2023-03-30 16:51:45

by Jonathan Cameron

[permalink] [raw]
Subject: [PATCH v4 0/5] CXL 3.0 Performance Monitoring Unit support

Changes since v3: Kan Liang gave feedback on v2 incorporated here.
- All updates in patch 4, see details there.

Updated cover letter.

The CXL rev 3.0 specification introduces a CXL Performance Monitoring
Unit definition. CXL components may have any number of these blocks. The
definition is highly flexible, but that does bring complexity in the
driver.

Notes.

1) The QEMU model against which this was developed needs tidying up.
Latest tree at https://gitlab.com/jic23/qemu branch cxl-2023-02-28
It's backed up behind other series that I plan to upstream first.
2) There are quite a lot of corner cases that will need working through
with variants of the model, or I'll have to design a pathological
set of CPMUs to hit all the corner cases in one go. So whilst I believe
the driver should be fine for what it supports we may find issues
as those corners of what is allowed are explored.
3) I'm not sure it makes sense to hang this of the cxl/pci driver but
couldn't really figure out where else in the current structure we could
make it fit cleanly.
4) Driver location. In past perf maintainers have requested perf drivers
for PCI etc be under drivers/perf. That would require moving some
CXL headers to be more generally visible, but is certainly possible
if there is agreement between CXL and perf maintainers on the correct
location.
5) Patch 1 led to a discussion of how to handle the self describing
and extensible nature of the CPMU counters. That is likely to involve
a description in the "caps" sysfs directory and perf tool code that is
aware of the different event combinations that make sense and can
establish which sets are available on a given device.
That is likely to take a while to converge on, so as the driver is useful
in the current state, I'm looking to upstream this first and deal with
the more complex handling later.
6) There is a lot of other functionality that can be added in future
include allowing for simpler hardware implementations that may not
support the minimum level of features currently required by the driver
(freeze, overflow interrupts etc).

CXL rev 3.0 specification available from https://www.computeexpresslink.org


Jonathan Cameron (5):
cxl: Add function to count regblocks of a given type
perf: Allow a PMU to have a parent
cxl/pci: Find and register CXL PMU devices
cxl: CXL Performance Monitoring Unit driver
docs: perf: Minimal introduction the the CXL PMU device and driver

Documentation/admin-guide/perf/cxl.rst | 65 ++
Documentation/admin-guide/perf/index.rst | 1 +
drivers/cxl/Kconfig | 13 +
drivers/cxl/Makefile | 1 +
drivers/cxl/core/Makefile | 1 +
drivers/cxl/core/core.h | 1 +
drivers/cxl/core/cpmu.c | 72 ++
drivers/cxl/core/pci.c | 2 +-
drivers/cxl/core/port.c | 4 +-
drivers/cxl/core/regs.c | 50 +-
drivers/cxl/cpmu.c | 940 +++++++++++++++++++++++
drivers/cxl/cpmu.h | 56 ++
drivers/cxl/cxl.h | 17 +-
drivers/cxl/cxlpci.h | 1 +
drivers/cxl/pci.c | 27 +-
include/linux/perf_event.h | 1 +
kernel/events/core.c | 1 +
17 files changed, 1245 insertions(+), 8 deletions(-)
create mode 100644 Documentation/admin-guide/perf/cxl.rst
create mode 100644 drivers/cxl/core/cpmu.c
create mode 100644 drivers/cxl/cpmu.c
create mode 100644 drivers/cxl/cpmu.h

--
2.37.2


2023-03-30 16:52:18

by Jonathan Cameron

[permalink] [raw]
Subject: [PATCH v4 5/5] docs: perf: Minimal introduction the the CXL PMU device and driver

Very basic introduction to the device and the current driver support
provided. I expect to expand on this in future versions of this patch
set.

Reviewed-by: Dave Jiang <[email protected]>
Signed-off-by: Jonathan Cameron <[email protected]>

--
v4: No change
---
Documentation/admin-guide/perf/cxl.rst | 65 ++++++++++++++++++++++++
Documentation/admin-guide/perf/index.rst | 1 +
2 files changed, 66 insertions(+)

diff --git a/Documentation/admin-guide/perf/cxl.rst b/Documentation/admin-guide/perf/cxl.rst
new file mode 100644
index 000000000000..46235dff4b21
--- /dev/null
+++ b/Documentation/admin-guide/perf/cxl.rst
@@ -0,0 +1,65 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================================
+CXL Performance Monitoring Unit (CPMU)
+======================================
+
+The CXL rev 3.0 specification provides a definition of CXL Performance
+Monitoring Unit in section 13.2: Performance Monitoring.
+
+CXL components (e.g. Root Port, Switch Upstream Port, End Point) may have
+any number of CPMU instances. CPMU capabilities are fully discoverable from
+the devices. The specification provides event definitions for all CXL protocol
+message types and a set of additional events for things commonly counted on
+CXL devices (e.g. DRAM events).
+
+CPMU driver
+===========
+
+The CPMU driver register a perf PMU with the name cpmu<id> on the CXL bus.
+
+ /sys/bus/cxl/device/cpmu<id>
+
+The associated PMU is registered as
+
+ /sys/bus/event_sources/devices/cpmu<id>
+
+In common with other CXL bus devices, the id has no specific meaning and the
+relationship to specific CXL device should be established via the device parent
+of the device on the CXL bus.
+
+PMU driver provides description of available events and filter options in sysfs.
+
+The "format" directory describes all formats of the config (event vendor id,
+group id and mask) config1 (threshold, filter enables) and config2 (filter
+parameters) fields of the perf_event_attr structure. The "events" directory
+describes all documented events show in perf list.
+
+The events shown in perf list are the most fine grained events with a single
+bit of the event mask set. More general events may be enable by setting
+multiple mask bits in config. For example, all Device to Host Read Requests
+may be captured on a single counter by setting the bits for all of
+
+* d2h_req_rdcurr
+* d2h_req_rdown
+* d2h_req_rdshared
+* d2h_req_rdany
+* d2h_req_rdownnodata
+
+Example of usage::
+
+ $#perf list
+ cpmu0/clock_ticks/ [Kernel PMU event]
+ cpmu0/d2h_req_itomwr/ [Kernel PMU event]
+ cpmu0/d2h_req_rdany/ [Kernel PMU event]
+ cpmu0/d2h_req_rdcurr/ [Kernel PMU event]
+ -----------------------------------------------------------
+
+ $# perf stat -e cpmu0/clock_ticks/ -e cpmu0/d2h_req_itowrm/
+
+Vendor specific events may also be available and if so can be used via
+
+ $# perf stat -e cpmu0/vid=VID,gid=GID,mask=MASK/
+
+The driver does not support sampling. So "perf record" and attaching to
+a task are unsupported.
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index 9de64a40adab..f60be04e4e33 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -21,3 +21,4 @@ Performance monitor support
alibaba_pmu
nvidia-pmu
meson-ddr-pmu
+ cxl
--
2.37.2

2023-03-30 16:52:45

by Jonathan Cameron

[permalink] [raw]
Subject: [PATCH v4 2/5] perf: Allow a PMU to have a parent

Some PMUs have well defined parents such as PCI devices.
As the device_initialize() and device_add() are all within
pmu_dev_alloc() which is called from perf_pmu_register()
there is no opportunity to set the parent from within a driver.

Add a struct device *parent field to struct pmu and use that
to set the parent.

Signed-off-by: Jonathan Cameron <[email protected]>
---
v4: No change
Note that this may first merge as part of a larger series I
plan to post next week that adds parents for many of the of the
other struct pmu instances. If so please drop this patch whilst
applying.

---
include/linux/perf_event.h | 1 +
kernel/events/core.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d5628a7b5eaa..b99db1eda72c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -303,6 +303,7 @@ struct pmu {

struct module *module;
struct device *dev;
+ struct device *parent;
const struct attribute_group **attr_groups;
const struct attribute_group **attr_update;
const char *name;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index fb3e436bcd4a..a84c282221f2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11367,6 +11367,7 @@ static int pmu_dev_alloc(struct pmu *pmu)

dev_set_drvdata(pmu->dev, pmu);
pmu->dev->bus = &pmu_bus;
+ pmu->dev->parent = pmu->parent;
pmu->dev->release = pmu_dev_release;

ret = dev_set_name(pmu->dev, "%s", pmu->name);
--
2.37.2

2023-04-03 17:50:31

by Liang, Kan

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] docs: perf: Minimal introduction the the CXL PMU device and driver



On 2023-03-30 12:45 p.m., Jonathan Cameron wrote:
> Very basic introduction to the device and the current driver support
> provided. I expect to expand on this in future versions of this patch
> set.
>
> Reviewed-by: Dave Jiang <[email protected]>
> Signed-off-by: Jonathan Cameron <[email protected]>
>
> --
> v4: No change
> ---
> Documentation/admin-guide/perf/cxl.rst | 65 ++++++++++++++++++++++++
> Documentation/admin-guide/perf/index.rst | 1 +
> 2 files changed, 66 insertions(+)
>
> diff --git a/Documentation/admin-guide/perf/cxl.rst b/Documentation/admin-guide/perf/cxl.rst
> new file mode 100644
> index 000000000000..46235dff4b21
> --- /dev/null
> +++ b/Documentation/admin-guide/perf/cxl.rst
> @@ -0,0 +1,65 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +======================================
> +CXL Performance Monitoring Unit (CPMU)
> +======================================
> +
> +The CXL rev 3.0 specification provides a definition of CXL Performance
> +Monitoring Unit in section 13.2: Performance Monitoring.
> +
> +CXL components (e.g. Root Port, Switch Upstream Port, End Point) may have
> +any number of CPMU instances. CPMU capabilities are fully discoverable from
> +the devices. The specification provides event definitions for all CXL protocol
> +message types and a set of additional events for things commonly counted on
> +CXL devices (e.g. DRAM events).
> +
> +CPMU driver
> +===========
> +
> +The CPMU driver register a perf PMU with the name cpmu<id> on the CXL bus.
> +
> + /sys/bus/cxl/device/cpmu<id>
> +
> +The associated PMU is registered as
> +
> + /sys/bus/event_sources/devices/cpmu<id>
> +
> +In common with other CXL bus devices, the id has no specific meaning and the
> +relationship to specific CXL device should be established via the device parent
> +of the device on the CXL bus.
> +
> +PMU driver provides description of available events and filter options in sysfs.
> +
> +The "format" directory describes all formats of the config (event vendor id,
> +group id and mask) config1 (threshold, filter enables) and config2 (filter
> +parameters) fields of the perf_event_attr structure. The "events" directory
> +describes all documented events show in perf list.
> +
> +The events shown in perf list are the most fine grained events with a single
> +bit of the event mask set. More general events may be enable by setting
> +multiple mask bits in config. For example, all Device to Host Read Requests
> +may be captured on a single counter by setting the bits for all of
> +
> +* d2h_req_rdcurr
> +* d2h_req_rdown
> +* d2h_req_rdshared
> +* d2h_req_rdany
> +* d2h_req_rdownnodata
> +
> +Example of usage::
> +
> + $#perf list
> + cpmu0/clock_ticks/ [Kernel PMU event]
> + cpmu0/d2h_req_itomwr/ [Kernel PMU event]
> + cpmu0/d2h_req_rdany/ [Kernel PMU event]
> + cpmu0/d2h_req_rdcurr/ [Kernel PMU event]
> + -----------------------------------------------------------
> +
> + $# perf stat -e cpmu0/clock_ticks/ -e cpmu0/d2h_req_itowrm/
> +
> +Vendor specific events may also be available and if so can be used via
> +
> + $# perf stat -e cpmu0/vid=VID,gid=GID,mask=MASK/
> +
> +The driver does not support sampling. So "perf record" and attaching to
> +a task are unsupported.

The PMU only supports system-wide counting. That's the reason it doesn't
support per-task profiling. Not because of missing sampling.

Thanks,
Kan
> diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
> index 9de64a40adab..f60be04e4e33 100644
> --- a/Documentation/admin-guide/perf/index.rst
> +++ b/Documentation/admin-guide/perf/index.rst
> @@ -21,3 +21,4 @@ Performance monitor support
> alibaba_pmu
> nvidia-pmu
> meson-ddr-pmu
> + cxl

2023-04-04 04:00:02

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH v4 0/5] CXL 3.0 Performance Monitoring Unit support

Jonathan Cameron wrote:
> Changes since v3: Kan Liang gave feedback on v2 incorporated here.
> - All updates in patch 4, see details there.
>
> Updated cover letter.
>
> The CXL rev 3.0 specification introduces a CXL Performance Monitoring
> Unit definition. CXL components may have any number of these blocks. The
> definition is highly flexible, but that does bring complexity in the
> driver.
>
> Notes.
>
> 1) The QEMU model against which this was developed needs tidying up.
> Latest tree at https://gitlab.com/jic23/qemu branch cxl-2023-02-28
> It's backed up behind other series that I plan to upstream first.
> 2) There are quite a lot of corner cases that will need working through
> with variants of the model, or I'll have to design a pathological
> set of CPMUs to hit all the corner cases in one go. So whilst I believe
> the driver should be fine for what it supports we may find issues
> as those corners of what is allowed are explored.
> 3) I'm not sure it makes sense to hang this of the cxl/pci driver but
> couldn't really figure out where else in the current structure we could
> make it fit cleanly.
> 4) Driver location. In past perf maintainers have requested perf drivers
> for PCI etc be under drivers/perf. That would require moving some
> CXL headers to be more generally visible, but is certainly possible
> if there is agreement between CXL and perf maintainers on the correct
> location.

I am ok the bulk of the logic living under drivers/perf/

Are drivers/perf/ folks ok with a:

CFLAGS_cxl.o := -I$(srctree)/drivers/cxl/

...or similar in their Makefile, because I don't think this by itself is
a reason to push CXL headers to include/.

I say this without having looked at the code yet and whether this driver
will need new exports from the cxl/core etc.

> 5) Patch 1 led to a discussion of how to handle the self describing
> and extensible nature of the CPMU counters. That is likely to involve
> a description in the "caps" sysfs directory and perf tool code that is
> aware of the different event combinations that make sense and can
> establish which sets are available on a given device.
> That is likely to take a while to converge on, so as the driver is useful
> in the current state, I'm looking to upstream this first and deal with
> the more complex handling later.

What's "this" in this last sentence, a canned set of base counters?

> 6) There is a lot of other functionality that can be added in future
> include allowing for simpler hardware implementations that may not
> support the minimum level of features currently required by the driver
> (freeze, overflow interrupts etc).

Curious if the the minimal set determined by the specification, like the
minimal features a CXL Memory Expander device must implement, or by what
you decided was fit to be emulated in QEMU?

>
> CXL rev 3.0 specification available from https://www.computeexpresslink.org
>
>
> Jonathan Cameron (5):
> cxl: Add function to count regblocks of a given type
> perf: Allow a PMU to have a parent
> cxl/pci: Find and register CXL PMU devices
> cxl: CXL Performance Monitoring Unit driver
> docs: perf: Minimal introduction the the CXL PMU device and driver
>
> Documentation/admin-guide/perf/cxl.rst | 65 ++
> Documentation/admin-guide/perf/index.rst | 1 +
> drivers/cxl/Kconfig | 13 +
> drivers/cxl/Makefile | 1 +
> drivers/cxl/core/Makefile | 1 +
> drivers/cxl/core/core.h | 1 +
> drivers/cxl/core/cpmu.c | 72 ++
> drivers/cxl/core/pci.c | 2 +-
> drivers/cxl/core/port.c | 4 +-
> drivers/cxl/core/regs.c | 50 +-
> drivers/cxl/cpmu.c | 940 +++++++++++++++++++++++
> drivers/cxl/cpmu.h | 56 ++
> drivers/cxl/cxl.h | 17 +-
> drivers/cxl/cxlpci.h | 1 +
> drivers/cxl/pci.c | 27 +-
> include/linux/perf_event.h | 1 +
> kernel/events/core.c | 1 +
> 17 files changed, 1245 insertions(+), 8 deletions(-)
> create mode 100644 Documentation/admin-guide/perf/cxl.rst
> create mode 100644 drivers/cxl/core/cpmu.c
> create mode 100644 drivers/cxl/cpmu.c
> create mode 100644 drivers/cxl/cpmu.h
>
> --
> 2.37.2
>

2023-04-04 04:37:54

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH v4 2/5] perf: Allow a PMU to have a parent

Jonathan Cameron wrote:
> Some PMUs have well defined parents such as PCI devices.
> As the device_initialize() and device_add() are all within
> pmu_dev_alloc() which is called from perf_pmu_register()
> there is no opportunity to set the parent from within a driver.
>
> Add a struct device *parent field to struct pmu and use that
> to set the parent.
>
> Signed-off-by: Jonathan Cameron <[email protected]>
> ---
> v4: No change
> Note that this may first merge as part of a larger series I
> plan to post next week that adds parents for many of the of the
> other struct pmu instances. If so please drop this patch whilst
> applying.

Feel free to add my:

Reviewed-by: Dan Williams <[email protected]>

...whereever this gets applied, and yes it makes sense especially as
more device attached PMUs are showing up.

2023-04-04 16:57:53

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] docs: perf: Minimal introduction the the CXL PMU device and driver

On Mon, 3 Apr 2023 13:45:52 -0400
"Liang, Kan" <[email protected]> wrote:

> On 2023-03-30 12:45 p.m., Jonathan Cameron wrote:
> > Very basic introduction to the device and the current driver support
> > provided. I expect to expand on this in future versions of this patch
> > set.
> >
> > Reviewed-by: Dave Jiang <[email protected]>
> > Signed-off-by: Jonathan Cameron <[email protected]>
> >
> > --
> > v4: No change
> > ---
> > Documentation/admin-guide/perf/cxl.rst | 65 ++++++++++++++++++++++++
> > Documentation/admin-guide/perf/index.rst | 1 +
> > 2 files changed, 66 insertions(+)
> >
> > diff --git a/Documentation/admin-guide/perf/cxl.rst b/Documentation/admin-guide/perf/cxl.rst
> > new file mode 100644
> > index 000000000000..46235dff4b21
> > --- /dev/null
> > +++ b/Documentation/admin-guide/perf/cxl.rst
> > @@ -0,0 +1,65 @@
> > +.. SPDX-License-Identifier: GPL-2.0
> > +
> > +======================================
> > +CXL Performance Monitoring Unit (CPMU)
> > +======================================
> > +
> > +The CXL rev 3.0 specification provides a definition of CXL Performance
> > +Monitoring Unit in section 13.2: Performance Monitoring.
> > +
> > +CXL components (e.g. Root Port, Switch Upstream Port, End Point) may have
> > +any number of CPMU instances. CPMU capabilities are fully discoverable from
> > +the devices. The specification provides event definitions for all CXL protocol
> > +message types and a set of additional events for things commonly counted on
> > +CXL devices (e.g. DRAM events).
> > +
> > +CPMU driver
> > +===========
> > +
> > +The CPMU driver register a perf PMU with the name cpmu<id> on the CXL bus.
> > +
> > + /sys/bus/cxl/device/cpmu<id>
> > +
> > +The associated PMU is registered as
> > +
> > + /sys/bus/event_sources/devices/cpmu<id>
> > +
> > +In common with other CXL bus devices, the id has no specific meaning and the
> > +relationship to specific CXL device should be established via the device parent
> > +of the device on the CXL bus.
> > +
> > +PMU driver provides description of available events and filter options in sysfs.
> > +
> > +The "format" directory describes all formats of the config (event vendor id,
> > +group id and mask) config1 (threshold, filter enables) and config2 (filter
> > +parameters) fields of the perf_event_attr structure. The "events" directory
> > +describes all documented events show in perf list.
> > +
> > +The events shown in perf list are the most fine grained events with a single
> > +bit of the event mask set. More general events may be enable by setting
> > +multiple mask bits in config. For example, all Device to Host Read Requests
> > +may be captured on a single counter by setting the bits for all of
> > +
> > +* d2h_req_rdcurr
> > +* d2h_req_rdown
> > +* d2h_req_rdshared
> > +* d2h_req_rdany
> > +* d2h_req_rdownnodata
> > +
> > +Example of usage::
> > +
> > + $#perf list
> > + cpmu0/clock_ticks/ [Kernel PMU event]
> > + cpmu0/d2h_req_itomwr/ [Kernel PMU event]
> > + cpmu0/d2h_req_rdany/ [Kernel PMU event]
> > + cpmu0/d2h_req_rdcurr/ [Kernel PMU event]
> > + -----------------------------------------------------------
> > +
> > + $# perf stat -e cpmu0/clock_ticks/ -e cpmu0/d2h_req_itowrm/
> > +
> > +Vendor specific events may also be available and if so can be used via
> > +
> > + $# perf stat -e cpmu0/vid=VID,gid=GID,mask=MASK/
> > +
> > +The driver does not support sampling. So "perf record" and attaching to
> > +a task are unsupported.
>
> The PMU only supports system-wide counting. That's the reason it doesn't
> support per-task profiling. Not because of missing sampling.

Ah. I've managed to fuse two different conditions. I'll break them apart for
v5.

Thanks,

Jonathan

>
> Thanks,
> Kan
> > diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
> > index 9de64a40adab..f60be04e4e33 100644
> > --- a/Documentation/admin-guide/perf/index.rst
> > +++ b/Documentation/admin-guide/perf/index.rst
> > @@ -21,3 +21,4 @@ Performance monitor support
> > alibaba_pmu
> > nvidia-pmu
> > meson-ddr-pmu
> > + cxl
>

2023-04-04 22:25:37

by Dan Williams

[permalink] [raw]
Subject: RE: [PATCH v4 5/5] docs: perf: Minimal introduction the the CXL PMU device and driver

Jonathan Cameron wrote:
> Very basic introduction to the device and the current driver support
> provided. I expect to expand on this in future versions of this patch
> set.
>
> Reviewed-by: Dave Jiang <[email protected]>
> Signed-off-by: Jonathan Cameron <[email protected]>
>
> --
> v4: No change
> ---
> Documentation/admin-guide/perf/cxl.rst | 65 ++++++++++++++++++++++++
> Documentation/admin-guide/perf/index.rst | 1 +
> 2 files changed, 66 insertions(+)
>
> diff --git a/Documentation/admin-guide/perf/cxl.rst b/Documentation/admin-guide/perf/cxl.rst
> new file mode 100644
> index 000000000000..46235dff4b21
> --- /dev/null
> +++ b/Documentation/admin-guide/perf/cxl.rst
> @@ -0,0 +1,65 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +======================================
> +CXL Performance Monitoring Unit (CPMU)
> +======================================
> +
> +The CXL rev 3.0 specification provides a definition of CXL Performance
> +Monitoring Unit in section 13.2: Performance Monitoring.
> +
> +CXL components (e.g. Root Port, Switch Upstream Port, End Point) may have
> +any number of CPMU instances. CPMU capabilities are fully discoverable from
> +the devices. The specification provides event definitions for all CXL protocol
> +message types and a set of additional events for things commonly counted on
> +CXL devices (e.g. DRAM events).
> +
> +CPMU driver
> +===========
> +
> +The CPMU driver register a perf PMU with the name cpmu<id> on the CXL bus.

s/register/registers/

> +
> + /sys/bus/cxl/device/cpmu<id>
> +
> +The associated PMU is registered as
> +
> + /sys/bus/event_sources/devices/cpmu<id>
> +
> +In common with other CXL bus devices, the id has no specific meaning and the
> +relationship to specific CXL device should be established via the device parent
> +of the device on the CXL bus.

So I went to go add some text about how to identify PMUs in a persistent
manner from one boot to the next. For CXL memdevs this is done by the
'serial' attribute which is always stable regardless of the device init
order. That's harder to get to from the pmu device because it may be
associated with a device that does not have a memdev.

I think it's also going to be frustrating for userspace to see
randomized pmu ids across devices since that probing will happen in
parallel. So how about:

1/ Add serial as an attribute for each PMU to export
2/ Change the device name format to be "pmuX.Y" where X can just reuse
the memdev id for endpoints and be another value for switches, and Y is
guaranteed to be 0-based and in hardware discovery order.

...with that, someone can write a udev script that can persistently
identify PMU[Y] on device[serial] each boot.

That also cleans up a /sys/bus/cxl/devices listing to make it clear
which pmu instances belong together.

> +
> +PMU driver provides description of available events and filter options in sysfs.
> +
> +The "format" directory describes all formats of the config (event vendor id,
> +group id and mask) config1 (threshold, filter enables) and config2 (filter
> +parameters) fields of the perf_event_attr structure. The "events" directory
> +describes all documented events show in perf list.
> +
> +The events shown in perf list are the most fine grained events with a single
> +bit of the event mask set. More general events may be enable by setting
> +multiple mask bits in config. For example, all Device to Host Read Requests
> +may be captured on a single counter by setting the bits for all of
> +
> +* d2h_req_rdcurr
> +* d2h_req_rdown
> +* d2h_req_rdshared
> +* d2h_req_rdany
> +* d2h_req_rdownnodata
> +
> +Example of usage::
> +
> + $#perf list
> + cpmu0/clock_ticks/ [Kernel PMU event]
> + cpmu0/d2h_req_itomwr/ [Kernel PMU event]
> + cpmu0/d2h_req_rdany/ [Kernel PMU event]
> + cpmu0/d2h_req_rdcurr/ [Kernel PMU event]
> + -----------------------------------------------------------
> +
> + $# perf stat -e cpmu0/clock_ticks/ -e cpmu0/d2h_req_itowrm/

Ah here's the examples I was looking for in the last patch, nice.

> +
> +Vendor specific events may also be available and if so can be used via
> +
> + $# perf stat -e cpmu0/vid=VID,gid=GID,mask=MASK/
> +
> +The driver does not support sampling. So "perf record" and attaching to
> +a task are unsupported.

Is this a common restriction for CPU-external pmus, or do you see
sampling support required to get this upstream?

2023-04-06 16:53:22

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v4 5/5] docs: perf: Minimal introduction the the CXL PMU device and driver


>
> > +
> > + /sys/bus/cxl/device/cpmu<id>
> > +
> > +The associated PMU is registered as
> > +
> > + /sys/bus/event_sources/devices/cpmu<id>
> > +
> > +In common with other CXL bus devices, the id has no specific meaning and the
> > +relationship to specific CXL device should be established via the device parent
> > +of the device on the CXL bus.
>
> So I went to go add some text about how to identify PMUs in a persistent
> manner from one boot to the next. For CXL memdevs this is done by the
> 'serial' attribute which is always stable regardless of the device init
> order. That's harder to get to from the pmu device because it may be
> associated with a device that does not have a memdev.
>
> I think it's also going to be frustrating for userspace to see
> randomized pmu ids across devices since that probing will happen in
> parallel. So how about:

Solving this in general for perf PMU drivers was what the parent device thing
was about. There is an argument that enabling any other path to get to
this association is both unnecessary and just possibly unwise.

The nice advantage of just using an IDA and relying on parentage for the
association was that I could avoid naming questions for all the other
places these might turn in a CXL topology. The Lazy / efficient option ;)

You can now see exactly which PCI device a given instance is associated with.
Custom ABI is going to be harder for anyone to use than that.

I suppose we can potentially enable both paths - but it's not quite
as straight forwards as you suggest.

>
> 1/ Add serial as an attribute for each PMU to export

Where does it come from? We only have one source of serial number per device.
That's no where near enough to work out where a PMU is.

> 2/ Change the device name format to be "pmuX.Y" where X can just reuse

Could use something a little more detailed cxl bus, but the one registered and use
to address this in bus/event_sources needs to be cxl specific so a cxl_ prefix
is needed I think

Given we need to namespace what the ids refer to, I'm currently going with
pmu_memX.Y pmu_dspX.Y.Z pmu_uspX.Y
on the cxl bus and
cxl_pmu_memX.Y cxl_pmu_dspX.Y.Z cxl_pmu_uspX.Y on even sources bus.
(Z needed because dsp index from 0 for each usp)
We can figure out what to do about other CXL EPs later and for now at least
there is no way to hand a CPMU instance off a host bridge (nothing in CEDT
to tell you where to find it).

I've had a fun day hacking PMUs onto the other emulated CXL devices to test
this.
There is a can of worms I'll avoid for this series by just sticking to type3
device PMUs for now.

I have no idea yet how we handle the interrupts safely for ports as those
interrupts are in control the pcie port driver not the CXL dport one.
At somepoint I'll send out an RFC about that if no one gets to it before
me. For now I've hacked portdrv to always allocate max vectors and am ignoring the
lovely back traces due to thing getting torn down in the wrong order on shutdown.
For upstream ports I've hacked portdrv to pretend it knows there is something to handle.
As a starting point I think we'll need to teach portdrv enough about CXL
to be able to tell if it should provide interrupt services..

Hence I'll keep the code to register the other PMUs for a future patch set
and just make sure the code is structured to enable that in this series.


> the memdev id for endpoints and be another value for switches, and Y is
> guaranteed to be 0-based and in hardware discovery order.

Also need to change registration order as PMUs were registered before the
memdev, but that's easy enough to do.

>
> ...with that, someone can write a udev script that can persistently
> identify PMU[Y] on device[serial] each boot.

>
> That also cleans up a /sys/bus/cxl/devices listing to make it clear
> which pmu instances belong together.
>
> > +
> > +PMU driver provides description of available events and filter options in sysfs.
> > +
> > +The "format" directory describes all formats of the config (event vendor id,
> > +group id and mask) config1 (threshold, filter enables) and config2 (filter
> > +parameters) fields of the perf_event_attr structure. The "events" directory
> > +describes all documented events show in perf list.
> > +
> > +The events shown in perf list are the most fine grained events with a single
> > +bit of the event mask set. More general events may be enable by setting
> > +multiple mask bits in config. For example, all Device to Host Read Requests
> > +may be captured on a single counter by setting the bits for all of
> > +
> > +* d2h_req_rdcurr
> > +* d2h_req_rdown
> > +* d2h_req_rdshared
> > +* d2h_req_rdany
> > +* d2h_req_rdownnodata
> > +
> > +Example of usage::
> > +
> > + $#perf list
> > + cpmu0/clock_ticks/ [Kernel PMU event]
> > + cpmu0/d2h_req_itomwr/ [Kernel PMU event]
> > + cpmu0/d2h_req_rdany/ [Kernel PMU event]
> > + cpmu0/d2h_req_rdcurr/ [Kernel PMU event]
> > + -----------------------------------------------------------
> > +
> > + $# perf stat -e cpmu0/clock_ticks/ -e cpmu0/d2h_req_itowrm/
>
> Ah here's the examples I was looking for in the last patch, nice.
>
> > +
> > +Vendor specific events may also be available and if so can be used via
> > +
> > + $# perf stat -e cpmu0/vid=VID,gid=GID,mask=MASK/
> > +
> > +The driver does not support sampling. So "perf record" and attaching to
> > +a task are unsupported.
>
> Is this a common restriction for CPU-external pmus, or do you see
> sampling support required to get this upstream?

It's a common restriction. Whilst we could potentially implement sampling
based on the presence of a suitable clock_ticks event it don't see it
as a requirement initially.

Jonathan


2023-04-11 13:38:07

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH v4 0/5] CXL 3.0 Performance Monitoring Unit support

On Mon, 3 Apr 2023 20:55:35 -0700
Dan Williams <[email protected]> wrote:

> Jonathan Cameron wrote:
> > Changes since v3: Kan Liang gave feedback on v2 incorporated here.
> > - All updates in patch 4, see details there.
> >
> > Updated cover letter.
> >
> > The CXL rev 3.0 specification introduces a CXL Performance Monitoring
> > Unit definition. CXL components may have any number of these blocks. The
> > definition is highly flexible, but that does bring complexity in the
> > driver.
> >
> > Notes.
> >
> > 1) The QEMU model against which this was developed needs tidying up.
> > Latest tree at https://gitlab.com/jic23/qemu branch cxl-2023-02-28
> > It's backed up behind other series that I plan to upstream first.
> > 2) There are quite a lot of corner cases that will need working through
> > with variants of the model, or I'll have to design a pathological
> > set of CPMUs to hit all the corner cases in one go. So whilst I believe
> > the driver should be fine for what it supports we may find issues
> > as those corners of what is allowed are explored.
> > 3) I'm not sure it makes sense to hang this of the cxl/pci driver but
> > couldn't really figure out where else in the current structure we could
> > make it fit cleanly.
> > 4) Driver location. In past perf maintainers have requested perf drivers
> > for PCI etc be under drivers/perf. That would require moving some
> > CXL headers to be more generally visible, but is certainly possible
> > if there is agreement between CXL and perf maintainers on the correct
> > location.
>
> I am ok the bulk of the logic living under drivers/perf/
>
> Are drivers/perf/ folks ok with a:
>
> CFLAGS_cxl.o := -I$(srctree)/drivers/cxl/
>
> ...or similar in their Makefile, because I don't think this by itself is
> a reason to push CXL headers to include/.

For now I've gone with ../cxl/ which doesn't look too bad.
If cflags stuff is preferred can change that.


>
> I say this without having looked at the code yet and whether this driver
> will need new exports from the cxl/core etc.
>
> > 5) Patch 1 led to a discussion of how to handle the self describing
> > and extensible nature of the CPMU counters. That is likely to involve
> > a description in the "caps" sysfs directory and perf tool code that is
> > aware of the different event combinations that make sense and can
> > establish which sets are available on a given device.
> > That is likely to take a while to converge on, so as the driver is useful
> > in the current state, I'm looking to upstream this first and deal with
> > the more complex handling later.
>
> What's "this" in this last sentence, a canned set of base counters?

All the counters defined in the CXL spec itself without any summing of
counters (so you can't do all 'reads' for example without 'guessing' the
magic numbers).

>
> > 6) There is a lot of other functionality that can be added in future
> > include allowing for simpler hardware implementations that may not
> > support the minimum level of features currently required by the driver
> > (freeze, overflow interrupts etc).
>
> Curious if the the minimal set determined by the specification, like the
> minimal features a CXL Memory Expander device must implement, or by what
> you decided was fit to be emulated in QEMU?

There is no specification requirement to implement any particular subset
of counters or features for those counters. I think that's considered out
of scope for the CXL spec itself. Of course particularly customers or
industry bodies may define a minimal set of requirements. I don't think anyone
has done so yet.

Current set of counters is all the ones in the specification itself at their
finest granularity (summed cases for the configurable counters 'work' but
aren't advertised - working out sensible choices is a job for perftool +
an enhanced driver that advertises what combinations it can sum).

Counter features / types include the most generic, fully featured and flexible
versions. In many cases it's harder to support the less complex hardware
because a configuration dance can be needed.
I'd imagine the remaining 'likely' hardware support will be added over the
next kernel cycle or two.

Jonathan

>
> >
> > CXL rev 3.0 specification available from https://www.computeexpresslink.org
> >
> >
> > Jonathan Cameron (5):
> > cxl: Add function to count regblocks of a given type
> > perf: Allow a PMU to have a parent
> > cxl/pci: Find and register CXL PMU devices
> > cxl: CXL Performance Monitoring Unit driver
> > docs: perf: Minimal introduction the the CXL PMU device and driver
> >
> > Documentation/admin-guide/perf/cxl.rst | 65 ++
> > Documentation/admin-guide/perf/index.rst | 1 +
> > drivers/cxl/Kconfig | 13 +
> > drivers/cxl/Makefile | 1 +
> > drivers/cxl/core/Makefile | 1 +
> > drivers/cxl/core/core.h | 1 +
> > drivers/cxl/core/cpmu.c | 72 ++
> > drivers/cxl/core/pci.c | 2 +-
> > drivers/cxl/core/port.c | 4 +-
> > drivers/cxl/core/regs.c | 50 +-
> > drivers/cxl/cpmu.c | 940 +++++++++++++++++++++++
> > drivers/cxl/cpmu.h | 56 ++
> > drivers/cxl/cxl.h | 17 +-
> > drivers/cxl/cxlpci.h | 1 +
> > drivers/cxl/pci.c | 27 +-
> > include/linux/perf_event.h | 1 +
> > kernel/events/core.c | 1 +
> > 17 files changed, 1245 insertions(+), 8 deletions(-)
> > create mode 100644 Documentation/admin-guide/perf/cxl.rst
> > create mode 100644 drivers/cxl/core/cpmu.c
> > create mode 100644 drivers/cxl/cpmu.c
> > create mode 100644 drivers/cxl/cpmu.h
> >
> > --
> > 2.37.2
> >