2022-09-17 12:20:46

by Shuai Xue

[permalink] [raw]
Subject: [PATCH v1 1/3] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver

Alibaba's T-Head Yitan 710 SoC is built on Synopsys' widely deployed and
silicon-proven DesignWare Core PCIe controller which implements PMU for
performance and functional debugging to facilitate system maintenance.
Document it to provide guidance on how to use it.

Signed-off-by: Shuai Xue <[email protected]>
---
.../admin-guide/perf/dwc_pcie_pmu.rst | 61 +++++++++++++++++++
Documentation/admin-guide/perf/index.rst | 1 +
2 files changed, 62 insertions(+)
create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst

diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
new file mode 100644
index 000000000000..fbcbf10b23b7
--- /dev/null
+++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
@@ -0,0 +1,61 @@
+======================================================================
+Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU)
+======================================================================
+
+DesignWare Cores (DWC) PCIe PMU
+===============================
+
+To facilitate collection of statistics, Synopsys DesignWare Cores PCIe
+controller provides the following two features:
+
+- Time Based Analysis (RX/TX data throughput and time spent in each
+ low-power LTSSM state)
+- Lane Event counters (Error and Non-Error for lanes)
+
+The PMU is not a PCIe Root Complex integrated End Point (RCiEP) device but
+only register counters provided by each PCIe Root Port.
+
+Time Based Analysis
+-------------------
+
+Using this feature you can obtain information regarding RX/TX data
+throughput and time spent in each low-power LTSSM state by the controller.
+
+The counters are 64-bit width and measure data in two categories,
+
+- percentage of time does the controller stay in LTSSM state in a
+ configurable duration. The measurement range of each Event in Group#0.
+- amount of data processed (Units of 16 bytes). The measurement range of
+ each Event in Group#1.
+
+Lane Event counters
+-------------------
+
+Using this feature you can obtain Error and Non-Error information in
+specific lane by the controller.
+
+The counters are 32-bit width and the measured event is select by:
+
+- Group i
+- Event j within the Group i
+- and Lank k
+
+Some of the event counters only exist for specific configurations.
+
+DesignWare Cores (DWC) PCIe PMU Driver
+=======================================
+
+This driver add PMU devices for each PCIe Root Port. And the PMU device is
+named based the BDF of Root Port. For example,
+
+ 10:00.0 PCI bridge: Device 1ded:8000 (rev 01)
+
+the PMU device name for this Root Port is pcie_bdf_100000.
+
+Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
+
+ $# perf stat -a -e pcie_bdf_200/Rx_PCIe_TLP_Data_Payload/
+
+average RX bandwidth can be calculated like this:
+
+ PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index 9c9ece88ce53..8e6a5472aeb3 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -18,3 +18,4 @@ Performance monitor support
xgene-pmu
arm_dsu_pmu
thunderx2-pmu
+ dwc_pcie_pmu
--
2.20.1.12.g72788fdb


2022-09-22 13:36:28

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver

On Sat, Sep 17, 2022 at 08:10:34PM +0800, Shuai Xue wrote:
> Alibaba's T-Head Yitan 710 SoC is built on Synopsys' widely deployed and
> silicon-proven DesignWare Core PCIe controller which implements PMU for
> performance and functional debugging to facilitate system maintenance.
> Document it to provide guidance on how to use it.
>
> Signed-off-by: Shuai Xue <[email protected]>
> ---
> .../admin-guide/perf/dwc_pcie_pmu.rst | 61 +++++++++++++++++++
> Documentation/admin-guide/perf/index.rst | 1 +
> 2 files changed, 62 insertions(+)
> create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>
> diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
> new file mode 100644
> index 000000000000..fbcbf10b23b7
> --- /dev/null
> +++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
> @@ -0,0 +1,61 @@
> +======================================================================
> +Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU)
> +======================================================================
> +
> +DesignWare Cores (DWC) PCIe PMU
> +===============================
> +
> +To facilitate collection of statistics, Synopsys DesignWare Cores PCIe
> +controller provides the following two features:
> +
> +- Time Based Analysis (RX/TX data throughput and time spent in each
> + low-power LTSSM state)
> +- Lane Event counters (Error and Non-Error for lanes)
> +
> +The PMU is not a PCIe Root Complex integrated End Point (RCiEP) device but
> +only register counters provided by each PCIe Root Port.
> +
> +Time Based Analysis
> +-------------------
> +
> +Using this feature you can obtain information regarding RX/TX data
> +throughput and time spent in each low-power LTSSM state by the controller.
> +
> +The counters are 64-bit width and measure data in two categories,
> +
> +- percentage of time does the controller stay in LTSSM state in a
> + configurable duration. The measurement range of each Event in Group#0.
> +- amount of data processed (Units of 16 bytes). The measurement range of
> + each Event in Group#1.
> +
> +Lane Event counters
> +-------------------
> +
> +Using this feature you can obtain Error and Non-Error information in
> +specific lane by the controller.
> +
> +The counters are 32-bit width and the measured event is select by:
> +
> +- Group i
> +- Event j within the Group i
> +- and Lank k
> +
> +Some of the event counters only exist for specific configurations.
> +
> +DesignWare Cores (DWC) PCIe PMU Driver
> +=======================================
> +
> +This driver add PMU devices for each PCIe Root Port. And the PMU device is
> +named based the BDF of Root Port. For example,
> +
> + 10:00.0 PCI bridge: Device 1ded:8000 (rev 01)
> +
> +the PMU device name for this Root Port is pcie_bdf_100000.
> +
> +Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
> +
> + $# perf stat -a -e pcie_bdf_200/Rx_PCIe_TLP_Data_Payload/

Do you really need to expose a separate PMU instance to userspace for each
BDF? I think it would be much cleaner if you could follow the approach used
by hisilicon/hisi_pcie_pmu.c and hide these details in the driver, exposing
a `bdf=' selector to userspace instead.

Will

2022-09-23 02:22:18

by Yicong Yang

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver

On 2022/9/17 20:10, Shuai Xue wrote:
> Alibaba's T-Head Yitan 710 SoC is built on Synopsys' widely deployed and
> silicon-proven DesignWare Core PCIe controller which implements PMU for
> performance and functional debugging to facilitate system maintenance.
> Document it to provide guidance on how to use it.
>
> Signed-off-by: Shuai Xue <[email protected]>
> ---
> .../admin-guide/perf/dwc_pcie_pmu.rst | 61 +++++++++++++++++++
> Documentation/admin-guide/perf/index.rst | 1 +
> 2 files changed, 62 insertions(+)
> create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>
> diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
> new file mode 100644
> index 000000000000..fbcbf10b23b7
> --- /dev/null
> +++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
> @@ -0,0 +1,61 @@
> +======================================================================
> +Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU)
> +======================================================================
> +
> +DesignWare Cores (DWC) PCIe PMU
> +===============================
> +
> +To facilitate collection of statistics, Synopsys DesignWare Cores PCIe
> +controller provides the following two features:
> +
> +- Time Based Analysis (RX/TX data throughput and time spent in each
> + low-power LTSSM state)
> +- Lane Event counters (Error and Non-Error for lanes)
> +
> +The PMU is not a PCIe Root Complex integrated End Point (RCiEP) device but
> +only register counters provided by each PCIe Root Port.
> +
> +Time Based Analysis
> +-------------------
> +
> +Using this feature you can obtain information regarding RX/TX data
> +throughput and time spent in each low-power LTSSM state by the controller.
> +
> +The counters are 64-bit width and measure data in two categories,
> +
> +- percentage of time does the controller stay in LTSSM state in a
> + configurable duration. The measurement range of each Event in Group#0.
> +- amount of data processed (Units of 16 bytes). The measurement range of
> + each Event in Group#1.
> +
> +Lane Event counters
> +-------------------
> +
> +Using this feature you can obtain Error and Non-Error information in
> +specific lane by the controller.
> +
> +The counters are 32-bit width and the measured event is select by:
> +
> +- Group i
> +- Event j within the Group i
> +- and Lank k

Typo here? I guess it's "Lane k"?

> +
> +Some of the event counters only exist for specific configurations.
> +
> +DesignWare Cores (DWC) PCIe PMU Driver
> +=======================================
> +
> +This driver add PMU devices for each PCIe Root Port. And the PMU device is
> +named based the BDF of Root Port. For example,
> +
> + 10:00.0 PCI bridge: Device 1ded:8000 (rev 01)
> +
> +the PMU device name for this Root Port is pcie_bdf_100000.
> +
> +Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
> +
> + $# perf stat -a -e pcie_bdf_200/Rx_PCIe_TLP_Data_Payload/
> +
> +average RX bandwidth can be calculated like this:
> +
> + PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
> diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
> index 9c9ece88ce53..8e6a5472aeb3 100644
> --- a/Documentation/admin-guide/perf/index.rst
> +++ b/Documentation/admin-guide/perf/index.rst
> @@ -18,3 +18,4 @@ Performance monitor support
> xgene-pmu
> arm_dsu_pmu
> thunderx2-pmu
> + dwc_pcie_pmu
>

2022-09-23 14:11:37

by Shuai Xue

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver



在 2022/9/22 PM9:25, Will Deacon 写道:
> On Sat, Sep 17, 2022 at 08:10:34PM +0800, Shuai Xue wrote:
>> Alibaba's T-Head Yitan 710 SoC is built on Synopsys' widely deployed and
>> silicon-proven DesignWare Core PCIe controller which implements PMU for
>> performance and functional debugging to facilitate system maintenance.
>> Document it to provide guidance on how to use it.
>>
>> Signed-off-by: Shuai Xue <[email protected]>
>> ---
>> .../admin-guide/perf/dwc_pcie_pmu.rst | 61 +++++++++++++++++++
>> Documentation/admin-guide/perf/index.rst | 1 +
>> 2 files changed, 62 insertions(+)
>> create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>>
>> diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>> new file mode 100644
>> index 000000000000..fbcbf10b23b7
>> --- /dev/null
>> +++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>> @@ -0,0 +1,61 @@
>> +======================================================================
>> +Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU)
>> +======================================================================
>> +
>> +DesignWare Cores (DWC) PCIe PMU
>> +===============================
>> +
>> +To facilitate collection of statistics, Synopsys DesignWare Cores PCIe
>> +controller provides the following two features:
>> +
>> +- Time Based Analysis (RX/TX data throughput and time spent in each
>> + low-power LTSSM state)
>> +- Lane Event counters (Error and Non-Error for lanes)
>> +
>> +The PMU is not a PCIe Root Complex integrated End Point (RCiEP) device but
>> +only register counters provided by each PCIe Root Port.
>> +
>> +Time Based Analysis
>> +-------------------
>> +
>> +Using this feature you can obtain information regarding RX/TX data
>> +throughput and time spent in each low-power LTSSM state by the controller.
>> +
>> +The counters are 64-bit width and measure data in two categories,
>> +
>> +- percentage of time does the controller stay in LTSSM state in a
>> + configurable duration. The measurement range of each Event in Group#0.
>> +- amount of data processed (Units of 16 bytes). The measurement range of
>> + each Event in Group#1.
>> +
>> +Lane Event counters
>> +-------------------
>> +
>> +Using this feature you can obtain Error and Non-Error information in
>> +specific lane by the controller.
>> +
>> +The counters are 32-bit width and the measured event is select by:
>> +
>> +- Group i
>> +- Event j within the Group i
>> +- and Lank k
>> +
>> +Some of the event counters only exist for specific configurations.
>> +
>> +DesignWare Cores (DWC) PCIe PMU Driver
>> +=======================================
>> +
>> +This driver add PMU devices for each PCIe Root Port. And the PMU device is
>> +named based the BDF of Root Port. For example,
>> +
>> + 10:00.0 PCI bridge: Device 1ded:8000 (rev 01)
>> +
>> +the PMU device name for this Root Port is pcie_bdf_100000.
>> +
>> +Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
>> +
>> + $# perf stat -a -e pcie_bdf_200/Rx_PCIe_TLP_Data_Payload/
>
> Do you really need to expose a separate PMU instance to userspace for each
> BDF? I think it would be much cleaner if you could follow the approach used
> by hisilicon/hisi_pcie_pmu.c and hide these details in the driver, exposing
> a `bdf=' selector to userspace instead.

Thank you for your valuable comments.

It's a good idea to encode bdf in bitmap and exposing a `bdf=' selector to userspace.
The problem of bdf selector is that the user need to compute bdf from lanes, do you
think it is user friendly? I'm worried about increasing the burden of users.

Best Regards
Shuai


2022-09-23 15:41:55

by Shuai Xue

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver



在 2022/9/23 AM9:27, Yicong Yang 写道:
> On 2022/9/17 20:10, Shuai Xue wrote:
>> Alibaba's T-Head Yitan 710 SoC is built on Synopsys' widely deployed and
>> silicon-proven DesignWare Core PCIe controller which implements PMU for
>> performance and functional debugging to facilitate system maintenance.
>> Document it to provide guidance on how to use it.
>>
>> Signed-off-by: Shuai Xue <[email protected]>
>> ---
>> .../admin-guide/perf/dwc_pcie_pmu.rst | 61 +++++++++++++++++++
>> Documentation/admin-guide/perf/index.rst | 1 +
>> 2 files changed, 62 insertions(+)
>> create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>>
>> diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>> new file mode 100644
>> index 000000000000..fbcbf10b23b7
>> --- /dev/null
>> +++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>> @@ -0,0 +1,61 @@
>> +======================================================================
>> +Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU)
>> +======================================================================
>> +
>> +DesignWare Cores (DWC) PCIe PMU
>> +===============================
>> +
>> +To facilitate collection of statistics, Synopsys DesignWare Cores PCIe
>> +controller provides the following two features:
>> +
>> +- Time Based Analysis (RX/TX data throughput and time spent in each
>> + low-power LTSSM state)
>> +- Lane Event counters (Error and Non-Error for lanes)
>> +
>> +The PMU is not a PCIe Root Complex integrated End Point (RCiEP) device but
>> +only register counters provided by each PCIe Root Port.
>> +
>> +Time Based Analysis
>> +-------------------
>> +
>> +Using this feature you can obtain information regarding RX/TX data
>> +throughput and time spent in each low-power LTSSM state by the controller.
>> +
>> +The counters are 64-bit width and measure data in two categories,
>> +
>> +- percentage of time does the controller stay in LTSSM state in a
>> + configurable duration. The measurement range of each Event in Group#0.
>> +- amount of data processed (Units of 16 bytes). The measurement range of
>> + each Event in Group#1.
>> +
>> +Lane Event counters
>> +-------------------
>> +
>> +Using this feature you can obtain Error and Non-Error information in
>> +specific lane by the controller.
>> +
>> +The counters are 32-bit width and the measured event is select by:
>> +
>> +- Group i
>> +- Event j within the Group i
>> +- and Lank k
>
> Typo here? I guess it's "Lane k"?

Good catch, thank you. Will fix in next version.

Best Regards,
Shuai

>
>> +
>> +Some of the event counters only exist for specific configurations.
>> +
>> +DesignWare Cores (DWC) PCIe PMU Driver
>> +=======================================
>> +
>> +This driver add PMU devices for each PCIe Root Port. And the PMU device is
>> +named based the BDF of Root Port. For example,
>> +
>> + 10:00.0 PCI bridge: Device 1ded:8000 (rev 01)
>> +
>> +the PMU device name for this Root Port is pcie_bdf_100000.
>> +
>> +Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
>> +
>> + $# perf stat -a -e pcie_bdf_200/Rx_PCIe_TLP_Data_Payload/
>> +
>> +average RX bandwidth can be calculated like this:
>> +
>> + PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
>> diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
>> index 9c9ece88ce53..8e6a5472aeb3 100644
>> --- a/Documentation/admin-guide/perf/index.rst
>> +++ b/Documentation/admin-guide/perf/index.rst
>> @@ -18,3 +18,4 @@ Performance monitor support
>> xgene-pmu
>> arm_dsu_pmu
>> thunderx2-pmu
>> + dwc_pcie_pmu
>>

2022-11-07 16:20:03

by Will Deacon

[permalink] [raw]
Subject: Re: [PATCH v1 1/3] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver

On Fri, Sep 23, 2022 at 09:51:40PM +0800, Shuai Xue wrote:
>
>
> 在 2022/9/22 PM9:25, Will Deacon 写道:
> > On Sat, Sep 17, 2022 at 08:10:34PM +0800, Shuai Xue wrote:
> >> Alibaba's T-Head Yitan 710 SoC is built on Synopsys' widely deployed and
> >> silicon-proven DesignWare Core PCIe controller which implements PMU for
> >> performance and functional debugging to facilitate system maintenance.
> >> Document it to provide guidance on how to use it.
> >>
> >> Signed-off-by: Shuai Xue <[email protected]>
> >> ---
> >> .../admin-guide/perf/dwc_pcie_pmu.rst | 61 +++++++++++++++++++
> >> Documentation/admin-guide/perf/index.rst | 1 +
> >> 2 files changed, 62 insertions(+)
> >> create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst
> >>
> >> diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
> >> new file mode 100644
> >> index 000000000000..fbcbf10b23b7
> >> --- /dev/null
> >> +++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
> >> @@ -0,0 +1,61 @@
> >> +======================================================================
> >> +Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU)
> >> +======================================================================
> >> +
> >> +DesignWare Cores (DWC) PCIe PMU
> >> +===============================
> >> +
> >> +To facilitate collection of statistics, Synopsys DesignWare Cores PCIe
> >> +controller provides the following two features:
> >> +
> >> +- Time Based Analysis (RX/TX data throughput and time spent in each
> >> + low-power LTSSM state)
> >> +- Lane Event counters (Error and Non-Error for lanes)
> >> +
> >> +The PMU is not a PCIe Root Complex integrated End Point (RCiEP) device but
> >> +only register counters provided by each PCIe Root Port.
> >> +
> >> +Time Based Analysis
> >> +-------------------
> >> +
> >> +Using this feature you can obtain information regarding RX/TX data
> >> +throughput and time spent in each low-power LTSSM state by the controller.
> >> +
> >> +The counters are 64-bit width and measure data in two categories,
> >> +
> >> +- percentage of time does the controller stay in LTSSM state in a
> >> + configurable duration. The measurement range of each Event in Group#0.
> >> +- amount of data processed (Units of 16 bytes). The measurement range of
> >> + each Event in Group#1.
> >> +
> >> +Lane Event counters
> >> +-------------------
> >> +
> >> +Using this feature you can obtain Error and Non-Error information in
> >> +specific lane by the controller.
> >> +
> >> +The counters are 32-bit width and the measured event is select by:
> >> +
> >> +- Group i
> >> +- Event j within the Group i
> >> +- and Lank k
> >> +
> >> +Some of the event counters only exist for specific configurations.
> >> +
> >> +DesignWare Cores (DWC) PCIe PMU Driver
> >> +=======================================
> >> +
> >> +This driver add PMU devices for each PCIe Root Port. And the PMU device is
> >> +named based the BDF of Root Port. For example,
> >> +
> >> + 10:00.0 PCI bridge: Device 1ded:8000 (rev 01)
> >> +
> >> +the PMU device name for this Root Port is pcie_bdf_100000.
> >> +
> >> +Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
> >> +
> >> + $# perf stat -a -e pcie_bdf_200/Rx_PCIe_TLP_Data_Payload/
> >
> > Do you really need to expose a separate PMU instance to userspace for each
> > BDF? I think it would be much cleaner if you could follow the approach used
> > by hisilicon/hisi_pcie_pmu.c and hide these details in the driver, exposing
> > a `bdf=' selector to userspace instead.
>
> Thank you for your valuable comments.
>
> It's a good idea to encode bdf in bitmap and exposing a `bdf=' selector to userspace.
> The problem of bdf selector is that the user need to compute bdf from lanes, do you
> think it is user friendly? I'm worried about increasing the burden of users.

I don't see this as being an issue, particularly if you document how to do
it.

Will