2021-11-15 09:21:10

by Matthias Schiffer

[permalink] [raw]
Subject: [PATCH net 0/4] Fix bit timings for m_can_pci (Elkhart Lake)

This series fixes two issues we found with the setup of the CAN
controller of Intel Elkhart Lake CPUs:

- Patch 1 fixes an incorrect reference clock rate, which caused the
configured and the actual bitrate always to differ by a factor of 2.
- Patches 2-4 fix a deviation between the driver and the documentation.
We did not actually see issues without these patches, however we did
only superficial testing and may just not have hit the specific
bittiming values that violate the documented limits.


Matthias Schiffer (4):
can: m_can: pci: fix incorrect reference clock rate
Revert "can: m_can: remove support for custom bit timing"
can: m_can: make custom bittiming fields const
can: m_can: pci: use custom bit timings for Elkhart Lake

drivers/net/can/m_can/m_can.c | 24 ++++++++++++----
drivers/net/can/m_can/m_can.h | 3 ++
drivers/net/can/m_can/m_can_pci.c | 48 ++++++++++++++++++++++++++++---
3 files changed, 65 insertions(+), 10 deletions(-)

--
2.17.1



2021-11-15 09:21:17

by Matthias Schiffer

[permalink] [raw]
Subject: [PATCH net 4/4] can: m_can: pci: use custom bit timings for Elkhart Lake

The relevant datasheet [1] specifies nonstandard limits for the bit timing
parameters. While it is unclear what the exact effect of violating these
limits is, it seems like a good idea to adhere to the documentation.

[1] Intel Atom® x6000E Series, and Intel® Pentium® and Celeron® N and J
Series Processors for IoT Applications Datasheet,
Volume 2 (Book 3 of 3), July 2021, Revision 001

Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
Signed-off-by: Matthias Schiffer <[email protected]>
---
drivers/net/can/m_can/m_can_pci.c | 48 ++++++++++++++++++++++++++++---
1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c
index d3c030a13cbe..8bbbaa264f0d 100644
--- a/drivers/net/can/m_can/m_can_pci.c
+++ b/drivers/net/can/m_can/m_can_pci.c
@@ -18,9 +18,14 @@

#define M_CAN_PCI_MMIO_BAR 0

-#define M_CAN_CLOCK_FREQ_EHL 200000000
#define CTL_CSR_INT_CTL_OFFSET 0x508

+struct m_can_pci_config {
+ const struct can_bittiming_const *bit_timing;
+ const struct can_bittiming_const *data_timing;
+ unsigned int clock_freq;
+};
+
struct m_can_pci_priv {
struct m_can_classdev cdev;

@@ -74,9 +79,40 @@ static struct m_can_ops m_can_pci_ops = {
.read_fifo = iomap_read_fifo,
};

+static const struct can_bittiming_const m_can_bittiming_const_ehl = {
+ .name = KBUILD_MODNAME,
+ .tseg1_min = 2, /* Time segment 1 = prop_seg + phase_seg1 */
+ .tseg1_max = 64,
+ .tseg2_min = 1, /* Time segment 2 = phase_seg2 */
+ .tseg2_max = 128,
+ .sjw_max = 128,
+ .brp_min = 1,
+ .brp_max = 512,
+ .brp_inc = 1,
+};
+
+static const struct can_bittiming_const m_can_data_bittiming_const_ehl = {
+ .name = KBUILD_MODNAME,
+ .tseg1_min = 2, /* Time segment 1 = prop_seg + phase_seg1 */
+ .tseg1_max = 16,
+ .tseg2_min = 1, /* Time segment 2 = phase_seg2 */
+ .tseg2_max = 8,
+ .sjw_max = 4,
+ .brp_min = 1,
+ .brp_max = 32,
+ .brp_inc = 1,
+};
+
+static const struct m_can_pci_config m_can_pci_ehl = {
+ .bit_timing = &m_can_bittiming_const_ehl,
+ .data_timing = &m_can_data_bittiming_const_ehl,
+ .clock_freq = 200000000,
+};
+
static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
{
struct device *dev = &pci->dev;
+ const struct m_can_pci_config *cfg;
struct m_can_classdev *mcan_class;
struct m_can_pci_priv *priv;
void __iomem *base;
@@ -104,6 +140,8 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
if (!mcan_class)
return -ENOMEM;

+ cfg = (const struct m_can_pci_config *)id->driver_data;
+
priv = cdev_to_priv(mcan_class);

priv->base = base;
@@ -115,7 +153,9 @@ static int m_can_pci_probe(struct pci_dev *pci, const struct pci_device_id *id)
mcan_class->dev = &pci->dev;
mcan_class->net->irq = pci_irq_vector(pci, 0);
mcan_class->pm_clock_support = 1;
- mcan_class->can.clock.freq = id->driver_data;
+ mcan_class->bit_timing = cfg->bit_timing;
+ mcan_class->data_timing = cfg->data_timing;
+ mcan_class->can.clock.freq = cfg->clock_freq;
mcan_class->ops = &m_can_pci_ops;

pci_set_drvdata(pci, mcan_class);
@@ -168,8 +208,8 @@ static SIMPLE_DEV_PM_OPS(m_can_pci_pm_ops,
m_can_pci_suspend, m_can_pci_resume);

static const struct pci_device_id m_can_pci_id_table[] = {
- { PCI_VDEVICE(INTEL, 0x4bc1), M_CAN_CLOCK_FREQ_EHL, },
- { PCI_VDEVICE(INTEL, 0x4bc2), M_CAN_CLOCK_FREQ_EHL, },
+ { PCI_VDEVICE(INTEL, 0x4bc1), (kernel_ulong_t)&m_can_pci_ehl, },
+ { PCI_VDEVICE(INTEL, 0x4bc2), (kernel_ulong_t)&m_can_pci_ehl, },
{ } /* Terminating Entry */
};
MODULE_DEVICE_TABLE(pci, m_can_pci_id_table);
--
2.17.1


2021-11-15 09:21:39

by Matthias Schiffer

[permalink] [raw]
Subject: [PATCH net 3/4] can: m_can: make custom bittiming fields const

The assigned timing structs will be defined a const anyway, so we can
avoid a few casts by declaring the struct fields as const as well.

Signed-off-by: Matthias Schiffer <[email protected]>
---
drivers/net/can/m_can/m_can.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
index ad063b101411..2c5d40997168 100644
--- a/drivers/net/can/m_can/m_can.h
+++ b/drivers/net/can/m_can/m_can.h
@@ -85,8 +85,8 @@ struct m_can_classdev {
struct sk_buff *tx_skb;
struct phy *transceiver;

- struct can_bittiming_const *bit_timing;
- struct can_bittiming_const *data_timing;
+ const struct can_bittiming_const *bit_timing;
+ const struct can_bittiming_const *data_timing;

struct m_can_ops *ops;

--
2.17.1


2021-11-15 09:21:49

by Matthias Schiffer

[permalink] [raw]
Subject: [PATCH net 2/4] Revert "can: m_can: remove support for custom bit timing"

The timing limits specified by the Elkhart Lake CPU datasheets do not
match the defaults. Let's reintroduce the support for custom bit timings.

This reverts commit 0ddd83fbebbc5537f9d180d31f659db3564be708.

Signed-off-by: Matthias Schiffer <[email protected]>
---
drivers/net/can/m_can/m_can.c | 24 ++++++++++++++++++------
drivers/net/can/m_can/m_can.h | 3 +++
2 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 2470c47b2e31..529f754faae6 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1494,20 +1494,32 @@ static int m_can_dev_setup(struct m_can_classdev *cdev)
case 30:
/* CAN_CTRLMODE_FD_NON_ISO is fixed with M_CAN IP v3.0.x */
can_set_static_ctrlmode(dev, CAN_CTRLMODE_FD_NON_ISO);
- cdev->can.bittiming_const = &m_can_bittiming_const_30X;
- cdev->can.data_bittiming_const = &m_can_data_bittiming_const_30X;
+ cdev->can.bittiming_const = cdev->bit_timing ?
+ cdev->bit_timing : &m_can_bittiming_const_30X;
+
+ cdev->can.data_bittiming_const = cdev->data_timing ?
+ cdev->data_timing :
+ &m_can_data_bittiming_const_30X;
break;
case 31:
/* CAN_CTRLMODE_FD_NON_ISO is fixed with M_CAN IP v3.1.x */
can_set_static_ctrlmode(dev, CAN_CTRLMODE_FD_NON_ISO);
- cdev->can.bittiming_const = &m_can_bittiming_const_31X;
- cdev->can.data_bittiming_const = &m_can_data_bittiming_const_31X;
+ cdev->can.bittiming_const = cdev->bit_timing ?
+ cdev->bit_timing : &m_can_bittiming_const_31X;
+
+ cdev->can.data_bittiming_const = cdev->data_timing ?
+ cdev->data_timing :
+ &m_can_data_bittiming_const_31X;
break;
case 32:
case 33:
/* Support both MCAN version v3.2.x and v3.3.0 */
- cdev->can.bittiming_const = &m_can_bittiming_const_31X;
- cdev->can.data_bittiming_const = &m_can_data_bittiming_const_31X;
+ cdev->can.bittiming_const = cdev->bit_timing ?
+ cdev->bit_timing : &m_can_bittiming_const_31X;
+
+ cdev->can.data_bittiming_const = cdev->data_timing ?
+ cdev->data_timing :
+ &m_can_data_bittiming_const_31X;

cdev->can.ctrlmode_supported |=
(m_can_niso_supported(cdev) ?
diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
index d18b515e6ccc..ad063b101411 100644
--- a/drivers/net/can/m_can/m_can.h
+++ b/drivers/net/can/m_can/m_can.h
@@ -85,6 +85,9 @@ struct m_can_classdev {
struct sk_buff *tx_skb;
struct phy *transceiver;

+ struct can_bittiming_const *bit_timing;
+ struct can_bittiming_const *data_timing;
+
struct m_can_ops *ops;

int version;
--
2.17.1


2021-11-15 09:21:53

by Matthias Schiffer

[permalink] [raw]
Subject: [PATCH net 1/4] can: m_can: pci: fix incorrect reference clock rate

When testing the CAN controller on our Ekhart Lake hardware, we
determined that all communication was running with twice the configured
bitrate. Changing the reference clock rate from 100MHz to 200MHz fixed
this. Intel's support has confirmed to us that 200MHz is indeed the
correct clock rate.

Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
Signed-off-by: Matthias Schiffer <[email protected]>
---
drivers/net/can/m_can/m_can_pci.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c
index 89cc3d41e952..d3c030a13cbe 100644
--- a/drivers/net/can/m_can/m_can_pci.c
+++ b/drivers/net/can/m_can/m_can_pci.c
@@ -18,7 +18,7 @@

#define M_CAN_PCI_MMIO_BAR 0

-#define M_CAN_CLOCK_FREQ_EHL 100000000
+#define M_CAN_CLOCK_FREQ_EHL 200000000
#define CTL_CSR_INT_CTL_OFFSET 0x508

struct m_can_pci_priv {
--
2.17.1


2021-11-15 14:49:14

by Jarkko Nikula

[permalink] [raw]
Subject: Re: [PATCH net 1/4] can: m_can: pci: fix incorrect reference clock rate

Hi

On 11/15/21 11:18 AM, Matthias Schiffer wrote:
> When testing the CAN controller on our Ekhart Lake hardware, we
> determined that all communication was running with twice the configured
> bitrate. Changing the reference clock rate from 100MHz to 200MHz fixed
> this. Intel's support has confirmed to us that 200MHz is indeed the
> correct clock rate.
>
> Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel Elkhart Lake")
> Signed-off-by: Matthias Schiffer <[email protected]>
> ---
> drivers/net/can/m_can/m_can_pci.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/can/m_can/m_can_pci.c b/drivers/net/can/m_can/m_can_pci.c
> index 89cc3d41e952..d3c030a13cbe 100644
> --- a/drivers/net/can/m_can/m_can_pci.c
> +++ b/drivers/net/can/m_can/m_can_pci.c
> @@ -18,7 +18,7 @@
>
> #define M_CAN_PCI_MMIO_BAR 0
>
> -#define M_CAN_CLOCK_FREQ_EHL 100000000
> +#define M_CAN_CLOCK_FREQ_EHL 200000000
> #define CTL_CSR_INT_CTL_OFFSET 0x508
>
I'll double check this from HW people but at quick test on an HW I have
the signals on an oscilloscope were having 1 us shortest cycle (~500 ns
low, ~500 ns high) when testing like below:

ip link set can0 type can bitrate 1000000 dbitrate 2000000 fd on
ip link set can0 up
ip link set can1 type can bitrate 1000000 dbitrate 2000000 fd on
ip link set can1 up

candump can0 &

cansend can1 01a#11223344AABBCCDD

Caveat: I'm not an CAN signaling expert at all.

Jarkko

2021-11-16 07:13:16

by Jarkko Nikula

[permalink] [raw]
Subject: Re: [PATCH net 1/4] can: m_can: pci: fix incorrect reference clock rate

Hi

On 11/15/21 4:48 PM, Jarkko Nikula wrote:
> Hi
>
> On 11/15/21 11:18 AM, Matthias Schiffer wrote:
>> When testing the CAN controller on our Ekhart Lake hardware, we
>> determined that all communication was running with twice the configured
>> bitrate. Changing the reference clock rate from 100MHz to 200MHz fixed
>> this. Intel's support has confirmed to us that 200MHz is indeed the
>> correct clock rate.
>>
>> Fixes: cab7ffc0324f ("can: m_can: add PCI glue driver for Intel
>> Elkhart Lake")
>> Signed-off-by: Matthias Schiffer <[email protected]>
>> ---
>>   drivers/net/can/m_can/m_can_pci.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/can/m_can/m_can_pci.c
>> b/drivers/net/can/m_can/m_can_pci.c
>> index 89cc3d41e952..d3c030a13cbe 100644
>> --- a/drivers/net/can/m_can/m_can_pci.c
>> +++ b/drivers/net/can/m_can/m_can_pci.c
>> @@ -18,7 +18,7 @@
>>   #define M_CAN_PCI_MMIO_BAR        0
>> -#define M_CAN_CLOCK_FREQ_EHL        100000000
>> +#define M_CAN_CLOCK_FREQ_EHL        200000000
>>   #define CTL_CSR_INT_CTL_OFFSET        0x508
> I'll double check this from HW people but at quick test on an HW I have
> the signals on an oscilloscope were having 1 us shortest cycle (~500 ns
> low, ~500 ns high) when testing like below:
>
> ip link set can0 type can bitrate 1000000 dbitrate 2000000 fd on

I got confirmation the clock to CAN controller is indeed changed from
100 MHz to 200 MHz in release HW & firmware.

I haven't upgraded the FW in a while on our HW so that perhaps explain
why I was seeing expected rate :-)

So which one is more appropriate:

Acked-by: Jarkko Nikula <[email protected]>
or
Reviewed-by: Jarkko Nikula <[email protected]>

2021-11-16 07:16:29

by Marc Kleine-Budde

[permalink] [raw]
Subject: Re: [PATCH net 1/4] can: m_can: pci: fix incorrect reference clock rate

On 16.11.2021 09:11:40, Jarkko Nikula wrote:
> > ip link set can0 type can bitrate 1000000 dbitrate 2000000 fd on
>
> I got confirmation the clock to CAN controller is indeed changed from 100
> MHz to 200 MHz in release HW & firmware.
>
> I haven't upgraded the FW in a while on our HW so that perhaps explain
> why I was seeing expected rate :-)

Can we query the FW version in the driver and set the clock rate
accordingly?

> So which one is more appropriate:
>
> Acked-by: Jarkko Nikula <[email protected]>
> or
> Reviewed-by: Jarkko Nikula <[email protected]>

regards,
Marc

--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


Attachments:
(No filename) (872.00 B)
signature.asc (488.00 B)
Download all attachments

2021-11-16 13:58:26

by Matthias Schiffer

[permalink] [raw]
Subject: Re: [PATCH net 0/4] Fix bit timings for m_can_pci (Elkhart Lake)

On Mon, 2021-11-15 at 10:18 +0100, Matthias Schiffer wrote:
> This series fixes two issues we found with the setup of the CAN
> controller of Intel Elkhart Lake CPUs:
>
> - Patch 1 fixes an incorrect reference clock rate, which caused the
> configured and the actual bitrate always to differ by a factor of 2.
> - Patches 2-4 fix a deviation between the driver and the documentation.
> We did not actually see issues without these patches, however we did
> only superficial testing and may just not have hit the specific
> bittiming values that violate the documented limits.
>
>
> Matthias Schiffer (4):
> can: m_can: pci: fix incorrect reference clock rate
> Revert "can: m_can: remove support for custom bit timing"
> can: m_can: make custom bittiming fields const
> can: m_can: pci: use custom bit timings for Elkhart Lake
>
> drivers/net/can/m_can/m_can.c | 24 ++++++++++++----
> drivers/net/can/m_can/m_can.h | 3 ++
> drivers/net/can/m_can/m_can_pci.c | 48 ++++++++++++++++++++++++++++---
> 3 files changed, 65 insertions(+), 10 deletions(-)
>

I just noticed that m_can_pci is completely broken on 5.15.2, while
it's working fine on 5.14.y.

I assume something simliar to [1] will be necessary in m_can_pci as
well, however I'm not really familiar with the driver. There is no
"mram_base" in m_can_plat_pci, only "base". Is using "base" with
iowrite32/ioread32 + manual increment the correct solution here?


[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=99d173fbe8944861a00ebd1c73817a1260d21e60


2021-11-16 14:51:26

by Jarkko Nikula

[permalink] [raw]
Subject: Re: [PATCH net 1/4] can: m_can: pci: fix incorrect reference clock rate

On 11/16/21 9:15 AM, Marc Kleine-Budde wrote:
> On 16.11.2021 09:11:40, Jarkko Nikula wrote:
>>> ip link set can0 type can bitrate 1000000 dbitrate 2000000 fd on
>>
>> I got confirmation the clock to CAN controller is indeed changed from 100
>> MHz to 200 MHz in release HW & firmware.
>>
>> I haven't upgraded the FW in a while on our HW so that perhaps explain
>> why I was seeing expected rate :-)
>
> Can we query the FW version in the driver and set the clock rate
> accordingly?
>
Perhaps or check some clock register. I guess for now it can be
considered fixed clock since I understood rate was changed before
released to customers. I.e. customers shouldn't have firmware with
different rates. At least I hope so :-)

Jarkko

2021-11-17 12:14:34

by Jarkko Nikula

[permalink] [raw]
Subject: Re: [PATCH net 0/4] Fix bit timings for m_can_pci (Elkhart Lake)

Hi

On 11/16/21 3:58 PM, Matthias Schiffer wrote:
> I just noticed that m_can_pci is completely broken on 5.15.2, while
> it's working fine on 5.14.y.
>
Hmm.. so that may explain why I once saw candump received just zeroes on
v5.15-rc something but earlier kernels were ok. What's odd then next
time v5.15-rc was ok so went blaming sun spots instead of bisecting.

> I assume something simliar to [1] will be necessary in m_can_pci as
> well, however I'm not really familiar with the driver. There is no
> "mram_base" in m_can_plat_pci, only "base". Is using "base" with
> iowrite32/ioread32 + manual increment the correct solution here?
>
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=99d173fbe8944861a00ebd1c73817a1260d21e60
>
If your test case after 5.15 reliably fails are you able to bisect or
check does the regression originate from the same commit?

Jarkko

2021-11-18 14:47:57

by Matthias Schiffer

[permalink] [raw]
Subject: Re: [PATCH net 0/4] Fix bit timings for m_can_pci (Elkhart Lake)

On Wed, 2021-11-17 at 14:14 +0200, Jarkko Nikula wrote:
> Hi
>
> On 11/16/21 3:58 PM, Matthias Schiffer wrote:
> > I just noticed that m_can_pci is completely broken on 5.15.2, while
> > it's working fine on 5.14.y.
> >
>
> Hmm.. so that may explain why I once saw candump received just zeroes on
> v5.15-rc something but earlier kernels were ok. What's odd then next
> time v5.15-rc was ok so went blaming sun spots instead of bisecting.
>
> > I assume something simliar to [1] will be necessary in m_can_pci as
> > well, however I'm not really familiar with the driver. There is no
> > "mram_base" in m_can_plat_pci, only "base". Is using "base" with
> > iowrite32/ioread32 + manual increment the correct solution here?
> >
> >
> > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=99d173fbe8944861a00ebd1c73817a1260d21e60
> >
>
> If your test case after 5.15 reliably fails are you able to bisect or
> check does the regression originate from the same commit?
>
> Jarkko

The Fixes tag of 99d173fbe894 ("can: m_can: fix iomap_read_fifo() and
iomap_write_fifo()") is off AFAICT, the actual breakage happened with
812270e5445b ("can: m_can: Batch FIFO writes during CAN transmit") and
1aa6772f64b4 ("can: m_can: Batch FIFO reads during CAN receive");
reverting these two patches fixes the regression.

I just sent a patch for m_can_pci that applies the same fix that was
done in m_can_platform in 99d173fbe894, and verified that this makes
CAN communication work on Elkhart Lake with Linux 5.15.y together with
my other patches.

Matthias