I've detected a massive loss of can frames on i.MX6 using flexcan
driver with 4.1-rc8 and tracked this down to following commit:
514ac99c64b22d83b52dfee3b8becaa69a92bc4a - "can: fix multiple delivery
of a single CAN frame for overlapping CAN filters"
514ac99c64b22d83b52dfee3b8becaa69a92bc4a introduces a frame equality
check. Since the sk_buff pointer is not sufficient to do this (buffers
are reused), the check also compares time stamps.
In short: pointer+time stamp was assumed as unique key to a specific
frame.
The problem with this is, that the time stamp is an optional property
and not set per default.
In our case (flexcan) the time stamp is always zero, so the equality
check is reduced to equality of buffer pointers, resulting in a lot of
dropped frames.
Possible solutions I thought of:
1. Every driver has to set a time stamp
(possibly error prone and hard to enforce?)
2. Change the equality check
3. Fulfil the requirements of the equality check by setting a
time stamp per default.
This patch fixes the problem with solution 3. A time stamp is set at
time of allocation in alloc_can_skb.
The time stamp may be overridden later, but the function of the equality
check is ensured.
I'm not really deep in linux network subsystem, so there may exists
more elegant solutions for the problem.
Signed-off-by: Manfred Schlaegl <[email protected]>
---
drivers/net/can/dev.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index b0f6924..282e2e7 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -575,6 +575,7 @@ struct sk_buff *alloc_can_skb(struct net_device *dev, struct can_frame **cf)
if (unlikely(!skb))
return NULL;
+ __net_timestamp(skb);
skb->protocol = htons(ETH_P_CAN);
skb->pkt_type = PACKET_BROADCAST;
skb->ip_summed = CHECKSUM_UNNECESSARY;
--
1.7.10.4
Hello Manfred,
On 06/20/2015 07:21 PM, Manfred Schlaegl wrote:
> I've detected a massive loss of can frames on i.MX6 using flexcan
> driver with 4.1-rc8 and tracked this down to following commit:
> 514ac99c64b22d83b52dfee3b8becaa69a92bc4a - "can: fix multiple delivery
> of a single CAN frame for overlapping CAN filters"
thanks for detecting this issue!
> 514ac99c64b22d83b52dfee3b8becaa69a92bc4a introduces a frame equality
> check. Since the sk_buff pointer is not sufficient to do this (buffers
> are reused), the check also compares time stamps.
> In short: pointer+time stamp was assumed as unique key to a specific
> frame.
> The problem with this is, that the time stamp is an optional property
> and not set per default.
> In our case (flexcan) the time stamp is always zero, so the equality
> check is reduced to equality of buffer pointers, resulting in a lot of
> dropped frames.
The question is why your system did not generate a timestamp at the time of
skb reception.
Usually when netif_rx(), netif_rx_ni() is invoked the timestamp is set in the
following reception process.
flexcan.c only uses netif_receive_skb() - but all theses functions set the
timestamp
net_timestamp_check(netdev_tstamp_prequeue, skb);
depending on netdev_tstamp_prequeue which is configured by
/proc/sys/net/core/netdev_tstamp_prequeue
See the idea of netdev_tstamp_prequeue here:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/core/sysctl_net_core.c?id=3b098e2d7c693796cc4dffb07caa249fc0f70771
Can you tell me the output of /proc/sys/net/core/netdev_tstamp_prequeue on
your machine?
If it's not '1' can you set it to '1' for a test?
>
> Possible solutions I thought of:
> 1. Every driver has to set a time stamp
> (possibly error prone and hard to enforce?)
> 2. Change the equality check
> 3. Fulfil the requirements of the equality check by setting a
> time stamp per default.
>
> This patch fixes the problem with solution 3. A time stamp is set at
> time of allocation in alloc_can_skb.
That's a feasible way if won't find a better way to make sure the timestamps
are generally set before the skb is processed in the NET_RX softirq.
> The time stamp may be overridden later, but the function of the equality
> check is ensured.
>
> I'm not really deep in linux network subsystem, so there may exists
> more elegant solutions for the problem.
>
> Signed-off-by: Manfred Schlaegl <[email protected]>
> ---
> drivers/net/can/dev.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
> index b0f6924..282e2e7 100644
> --- a/drivers/net/can/dev.c
> +++ b/drivers/net/can/dev.c
> @@ -575,6 +575,7 @@ struct sk_buff *alloc_can_skb(struct net_device *dev, struct can_frame **cf)
> if (unlikely(!skb))
> return NULL;
>
> + __net_timestamp(skb);
> skb->protocol = htons(ETH_P_CAN);
> skb->pkt_type = PACKET_BROADCAST;
> skb->ip_summed = CHECKSUM_UNNECESSARY;
>
Please check the netdev_tstamp_prequeue value first.
If we would need solution 3 the __net_timestamp(skb) should be placed in
alloc_canfd_skb() too.
Thanks again for your investigation!
Best regards,
Oliver
Hello Oliver,
On 2015-06-21 00:42, Oliver Hartkopp wrote:
>> 514ac99c64b22d83b52dfee3b8becaa69a92bc4a introduces a frame equality
>> check. Since the sk_buff pointer is not sufficient to do this (buffers
>> are reused), the check also compares time stamps.
>> In short: pointer+time stamp was assumed as unique key to a specific
>> frame.
>> The problem with this is, that the time stamp is an optional property
>> and not set per default.
>> In our case (flexcan) the time stamp is always zero, so the equality
>> check is reduced to equality of buffer pointers, resulting in a lot of
>> dropped frames.
>
> The question is why your system did not generate a timestamp at the time of
> skb reception.
>
> Usually when netif_rx(), netif_rx_ni() is invoked the timestamp is set in the
> following reception process.
>
> flexcan.c only uses netif_receive_skb() - but all theses functions set the
> timestamp
>
> net_timestamp_check(netdev_tstamp_prequeue, skb);
>
> depending on netdev_tstamp_prequeue which is configured by
>
> /proc/sys/net/core/netdev_tstamp_prequeue
>
> See the idea of netdev_tstamp_prequeue here:
>
> http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/net/core/sysctl_net_core.c?id=3b098e2d7c693796cc4dffb07caa249fc0f70771
>
Thank you for the background information!
I've also noticed your patch [PATCH - regression 4.1-rc8] can: fix loss of CAN frames in raw_rcv
> Can you tell me the output of /proc/sys/net/core/netdev_tstamp_prequeue on
> your machine?
/proc/sys/net/core/netdev_tstamp_prequeue is set to 1 (unmodified, default)
I tried to dig a little deeper in timestamping:
1. (net/core/dev.c) I found that static_key_false(&netstamp_needed) is always 0, resulting that the timestamp is never set by net_timestamp_check in netif_receive_skb_internal.
2. (net/core/dev.c) static_key_false(&netstamp_needed) is 0 because net_enable_timestamp is never called.
3. (net/core/sock.c) net_enable_timestamp is never called because SK_FLAGS_TIMESTAMP is not set
4. (net/core/sock.c) SK_FLAGS_TIMESTAMP is not set because neither of SOCK_TIMESTAMP or SOCK_TIMESTAMPING_RX_SOFTWARE is set
5. (net/core/sock.c) SOCK_TIMESTAMP or SOCK_TIMESTAMPING_RX_SOFTWARE is not set because timestamping is an optional feature (according to http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/timestamping.txt?id=b953c0d234bc72e8489d3bf51a276c5c4ec85345) not enabled in my use case (even if netdev_tstamp_prequeue is set to 1)
So the original assumption for the was correct: The correctness of the skb equality check depends on a feature that is not enabled by default (respectively user configurable).
Do you agree with this?
>
> Thanks again for your investigation!
Sure!
Best regards,
Manfred
Hello Manfred,
On 22.06.2015 11:48, Manfred Schlaegl wrote:
>> Can you tell me the output of /proc/sys/net/core/netdev_tstamp_prequeue on
>> your machine?
>
> /proc/sys/net/core/netdev_tstamp_prequeue is set to 1 (unmodified, default)
>
> I tried to dig a little deeper in timestamping:
> 1. (net/core/dev.c) I found that static_key_false(&netstamp_needed) is always 0, resulting that the timestamp is never set by net_timestamp_check in netif_receive_skb_internal.
> 2. (net/core/dev.c) static_key_false(&netstamp_needed) is 0 because net_enable_timestamp is never called.
> 3. (net/core/sock.c) net_enable_timestamp is never called because SK_FLAGS_TIMESTAMP is not set
> 4. (net/core/sock.c) SK_FLAGS_TIMESTAMP is not set because neither of SOCK_TIMESTAMP or SOCK_TIMESTAMPING_RX_SOFTWARE is set
> 5. (net/core/sock.c) SOCK_TIMESTAMP or SOCK_TIMESTAMPING_RX_SOFTWARE is not set because timestamping is an optional feature (according to http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/timestamping.txt?id=b953c0d234bc72e8489d3bf51a276c5c4ec85345) not enabled in my use case (even if netdev_tstamp_prequeue is set to 1)
>
> So the original assumption for the was correct: The correctness of the skb equality check depends on a feature that is not enabled by default (respectively user configurable).
> Do you agree with this?
Yes.
But the point becomes an issue when there's no userspace application that
requires timestamps.
I did my testing wile having at least one "candump" instances running, which
enables timestamping. So when there's no one requesting timestamps the check
in can_rcv does not perform properly.
Therefor my patch grabs your idea to set the timestamps for CAN skbs
unconditionally. But there were some more places in the code where we need to
take care about that.
Regards,
Oliver