2017-07-21 15:22:37

by John Crispin

[permalink] [raw]
Subject: [RFC 0/2] net-next: hw flow offloading

Hi,

I managed to bring up the flow offloading on latest MedieTek silicon.

When enabling HW flow offloading, the traffic coming in on either of the
GMACs is first sent to the PPE for processing. Any traffic not offloaded
at this point will then be forwarded to the normal RX DMA ring for SW path
processing. In this case the PPE will send additional data inside RXD4
that is later required by the upper layers to populate the flow offloading
engines HW tables properly.

This series is a RFC as i am not sure how to best propagate the additional
info from the RX DMA descriptor. The driver is still using NF hooks and
I plan to rebase it and send it upstream once the flow table offloading
patches that folks are working on are upstream.

I am right now trying to get rid of the remaning hacks in the code and
wanted to know if this series would be a feasible solution.

John

John Crispin (2):
net-next: add a dma_desc element to struct skb_shared_info
net-next: mediatek: populate the shared

drivers/net/ethernet/mediatek/mtk_eth_soc.c | 4 ++++
include/linux/skbuff.h | 1 +
2 files changed, 5 insertions(+)

--
2.11.0


2017-07-21 15:22:38

by John Crispin

[permalink] [raw]
Subject: [RFC 2/2] net-next: mediatek: populate the shared

When enabling HW flow offloading, the traffic coming in on either of the
GMACs is first sent to the PPE for processing. Any traffic not offloaded
at this point will then be forwarded to the normal RX DMA ring for SW path
processing. In this case the PPE will send additional data inside RXD4
that is later required by the upper layers to populate the flow offloading
engines HW tables properly. This patch sets the skb_shared_info's dma_desc
field so that we can use the value later on.

Signed-off-by: John Crispin <[email protected]>
---
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index a455d1b4f1d8..42d162cd6363 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -918,6 +918,7 @@ static void mtk_update_rx_cpu_idx(struct mtk_eth *eth)
static int mtk_poll_rx(struct napi_struct *napi, int budget,
struct mtk_eth *eth)
{
+ struct skb_shared_info *sh;
struct mtk_rx_ring *ring;
int idx;
struct sk_buff *skb;
@@ -1000,6 +1001,9 @@ static int mtk_poll_rx(struct napi_struct *napi, int budget,
else
skb_checksum_none_assert(skb);
skb->protocol = eth_type_trans(skb, netdev);
+ sh = skb_shinfo(skb);
+
+ sh->dma_desc = trxd.rxd4;

if (netdev->features & NETIF_F_HW_VLAN_CTAG_RX &&
RX_DMA_VID(trxd.rxd3))
--
2.11.0

2017-07-21 15:22:35

by John Crispin

[permalink] [raw]
Subject: [RFC 1/2] net-next: add a dma_desc element to struct skb_shared_info

In order to make HW flow offloading work in latest MediaTek silicon we need
to propagate part of the RX DMS descriptor to the upper layers populating
the flow offload engines HW tables. This patch adds an extra element to
struct skb_shared_info allowing the ethernet drivers RX napi code to store
the required information and make it persistent for the lifecycle of the
skb and its clones.

Signed-off-by: John Crispin <[email protected]>
---
include/linux/skbuff.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4093552be1de..db9576cd946b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -426,6 +426,7 @@ struct skb_shared_info {
unsigned int gso_type;
u32 tskey;
__be32 ip6_frag_id;
+ u32 dma_desc;

/*
* Warning : all fields before dataref are cleared in __alloc_skb()
--
2.11.0

2017-07-21 15:56:44

by Paolo Abeni

[permalink] [raw]
Subject: Re: [RFC 1/2] net-next: add a dma_desc element to struct skb_shared_info

Hi,

On Fri, 2017-07-21 at 17:20 +0200, John Crispin wrote:
> In order to make HW flow offloading work in latest MediaTek silicon we need
> to propagate part of the RX DMS descriptor to the upper layers populating
> the flow offload engines HW tables. This patch adds an extra element to
> struct skb_shared_info allowing the ethernet drivers RX napi code to store
> the required information and make it persistent for the lifecycle of the
> skb and its clones.
>
> Signed-off-by: John Crispin <[email protected]>
> ---
> include/linux/skbuff.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 4093552be1de..db9576cd946b 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -426,6 +426,7 @@ struct skb_shared_info {
> unsigned int gso_type;
> u32 tskey;
> __be32 ip6_frag_id;
> + u32 dma_desc;
>
> /*
> * Warning : all fields before dataref are cleared in __alloc_skb()

This will increase the skb_shared_info struct size, which is already
quite large, and can have several kind of performance drawback.
AFAIK this is discouraged.

I don't understand the use case; the driver will set this field, but
who is going to consume it?

Thanks,

Paolo

2017-07-21 17:02:07

by John Crispin

[permalink] [raw]
Subject: Re: [RFC 1/2] net-next: add a dma_desc element to struct skb_shared_info



On 21/07/17 17:56, Paolo Abeni wrote:
> Hi,
>
> On Fri, 2017-07-21 at 17:20 +0200, John Crispin wrote:
>> In order to make HW flow offloading work in latest MediaTek silicon we need
>> to propagate part of the RX DMS descriptor to the upper layers populating
>> the flow offload engines HW tables. This patch adds an extra element to
>> struct skb_shared_info allowing the ethernet drivers RX napi code to store
>> the required information and make it persistent for the lifecycle of the
>> skb and its clones.
>>
>> Signed-off-by: John Crispin <[email protected]>
>> ---
>> include/linux/skbuff.h | 1 +
>> 1 file changed, 1 insertion(+)
>>
>> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> index 4093552be1de..db9576cd946b 100644
>> --- a/include/linux/skbuff.h
>> +++ b/include/linux/skbuff.h
>> @@ -426,6 +426,7 @@ struct skb_shared_info {
>> unsigned int gso_type;
>> u32 tskey;
>> __be32 ip6_frag_id;
>> + u32 dma_desc;
>>
>> /*
>> * Warning : all fields before dataref are cleared in __alloc_skb()
> This will increase the skb_shared_info struct size, which is already
> quite large, and can have several kind of performance drawback.
> AFAIK this is discouraged.
>
> I don't understand the use case; the driver will set this field, but
> who is going to consume it?
>
> Thanks,
>
> Paolo
Hi Paolo,

When the flow offloading engine forwards a packet to the DMA it will
send additional info to the sw path. this includes
* physical switch port
* internal flow hash - this is required to populate the correct flow
table entry
* ppe state - this indicates what state the PPEs internal table is in
for the flow
* the reason why the packet was forwarde - these are things like bind,
unbind, timed out, ...

once the flow table offloading patches are ready and upstream, the
netfilter layer will see the SKB and pass it o to the flow table
offloading code, at which point it will finally end up inside the
offloading driver. this will need to have access to this info sent to
the sw path inside the rx descriptor to properly work out what state the
flow is in and which table entry to populate in the HW table for
offloading to work.

Hope that is a little clearer. current hackish driver is here [1], the
patch to the ethernet driver is here [2]

John

[1]
https://git.lede-project.org/?p=lede/blogic/staging.git;a=tree;f=target/linux/mediatek/files/drivers/net/ethernet/mediatek/mtk_hnat;hb=bc0518b9d928b43d965d8a1f8860281f0ae6a31c
[2]
https://git.lede-project.org/?p=lede/blogic/staging.git;a=blob;f=target/linux/mediatek/patches-4.9/0310-hwnat.patch;h=57bd0c07b39d2169f3ba08e1aa83b92dffcee025;hb=bc0518b9d928b43d965d8a1f8860281f0ae6a31c

2017-07-21 19:21:17

by David Miller

[permalink] [raw]
Subject: Re: [RFC 1/2] net-next: add a dma_desc element to struct skb_shared_info

From: John Crispin <[email protected]>
Date: Fri, 21 Jul 2017 19:01:57 +0200

> When the flow offloading engine forwards a packet to the DMA it will
> send additional info to the sw path. this includes
> * physical switch port
> * internal flow hash - this is required to populate the correct flow
> * table entry
> * ppe state - this indicates what state the PPEs internal table is in
> * for the flow
> * the reason why the packet was forwarde - these are things like bind,
> * unbind, timed out, ...
>
> once the flow table offloading patches are ready and upstream, the
> netfilter layer will see the SKB and pass it o to the flow table
> offloading code, at which point it will finally end up inside the
> offloading driver. this will need to have access to this info sent to
> the sw path inside the rx descriptor to properly work out what state
> the flow is in and which table entry to populate in the HW table for
> offloading to work.

You absolutely must justify any change to a core data structure
alongside the complete and full set of patches that actually make use
of that data structure change.

You can't just say "here is the data structure change and BTW what
actually uses this is somewhere else, and not here on the list yet."

That makes it impossible to 1) evaluate the correctness of your change
and 2) validate the actual use so we can suggest alternative schemes
and/or approaches.

So please don't suggest changes this way.

Thanks.

2017-07-21 20:39:41

by Florian Westphal

[permalink] [raw]
Subject: Re: [RFC 1/2] net-next: add a dma_desc element to struct skb_shared_info

John Crispin <[email protected]> wrote:
> When the flow offloading engine forwards a packet to the DMA it will send
> additional info to the sw path. this includes
> * physical switch port
> * internal flow hash - this is required to populate the correct flow table
> entry
> * ppe state - this indicates what state the PPEs internal table is in for
> the flow
> * the reason why the packet was forwarde - these are things like bind,
> unbind, timed out, ...
>
> once the flow table offloading patches are ready and upstream, the netfilter
> layer will see the SKB and pass it o to the flow table offloading code,

If this is about conntrack offloading, then I prefer if this is done
without changing any core network structure.

What about adding a new conntrack extension to hold whatever info
you need, and then allocate a conntrack entry in the driver?

This would obviously need core changes in conntrack (such as allowing
calls into conntrack from drivers without hard module dependencies,
and a thorough check if this causes backwards problems (e.g.
right now a "-m conntrack" check in the raw table can only succeed for
packets from lo interface).

But I think that could be worked around, esp. if we assume that we
won't see such entries a lot (assuming sw is slowpath and hw handles
most packets).

Thanks,
Florian