2021-04-03 11:50:52

by Oleksij Rempel

[permalink] [raw]
Subject: [PATCH net-next v1 1/9] net: dsa: add rcv_post call back

Some switches (for example ar9331) do not provide enough information
about forwarded packets. If the switch decision was made based on IPv4
or IPv6 header, we need to analyze it and set proper flag.

Potentially we can do it in existing rcv path, on other hand we can
avoid part of duplicated work and let the dsa framework set skb header
pointers and then use preprocessed skb one step later withing the rcv_post
call back.

This patch is needed for ar9331 switch.

Signed-off-by: Oleksij Rempel <[email protected]>
---
include/net/dsa.h | 2 ++
net/dsa/dsa.c | 4 ++++
net/dsa/port.c | 1 +
3 files changed, 7 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 57b2c49f72f4..f1a7aa4303a7 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -84,6 +84,7 @@ struct dsa_device_ops {
struct sk_buff *(*xmit)(struct sk_buff *skb, struct net_device *dev);
struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt);
+ void (*rcv_post)(struct sk_buff *skb);
void (*flow_dissect)(const struct sk_buff *skb, __be16 *proto,
int *offset);
/* Used to determine which traffic should match the DSA filter in
@@ -247,6 +248,7 @@ struct dsa_port {
struct dsa_switch_tree *dst;
struct sk_buff *(*rcv)(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt);
+ void (*rcv_post)(struct sk_buff *skb);
bool (*filter)(const struct sk_buff *skb, struct net_device *dev);

enum {
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 84cad1be9ce4..fa3e7201e760 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -249,6 +249,10 @@ static int dsa_switch_rcv(struct sk_buff *skb, struct net_device *dev,
skb->pkt_type = PACKET_HOST;
skb->protocol = eth_type_trans(skb, skb->dev);

+
+ if (cpu_dp->rcv_post)
+ cpu_dp->rcv_post(skb);
+
if (unlikely(!dsa_slave_dev_check(skb->dev))) {
/* Packet is to be injected directly on an upper
* device, e.g. a team/bond, so skip all DSA-port
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 01e30264b25b..859957688c62 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -720,6 +720,7 @@ void dsa_port_set_tag_protocol(struct dsa_port *cpu_dp,
{
cpu_dp->filter = tag_ops->filter;
cpu_dp->rcv = tag_ops->rcv;
+ cpu_dp->rcv_post = tag_ops->rcv_post;
cpu_dp->tag_ops = tag_ops;
}

--
2.29.2


2021-04-03 14:10:08

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v1 1/9] net: dsa: add rcv_post call back

On Sat, Apr 03, 2021 at 01:48:40PM +0200, Oleksij Rempel wrote:
> Some switches (for example ar9331) do not provide enough information
> about forwarded packets. If the switch decision was made based on IPv4
> or IPv6 header, we need to analyze it and set proper flag.
>
> Potentially we can do it in existing rcv path, on other hand we can
> avoid part of duplicated work and let the dsa framework set skb header
> pointers and then use preprocessed skb one step later withing the rcv_post
> call back.
>
> This patch is needed for ar9331 switch.
>
> Signed-off-by: Oleksij Rempel <[email protected]>
> ---

I don't necessarily disagree with this, perhaps we can even move
Florian's dsa_untag_bridge_pvid() call inside a rcv_post() method
implemented by the DSA_TAG_PROTO_BRCM_LEGACY, DSA_TAG_PROTO_BRCM_PREPEND
and DSA_TAG_PROTO_BRCM taggers. Or even better, because Oleksij's
rcv_post is already prototype-compatible with dsa_untag_bridge_pvid, we
can already do:

.rcv_post = dsa_untag_bridge_pvid,

This should be generally useful for stuff that DSA taggers need to do
which is easiest done after eth_type_trans() was called.

2021-04-03 23:26:48

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v1 1/9] net: dsa: add rcv_post call back

On Sat, Apr 03, 2021 at 05:05:34PM +0300, Vladimir Oltean wrote:
> On Sat, Apr 03, 2021 at 01:48:40PM +0200, Oleksij Rempel wrote:
> > Some switches (for example ar9331) do not provide enough information
> > about forwarded packets. If the switch decision was made based on IPv4
> > or IPv6 header, we need to analyze it and set proper flag.
> >
> > Potentially we can do it in existing rcv path, on other hand we can
> > avoid part of duplicated work and let the dsa framework set skb header
> > pointers and then use preprocessed skb one step later withing the rcv_post
> > call back.
> >
> > This patch is needed for ar9331 switch.
> >
> > Signed-off-by: Oleksij Rempel <[email protected]>
> > ---
>
> I don't necessarily disagree with this, perhaps we can even move
> Florian's dsa_untag_bridge_pvid() call inside a rcv_post() method
> implemented by the DSA_TAG_PROTO_BRCM_LEGACY, DSA_TAG_PROTO_BRCM_PREPEND
> and DSA_TAG_PROTO_BRCM taggers. Or even better, because Oleksij's
> rcv_post is already prototype-compatible with dsa_untag_bridge_pvid, we
> can already do:
>
> .rcv_post = dsa_untag_bridge_pvid,
>
> This should be generally useful for stuff that DSA taggers need to do
> which is easiest done after eth_type_trans() was called.

I had some fun with an alternative method of parsing the frame for IGMP
so that you can clear skb->offload_fwd_mark, which doesn't rely on the
introduction of a new method in DSA. It should also have several other
advantages compared to your solution such as the fact that it should
work with VLAN-tagged packets.

Background: we made Receive Packet Steering work on DSA master interfaces
(echo 3 > /sys/class/net/eth0/queues/rx-1/rps_cpus) even when the DSA
tag shifts to the right the IP headers and everything that comes
afterwards. The flow dissector had to be patched for that, just grep for
DSA in net/core/flow_dissector.c.

The problem you're facing is that you can't parse the IP and IGMP
headers in the tagger's rcv() method, since the network header,
transport header offsets and skb->protocol are all messed up, since
eth_type_trans hasn't been called yet.

And that's the trick right there, you're between a rock and a hard
place: too early because eth_type_trans wasn't called yet, and too late
because skb->dev was changed and no longer points to the DSA master, so
the flow dissector adjustment we made doesn't apply.

But if you call the flow dissector _before_ you call "skb->dev =
dsa_master_find_slave" (and yes, while the DSA tag is still there), then
it's virtually as if you had called that while the skb belonged to the
DSA master, so it should work with __skb_flow_dissect.

In fact I prototyped this idea below. I wanted to check whether I can
match something as fine-grained as an IGMPv2 Membership Report message,
and I could.

I prototyped it inside the ocelot tagging protocol driver because that's
what I had handy. I used __skb_flow_dissect with my own flow dissector
which had to be initialized at the tagger module_init time, even though
I think I could have probably just called skb_flow_dissect_flow_keys
with a standard dissector, and that would have removed the need for the
custom module_init in tag_ocelot.c. One thing that is interesting is
that I had to add the bits for IGMP parsing to the flow dissector
myself (based on the existing ICMP code). I was too lazy to do that for
MLD as well, but it is really not hard. Or even better, if you don't
need to look at all inside the IGMP/MLD header, I think you can even
omit adding this parsing code to the flow dissector and just look at
basic.n_proto and basic.ip_proto.

See the snippet below. Hope it helps.

-----------------------------[ cut here ]-----------------------------
diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index ffd386ea0dbb..4c25fa47637a 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -190,6 +190,20 @@ struct flow_dissector_key_icmp {
u16 id;
};

+/**
+ * flow_dissector_key_igmp:
+ * type: indicates the message type, see include/uapi/linux/igmp.h
+ * code: Max Resp Code, the maximum time in 1/10 second
+ * increments before sending a responding report
+ * group: the multicast address being queried when sending a
+ * Group-Specific or Group-and-Source-Specific Query.
+ */
+struct flow_dissector_key_igmp {
+ u8 type;
+ u8 code; /* Max Resp Time in IGMPv2 */
+ __be32 group;
+};
+
/**
* struct flow_dissector_key_eth_addrs:
* @src: source Ethernet address
@@ -259,6 +273,7 @@ enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
FLOW_DISSECTOR_KEY_PORTS_RANGE, /* struct flow_dissector_key_ports */
FLOW_DISSECTOR_KEY_ICMP, /* struct flow_dissector_key_icmp */
+ FLOW_DISSECTOR_KEY_IGMP, /* struct flow_dissector_key_igmp */
FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
FLOW_DISSECTOR_KEY_TIPC, /* struct flow_dissector_key_tipc */
FLOW_DISSECTOR_KEY_ARP, /* struct flow_dissector_key_arp */
@@ -314,6 +329,7 @@ struct flow_keys {
struct flow_dissector_key_keyid keyid;
struct flow_dissector_key_ports ports;
struct flow_dissector_key_icmp icmp;
+ struct flow_dissector_key_igmp igmp;
/* 'addrs' must be the last member */
struct flow_dissector_key_addrs addrs;
};
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 5985029e43d4..8cc8c34ea5cd 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -202,6 +202,30 @@ static void __skb_flow_dissect_icmp(const struct sk_buff *skb,
skb_flow_get_icmp_tci(skb, key_icmp, data, thoff, hlen);
}

+static void __skb_flow_dissect_igmp(const struct sk_buff *skb,
+ struct flow_dissector *flow_dissector,
+ void *target_container, const void *data,
+ int thoff, int hlen)
+{
+ struct flow_dissector_key_igmp *key_igmp;
+ struct igmphdr *ih, _ih;
+
+ if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_IGMP))
+ return;
+
+ ih = __skb_header_pointer(skb, thoff, sizeof(_ih), data, hlen, &_ih);
+ if (!ih)
+ return;
+
+ key_igmp = skb_flow_dissector_target(flow_dissector,
+ FLOW_DISSECTOR_KEY_IGMP,
+ target_container);
+
+ key_igmp->type = ih->type;
+ key_igmp->code = ih->code;
+ key_igmp->group = ih->group;
+}
+
void skb_flow_dissect_meta(const struct sk_buff *skb,
struct flow_dissector *flow_dissector,
void *target_container)
@@ -1398,6 +1422,11 @@ bool __skb_flow_dissect(const struct net *net,
data, nhoff, hlen);
break;

+ case IPPROTO_IGMP:
+ __skb_flow_dissect_igmp(skb, flow_dissector, target_container,
+ data, nhoff, hlen);
+ break;
+
default:
break;
}
diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
index f9df9cac81c5..a2cc824ddeec 100644
--- a/net/dsa/tag_ocelot.c
+++ b/net/dsa/tag_ocelot.c
@@ -2,9 +2,51 @@
/* Copyright 2019 NXP Semiconductors
*/
#include <linux/dsa/ocelot.h>
+#include <linux/igmp.h>
#include <soc/mscc/ocelot.h>
#include "dsa_priv.h"

+static const struct flow_dissector_key ocelot_flow_keys[] = {
+ {
+ .key_id = FLOW_DISSECTOR_KEY_CONTROL,
+ .offset = offsetof(struct flow_keys, control),
+ },
+ {
+ .key_id = FLOW_DISSECTOR_KEY_BASIC,
+ .offset = offsetof(struct flow_keys, basic),
+ },
+ {
+ .key_id = FLOW_DISSECTOR_KEY_IGMP,
+ .offset = offsetof(struct flow_keys, igmp),
+ },
+};
+
+static struct flow_dissector ocelot_flow_dissector __read_mostly;
+
+static struct sk_buff *ocelot_drop_igmp(struct sk_buff *skb)
+{
+ struct flow_keys fk;
+
+ memset(&fk, 0, sizeof(fk));
+
+ if (!__skb_flow_dissect(NULL, skb, &ocelot_flow_dissector,
+ &fk, NULL, 0, 0, 0, 0))
+ return skb;
+
+ if (fk.basic.n_proto != htons(ETH_P_IP))
+ return skb;
+
+ if (fk.basic.ip_proto != IPPROTO_IGMP)
+ return skb;
+
+ if (fk.igmp.type != IGMPV2_HOST_MEMBERSHIP_REPORT)
+ return skb;
+
+ skb_dump(KERN_ERR, skb, true);
+
+ return NULL;
+}
+
static void ocelot_xmit_ptp(struct dsa_port *dp, void *injection,
struct sk_buff *clone)
{
@@ -84,6 +126,10 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
u8 *extraction;
u16 vlan_tpid;

+ skb = ocelot_drop_igmp(skb);
+ if (!skb)
+ return NULL;
+
/* Revert skb->data by the amount consumed by the DSA master,
* so it points to the beginning of the frame.
*/
@@ -186,6 +232,23 @@ static struct dsa_tag_driver *ocelot_tag_driver_array[] = {
&DSA_TAG_DRIVER_NAME(seville_netdev_ops),
};

-module_dsa_tag_drivers(ocelot_tag_driver_array);
+static int __init dsa_tag_driver_module_init(void)
+{
+ skb_flow_dissector_init(&ocelot_flow_dissector, ocelot_flow_keys,
+ ARRAY_SIZE(ocelot_flow_keys));
+
+ dsa_tag_drivers_register(ocelot_tag_driver_array,
+ ARRAY_SIZE(ocelot_tag_driver_array),
+ THIS_MODULE);
+ return 0;
+}
+module_init(dsa_tag_driver_module_init);
+
+static void __exit dsa_tag_driver_module_exit(void)
+{
+ dsa_tag_drivers_unregister(ocelot_tag_driver_array,
+ ARRAY_SIZE(ocelot_tag_driver_array));
+}
+module_exit(dsa_tag_driver_module_exit)

MODULE_LICENSE("GPL v2");
-----------------------------[ cut here ]-----------------------------

2021-04-04 02:35:09

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH net-next v1 1/9] net: dsa: add rcv_post call back



On 4/3/2021 16:21, Vladimir Oltean wrote:
> On Sat, Apr 03, 2021 at 05:05:34PM +0300, Vladimir Oltean wrote:
>> On Sat, Apr 03, 2021 at 01:48:40PM +0200, Oleksij Rempel wrote:
>>> Some switches (for example ar9331) do not provide enough information
>>> about forwarded packets. If the switch decision was made based on IPv4
>>> or IPv6 header, we need to analyze it and set proper flag.
>>>
>>> Potentially we can do it in existing rcv path, on other hand we can
>>> avoid part of duplicated work and let the dsa framework set skb header
>>> pointers and then use preprocessed skb one step later withing the rcv_post
>>> call back.
>>>
>>> This patch is needed for ar9331 switch.
>>>
>>> Signed-off-by: Oleksij Rempel <[email protected]>
>>> ---
>>
>> I don't necessarily disagree with this, perhaps we can even move
>> Florian's dsa_untag_bridge_pvid() call inside a rcv_post() method
>> implemented by the DSA_TAG_PROTO_BRCM_LEGACY, DSA_TAG_PROTO_BRCM_PREPEND
>> and DSA_TAG_PROTO_BRCM taggers. Or even better, because Oleksij's
>> rcv_post is already prototype-compatible with dsa_untag_bridge_pvid, we
>> can already do:
>>
>> .rcv_post = dsa_untag_bridge_pvid,
>>
>> This should be generally useful for stuff that DSA taggers need to do
>> which is easiest done after eth_type_trans() was called.
>
> I had some fun with an alternative method of parsing the frame for IGMP
> so that you can clear skb->offload_fwd_mark, which doesn't rely on the
> introduction of a new method in DSA. It should also have several other
> advantages compared to your solution such as the fact that it should
> work with VLAN-tagged packets.
>
> Background: we made Receive Packet Steering work on DSA master interfaces
> (echo 3 > /sys/class/net/eth0/queues/rx-1/rps_cpus) even when the DSA
> tag shifts to the right the IP headers and everything that comes
> afterwards. The flow dissector had to be patched for that, just grep for
> DSA in net/core/flow_dissector.c.
>
> The problem you're facing is that you can't parse the IP and IGMP
> headers in the tagger's rcv() method, since the network header,
> transport header offsets and skb->protocol are all messed up, since
> eth_type_trans hasn't been called yet.
>
> And that's the trick right there, you're between a rock and a hard
> place: too early because eth_type_trans wasn't called yet, and too late
> because skb->dev was changed and no longer points to the DSA master, so
> the flow dissector adjustment we made doesn't apply.
>
> But if you call the flow dissector _before_ you call "skb->dev =
> dsa_master_find_slave" (and yes, while the DSA tag is still there), then
> it's virtually as if you had called that while the skb belonged to the
> DSA master, so it should work with __skb_flow_dissect.
>
> In fact I prototyped this idea below. I wanted to check whether I can
> match something as fine-grained as an IGMPv2 Membership Report message,
> and I could.
>
> I prototyped it inside the ocelot tagging protocol driver because that's
> what I had handy. I used __skb_flow_dissect with my own flow dissector
> which had to be initialized at the tagger module_init time, even though
> I think I could have probably just called skb_flow_dissect_flow_keys
> with a standard dissector, and that would have removed the need for the
> custom module_init in tag_ocelot.c. One thing that is interesting is
> that I had to add the bits for IGMP parsing to the flow dissector
> myself (based on the existing ICMP code). I was too lazy to do that for
> MLD as well, but it is really not hard. Or even better, if you don't
> need to look at all inside the IGMP/MLD header, I think you can even
> omit adding this parsing code to the flow dissector and just look at
> basic.n_proto and basic.ip_proto.
>
> See the snippet below. Hope it helps.

This looks a lot better than introducing hooks at various points in
dsa_switch_rcv().
--
Florian

2021-04-04 05:52:34

by Oleksij Rempel

[permalink] [raw]
Subject: Re: [PATCH net-next v1 1/9] net: dsa: add rcv_post call back

Am 04.04.21 um 01:21 schrieb Vladimir Oltean:
> On Sat, Apr 03, 2021 at 05:05:34PM +0300, Vladimir Oltean wrote:
>> On Sat, Apr 03, 2021 at 01:48:40PM +0200, Oleksij Rempel wrote:
>>> Some switches (for example ar9331) do not provide enough information
>>> about forwarded packets. If the switch decision was made based on IPv4
>>> or IPv6 header, we need to analyze it and set proper flag.
>>>
>>> Potentially we can do it in existing rcv path, on other hand we can
>>> avoid part of duplicated work and let the dsa framework set skb header
>>> pointers and then use preprocessed skb one step later withing the rcv_post
>>> call back.
>>>
>>> This patch is needed for ar9331 switch.
>>>
>>> Signed-off-by: Oleksij Rempel <[email protected]>
>>> ---
>>
>> I don't necessarily disagree with this, perhaps we can even move
>> Florian's dsa_untag_bridge_pvid() call inside a rcv_post() method
>> implemented by the DSA_TAG_PROTO_BRCM_LEGACY, DSA_TAG_PROTO_BRCM_PREPEND
>> and DSA_TAG_PROTO_BRCM taggers. Or even better, because Oleksij's
>> rcv_post is already prototype-compatible with dsa_untag_bridge_pvid, we
>> can already do:
>>
>> .rcv_post = dsa_untag_bridge_pvid,
>>
>> This should be generally useful for stuff that DSA taggers need to do
>> which is easiest done after eth_type_trans() was called.
>
> I had some fun with an alternative method of parsing the frame for IGMP
> so that you can clear skb->offload_fwd_mark, which doesn't rely on the
> introduction of a new method in DSA. It should also have several other
> advantages compared to your solution such as the fact that it should
> work with VLAN-tagged packets.
>
> Background: we made Receive Packet Steering work on DSA master interfaces
> (echo 3 > /sys/class/net/eth0/queues/rx-1/rps_cpus) even when the DSA
> tag shifts to the right the IP headers and everything that comes
> afterwards. The flow dissector had to be patched for that, just grep for
> DSA in net/core/flow_dissector.c.
>
> The problem you're facing is that you can't parse the IP and IGMP
> headers in the tagger's rcv() method, since the network header,
> transport header offsets and skb->protocol are all messed up, since
> eth_type_trans hasn't been called yet.
>
> And that's the trick right there, you're between a rock and a hard
> place: too early because eth_type_trans wasn't called yet, and too late
> because skb->dev was changed and no longer points to the DSA master, so
> the flow dissector adjustment we made doesn't apply.
>
> But if you call the flow dissector _before_ you call "skb->dev =
> dsa_master_find_slave" (and yes, while the DSA tag is still there), then
> it's virtually as if you had called that while the skb belonged to the
> DSA master, so it should work with __skb_flow_dissect.
>
> In fact I prototyped this idea below. I wanted to check whether I can
> match something as fine-grained as an IGMPv2 Membership Report message,
> and I could.
>
> I prototyped it inside the ocelot tagging protocol driver because that's
> what I had handy. I used __skb_flow_dissect with my own flow dissector
> which had to be initialized at the tagger module_init time, even though
> I think I could have probably just called skb_flow_dissect_flow_keys
> with a standard dissector, and that would have removed the need for the
> custom module_init in tag_ocelot.c. One thing that is interesting is
> that I had to add the bits for IGMP parsing to the flow dissector
> myself (based on the existing ICMP code). I was too lazy to do that for
> MLD as well, but it is really not hard. Or even better, if you don't
> need to look at all inside the IGMP/MLD header, I think you can even
> omit adding this parsing code to the flow dissector and just look at
> basic.n_proto and basic.ip_proto.
>
> See the snippet below. Hope it helps.
>
> -----------------------------[ cut here ]-----------------------------
> diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> index ffd386ea0dbb..4c25fa47637a 100644
> --- a/include/net/flow_dissector.h
> +++ b/include/net/flow_dissector.h
> @@ -190,6 +190,20 @@ struct flow_dissector_key_icmp {
> u16 id;
> };
>
> +/**
> + * flow_dissector_key_igmp:
> + * type: indicates the message type, see include/uapi/linux/igmp.h
> + * code: Max Resp Code, the maximum time in 1/10 second
> + * increments before sending a responding report
> + * group: the multicast address being queried when sending a
> + * Group-Specific or Group-and-Source-Specific Query.
> + */
> +struct flow_dissector_key_igmp {
> + u8 type;
> + u8 code; /* Max Resp Time in IGMPv2 */
> + __be32 group;
> +};
> +
> /**
> * struct flow_dissector_key_eth_addrs:
> * @src: source Ethernet address
> @@ -259,6 +273,7 @@ enum flow_dissector_key_id {
> FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
> FLOW_DISSECTOR_KEY_PORTS_RANGE, /* struct flow_dissector_key_ports */
> FLOW_DISSECTOR_KEY_ICMP, /* struct flow_dissector_key_icmp */
> + FLOW_DISSECTOR_KEY_IGMP, /* struct flow_dissector_key_igmp */
> FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
> FLOW_DISSECTOR_KEY_TIPC, /* struct flow_dissector_key_tipc */
> FLOW_DISSECTOR_KEY_ARP, /* struct flow_dissector_key_arp */
> @@ -314,6 +329,7 @@ struct flow_keys {
> struct flow_dissector_key_keyid keyid;
> struct flow_dissector_key_ports ports;
> struct flow_dissector_key_icmp icmp;
> + struct flow_dissector_key_igmp igmp;
> /* 'addrs' must be the last member */
> struct flow_dissector_key_addrs addrs;
> };
> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> index 5985029e43d4..8cc8c34ea5cd 100644
> --- a/net/core/flow_dissector.c
> +++ b/net/core/flow_dissector.c
> @@ -202,6 +202,30 @@ static void __skb_flow_dissect_icmp(const struct sk_buff *skb,
> skb_flow_get_icmp_tci(skb, key_icmp, data, thoff, hlen);
> }
>
> +static void __skb_flow_dissect_igmp(const struct sk_buff *skb,
> + struct flow_dissector *flow_dissector,
> + void *target_container, const void *data,
> + int thoff, int hlen)
> +{
> + struct flow_dissector_key_igmp *key_igmp;
> + struct igmphdr *ih, _ih;
> +
> + if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_IGMP))
> + return;
> +
> + ih = __skb_header_pointer(skb, thoff, sizeof(_ih), data, hlen, &_ih);
> + if (!ih)
> + return;
> +
> + key_igmp = skb_flow_dissector_target(flow_dissector,
> + FLOW_DISSECTOR_KEY_IGMP,
> + target_container);
> +
> + key_igmp->type = ih->type;
> + key_igmp->code = ih->code;
> + key_igmp->group = ih->group;
> +}
> +
> void skb_flow_dissect_meta(const struct sk_buff *skb,
> struct flow_dissector *flow_dissector,
> void *target_container)
> @@ -1398,6 +1422,11 @@ bool __skb_flow_dissect(const struct net *net,
> data, nhoff, hlen);
> break;
>
> + case IPPROTO_IGMP:
> + __skb_flow_dissect_igmp(skb, flow_dissector, target_container,
> + data, nhoff, hlen);
> + break;
> +
> default:
> break;
> }
> diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
> index f9df9cac81c5..a2cc824ddeec 100644
> --- a/net/dsa/tag_ocelot.c
> +++ b/net/dsa/tag_ocelot.c
> @@ -2,9 +2,51 @@
> /* Copyright 2019 NXP Semiconductors
> */
> #include <linux/dsa/ocelot.h>
> +#include <linux/igmp.h>
> #include <soc/mscc/ocelot.h>
> #include "dsa_priv.h"
>
> +static const struct flow_dissector_key ocelot_flow_keys[] = {
> + {
> + .key_id = FLOW_DISSECTOR_KEY_CONTROL,
> + .offset = offsetof(struct flow_keys, control),
> + },
> + {
> + .key_id = FLOW_DISSECTOR_KEY_BASIC,
> + .offset = offsetof(struct flow_keys, basic),
> + },
> + {
> + .key_id = FLOW_DISSECTOR_KEY_IGMP,
> + .offset = offsetof(struct flow_keys, igmp),
> + },
> +};
> +
> +static struct flow_dissector ocelot_flow_dissector __read_mostly;
> +
> +static struct sk_buff *ocelot_drop_igmp(struct sk_buff *skb)
> +{
> + struct flow_keys fk;
> +
> + memset(&fk, 0, sizeof(fk));
> +
> + if (!__skb_flow_dissect(NULL, skb, &ocelot_flow_dissector,
> + &fk, NULL, 0, 0, 0, 0))
> + return skb;
> +
> + if (fk.basic.n_proto != htons(ETH_P_IP))
> + return skb;
> +
> + if (fk.basic.ip_proto != IPPROTO_IGMP)
> + return skb;
> +
> + if (fk.igmp.type != IGMPV2_HOST_MEMBERSHIP_REPORT)
> + return skb;
> +
> + skb_dump(KERN_ERR, skb, true);
> +
> + return NULL;
> +}
> +
> static void ocelot_xmit_ptp(struct dsa_port *dp, void *injection,
> struct sk_buff *clone)
> {
> @@ -84,6 +126,10 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
> u8 *extraction;
> u16 vlan_tpid;
>
> + skb = ocelot_drop_igmp(skb);
> + if (!skb)
> + return NULL;

I probably like this idea, but i need more understanding :)

In case of ocelot_drop_igmp() you wont to really drop it? Or it is just
example on how it may potentially work?
I ask, because IGMP should not be dropped.

--
Regards,
Oleksij

2021-04-04 12:58:19

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v1 1/9] net: dsa: add rcv_post call back

On Sun, Apr 04, 2021 at 07:49:03AM +0200, Oleksij Rempel wrote:
> Am 04.04.21 um 01:21 schrieb Vladimir Oltean:
> > On Sat, Apr 03, 2021 at 05:05:34PM +0300, Vladimir Oltean wrote:
> >> On Sat, Apr 03, 2021 at 01:48:40PM +0200, Oleksij Rempel wrote:
> >>> Some switches (for example ar9331) do not provide enough information
> >>> about forwarded packets. If the switch decision was made based on IPv4
> >>> or IPv6 header, we need to analyze it and set proper flag.
> >>>
> >>> Potentially we can do it in existing rcv path, on other hand we can
> >>> avoid part of duplicated work and let the dsa framework set skb header
> >>> pointers and then use preprocessed skb one step later withing the rcv_post
> >>> call back.
> >>>
> >>> This patch is needed for ar9331 switch.
> >>>
> >>> Signed-off-by: Oleksij Rempel <[email protected]>
> >>> ---
> >>
> >> I don't necessarily disagree with this, perhaps we can even move
> >> Florian's dsa_untag_bridge_pvid() call inside a rcv_post() method
> >> implemented by the DSA_TAG_PROTO_BRCM_LEGACY, DSA_TAG_PROTO_BRCM_PREPEND
> >> and DSA_TAG_PROTO_BRCM taggers. Or even better, because Oleksij's
> >> rcv_post is already prototype-compatible with dsa_untag_bridge_pvid, we
> >> can already do:
> >>
> >> .rcv_post = dsa_untag_bridge_pvid,
> >>
> >> This should be generally useful for stuff that DSA taggers need to do
> >> which is easiest done after eth_type_trans() was called.
> >
> > I had some fun with an alternative method of parsing the frame for IGMP
> > so that you can clear skb->offload_fwd_mark, which doesn't rely on the
> > introduction of a new method in DSA. It should also have several other
> > advantages compared to your solution such as the fact that it should
> > work with VLAN-tagged packets.
> >
> > Background: we made Receive Packet Steering work on DSA master interfaces
> > (echo 3 > /sys/class/net/eth0/queues/rx-1/rps_cpus) even when the DSA
> > tag shifts to the right the IP headers and everything that comes
> > afterwards. The flow dissector had to be patched for that, just grep for
> > DSA in net/core/flow_dissector.c.
> >
> > The problem you're facing is that you can't parse the IP and IGMP
> > headers in the tagger's rcv() method, since the network header,
> > transport header offsets and skb->protocol are all messed up, since
> > eth_type_trans hasn't been called yet.
> >
> > And that's the trick right there, you're between a rock and a hard
> > place: too early because eth_type_trans wasn't called yet, and too late
> > because skb->dev was changed and no longer points to the DSA master, so
> > the flow dissector adjustment we made doesn't apply.
> >
> > But if you call the flow dissector _before_ you call "skb->dev =
> > dsa_master_find_slave" (and yes, while the DSA tag is still there), then
> > it's virtually as if you had called that while the skb belonged to the
> > DSA master, so it should work with __skb_flow_dissect.
> >
> > In fact I prototyped this idea below. I wanted to check whether I can
> > match something as fine-grained as an IGMPv2 Membership Report message,
> > and I could.
> >
> > I prototyped it inside the ocelot tagging protocol driver because that's
> > what I had handy. I used __skb_flow_dissect with my own flow dissector
> > which had to be initialized at the tagger module_init time, even though
> > I think I could have probably just called skb_flow_dissect_flow_keys
> > with a standard dissector, and that would have removed the need for the
> > custom module_init in tag_ocelot.c. One thing that is interesting is
> > that I had to add the bits for IGMP parsing to the flow dissector
> > myself (based on the existing ICMP code). I was too lazy to do that for
> > MLD as well, but it is really not hard. Or even better, if you don't
> > need to look at all inside the IGMP/MLD header, I think you can even
> > omit adding this parsing code to the flow dissector and just look at
> > basic.n_proto and basic.ip_proto.
> >
> > See the snippet below. Hope it helps.
> >
> > -----------------------------[ cut here ]-----------------------------
> > diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
> > index ffd386ea0dbb..4c25fa47637a 100644
> > --- a/include/net/flow_dissector.h
> > +++ b/include/net/flow_dissector.h
> > @@ -190,6 +190,20 @@ struct flow_dissector_key_icmp {
> > u16 id;
> > };
> >
> > +/**
> > + * flow_dissector_key_igmp:
> > + * type: indicates the message type, see include/uapi/linux/igmp.h
> > + * code: Max Resp Code, the maximum time in 1/10 second
> > + * increments before sending a responding report
> > + * group: the multicast address being queried when sending a
> > + * Group-Specific or Group-and-Source-Specific Query.
> > + */
> > +struct flow_dissector_key_igmp {
> > + u8 type;
> > + u8 code; /* Max Resp Time in IGMPv2 */
> > + __be32 group;
> > +};
> > +
> > /**
> > * struct flow_dissector_key_eth_addrs:
> > * @src: source Ethernet address
> > @@ -259,6 +273,7 @@ enum flow_dissector_key_id {
> > FLOW_DISSECTOR_KEY_PORTS, /* struct flow_dissector_key_ports */
> > FLOW_DISSECTOR_KEY_PORTS_RANGE, /* struct flow_dissector_key_ports */
> > FLOW_DISSECTOR_KEY_ICMP, /* struct flow_dissector_key_icmp */
> > + FLOW_DISSECTOR_KEY_IGMP, /* struct flow_dissector_key_igmp */
> > FLOW_DISSECTOR_KEY_ETH_ADDRS, /* struct flow_dissector_key_eth_addrs */
> > FLOW_DISSECTOR_KEY_TIPC, /* struct flow_dissector_key_tipc */
> > FLOW_DISSECTOR_KEY_ARP, /* struct flow_dissector_key_arp */
> > @@ -314,6 +329,7 @@ struct flow_keys {
> > struct flow_dissector_key_keyid keyid;
> > struct flow_dissector_key_ports ports;
> > struct flow_dissector_key_icmp icmp;
> > + struct flow_dissector_key_igmp igmp;
> > /* 'addrs' must be the last member */
> > struct flow_dissector_key_addrs addrs;
> > };
> > diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
> > index 5985029e43d4..8cc8c34ea5cd 100644
> > --- a/net/core/flow_dissector.c
> > +++ b/net/core/flow_dissector.c
> > @@ -202,6 +202,30 @@ static void __skb_flow_dissect_icmp(const struct sk_buff *skb,
> > skb_flow_get_icmp_tci(skb, key_icmp, data, thoff, hlen);
> > }
> >
> > +static void __skb_flow_dissect_igmp(const struct sk_buff *skb,
> > + struct flow_dissector *flow_dissector,
> > + void *target_container, const void *data,
> > + int thoff, int hlen)
> > +{
> > + struct flow_dissector_key_igmp *key_igmp;
> > + struct igmphdr *ih, _ih;
> > +
> > + if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_IGMP))
> > + return;
> > +
> > + ih = __skb_header_pointer(skb, thoff, sizeof(_ih), data, hlen, &_ih);
> > + if (!ih)
> > + return;
> > +
> > + key_igmp = skb_flow_dissector_target(flow_dissector,
> > + FLOW_DISSECTOR_KEY_IGMP,
> > + target_container);
> > +
> > + key_igmp->type = ih->type;
> > + key_igmp->code = ih->code;
> > + key_igmp->group = ih->group;
> > +}
> > +
> > void skb_flow_dissect_meta(const struct sk_buff *skb,
> > struct flow_dissector *flow_dissector,
> > void *target_container)
> > @@ -1398,6 +1422,11 @@ bool __skb_flow_dissect(const struct net *net,
> > data, nhoff, hlen);
> > break;
> >
> > + case IPPROTO_IGMP:
> > + __skb_flow_dissect_igmp(skb, flow_dissector, target_container,
> > + data, nhoff, hlen);
> > + break;
> > +
> > default:
> > break;
> > }
> > diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c
> > index f9df9cac81c5..a2cc824ddeec 100644
> > --- a/net/dsa/tag_ocelot.c
> > +++ b/net/dsa/tag_ocelot.c
> > @@ -2,9 +2,51 @@
> > /* Copyright 2019 NXP Semiconductors
> > */
> > #include <linux/dsa/ocelot.h>
> > +#include <linux/igmp.h>
> > #include <soc/mscc/ocelot.h>
> > #include "dsa_priv.h"
> >
> > +static const struct flow_dissector_key ocelot_flow_keys[] = {
> > + {
> > + .key_id = FLOW_DISSECTOR_KEY_CONTROL,
> > + .offset = offsetof(struct flow_keys, control),
> > + },
> > + {
> > + .key_id = FLOW_DISSECTOR_KEY_BASIC,
> > + .offset = offsetof(struct flow_keys, basic),
> > + },
> > + {
> > + .key_id = FLOW_DISSECTOR_KEY_IGMP,
> > + .offset = offsetof(struct flow_keys, igmp),
> > + },
> > +};
> > +
> > +static struct flow_dissector ocelot_flow_dissector __read_mostly;
> > +
> > +static struct sk_buff *ocelot_drop_igmp(struct sk_buff *skb)
> > +{
> > + struct flow_keys fk;
> > +
> > + memset(&fk, 0, sizeof(fk));
> > +
> > + if (!__skb_flow_dissect(NULL, skb, &ocelot_flow_dissector,
> > + &fk, NULL, 0, 0, 0, 0))
> > + return skb;
> > +
> > + if (fk.basic.n_proto != htons(ETH_P_IP))
> > + return skb;
> > +
> > + if (fk.basic.ip_proto != IPPROTO_IGMP)
> > + return skb;
> > +
> > + if (fk.igmp.type != IGMPV2_HOST_MEMBERSHIP_REPORT)
> > + return skb;
> > +
> > + skb_dump(KERN_ERR, skb, true);
> > +
> > + return NULL;
> > +}
> > +
> > static void ocelot_xmit_ptp(struct dsa_port *dp, void *injection,
> > struct sk_buff *clone)
> > {
> > @@ -84,6 +126,10 @@ static struct sk_buff *ocelot_rcv(struct sk_buff *skb,
> > u8 *extraction;
> > u16 vlan_tpid;
> >
> > + skb = ocelot_drop_igmp(skb);
> > + if (!skb)
> > + return NULL;
>
> I probably like this idea, but i need more understanding :)
>
> In case of ocelot_drop_igmp() you wont to really drop it? Or it is just
> example on how it may potentially work?
> I ask, because IGMP should not be dropped.

Uhm, yeah, it is just an example of 'how to do X with IGMP packets from
the DSA receive path'. You don't have to drop them.