In act_mirred's ingress redirection, if the skb's dst_entry is valid
when call function netif_receive_skb, the fllowing l3 stack process
(ip_rcv_finish_core) will check dst_entry and skip the routing
decision. Using the old dst_entry is unexpected and may discard the
skb in some case. For example dst->dst_input points to dst_discard.
This patch drops the skb's dst_entry before calling netif_receive_skb
so that the skb can be made routing decision like a normal ingress
skb.
Signed-off-by: Zhiyuan Hou <[email protected]>
---
net/sched/act_mirred.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 9ce073a05414..6108a64c0cd5 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -18,6 +18,7 @@
#include <linux/gfp.h>
#include <linux/if_arp.h>
#include <net/net_namespace.h>
+#include <net/dst.h>
#include <net/netlink.h>
#include <net/pkt_sched.h>
#include <net/pkt_cls.h>
@@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
if (!want_ingress)
err = dev_queue_xmit(skb2);
- else
+ else {
+ skb_dst_drop(skb2);
err = netif_receive_skb(skb2);
+ }
if (err) {
out:
--
2.21.0
On 10/12/19 12:16 AM, Zhiyuan Hou wrote:
> In act_mirred's ingress redirection, if the skb's dst_entry is valid
> when call function netif_receive_skb, the fllowing l3 stack process
> (ip_rcv_finish_core) will check dst_entry and skip the routing
> decision. Using the old dst_entry is unexpected and may discard the
> skb in some case. For example dst->dst_input points to dst_discard.
>
> This patch drops the skb's dst_entry before calling netif_receive_skb
> so that the skb can be made routing decision like a normal ingress
> skb.
>
> Signed-off-by: Zhiyuan Hou <[email protected]>
> ---
> net/sched/act_mirred.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 9ce073a05414..6108a64c0cd5 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -18,6 +18,7 @@
> #include <linux/gfp.h>
> #include <linux/if_arp.h>
> #include <net/net_namespace.h>
> +#include <net/dst.h>
> #include <net/netlink.h>
> #include <net/pkt_sched.h>
> #include <net/pkt_cls.h>
> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
>
> if (!want_ingress)
> err = dev_queue_xmit(skb2);
> - else
> + else {
> + skb_dst_drop(skb2);
> err = netif_receive_skb(skb2);
> + }
>
> if (err) {
> out:
>
Why is dst_discard used ?
This could actually drop packets, for loopback.
A Fixes: tag would tremendously help, I wonder if you are not working around
the other issue Wei was tracking yesterday ( https://www.spinics.net/lists/netdev/msg604397.html )
Hello!
On 10/12/2019 10:16 AM, Zhiyuan Hou wrote:
> In act_mirred's ingress redirection, if the skb's dst_entry is valid
> when call function netif_receive_skb, the fllowing l3 stack process
Following or flowing?
> (ip_rcv_finish_core) will check dst_entry and skip the routing
> decision. Using the old dst_entry is unexpected and may discard the
> skb in some case. For example dst->dst_input points to dst_discard.
>
> This patch drops the skb's dst_entry before calling netif_receive_skb
> so that the skb can be made routing decision like a normal ingress
> skb.
>
> Signed-off-by: Zhiyuan Hou <[email protected]>
> ---
> net/sched/act_mirred.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 9ce073a05414..6108a64c0cd5 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
[...]
> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
>
> if (!want_ingress)
> err = dev_queue_xmit(skb2);
> - else
> + else {
> + skb_dst_drop(skb2);
> err = netif_receive_skb(skb2);
> + }
If you introduce {} in one *if* branch, {} should be added to the other branches as well,
says CodingStyle.
[...]
MBR, Sergei
On 2019/10/12 6:59 下午, Eric Dumazet wrote:
>
> On 10/12/19 12:16 AM, Zhiyuan Hou wrote:
>> In act_mirred's ingress redirection, if the skb's dst_entry is valid
>> when call function netif_receive_skb, the fllowing l3 stack process
>> (ip_rcv_finish_core) will check dst_entry and skip the routing
>> decision. Using the old dst_entry is unexpected and may discard the
>> skb in some case. For example dst->dst_input points to dst_discard.
>>
>> This patch drops the skb's dst_entry before calling netif_receive_skb
>> so that the skb can be made routing decision like a normal ingress
>> skb.
>>
>> Signed-off-by: Zhiyuan Hou <[email protected]>
>> ---
>> net/sched/act_mirred.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
>> index 9ce073a05414..6108a64c0cd5 100644
>> --- a/net/sched/act_mirred.c
>> +++ b/net/sched/act_mirred.c
>> @@ -18,6 +18,7 @@
>> #include <linux/gfp.h>
>> #include <linux/if_arp.h>
>> #include <net/net_namespace.h>
>> +#include <net/dst.h>
>> #include <net/netlink.h>
>> #include <net/pkt_sched.h>
>> #include <net/pkt_cls.h>
>> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
>>
>> if (!want_ingress)
>> err = dev_queue_xmit(skb2);
>> - else
>> + else {
>> + skb_dst_drop(skb2);
>> err = netif_receive_skb(skb2);
>> + }
>>
>> if (err) {
>> out:
>>
> Why is dst_discard used ?
When send a skb from local to external, the dst->dst_input will be
assigned dst_discard after routing decision. So if we redirect these
skbs to ingress stack, it will be dropped.
For ipvlan l2 mode or macvlan, clsact egress filters on master deivce
may also meet these skbs even if they came from slave device. Ingress
redirection on these skbs may drop them on l3 stack.
> This could actually drop packets, for loopback.
>
> A Fixes: tag would tremendously help, I wonder if you are not working around
> the other issue Wei was tracking yesterday ( https://www.spinics.net/lists/netdev/msg604397.html )
No, this is a different issue ^_^.
On 2019/10/13 3:34 上午, Sergei Shtylyov wrote:
> Hello!
>
> On 10/12/2019 10:16 AM, Zhiyuan Hou wrote:
>
>> In act_mirred's ingress redirection, if the skb's dst_entry is valid
>> when call function netif_receive_skb, the fllowing l3 stack process
> Following or flowing?
Sorry, it should be following. I will change it in next version.
>> (ip_rcv_finish_core) will check dst_entry and skip the routing
>> decision. Using the old dst_entry is unexpected and may discard the
>> skb in some case. For example dst->dst_input points to dst_discard.
>>
>> This patch drops the skb's dst_entry before calling netif_receive_skb
>> so that the skb can be made routing decision like a normal ingress
>> skb.
>>
>> Signed-off-by: Zhiyuan Hou<[email protected]>
>> ---
>> net/sched/act_mirred.c | 5 ++++-
>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
>> index 9ce073a05414..6108a64c0cd5 100644
>> --- a/net/sched/act_mirred.c
>> +++ b/net/sched/act_mirred.c
> [...]
>> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
>>
>> if (!want_ingress)
>> err = dev_queue_xmit(skb2);
>> - else
>> + else {
>> + skb_dst_drop(skb2);
>> err = netif_receive_skb(skb2);
>> + }
> If you introduce {} in one *if* branch, {} should be added to the other branches as well,
> says CodingStyle.
I will change it in next version . Thanks.
> [...]
>
> MBR, Sergei
On 10/14/19 12:07 AM, Zhiyuan Hou wrote:
>
> On 2019/10/12 6:59 下午, Eric Dumazet wrote:
>>
>> On 10/12/19 12:16 AM, Zhiyuan Hou wrote:
>>> In act_mirred's ingress redirection, if the skb's dst_entry is valid
>>> when call function netif_receive_skb, the fllowing l3 stack process
>>> (ip_rcv_finish_core) will check dst_entry and skip the routing
>>> decision. Using the old dst_entry is unexpected and may discard the
>>> skb in some case. For example dst->dst_input points to dst_discard.
>>>
>>> This patch drops the skb's dst_entry before calling netif_receive_skb
>>> so that the skb can be made routing decision like a normal ingress
>>> skb.
>>>
>>> Signed-off-by: Zhiyuan Hou <[email protected]>
>>> ---
>>> net/sched/act_mirred.c | 5 ++++-
>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
>>> index 9ce073a05414..6108a64c0cd5 100644
>>> --- a/net/sched/act_mirred.c
>>> +++ b/net/sched/act_mirred.c
>>> @@ -18,6 +18,7 @@
>>> #include <linux/gfp.h>
>>> #include <linux/if_arp.h>
>>> #include <net/net_namespace.h>
>>> +#include <net/dst.h>
>>> #include <net/netlink.h>
>>> #include <net/pkt_sched.h>
>>> #include <net/pkt_cls.h>
>>> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
>>> if (!want_ingress)
>>> err = dev_queue_xmit(skb2);
>>> - else
>>> + else {
>>> + skb_dst_drop(skb2);
>>> err = netif_receive_skb(skb2);
>>> + }
>>> if (err) {
>>> out:
>>>
>> Why is dst_discard used ?
> When send a skb from local to external, the dst->dst_input will be
> assigned dst_discard after routing decision. So if we redirect these
> skbs to ingress stack, it will be dropped.
>
> For ipvlan l2 mode or macvlan, clsact egress filters on master deivce
> may also meet these skbs even if they came from slave device. Ingress
> redirection on these skbs may drop them on l3 stack.
Can you please add a test, so that we can see what you are trying to do exactly ?
>> This could actually drop packets, for loopback.
>>
>> A Fixes: tag would tremendously help, I wonder if you are not working around
>> the other issue Wei was tracking yesterday ( https://www.spinics.net/lists/netdev/msg604397.html )
> No, this is a different issue ^_^.
Please add a Fixes: tag then.
Thanks.
On Sat, Oct 12, 2019 at 12:16 AM Zhiyuan Hou
<[email protected]> wrote:
> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> index 9ce073a05414..6108a64c0cd5 100644
> --- a/net/sched/act_mirred.c
> +++ b/net/sched/act_mirred.c
> @@ -18,6 +18,7 @@
> #include <linux/gfp.h>
> #include <linux/if_arp.h>
> #include <net/net_namespace.h>
> +#include <net/dst.h>
> #include <net/netlink.h>
> #include <net/pkt_sched.h>
> #include <net/pkt_cls.h>
> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
>
> if (!want_ingress)
> err = dev_queue_xmit(skb2);
> - else
> + else {
> + skb_dst_drop(skb2);
> err = netif_receive_skb(skb2);
> + }
Good catch!
I don't want to be picky, but it seems this is only needed
when redirecting from egress to ingress, right? That is,
ingress to ingress, or ingress to egress is okay? If not,
please fix all the cases while you are on it?
Thanks.
On 2019/10/15 1:57 上午, Cong Wang wrote:
> On Sat, Oct 12, 2019 at 12:16 AM Zhiyuan Hou
> <[email protected]> wrote:
>> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
>> index 9ce073a05414..6108a64c0cd5 100644
>> --- a/net/sched/act_mirred.c
>> +++ b/net/sched/act_mirred.c
>> @@ -18,6 +18,7 @@
>> #include <linux/gfp.h>
>> #include <linux/if_arp.h>
>> #include <net/net_namespace.h>
>> +#include <net/dst.h>
>> #include <net/netlink.h>
>> #include <net/pkt_sched.h>
>> #include <net/pkt_cls.h>
>> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
>>
>> if (!want_ingress)
>> err = dev_queue_xmit(skb2);
>> - else
>> + else {
>> + skb_dst_drop(skb2);
>> err = netif_receive_skb(skb2);
>> + }
> Good catch!
>
> I don't want to be picky, but it seems this is only needed
> when redirecting from egress to ingress, right? That is,
> ingress to ingress, or ingress to egress is okay? If not,
> please fix all the cases while you are on it?
Sure. But I think this patch is also needed when redirecting from
ingress to ingress. Because we cannot assure that a skb has null dst
in ingress redirection path. For example, if redirecting a skb from
loopback's ingress to other device's ingress, the skb will take a
dst.
As commit logs point out, skb with valid dst cannot be made routing
decision in following process. original dst may cause skb loss or
other unexpected behavior.
>
> Thanks.
Hi,
On Wed, 16 Oct 2019 01:22:01 +0800
Zhiyuan Hou <[email protected]> wrote:
> On 2019/10/15 1:57 上午, Cong Wang wrote:
> > On Sat, Oct 12, 2019 at 12:16 AM Zhiyuan Hou
> > <[email protected]> wrote:
> >> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> >> index 9ce073a05414..6108a64c0cd5 100644
> >> --- a/net/sched/act_mirred.c
> >> +++ b/net/sched/act_mirred.c
> >> @@ -18,6 +18,7 @@
> >> #include <linux/gfp.h>
> >> #include <linux/if_arp.h>
> >> #include <net/net_namespace.h>
> >> +#include <net/dst.h>
> >> #include <net/netlink.h>
> >> #include <net/pkt_sched.h>
> >> #include <net/pkt_cls.h>
> >> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff
> >> *skb, const struct tc_action *a,
> >>
> >> if (!want_ingress)
> >> err = dev_queue_xmit(skb2);
> >> - else
> >> + else {
> >> + skb_dst_drop(skb2);
> >> err = netif_receive_skb(skb2);
> >> + }
> > Good catch!
Indeed! Thanks for fixing this!
> >
> > I don't want to be picky, but it seems this is only needed
> > when redirecting from egress to ingress, right? That is,
> > ingress to ingress, or ingress to egress is okay? If not,
> > please fix all the cases while you are on it?
> Sure. But I think this patch is also needed when redirecting from
> ingress to ingress. Because we cannot assure that a skb has null dst
> in ingress redirection path. For example, if redirecting a skb from
> loopback's ingress to other device's ingress, the skb will take a
> dst.
>
> As commit logs point out, skb with valid dst cannot be made routing
> decision in following process. original dst may cause skb loss or
> other unexpected behavior.
On the other hand, removing the dst on ingress-to-ingress redirection
may remove LWT information on incoming packets, which may be undesired.
Eyal.
On 2019/10/14 8:46 下午, Eric Dumazet wrote:
>
> On 10/14/19 12:07 AM, Zhiyuan Hou wrote:
>> On 2019/10/12 6:59 下午, Eric Dumazet wrote:
>>> On 10/12/19 12:16 AM, Zhiyuan Hou wrote:
>>>> In act_mirred's ingress redirection, if the skb's dst_entry is valid
>>>> when call function netif_receive_skb, the fllowing l3 stack process
>>>> (ip_rcv_finish_core) will check dst_entry and skip the routing
>>>> decision. Using the old dst_entry is unexpected and may discard the
>>>> skb in some case. For example dst->dst_input points to dst_discard.
>>>>
>>>> This patch drops the skb's dst_entry before calling netif_receive_skb
>>>> so that the skb can be made routing decision like a normal ingress
>>>> skb.
>>>>
>>>> Signed-off-by: Zhiyuan Hou <[email protected]>
>>>> ---
>>>> net/sched/act_mirred.c | 5 ++++-
>>>> 1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
>>>> index 9ce073a05414..6108a64c0cd5 100644
>>>> --- a/net/sched/act_mirred.c
>>>> +++ b/net/sched/act_mirred.c
>>>> @@ -18,6 +18,7 @@
>>>> #include <linux/gfp.h>
>>>> #include <linux/if_arp.h>
>>>> #include <net/net_namespace.h>
>>>> +#include <net/dst.h>
>>>> #include <net/netlink.h>
>>>> #include <net/pkt_sched.h>
>>>> #include <net/pkt_cls.h>
>>>> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
>>>> if (!want_ingress)
>>>> err = dev_queue_xmit(skb2);
>>>> - else
>>>> + else {
>>>> + skb_dst_drop(skb2);
>>>> err = netif_receive_skb(skb2);
>>>> + }
>>>> if (err) {
>>>> out:
>>>>
>>> Why is dst_discard used ?
>> When send a skb from local to external, the dst->dst_input will be
>> assigned dst_discard after routing decision. So if we redirect these
>> skbs to ingress stack, it will be dropped.
>>
>> For ipvlan l2 mode or macvlan, clsact egress filters on master deivce
>> may also meet these skbs even if they came from slave device. Ingress
>> redirection on these skbs may drop them on l3 stack.
> Can you please add a test, so that we can see what you are trying to do exactly ?
Sure. Suppose a linux box has two interfaces (eth0 and eth1). We
create a vrf (vrf0) and put eth1 in it, as following commands:
# ip link add vrf0 type vrf table 10
# ip link set dev vrf0 up
# ip link set eth1 master vrf0
Then let's intercept some egress flows through eth0 and redirect them
to vrf0 using act_mirred.
# ip route add table 10 ... # add routes to vrf0
# tc qdisc add dev eth0 clsact
# tc filter add dev eth0 egress proto ip u32 \
match ip dst 10.0.0.0/24 \
action mirred ingress redirect dev eth1
We expect that the matching skb will be received or forwarded via
vrf0's route table. But the fact is, the skb is dropped as commit log
notes.
>
>
>
>
>>> This could actually drop packets, for loopback.
>>>
>>> A Fixes: tag would tremendously help, I wonder if you are not working around
>>> the other issue Wei was tracking yesterday ( https://www.spinics.net/lists/netdev/msg604397.html )
>> No, this is a different issue ^_^.
> Please add a Fixes: tag then.
Yes, I will add following tag in v2.
Fixes: 53592b364001 ("net/sched: act_mirred: Implement ingress actions")
>
> Thanks.
On 2019/10/16 8:13 下午, Eyal Birger wrote:
> Hi,
>
> On Wed, 16 Oct 2019 01:22:01 +0800
> Zhiyuan Hou <[email protected]> wrote:
>
>> On 2019/10/15 1:57 上午, Cong Wang wrote:
>>> On Sat, Oct 12, 2019 at 12:16 AM Zhiyuan Hou
>>> <[email protected]> wrote:
>>>> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
>>>> index 9ce073a05414..6108a64c0cd5 100644
>>>> --- a/net/sched/act_mirred.c
>>>> +++ b/net/sched/act_mirred.c
>>>> @@ -18,6 +18,7 @@
>>>> #include <linux/gfp.h>
>>>> #include <linux/if_arp.h>
>>>> #include <net/net_namespace.h>
>>>> +#include <net/dst.h>
>>>> #include <net/netlink.h>
>>>> #include <net/pkt_sched.h>
>>>> #include <net/pkt_cls.h>
>>>> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff
>>>> *skb, const struct tc_action *a,
>>>>
>>>> if (!want_ingress)
>>>> err = dev_queue_xmit(skb2);
>>>> - else
>>>> + else {
>>>> + skb_dst_drop(skb2);
>>>> err = netif_receive_skb(skb2);
>>>> + }
>>> Good catch!
> Indeed! Thanks for fixing this!
>
>>> I don't want to be picky, but it seems this is only needed
>>> when redirecting from egress to ingress, right? That is,
>>> ingress to ingress, or ingress to egress is okay? If not,
>>> please fix all the cases while you are on it?
>> Sure. But I think this patch is also needed when redirecting from
>> ingress to ingress. Because we cannot assure that a skb has null dst
>> in ingress redirection path. For example, if redirecting a skb from
>> loopback's ingress to other device's ingress, the skb will take a
>> dst.
>>
>> As commit logs point out, skb with valid dst cannot be made routing
>> decision in following process. original dst may cause skb loss or
>> other unexpected behavior.
> On the other hand, removing the dst on ingress-to-ingress redirection
> may remove LWT information on incoming packets, which may be undesired.
Sorry, I do not understand why lwt information is needed on
ingress-to-ingress redirection. lwt is used on output path, isn't it?
Can you please give more information?
> Eyal.
Hi,
On Fri, 18 Oct 2019 00:33:53 +0800
Zhiyuan Hou <[email protected]> wrote:
> On 2019/10/16 8:13 下午, Eyal Birger wrote:
> > Hi,
> >
> > On Wed, 16 Oct 2019 01:22:01 +0800
> > Zhiyuan Hou <[email protected]> wrote:
> >
> >> On 2019/10/15 1:57 上午, Cong Wang wrote:
> >>> On Sat, Oct 12, 2019 at 12:16 AM Zhiyuan Hou
> >>> <[email protected]> wrote:
> >>>> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> >>>> index 9ce073a05414..6108a64c0cd5 100644
> >>>> --- a/net/sched/act_mirred.c
> >>>> +++ b/net/sched/act_mirred.c
> >>>> @@ -18,6 +18,7 @@
> >>>> #include <linux/gfp.h>
> >>>> #include <linux/if_arp.h>
> >>>> #include <net/net_namespace.h>
> >>>> +#include <net/dst.h>
> >>>> #include <net/netlink.h>
> >>>> #include <net/pkt_sched.h>
> >>>> #include <net/pkt_cls.h>
> >>>> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff
> >>>> *skb, const struct tc_action *a,
> >>>>
> >>>> if (!want_ingress)
> >>>> err = dev_queue_xmit(skb2);
> >>>> - else
> >>>> + else {
> >>>> + skb_dst_drop(skb2);
> >>>> err = netif_receive_skb(skb2);
> >>>> + }
> >>> Good catch!
> > Indeed! Thanks for fixing this!
> >
> >>> I don't want to be picky, but it seems this is only needed
> >>> when redirecting from egress to ingress, right? That is,
> >>> ingress to ingress, or ingress to egress is okay? If not,
> >>> please fix all the cases while you are on it?
> >> Sure. But I think this patch is also needed when redirecting from
> >> ingress to ingress. Because we cannot assure that a skb has null
> >> dst in ingress redirection path. For example, if redirecting a skb
> >> from loopback's ingress to other device's ingress, the skb will
> >> take a dst.
> >>
> >> As commit logs point out, skb with valid dst cannot be made routing
> >> decision in following process. original dst may cause skb loss or
> >> other unexpected behavior.
> > On the other hand, removing the dst on ingress-to-ingress
> > redirection may remove LWT information on incoming packets, which
> > may be undesired.
> Sorry, I do not understand why lwt information is needed on
> ingress-to-ingress redirection. lwt is used on output path, isn't it?
> Can you please give more information?
On rx path tunnelled packets parameters received on a collect_md tunnel device
are kept in a metadata dst. See ip_tunnel_rcv() 'tun_dst' parameter.
The rx metadata dst can be matched by a number of mechanisms like routing
rules, eBPF, OVS, and netfilter.
Eyal.
On 2019/10/19 5:25 上午, Eyal Birger wrote:
> Hi,
>
> On Fri, 18 Oct 2019 00:33:53 +0800
> Zhiyuan Hou <[email protected]> wrote:
>
>> On 2019/10/16 8:13 下午, Eyal Birger wrote:
>>> Hi,
>>>
>>> On Wed, 16 Oct 2019 01:22:01 +0800
>>> Zhiyuan Hou <[email protected]> wrote:
>>>
>>>> On 2019/10/15 1:57 上午, Cong Wang wrote:
>>>>> On Sat, Oct 12, 2019 at 12:16 AM Zhiyuan Hou
>>>>> <[email protected]> wrote:
>>>>>> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
>>>>>> index 9ce073a05414..6108a64c0cd5 100644
>>>>>> --- a/net/sched/act_mirred.c
>>>>>> +++ b/net/sched/act_mirred.c
>>>>>> @@ -18,6 +18,7 @@
>>>>>> #include <linux/gfp.h>
>>>>>> #include <linux/if_arp.h>
>>>>>> #include <net/net_namespace.h>
>>>>>> +#include <net/dst.h>
>>>>>> #include <net/netlink.h>
>>>>>> #include <net/pkt_sched.h>
>>>>>> #include <net/pkt_cls.h>
>>>>>> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff
>>>>>> *skb, const struct tc_action *a,
>>>>>>
>>>>>> if (!want_ingress)
>>>>>> err = dev_queue_xmit(skb2);
>>>>>> - else
>>>>>> + else {
>>>>>> + skb_dst_drop(skb2);
>>>>>> err = netif_receive_skb(skb2);
>>>>>> + }
>>>>> Good catch!
>>> Indeed! Thanks for fixing this!
>>>
>>>>> I don't want to be picky, but it seems this is only needed
>>>>> when redirecting from egress to ingress, right? That is,
>>>>> ingress to ingress, or ingress to egress is okay? If not,
>>>>> please fix all the cases while you are on it?
>>>> Sure. But I think this patch is also needed when redirecting from
>>>> ingress to ingress. Because we cannot assure that a skb has null
>>>> dst in ingress redirection path. For example, if redirecting a skb
>>>> from loopback's ingress to other device's ingress, the skb will
>>>> take a dst.
>>>>
>>>> As commit logs point out, skb with valid dst cannot be made routing
>>>> decision in following process. original dst may cause skb loss or
>>>> other unexpected behavior.
>>> On the other hand, removing the dst on ingress-to-ingress
>>> redirection may remove LWT information on incoming packets, which
>>> may be undesired.
>> Sorry, I do not understand why lwt information is needed on
>> ingress-to-ingress redirection. lwt is used on output path, isn't it?
>> Can you please give more information?
> On rx path tunnelled packets parameters received on a collect_md tunnel device
> are kept in a metadata dst. See ip_tunnel_rcv() 'tun_dst' parameter.
>
> The rx metadata dst can be matched by a number of mechanisms like routing
> rules, eBPF, OVS, and netfilter.
Yes, you are right. Thanks for your explanations.
The metadata dst should not be removed in redirection path and also
does not affect L3's routing decision.
Maybe we can add a following check to solve it before removing a dst,
what do you think?
if (skb_valid_dst(skb2))
skb_dst_drop(sbk2);
>
> Eyal.
On Fri, Oct 18, 2019 at 2:25 PM Eyal Birger <[email protected]> wrote:
>
> Hi,
>
> On Fri, 18 Oct 2019 00:33:53 +0800
> Zhiyuan Hou <[email protected]> wrote:
>
> > On 2019/10/16 8:13 下午, Eyal Birger wrote:
> > > Hi,
> > >
> > > On Wed, 16 Oct 2019 01:22:01 +0800
> > > Zhiyuan Hou <[email protected]> wrote:
> > >
> > >> On 2019/10/15 1:57 上午, Cong Wang wrote:
> > >>> On Sat, Oct 12, 2019 at 12:16 AM Zhiyuan Hou
> > >>> <[email protected]> wrote:
> > >>>> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> > >>>> index 9ce073a05414..6108a64c0cd5 100644
> > >>>> --- a/net/sched/act_mirred.c
> > >>>> +++ b/net/sched/act_mirred.c
> > >>>> @@ -18,6 +18,7 @@
> > >>>> #include <linux/gfp.h>
> > >>>> #include <linux/if_arp.h>
> > >>>> #include <net/net_namespace.h>
> > >>>> +#include <net/dst.h>
> > >>>> #include <net/netlink.h>
> > >>>> #include <net/pkt_sched.h>
> > >>>> #include <net/pkt_cls.h>
> > >>>> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff
> > >>>> *skb, const struct tc_action *a,
> > >>>>
> > >>>> if (!want_ingress)
> > >>>> err = dev_queue_xmit(skb2);
> > >>>> - else
> > >>>> + else {
> > >>>> + skb_dst_drop(skb2);
> > >>>> err = netif_receive_skb(skb2);
> > >>>> + }
> > >>> Good catch!
> > > Indeed! Thanks for fixing this!
> > >
> > >>> I don't want to be picky, but it seems this is only needed
> > >>> when redirecting from egress to ingress, right? That is,
> > >>> ingress to ingress, or ingress to egress is okay? If not,
> > >>> please fix all the cases while you are on it?
> > >> Sure. But I think this patch is also needed when redirecting from
> > >> ingress to ingress. Because we cannot assure that a skb has null
> > >> dst in ingress redirection path. For example, if redirecting a skb
> > >> from loopback's ingress to other device's ingress, the skb will
> > >> take a dst.
> > >>
> > >> As commit logs point out, skb with valid dst cannot be made routing
> > >> decision in following process. original dst may cause skb loss or
> > >> other unexpected behavior.
> > > On the other hand, removing the dst on ingress-to-ingress
> > > redirection may remove LWT information on incoming packets, which
> > > may be undesired.
> > Sorry, I do not understand why lwt information is needed on
> > ingress-to-ingress redirection. lwt is used on output path, isn't it?
> > Can you please give more information?
>
> On rx path tunnelled packets parameters received on a collect_md tunnel device
> are kept in a metadata dst. See ip_tunnel_rcv() 'tun_dst' parameter.
>
> The rx metadata dst can be matched by a number of mechanisms like routing
> rules, eBPF, OVS, and netfilter.
Should this meta information be kept when redirecting? The dest device
may be a non-tunnel device, so I don't know if it is still useful when
for non-tunnel devices.
Thanks.
Hi,
On Mon, 21 Oct 2019 13:50:13 -0700
Cong Wang <[email protected]> wrote:
> On Fri, Oct 18, 2019 at 2:25 PM Eyal Birger <[email protected]>
> wrote:
> >
> > Hi,
> >
> > On Fri, 18 Oct 2019 00:33:53 +0800
> > Zhiyuan Hou <[email protected]> wrote:
> >
> > > On 2019/10/16 8:13 下午, Eyal Birger wrote:
> > > > Hi,
> > > >
> > > > On Wed, 16 Oct 2019 01:22:01 +0800
> > > > Zhiyuan Hou <[email protected]> wrote:
> > > >
> > > >> On 2019/10/15 1:57 上午, Cong Wang wrote:
> > > >>> On Sat, Oct 12, 2019 at 12:16 AM Zhiyuan Hou
> > > >>> <[email protected]> wrote:
> > > >>>> diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
> > > >>>> index 9ce073a05414..6108a64c0cd5 100644
> > > >>>> --- a/net/sched/act_mirred.c
> > > >>>> +++ b/net/sched/act_mirred.c
> > > >>>> @@ -18,6 +18,7 @@
> > > >>>> #include <linux/gfp.h>
> > > >>>> #include <linux/if_arp.h>
> > > >>>> #include <net/net_namespace.h>
> > > >>>> +#include <net/dst.h>
> > > >>>> #include <net/netlink.h>
> > > >>>> #include <net/pkt_sched.h>
> > > >>>> #include <net/pkt_cls.h>
> > > >>>> @@ -298,8 +299,10 @@ static int tcf_mirred_act(struct sk_buff
> > > >>>> *skb, const struct tc_action *a,
> > > >>>>
> > > >>>> if (!want_ingress)
> > > >>>> err = dev_queue_xmit(skb2);
> > > >>>> - else
> > > >>>> + else {
> > > >>>> + skb_dst_drop(skb2);
> > > >>>> err = netif_receive_skb(skb2);
> > > >>>> + }
> > > >>> Good catch!
> > > > Indeed! Thanks for fixing this!
> > > >
> > > >>> I don't want to be picky, but it seems this is only needed
> > > >>> when redirecting from egress to ingress, right? That is,
> > > >>> ingress to ingress, or ingress to egress is okay? If not,
> > > >>> please fix all the cases while you are on it?
> > > >> Sure. But I think this patch is also needed when redirecting
> > > >> from ingress to ingress. Because we cannot assure that a skb
> > > >> has null dst in ingress redirection path. For example, if
> > > >> redirecting a skb from loopback's ingress to other device's
> > > >> ingress, the skb will take a dst.
> > > >>
> > > >> As commit logs point out, skb with valid dst cannot be made
> > > >> routing decision in following process. original dst may cause
> > > >> skb loss or other unexpected behavior.
> > > > On the other hand, removing the dst on ingress-to-ingress
> > > > redirection may remove LWT information on incoming packets,
> > > > which may be undesired.
> > > Sorry, I do not understand why lwt information is needed on
> > > ingress-to-ingress redirection. lwt is used on output path, isn't
> > > it? Can you please give more information?
> >
> > On rx path tunnelled packets parameters received on a collect_md
> > tunnel device are kept in a metadata dst. See ip_tunnel_rcv()
> > 'tun_dst' parameter.
> >
> > The rx metadata dst can be matched by a number of mechanisms like
> > routing rules, eBPF, OVS, and netfilter.
>
> Should this meta information be kept when redirecting? The dest device
> may be a non-tunnel device, so I don't know if it is still useful when
> for non-tunnel devices.
I think that on ingress-to-ingress redirect it would make sense to keep the
metadata.
The dest device does not have to be a tunnel device AFAICT in order to use
tunnel info as skb_tunnel_info() does not observe skb->dev.
I don't see why going through mirred redirect should prevent the admin from
matching the packet based on LWT metadata - a packet may arrive on a collect_md
tunnel device, be ingress-redirected to different devices based on different
criteria, then routed based also on the tunnel parameters.
Eyal.