2023-05-24 12:32:53

by Mirsad Todorovac

[permalink] [raw]
Subject: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL] in vrf "bind - ns-B IPv6 LLA" test

Hi,

The very recent 6.4-rc3 kernel build with AlmaLinux 8.7 on LENOVO 10TX000VCR
desktop box fails one test:

[root@host net]# ./fcnal-test.sh
[...]
TEST: ping out, vrf device+address bind - ns-B loopback IPv6 [ OK ]
TEST: ping out, vrf device+address bind - ns-B IPv6 LLA [FAIL]
TEST: ping in - ns-A IPv6 [ OK ]
[...]
Tests passed: 887
Tests failed: 1
[root@host net]#

Please find the config, + dmesg and lshw output here:

https://domac.alu.unizg.hr/~mtodorov/linux/selftests/net-fcnal-test/config-6.4-rc3
https://domac.alu.unizg.hr/~mtodorov/linux/selftests/net-fcnal-test/dmesg.log
https://domac.alu.unizg.hr/~mtodorov/linux/selftests/net-fcnal-test/lshw.txt

I believe that I have all required configs merged for the selftest/net tests.

Maybe we have a regression?

My knowledge of fcnal-test.sh isn't sufficient to build a smaller reproducer.

Guillaume said in January he could help with the net/fcnal-test.sh, but I was doing
the other things in the meantime. Tempus fugit :-/

Best regards,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia

"What’s this thing suddenly coming towards me very fast? Very very fast.
... I wonder if it will be friends with me?"


2023-05-31 18:18:58

by Guillaume Nault

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL] in vrf "bind - ns-B IPv6 LLA" test

On Wed, May 24, 2023 at 02:17:09PM +0200, Mirsad Todorovac wrote:
> Hi,

Hi Mirsad,

> The very recent 6.4-rc3 kernel build with AlmaLinux 8.7 on LENOVO 10TX000VCR
> desktop box fails one test:
>
> [root@host net]# ./fcnal-test.sh
> [...]
> TEST: ping out, vrf device+address bind - ns-B loopback IPv6 [ OK ]
> TEST: ping out, vrf device+address bind - ns-B IPv6 LLA [FAIL]
> TEST: ping in - ns-A IPv6 [ OK ]
> [...]
> Tests passed: 887
> Tests failed: 1
> [root@host net]#

This test also fails on -net. The problem is specific to ping sockets
(same test passes with raw sockets). I believe this test has always
failed since fcnal-test.sh started using net.ipv4.ping_group_range
(commit e71b7f1f44d3 ("selftests: add ping test with ping_group_range
tuned")).

The executed command is:

ip netns exec ns-A ip vrf exec red /usr/bin/ping6 -c1 -w1 -I 2001:db8:3::1 fe80::a846:b5ff:fe4c:da4e%eth1

So ping6 is executed inside VRF 'red' and sets .sin6_scope_id to 'eth1'
(which is a slave device of VRF 'red'). Therefore, we have
sk->sk_bound_dev_if == 'red' and .sin6_scope_id == 'eth1'. This fails
because ping_v6_sendmsg() expects them to be equal:

static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
{
...
if (__ipv6_addr_needs_scope_id(ipv6_addr_type(daddr)))
oif = u->sin6_scope_id;
...
if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
(addr_type & IPV6_ADDR_MAPPED) ||
(oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if)) <-- oif='eth1', but ->sk_bound_dev_if='red'
return -EINVAL;
...
}

I believe this condition should be relaxed to allow the case where
->sk_bound_dev_if is oif's master device (and maybe there are other
VRF cases to also consider).


2023-06-02 12:50:34

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL] in vrf "bind - ns-B IPv6 LLA" test

On 5/31/23 20:11, Guillaume Nault wrote:
> On Wed, May 24, 2023 at 02:17:09PM +0200, Mirsad Todorovac wrote:
>> Hi,
>
> Hi Mirsad,

Hi Guillaume,

>> The very recent 6.4-rc3 kernel build with AlmaLinux 8.7 on LENOVO 10TX000VCR
>> desktop box fails one test:
>>
>> [root@host net]# ./fcnal-test.sh
>> [...]
>> TEST: ping out, vrf device+address bind - ns-B loopback IPv6 [ OK ]
>> TEST: ping out, vrf device+address bind - ns-B IPv6 LLA [FAIL]
>> TEST: ping in - ns-A IPv6 [ OK ]
>> [...]
>> Tests passed: 887
>> Tests failed: 1
>> [root@host net]#
>
> This test also fails on -net. The problem is specific to ping sockets
> (same test passes with raw sockets). I believe this test has always
> failed since fcnal-test.sh started using net.ipv4.ping_group_range
> (commit e71b7f1f44d3 ("selftests: add ping test with ping_group_range
> tuned")).
>
> The executed command is:
>
> ip netns exec ns-A ip vrf exec red /usr/bin/ping6 -c1 -w1 -I 2001:db8:3::1 fe80::a846:b5ff:fe4c:da4e%eth1
>
> So ping6 is executed inside VRF 'red' and sets .sin6_scope_id to 'eth1'
> (which is a slave device of VRF 'red'). Therefore, we have
> sk->sk_bound_dev_if == 'red' and .sin6_scope_id == 'eth1'. This fails
> because ping_v6_sendmsg() expects them to be equal:
>
> static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> {
> ...
> if (__ipv6_addr_needs_scope_id(ipv6_addr_type(daddr)))
> oif = u->sin6_scope_id;
> ...
> if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
> (addr_type & IPV6_ADDR_MAPPED) ||
> (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if)) <-- oif='eth1', but ->sk_bound_dev_if='red'
> return -EINVAL;
> ...
> }

Thank you for your thorough investigation. It helps a great deal to
understand the issue.

I am really not that into the network stack, though I can always smuggle
the work on the network stack as a work on high-bandwidth multimedia
and do it in day hours.

Probably I need to catch up with the network stack homework.

> I believe this condition should be relaxed to allow the case where
> ->sk_bound_dev_if is oif's master device (and maybe there are other
> VRF cases to also consider).

I have looked into the code, but currently my knowledge of the code is
not sufficient for the intervention.

Thank you,
Mirsad

2023-06-06 07:02:26

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL] in vrf "bind - ns-B IPv6 LLA" test

On 5/31/23 20:11, Guillaume Nault wrote:
> On Wed, May 24, 2023 at 02:17:09PM +0200, Mirsad Todorovac wrote:
>> Hi,
>
> Hi Mirsad,
>
>> The very recent 6.4-rc3 kernel build with AlmaLinux 8.7 on LENOVO 10TX000VCR
>> desktop box fails one test:
>>
>> [root@host net]# ./fcnal-test.sh
>> [...]
>> TEST: ping out, vrf device+address bind - ns-B loopback IPv6 [ OK ]
>> TEST: ping out, vrf device+address bind - ns-B IPv6 LLA [FAIL]
>> TEST: ping in - ns-A IPv6 [ OK ]
>> [...]
>> Tests passed: 887
>> Tests failed: 1
>> [root@host net]#
>
> This test also fails on -net. The problem is specific to ping sockets
> (same test passes with raw sockets). I believe this test has always
> failed since fcnal-test.sh started using net.ipv4.ping_group_range
> (commit e71b7f1f44d3 ("selftests: add ping test with ping_group_range
> tuned")).
>
> The executed command is:
>
> ip netns exec ns-A ip vrf exec red /usr/bin/ping6 -c1 -w1 -I 2001:db8:3::1 fe80::a846:b5ff:fe4c:da4e%eth1
>
> So ping6 is executed inside VRF 'red' and sets .sin6_scope_id to 'eth1'
> (which is a slave device of VRF 'red'). Therefore, we have
> sk->sk_bound_dev_if == 'red' and .sin6_scope_id == 'eth1'. This fails
> because ping_v6_sendmsg() expects them to be equal:
>
> static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> {
> ...
> if (__ipv6_addr_needs_scope_id(ipv6_addr_type(daddr)))
> oif = u->sin6_scope_id;
> ...
> if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
> (addr_type & IPV6_ADDR_MAPPED) ||
> (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if)) <-- oif='eth1', but ->sk_bound_dev_if='red'
> return -EINVAL;
> ...
> }
>
> I believe this condition should be relaxed to allow the case where
> ->sk_bound_dev_if is oif's master device (and maybe there are other
> VRF cases to also consider).

I've tried something like this, but something makes the kernel stuck
here:

TEST: ping out, blocked by route - ns-B loopback IPv6 [ OK ]
TEST: ping out, device bind, blocked by route - ns-B loopback IPv6 [ OK ]
TEST: ping in, blocked by route - ns-A loopback IPv6 [ OK ]
TEST: ping out, unreachable route - ns-B loopback IPv6 [ OK ]
TEST: ping out, device bind, unreachable route - ns-B loopback IPv6 [ OK ]

#################################################################
With VRF

[hanged process and kernel won't shutdown]

The code is:

---
net/ipv6/ping.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
index c4835dbdfcff..81293e902293 100644
--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -73,6 +73,9 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
struct rt6_info *rt;
struct pingfakehdr pfh;
struct ipcm6_cookie ipc6;
+ struct net *net = sock_net(sk);
+ struct net_device *dev = NULL;
+ struct net_device *mdev = NULL;

err = ping_common_sendmsg(AF_INET6, msg, len, &user_icmph,
sizeof(user_icmph));
@@ -111,10 +114,17 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
else if (!oif)
oif = np->ucast_oif;

+ if (oif) {
+ dev = dev_get_by_index(net, oif);
+ mdev = netdev_master_upper_dev_get(dev);
+ }
+
addr_type = ipv6_addr_type(daddr);
if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
(addr_type & IPV6_ADDR_MAPPED) ||
- (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
+ (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
+ !(mdev && sk->sk_bound_dev_if &&
+ mdev != dev_get_by_index(net, sk->sk_bound_dev_if))))
return -EINVAL;

ipcm6_init_sk(&ipc6, np);

I am obviously doing something very stupid.

Regards,
Mirsad

2023-06-06 14:08:07

by Guillaume Nault

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL] in vrf "bind - ns-B IPv6 LLA" test

On Tue, Jun 06, 2023 at 08:24:54AM +0200, Mirsad Goran Todorovac wrote:
> On 5/31/23 20:11, Guillaume Nault wrote:
> > I believe this condition should be relaxed to allow the case where
> > ->sk_bound_dev_if is oif's master device (and maybe there are other
> > VRF cases to also consider).
>
> I've tried something like this, but something makes the kernel stuck
> here:
>
> TEST: ping out, blocked by route - ns-B loopback IPv6 [ OK ]
> TEST: ping out, device bind, blocked by route - ns-B loopback IPv6 [ OK ]
> TEST: ping in, blocked by route - ns-A loopback IPv6 [ OK ]
> TEST: ping out, unreachable route - ns-B loopback IPv6 [ OK ]
> TEST: ping out, device bind, unreachable route - ns-B loopback IPv6 [ OK ]
>
> #################################################################
> With VRF
>
> [hanged process and kernel won't shutdown]
>
> The code is:
>
> ---
> net/ipv6/ping.c | 12 +++++++++++-
> 1 file changed, 11 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
> index c4835dbdfcff..81293e902293 100644
> --- a/net/ipv6/ping.c
> +++ b/net/ipv6/ping.c
> @@ -73,6 +73,9 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> struct rt6_info *rt;
> struct pingfakehdr pfh;
> struct ipcm6_cookie ipc6;
> + struct net *net = sock_net(sk);
> + struct net_device *dev = NULL;
> + struct net_device *mdev = NULL;
> err = ping_common_sendmsg(AF_INET6, msg, len, &user_icmph,
> sizeof(user_icmph));
> @@ -111,10 +114,17 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> else if (!oif)
> oif = np->ucast_oif;
> + if (oif) {
> + dev = dev_get_by_index(net, oif);
> + mdev = netdev_master_upper_dev_get(dev);
> + }
> +
> addr_type = ipv6_addr_type(daddr);
> if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
> (addr_type & IPV6_ADDR_MAPPED) ||
> - (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
> + (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
> + !(mdev && sk->sk_bound_dev_if &&
> + mdev != dev_get_by_index(net, sk->sk_bound_dev_if))))
> return -EINVAL;
> ipcm6_init_sk(&ipc6, np);
>
> I am obviously doing something very stupid.

The problem is that dev_get_by_index() holds a reference on 'dev' which
your code never releases. Also netdev_master_upper_dev_get() needs rtnl
protection. These should have generated some kernel oops.

You can try this instead:

-------- >8 --------

diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
index c4835dbdfcff..f804c11e2146 100644
--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -114,7 +114,8 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
addr_type = ipv6_addr_type(daddr);
if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
(addr_type & IPV6_ADDR_MAPPED) ||
- (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
+ (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
+ l3mdev_master_ifindex_by_index(sock_net(sk), oif) != sk->sk_bound_dev_if))
return -EINVAL;

ipcm6_init_sk(&ipc6, np);


2023-06-06 14:24:52

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL] in vrf "bind - ns-B IPv6 LLA" test

On 6/6/23 15:46, Guillaume Nault wrote:
> On Tue, Jun 06, 2023 at 08:24:54AM +0200, Mirsad Goran Todorovac wrote:
>> On 5/31/23 20:11, Guillaume Nault wrote:
>>> I believe this condition should be relaxed to allow the case where
>>> ->sk_bound_dev_if is oif's master device (and maybe there are other
>>> VRF cases to also consider).
>>
>> I've tried something like this, but something makes the kernel stuck
>> here:
>>
>> TEST: ping out, blocked by route - ns-B loopback IPv6 [ OK ]
>> TEST: ping out, device bind, blocked by route - ns-B loopback IPv6 [ OK ]
>> TEST: ping in, blocked by route - ns-A loopback IPv6 [ OK ]
>> TEST: ping out, unreachable route - ns-B loopback IPv6 [ OK ]
>> TEST: ping out, device bind, unreachable route - ns-B loopback IPv6 [ OK ]
>>
>> #################################################################
>> With VRF
>>
>> [hanged process and kernel won't shutdown]
>>
>> The code is:
>>
>> ---
>> net/ipv6/ping.c | 12 +++++++++++-
>> 1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
>> index c4835dbdfcff..81293e902293 100644
>> --- a/net/ipv6/ping.c
>> +++ b/net/ipv6/ping.c
>> @@ -73,6 +73,9 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>> struct rt6_info *rt;
>> struct pingfakehdr pfh;
>> struct ipcm6_cookie ipc6;
>> + struct net *net = sock_net(sk);
>> + struct net_device *dev = NULL;
>> + struct net_device *mdev = NULL;
>> err = ping_common_sendmsg(AF_INET6, msg, len, &user_icmph,
>> sizeof(user_icmph));
>> @@ -111,10 +114,17 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>> else if (!oif)
>> oif = np->ucast_oif;
>> + if (oif) {
>> + dev = dev_get_by_index(net, oif);
>> + mdev = netdev_master_upper_dev_get(dev);
>> + }
>> +
>> addr_type = ipv6_addr_type(daddr);
>> if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
>> (addr_type & IPV6_ADDR_MAPPED) ||
>> - (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
>> + (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
>> + !(mdev && sk->sk_bound_dev_if &&
>> + mdev != dev_get_by_index(net, sk->sk_bound_dev_if))))
>> return -EINVAL;
>> ipcm6_init_sk(&ipc6, np);
>>
>> I am obviously doing something very stupid.
>
> The problem is that dev_get_by_index() holds a reference on 'dev' which
> your code never releases. Also netdev_master_upper_dev_get() needs rtnl
> protection. These should have generated some kernel oops.
>
> You can try this instead:
>
> -------- >8 --------
>
> diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
> index c4835dbdfcff..f804c11e2146 100644
> --- a/net/ipv6/ping.c
> +++ b/net/ipv6/ping.c
> @@ -114,7 +114,8 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> addr_type = ipv6_addr_type(daddr);
> if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
> (addr_type & IPV6_ADDR_MAPPED) ||
> - (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
> + (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
> + l3mdev_master_ifindex_by_index(sock_net(sk), oif) != sk->sk_bound_dev_if))
> return -EINVAL;
>
> ipcm6_init_sk(&ipc6, np);

Yes, I stupidly forgot that.

I came across a fixed version:

------
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
index c4835dbdfcff..c1d81c49b775 100644
--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -73,6 +73,10 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
struct rt6_info *rt;
struct pingfakehdr pfh;
struct ipcm6_cookie ipc6;
+ struct net *net = sock_net(sk);
+ struct net_device *dev = NULL;
+ struct net_device *mdev = NULL;
+ struct net_device *bdev = NULL;

err = ping_common_sendmsg(AF_INET6, msg, len, &user_icmph,
sizeof(user_icmph));
@@ -111,10 +115,26 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
else if (!oif)
oif = np->ucast_oif;

+ if (oif) {
+ rcu_read_lock();
+ dev = dev_get_by_index_rcu(net, oif);
+ rcu_read_unlock();
+ rtnl_lock();
+ mdev = netdev_master_upper_dev_get(dev);
+ rtnl_unlock();
+ }
+
+ if (sk->sk_bound_dev_if) {
+ rcu_read_lock();
+ bdev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
+ rcu_read_unlock();
+ }
+
addr_type = ipv6_addr_type(daddr);
if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
(addr_type & IPV6_ADDR_MAPPED) ||
- (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
+ (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
+ !(mdev && sk->sk_bound_dev_if && bdev && mdev == bdev)))
return -EINVAL;

ipcm6_init_sk(&ipc6, np);

However, this works by the test (888 passed) but your two liner is obviously
better :-)

Best regards,
Mirsad

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia

2023-06-06 14:26:01

by Guillaume Nault

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL] in vrf "bind - ns-B IPv6 LLA" test

On Tue, Jun 06, 2023 at 03:57:35PM +0200, Mirsad Todorovac wrote:
> diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
> index c4835dbdfcff..c1d81c49b775 100644
> --- a/net/ipv6/ping.c
> +++ b/net/ipv6/ping.c
> @@ -73,6 +73,10 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> struct rt6_info *rt;
> struct pingfakehdr pfh;
> struct ipcm6_cookie ipc6;
> + struct net *net = sock_net(sk);
> + struct net_device *dev = NULL;
> + struct net_device *mdev = NULL;
> + struct net_device *bdev = NULL;
>
> err = ping_common_sendmsg(AF_INET6, msg, len, &user_icmph,
> sizeof(user_icmph));
> @@ -111,10 +115,26 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> else if (!oif)
> oif = np->ucast_oif;
>
> + if (oif) {
> + rcu_read_lock();
> + dev = dev_get_by_index_rcu(net, oif);
> + rcu_read_unlock();

You can't assume '*dev' is still valid after rcu_read_unlock() unless
you hold a reference on it.

> + rtnl_lock();
> + mdev = netdev_master_upper_dev_get(dev);
> + rtnl_unlock();

Because of that, 'dev' might have already disappeared at the time
netdev_master_upper_dev_get() is called. So it may dereference an
invalid pointer here.

> + }
> +
> + if (sk->sk_bound_dev_if) {
> + rcu_read_lock();
> + bdev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
> + rcu_read_unlock();
> + }
> +
> addr_type = ipv6_addr_type(daddr);
> if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
> (addr_type & IPV6_ADDR_MAPPED) ||
> - (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
> + (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
> + !(mdev && sk->sk_bound_dev_if && bdev && mdev == bdev)))
> return -EINVAL;
>
> ipcm6_init_sk(&ipc6, np);
>
> However, this works by the test (888 passed) but your two liner is obviously
> better :-)

:)

> Best regards,
> Mirsad
>
> --
> Mirsad Goran Todorovac
> Sistem inženjer
> Grafički fakultet | Akademija likovnih umjetnosti
> Sveučilište u Zagrebu
>
> System engineer
> Faculty of Graphic Arts | Academy of Fine Arts
> University of Zagreb, Republic of Croatia
>


2023-06-06 14:57:12

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL] in vrf "bind - ns-B IPv6 LLA" test

On 6/6/23 16:11, Guillaume Nault wrote:
> On Tue, Jun 06, 2023 at 03:57:35PM +0200, Mirsad Todorovac wrote:
>> diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
>> index c4835dbdfcff..c1d81c49b775 100644
>> --- a/net/ipv6/ping.c
>> +++ b/net/ipv6/ping.c
>> @@ -73,6 +73,10 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>> struct rt6_info *rt;
>> struct pingfakehdr pfh;
>> struct ipcm6_cookie ipc6;
>> + struct net *net = sock_net(sk);
>> + struct net_device *dev = NULL;
>> + struct net_device *mdev = NULL;
>> + struct net_device *bdev = NULL;
>>
>> err = ping_common_sendmsg(AF_INET6, msg, len, &user_icmph,
>> sizeof(user_icmph));
>> @@ -111,10 +115,26 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>> else if (!oif)
>> oif = np->ucast_oif;
>>
>> + if (oif) {
>> + rcu_read_lock();
>> + dev = dev_get_by_index_rcu(net, oif);
>> + rcu_read_unlock();
>
> You can't assume '*dev' is still valid after rcu_read_unlock() unless
> you hold a reference on it.
>
>> + rtnl_lock();
>> + mdev = netdev_master_upper_dev_get(dev);
>> + rtnl_unlock();
>
> Because of that, 'dev' might have already disappeared at the time
> netdev_master_upper_dev_get() is called. So it may dereference an
> invalid pointer here.

Good point, thanks. I didn't expect those to change.

This can be fixed, provided that RCU and RTNL locks can be nested:

rcu_read_lock();
if (oif) {
dev = dev_get_by_index_rcu(net, oif);
rtnl_lock();
mdev = netdev_master_upper_dev_get(dev);
rtnl_unlock();
}

if (sk->sk_bound_dev_if) {
bdev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
}

addr_type = ipv6_addr_type(daddr);
if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
(addr_type & IPV6_ADDR_MAPPED) ||
(oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
!(mdev && sk->sk_bound_dev_if && bdev && mdev == bdev))) {
rcu_read_unlock();
return -EINVAL;
}
rcu_read_unlock();

But again this is still probably not race-free (bdev might also disappear before
the mdev == bdev test), even if it passed fcnal-test.sh, there is much duplication
of code, so your one-line solution is obviously by far better. :-)

Much obliged.

Best regards,
Mirsad

>> + }
>> +
>> + if (sk->sk_bound_dev_if) {
>> + rcu_read_lock();
>> + bdev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
>> + rcu_read_unlock();
>> + }
>> +
>> addr_type = ipv6_addr_type(daddr);
>> if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
>> (addr_type & IPV6_ADDR_MAPPED) ||
>> - (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
>> + (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
>> + !(mdev && sk->sk_bound_dev_if && bdev && mdev == bdev)))
>> return -EINVAL;
>>
>> ipcm6_init_sk(&ipc6, np);
>>
>> However, this works by the test (888 passed) but your two liner is obviously
>> better :-)
>
> :)

--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu

System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia

2023-06-06 18:10:34

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL][FIX TESTED] in vrf "bind - ns-B IPv6 LLA" test

On 6/6/23 15:46, Guillaume Nault wrote:
> On Tue, Jun 06, 2023 at 08:24:54AM +0200, Mirsad Goran Todorovac wrote:
>> On 5/31/23 20:11, Guillaume Nault wrote:
>>> I believe this condition should be relaxed to allow the case where
>>> ->sk_bound_dev_if is oif's master device (and maybe there are other
>>> VRF cases to also consider).
>>
>> I've tried something like this, but something makes the kernel stuck
>> here:
>>
>> TEST: ping out, blocked by route - ns-B loopback IPv6 [ OK ]
>> TEST: ping out, device bind, blocked by route - ns-B loopback IPv6 [ OK ]
>> TEST: ping in, blocked by route - ns-A loopback IPv6 [ OK ]
>> TEST: ping out, unreachable route - ns-B loopback IPv6 [ OK ]
>> TEST: ping out, device bind, unreachable route - ns-B loopback IPv6 [ OK ]
>>
>> #################################################################
>> With VRF
>>
>> [hanged process and kernel won't shutdown]
>>
>> The code is:
>>
>> ---
>> net/ipv6/ping.c | 12 +++++++++++-
>> 1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
>> index c4835dbdfcff..81293e902293 100644
>> --- a/net/ipv6/ping.c
>> +++ b/net/ipv6/ping.c
>> @@ -73,6 +73,9 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>> struct rt6_info *rt;
>> struct pingfakehdr pfh;
>> struct ipcm6_cookie ipc6;
>> + struct net *net = sock_net(sk);
>> + struct net_device *dev = NULL;
>> + struct net_device *mdev = NULL;
>> err = ping_common_sendmsg(AF_INET6, msg, len, &user_icmph,
>> sizeof(user_icmph));
>> @@ -111,10 +114,17 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>> else if (!oif)
>> oif = np->ucast_oif;
>> + if (oif) {
>> + dev = dev_get_by_index(net, oif);
>> + mdev = netdev_master_upper_dev_get(dev);
>> + }
>> +
>> addr_type = ipv6_addr_type(daddr);
>> if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
>> (addr_type & IPV6_ADDR_MAPPED) ||
>> - (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
>> + (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
>> + !(mdev && sk->sk_bound_dev_if &&
>> + mdev != dev_get_by_index(net, sk->sk_bound_dev_if))))
>> return -EINVAL;
>> ipcm6_init_sk(&ipc6, np);
>>
>> I am obviously doing something very stupid.
>
> The problem is that dev_get_by_index() holds a reference on 'dev' which
> your code never releases. Also netdev_master_upper_dev_get() needs rtnl
> protection. These should have generated some kernel oops.
>
> You can try this instead:
>
> -------- >8 --------
>
> diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
> index c4835dbdfcff..f804c11e2146 100644
> --- a/net/ipv6/ping.c
> +++ b/net/ipv6/ping.c
> @@ -114,7 +114,8 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> addr_type = ipv6_addr_type(daddr);
> if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
> (addr_type & IPV6_ADDR_MAPPED) ||
> - (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
> + (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
> + l3mdev_master_ifindex_by_index(sock_net(sk), oif) != sk->sk_bound_dev_if))
> return -EINVAL;
>
> ipcm6_init_sk(&ipc6, np);

The problem appears to be fixed:

# ./fcnal-test.sh
[...]
TEST: ping out, vrf device+address bind - ns-B loopback IPv6 [ OK ]
TEST: ping out, vrf device+address bind - ns-B IPv6 LLA [ OK ]
TEST: ping in - ns-A IPv6 [ OK ]
[...]
Tests passed: 888
Tests failed: 0
#

The test passed in both environments that manifested the bug.

Tested-by: Mirsad Todorovac <[email protected]>

However, test on my AMD Ubuntu 22.04 box with 6.4-rc5 commit a4d7d7011219
has shown additional four failed tests:

root@host # grep -n FAIL ../fcnal-test-4.log
90:TEST: ping local, VRF bind - VRF IP [FAIL]
92:TEST: ping local, device bind - ns-A IP [FAIL]
116:TEST: ping local, VRF bind - VRF IP [FAIL]
118:TEST: ping local, device bind - ns-A IP [FAIL]
root@host #

But you would probably want me to file a separate bug report?

Best regards,
Mirsad

2023-06-06 19:01:27

by Guillaume Nault

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL] in vrf "bind - ns-B IPv6 LLA" test

On Tue, Jun 06, 2023 at 04:28:02PM +0200, Mirsad Todorovac wrote:
> On 6/6/23 16:11, Guillaume Nault wrote:
> > On Tue, Jun 06, 2023 at 03:57:35PM +0200, Mirsad Todorovac wrote:
> > > + if (oif) {
> > > + rcu_read_lock();
> > > + dev = dev_get_by_index_rcu(net, oif);
> > > + rcu_read_unlock();
> >
> > You can't assume '*dev' is still valid after rcu_read_unlock() unless
> > you hold a reference on it.
> >
> > > + rtnl_lock();
> > > + mdev = netdev_master_upper_dev_get(dev);
> > > + rtnl_unlock();
> >
> > Because of that, 'dev' might have already disappeared at the time
> > netdev_master_upper_dev_get() is called. So it may dereference an
> > invalid pointer here.
>
> Good point, thanks. I didn't expect those to change.
>
> This can be fixed, provided that RCU and RTNL locks can be nested:

Well, yes and no. You can call rcu_read_{lock,unlock}() while under the
rtnl protection, but not the other way around.

> rcu_read_lock();
> if (oif) {
> dev = dev_get_by_index_rcu(net, oif);
> rtnl_lock();
> mdev = netdev_master_upper_dev_get(dev);
> rtnl_unlock();
> }

This is invalid: rtnl_lock() uses a mutex, so it can sleep and that's
forbidden inside an RCU critical section.

> if (sk->sk_bound_dev_if) {
> bdev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
> }
>
> addr_type = ipv6_addr_type(daddr);
> if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
> (addr_type & IPV6_ADDR_MAPPED) ||
> (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
> !(mdev && sk->sk_bound_dev_if && bdev && mdev == bdev))) {
> rcu_read_unlock();
> return -EINVAL;
> }
> rcu_read_unlock();
>
> But again this is still probably not race-free (bdev might also disappear before
> the mdev == bdev test), even if it passed fcnal-test.sh, there is much duplication
> of code, so your one-line solution is obviously by far better. :-)

The real problem is choosing the right function for getting the master
device. In particular netdev_master_upper_dev_get() was a bad choice.
It forces you to take the rtnl, which is unnatural here and obliges you
to add extra code, while all this shouldn't be necessary in the first
place.

> Much obliged.
>
> Best regards,
> Mirsad


2023-06-06 19:30:29

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL] in vrf "bind - ns-B IPv6 LLA" test

On 6/6/23 20:50, Guillaume Nault wrote:
> On Tue, Jun 06, 2023 at 04:28:02PM +0200, Mirsad Todorovac wrote:
>> On 6/6/23 16:11, Guillaume Nault wrote:
>>> On Tue, Jun 06, 2023 at 03:57:35PM +0200, Mirsad Todorovac wrote:
>>>> + if (oif) {
>>>> + rcu_read_lock();
>>>> + dev = dev_get_by_index_rcu(net, oif);
>>>> + rcu_read_unlock();
>>>
>>> You can't assume '*dev' is still valid after rcu_read_unlock() unless
>>> you hold a reference on it.
>>>
>>>> + rtnl_lock();
>>>> + mdev = netdev_master_upper_dev_get(dev);
>>>> + rtnl_unlock();
>>>
>>> Because of that, 'dev' might have already disappeared at the time
>>> netdev_master_upper_dev_get() is called. So it may dereference an
>>> invalid pointer here.
>>
>> Good point, thanks. I didn't expect those to change.
>>
>> This can be fixed, provided that RCU and RTNL locks can be nested:
>
> Well, yes and no. You can call rcu_read_{lock,unlock}() while under the
> rtnl protection, but not the other way around.
>
>> rcu_read_lock();
>> if (oif) {
>> dev = dev_get_by_index_rcu(net, oif);
>> rtnl_lock();
>> mdev = netdev_master_upper_dev_get(dev);
>> rtnl_unlock();
>> }
>
> This is invalid: rtnl_lock() uses a mutex, so it can sleep and that's
> forbidden inside an RCU critical section.

Obviously, that's bad. Mea culpa.

>> if (sk->sk_bound_dev_if) {
>> bdev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
>> }
>>
>> addr_type = ipv6_addr_type(daddr);
>> if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
>> (addr_type & IPV6_ADDR_MAPPED) ||
>> (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
>> !(mdev && sk->sk_bound_dev_if && bdev && mdev == bdev))) {
>> rcu_read_unlock();
>> return -EINVAL;
>> }
>> rcu_read_unlock();
>>
>> But again this is still probably not race-free (bdev might also disappear before
>> the mdev == bdev test), even if it passed fcnal-test.sh, there is much duplication
>> of code, so your one-line solution is obviously by far better. :-)
>
> The real problem is choosing the right function for getting the master
> device. In particular netdev_master_upper_dev_get() was a bad choice.
> It forces you to take the rtnl, which is unnatural here and obliges you
> to add extra code, while all this shouldn't be necessary in the first
> place.

Thank you for the additional insight. I had poor luck with Googling on
these.

I made a blunder after blunder. But it was insightful and brainstorming.
Good exercise for my little grey cells.

However, learning without making any errors appears to be simply a lot
of blunt memorising. :-/

It's good to be in an environment when one can learn from errors.

:-)

Regards,
Mirsad

2023-06-06 19:30:30

by Guillaume Nault

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL][FIX TESTED] in vrf "bind - ns-B IPv6 LLA" test

On Tue, Jun 06, 2023 at 08:07:36PM +0200, Mirsad Goran Todorovac wrote:
> On 6/6/23 15:46, Guillaume Nault wrote:
> > diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
> > index c4835dbdfcff..f804c11e2146 100644
> > --- a/net/ipv6/ping.c
> > +++ b/net/ipv6/ping.c
> > @@ -114,7 +114,8 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
> > addr_type = ipv6_addr_type(daddr);
> > if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
> > (addr_type & IPV6_ADDR_MAPPED) ||
> > - (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
> > + (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
> > + l3mdev_master_ifindex_by_index(sock_net(sk), oif) != sk->sk_bound_dev_if))
> > return -EINVAL;
> > ipcm6_init_sk(&ipc6, np);
>
> The problem appears to be fixed:
>
> # ./fcnal-test.sh
> [...]
> TEST: ping out, vrf device+address bind - ns-B loopback IPv6 [ OK ]
> TEST: ping out, vrf device+address bind - ns-B IPv6 LLA [ OK ]
> TEST: ping in - ns-A IPv6 [ OK ]
> [...]
> Tests passed: 888
> Tests failed: 0
> #
>
> The test passed in both environments that manifested the bug.
>
> Tested-by: Mirsad Todorovac <[email protected]>

Thanks. I'll review and post this patch (probably tomorrow).

> However, test on my AMD Ubuntu 22.04 box with 6.4-rc5 commit a4d7d7011219
> has shown additional four failed tests:
>
> root@host # grep -n FAIL ../fcnal-test-4.log
> 90:TEST: ping local, VRF bind - VRF IP [FAIL]
> 92:TEST: ping local, device bind - ns-A IP [FAIL]
> 116:TEST: ping local, VRF bind - VRF IP [FAIL]
> 118:TEST: ping local, device bind - ns-A IP [FAIL]
> root@host #
>
> But you would probably want me to file a separate bug report?

Are these new failures? Do they also happen on the -net tree?

> Best regards,
> Mirsad
>


2023-06-06 19:31:09

by Guillaume Nault

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL] in vrf "bind - ns-B IPv6 LLA" test

On Tue, Jun 06, 2023 at 09:17:24PM +0200, Mirsad Goran Todorovac wrote:
> On 6/6/23 20:50, Guillaume Nault wrote:
> > On Tue, Jun 06, 2023 at 04:28:02PM +0200, Mirsad Todorovac wrote:
> > > On 6/6/23 16:11, Guillaume Nault wrote:
> > > > On Tue, Jun 06, 2023 at 03:57:35PM +0200, Mirsad Todorovac wrote:
> > > > > + if (oif) {
> > > > > + rcu_read_lock();
> > > > > + dev = dev_get_by_index_rcu(net, oif);
> > > > > + rcu_read_unlock();
> > > >
> > > > You can't assume '*dev' is still valid after rcu_read_unlock() unless
> > > > you hold a reference on it.
> > > >
> > > > > + rtnl_lock();
> > > > > + mdev = netdev_master_upper_dev_get(dev);
> > > > > + rtnl_unlock();
> > > >
> > > > Because of that, 'dev' might have already disappeared at the time
> > > > netdev_master_upper_dev_get() is called. So it may dereference an
> > > > invalid pointer here.
> > >
> > > Good point, thanks. I didn't expect those to change.
> > >
> > > This can be fixed, provided that RCU and RTNL locks can be nested:
> >
> > Well, yes and no. You can call rcu_read_{lock,unlock}() while under the
> > rtnl protection, but not the other way around.
> >
> > > rcu_read_lock();
> > > if (oif) {
> > > dev = dev_get_by_index_rcu(net, oif);
> > > rtnl_lock();
> > > mdev = netdev_master_upper_dev_get(dev);
> > > rtnl_unlock();
> > > }
> >
> > This is invalid: rtnl_lock() uses a mutex, so it can sleep and that's
> > forbidden inside an RCU critical section.
>
> Obviously, that's bad. Mea culpa.
>
> > > if (sk->sk_bound_dev_if) {
> > > bdev = dev_get_by_index_rcu(net, sk->sk_bound_dev_if);
> > > }
> > >
> > > addr_type = ipv6_addr_type(daddr);
> > > if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
> > > (addr_type & IPV6_ADDR_MAPPED) ||
> > > (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
> > > !(mdev && sk->sk_bound_dev_if && bdev && mdev == bdev))) {
> > > rcu_read_unlock();
> > > return -EINVAL;
> > > }
> > > rcu_read_unlock();
> > >
> > > But again this is still probably not race-free (bdev might also disappear before
> > > the mdev == bdev test), even if it passed fcnal-test.sh, there is much duplication
> > > of code, so your one-line solution is obviously by far better. :-)
> >
> > The real problem is choosing the right function for getting the master
> > device. In particular netdev_master_upper_dev_get() was a bad choice.
> > It forces you to take the rtnl, which is unnatural here and obliges you
> > to add extra code, while all this shouldn't be necessary in the first
> > place.
>
> Thank you for the additional insight. I had poor luck with Googling on
> these.
>
> I made a blunder after blunder. But it was insightful and brainstorming.
> Good exercise for my little grey cells.
>
> However, learning without making any errors appears to be simply a lot
> of blunt memorising. :-/
>
> It's good to be in an environment when one can learn from errors.
>
> :-)

I'm happy you found this useful.

> Regards,
> Mirsad
>


2023-06-06 22:36:31

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL][FIX TESTED] in vrf "bind - ns-B IPv6 LLA" test

On 6/6/23 20:57, Guillaume Nault wrote:
> On Tue, Jun 06, 2023 at 08:07:36PM +0200, Mirsad Goran Todorovac wrote:
>> On 6/6/23 15:46, Guillaume Nault wrote:
>>> diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
>>> index c4835dbdfcff..f804c11e2146 100644
>>> --- a/net/ipv6/ping.c
>>> +++ b/net/ipv6/ping.c
>>> @@ -114,7 +114,8 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
>>> addr_type = ipv6_addr_type(daddr);
>>> if ((__ipv6_addr_needs_scope_id(addr_type) && !oif) ||
>>> (addr_type & IPV6_ADDR_MAPPED) ||
>>> - (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if))
>>> + (oif && sk->sk_bound_dev_if && oif != sk->sk_bound_dev_if &&
>>> + l3mdev_master_ifindex_by_index(sock_net(sk), oif) != sk->sk_bound_dev_if))
>>> return -EINVAL;
>>> ipcm6_init_sk(&ipc6, np);
>>
>> The problem appears to be fixed:
>>
>> # ./fcnal-test.sh
>> [...]
>> TEST: ping out, vrf device+address bind - ns-B loopback IPv6 [ OK ]
>> TEST: ping out, vrf device+address bind - ns-B IPv6 LLA [ OK ]
>> TEST: ping in - ns-A IPv6 [ OK ]
>> [...]
>> Tests passed: 888
>> Tests failed: 0
>> #
>>
>> The test passed in both environments that manifested the bug.
>>
>> Tested-by: Mirsad Todorovac <[email protected]>
>
> Thanks. I'll review and post this patch (probably tomorrow).
>
>> However, test on my AMD Ubuntu 22.04 box with 6.4-rc5 commit a4d7d7011219
>> has shown additional four failed tests:
>>
>> root@host # grep -n FAIL ../fcnal-test-4.log
>> 90:TEST: ping local, VRF bind - VRF IP [FAIL]
>> 92:TEST: ping local, device bind - ns-A IP [FAIL]
>> 116:TEST: ping local, VRF bind - VRF IP [FAIL]
>> 118:TEST: ping local, device bind - ns-A IP [FAIL]
>> root@host #
>>
>> But you would probably want me to file a separate bug report?
>
> Are these new failures? Do they also happen on the -net tree?

I cannot tell if those are new for the architecture (Ubuntu 22.04 + AMD Ryzen)

However, Ubuntu's unsigned 6.3.1 generic mainline kernel is also affected.
So, it might seem like an old problem.

(If you could isolate the exact tests, I could try a bisect.)

[...]
TEST: ping local, VRF bind - ns-A IP [ OK ]
TEST: ping local, VRF bind - VRF IP [FAIL]
TEST: ping local, VRF bind - loopback [ OK ]
TEST: ping local, device bind - ns-A IP [FAIL]
TEST: ping local, device bind - VRF IP [ OK ]
[...]

SYSCTL: net.ipv4.raw_l3mdev_accept=1

[...]
TEST: ping local, VRF bind - ns-A IP [ OK ]
TEST: ping local, VRF bind - VRF IP [FAIL]
TEST: ping local, VRF bind - loopback [ OK ]
TEST: ping local, device bind - ns-A IP [FAIL]
TEST: ping local, device bind - VRF IP [ OK ]
[...]

Yes, just tested, w commit 42510dffd0e2 these are still present
in fcnal-test.sh output:

[...]
TEST: ping local, VRF bind - ns-A IP [ OK ]
TEST: ping local, VRF bind - VRF IP [FAIL]
TEST: ping local, VRF bind - loopback [ OK ]
TEST: ping local, device bind - ns-A IP [FAIL]
TEST: ping local, device bind - VRF IP [ OK ]
[...]
TEST: ping local, VRF bind - ns-A IP [ OK ]
TEST: ping local, VRF bind - VRF IP [FAIL]
TEST: ping local, VRF bind - loopback [ OK ]
TEST: ping local, device bind - ns-A IP [FAIL]
TEST: ping local, device bind - VRF IP [ OK ]
[...]

I have attached the config used and lshw.

Best regards,
Mirsad


Attachments:
config-6.4.0-rc4-net-00208-g42510dffd0e2.xz (56.39 kB)
lshw.txt.xz (6.40 kB)
Download all attachments

2023-06-07 17:07:57

by Guillaume Nault

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL][FIX TESTED] in vrf "bind - ns-B IPv6 LLA" test

On Wed, Jun 07, 2023 at 12:04:52AM +0200, Mirsad Goran Todorovac wrote:
> I cannot tell if those are new for the architecture (Ubuntu 22.04 + AMD Ryzen)
>
> However, Ubuntu's unsigned 6.3.1 generic mainline kernel is also affected.
> So, it might seem like an old problem.
>
> (If you could isolate the exact tests, I could try a bisect.)
>
> [...]
> TEST: ping local, VRF bind - ns-A IP [ OK ]
> TEST: ping local, VRF bind - VRF IP [FAIL]
> TEST: ping local, VRF bind - loopback [ OK ]
> TEST: ping local, device bind - ns-A IP [FAIL]
> TEST: ping local, device bind - VRF IP [ OK ]
> [...]
>
> SYSCTL: net.ipv4.raw_l3mdev_accept=1
>
> [...]
> TEST: ping local, VRF bind - ns-A IP [ OK ]
> TEST: ping local, VRF bind - VRF IP [FAIL]
> TEST: ping local, VRF bind - loopback [ OK ]
> TEST: ping local, device bind - ns-A IP [FAIL]
> TEST: ping local, device bind - VRF IP [ OK ]
> [...]
>
> Yes, just tested, w commit 42510dffd0e2 these are still present
> in fcnal-test.sh output:
>
> [...]
> TEST: ping local, VRF bind - ns-A IP [ OK ]
> TEST: ping local, VRF bind - VRF IP [FAIL]
> TEST: ping local, VRF bind - loopback [ OK ]
> TEST: ping local, device bind - ns-A IP [FAIL]
> TEST: ping local, device bind - VRF IP [ OK ]
> [...]
> TEST: ping local, VRF bind - ns-A IP [ OK ]
> TEST: ping local, VRF bind - VRF IP [FAIL]
> TEST: ping local, VRF bind - loopback [ OK ]
> TEST: ping local, device bind - ns-A IP [FAIL]
> TEST: ping local, device bind - VRF IP [ OK ]
> [...]

I have the same failures here. They don't seem to be recent.
I'll take a look.

> I have attached the config used and lshw.
>
> Best regards,
> Mirsad




2023-06-08 06:19:18

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL][FIX TESTED] in vrf "bind - ns-B IPv6 LLA" test

On 6/7/23 18:51, Guillaume Nault wrote:
> On Wed, Jun 07, 2023 at 12:04:52AM +0200, Mirsad Goran Todorovac wrote:
>> I cannot tell if those are new for the architecture (Ubuntu 22.04 + AMD Ryzen)
>>
>> However, Ubuntu's unsigned 6.3.1 generic mainline kernel is also affected.
>> So, it might seem like an old problem.
>>
>> (If you could isolate the exact tests, I could try a bisect.)
>>
>> [...]
>> TEST: ping local, VRF bind - ns-A IP [ OK ]
>> TEST: ping local, VRF bind - VRF IP [FAIL]
>> TEST: ping local, VRF bind - loopback [ OK ]
>> TEST: ping local, device bind - ns-A IP [FAIL]
>> TEST: ping local, device bind - VRF IP [ OK ]
>> [...]
>>
>> SYSCTL: net.ipv4.raw_l3mdev_accept=1
>>
>> [...]
>> TEST: ping local, VRF bind - ns-A IP [ OK ]
>> TEST: ping local, VRF bind - VRF IP [FAIL]
>> TEST: ping local, VRF bind - loopback [ OK ]
>> TEST: ping local, device bind - ns-A IP [FAIL]
>> TEST: ping local, device bind - VRF IP [ OK ]
>> [...]
>>
>> Yes, just tested, w commit 42510dffd0e2 these are still present
>> in fcnal-test.sh output:
>>
>> [...]
>> TEST: ping local, VRF bind - ns-A IP [ OK ]
>> TEST: ping local, VRF bind - VRF IP [FAIL]
>> TEST: ping local, VRF bind - loopback [ OK ]
>> TEST: ping local, device bind - ns-A IP [FAIL]
>> TEST: ping local, device bind - VRF IP [ OK ]
>> [...]
>> TEST: ping local, VRF bind - ns-A IP [ OK ]
>> TEST: ping local, VRF bind - VRF IP [FAIL]
>> TEST: ping local, VRF bind - loopback [ OK ]
>> TEST: ping local, device bind - ns-A IP [FAIL]
>> TEST: ping local, device bind - VRF IP [ OK ]
>> [...]
>
> I have the same failures here. They don't seem to be recent.
> I'll take a look.

Certainly. I thought it might be something architecture-specific?

I have reproduced it also on a Lenovo IdeaPad 3 with Ubuntu 22.10,
but on Lenovo desktop with AlmaLinux 8.8 (CentOS fork), the result
was "888/888 passed".

However, I have a question:

In the ping + "With VRF" section, the tests with net.ipv4.raw_l3mdev_accept=1
are repeated twice, while "No VRF" section has the versions:

SYSCTL: net.ipv4.raw_l3mdev_accept=0

and

SYSCTL: net.ipv4.raw_l3mdev_accept=1

The same happens with the IPv6 ping tests.

In that case, it could be that we have only 2 actual FAIL cases,
because the error is reported twice.

Is this intentional?

Thanks,
Mirsad

74 #################################################################
75 With VRF
76
77 SYSCTL: net.ipv4.raw_l3mdev_accept=1
78
79 TEST: ping out, VRF bind - ns-B IP [ OK ]
80 TEST: ping out, device bind - ns-B IP [ OK ]
81 TEST: ping out, vrf device + dev address bind - ns-B IP [ OK ]
82 TEST: ping out, vrf device + vrf address bind - ns-B IP [ OK ]
83 TEST: ping out, VRF bind - ns-B loopback IP [ OK ]
84 TEST: ping out, device bind - ns-B loopback IP [ OK ]
85 TEST: ping out, vrf device + dev address bind - ns-B loopback IP [ OK ]
86 TEST: ping out, vrf device + vrf address bind - ns-B loopback IP [ OK ]
87 TEST: ping in - ns-A IP [ OK ]
88 TEST: ping in - VRF IP [ OK ]
89 TEST: ping local, VRF bind - ns-A IP [ OK ]
90 TEST: ping local, VRF bind - VRF IP [FAIL]
91 TEST: ping local, VRF bind - loopback [ OK ]
92 TEST: ping local, device bind - ns-A IP [FAIL]
93 TEST: ping local, device bind - VRF IP [ OK ]
94 TEST: ping local, device bind - loopback [ OK ]
95 TEST: ping out, vrf bind, blocked by rule - ns-B loopback IP [ OK ]
96 TEST: ping out, device bind, blocked by rule - ns-B loopback IP [ OK ]
97 TEST: ping in, blocked by rule - ns-A loopback IP [ OK ]
98 TEST: ping out, vrf bind, unreachable route - ns-B loopback IP [ OK ]
99 TEST: ping out, device bind, unreachable route - ns-B loopback IP [ OK ]
100 TEST: ping in, unreachable route - ns-A loopback IP [ OK ]
101 SYSCTL: net.ipv4.ping_group_range=0 2147483647
102
103 SYSCTL: net.ipv4.raw_l3mdev_accept=1
104
105 TEST: ping out, VRF bind - ns-B IP [ OK ]
106 TEST: ping out, device bind - ns-B IP [ OK ]
107 TEST: ping out, vrf device + dev address bind - ns-B IP [ OK ]
108 TEST: ping out, vrf device + vrf address bind - ns-B IP [ OK ]
109 TEST: ping out, VRF bind - ns-B loopback IP [ OK ]
110 TEST: ping out, device bind - ns-B loopback IP [ OK ]
111 TEST: ping out, vrf device + dev address bind - ns-B loopback IP [ OK ]
112 TEST: ping out, vrf device + vrf address bind - ns-B loopback IP [ OK ]
113 TEST: ping in - ns-A IP [ OK ]
114 TEST: ping in - VRF IP [ OK ]
115 TEST: ping local, VRF bind - ns-A IP [ OK ]
116 TEST: ping local, VRF bind - VRF IP [FAIL]
117 TEST: ping local, VRF bind - loopback [ OK ]
118 TEST: ping local, device bind - ns-A IP [FAIL]
119 TEST: ping local, device bind - VRF IP [ OK ]


2023-06-09 16:34:23

by Guillaume Nault

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL][FIX TESTED] in vrf "bind - ns-B IPv6 LLA" test

On Thu, Jun 08, 2023 at 07:37:15AM +0200, Mirsad Goran Todorovac wrote:
> On 6/7/23 18:51, Guillaume Nault wrote:
> > On Wed, Jun 07, 2023 at 12:04:52AM +0200, Mirsad Goran Todorovac wrote:
> > > [...]
> > > TEST: ping local, VRF bind - ns-A IP [ OK ]
> > > TEST: ping local, VRF bind - VRF IP [FAIL]
> > > TEST: ping local, VRF bind - loopback [ OK ]
> > > TEST: ping local, device bind - ns-A IP [FAIL]
> > > TEST: ping local, device bind - VRF IP [ OK ]
> > > [...]
> > > TEST: ping local, VRF bind - ns-A IP [ OK ]
> > > TEST: ping local, VRF bind - VRF IP [FAIL]
> > > TEST: ping local, VRF bind - loopback [ OK ]
> > > TEST: ping local, device bind - ns-A IP [FAIL]
> > > TEST: ping local, device bind - VRF IP [ OK ]
> > > [...]
> >
> > I have the same failures here. They don't seem to be recent.
> > I'll take a look.
>
> Certainly. I thought it might be something architecture-specific?
>
> I have reproduced it also on a Lenovo IdeaPad 3 with Ubuntu 22.10,
> but on Lenovo desktop with AlmaLinux 8.8 (CentOS fork), the result
> was "888/888 passed".

I've taken a deeper look at these failures. That's actually a problem in
ping. That's probably why you have different results depending on the
distribution.

The problem is that, for some versions, 'ping -I netdev ...' doesn't
bind the socket to 'netdev' if the IPv4 address to ping is set on that
same device. The VRF tests depend on this socket binding, so they fail
when ping refuses to bind. That was fixed upstream with commit
92ce8ef21393 ("Revert "ping: do not bind to device when destination IP
is on device"") (https://github.com/iputils/iputils/commit/92ce8ef2139353da3bf55fe2280bd4abd2155c9f).

Long story short, the tests should pass with the latest upstream ping
version.

Alternatively, you can modify the commands run by fcnal-test.sh and
provide the -I option twice: one for setting the device binding and one
for setting the source IPv4 address. This way ping should accept to
bind its socket.

Something like (not tested):

- run_cmd ping -c1 -w1 -I ${VRF} ${a}
+ run_cmd ping -c1 -w1 -I ${VRF} -I ${a} ${a}
[...]
- run_cmd ping -c1 -w1 -I ${NSA_DEV} ${a}
+ run_cmd ping -c1 -w1 -I ${NSA_DEV} -I ${a} ${a}

> However, I have a question:
>
> In the ping + "With VRF" section, the tests with net.ipv4.raw_l3mdev_accept=1
> are repeated twice, while "No VRF" section has the versions:
>
> SYSCTL: net.ipv4.raw_l3mdev_accept=0
>
> and
>
> SYSCTL: net.ipv4.raw_l3mdev_accept=1
>
> The same happens with the IPv6 ping tests.
>
> In that case, it could be that we have only 2 actual FAIL cases,
> because the error is reported twice.
>
> Is this intentional?

I don't know why the non-VRF tests are run once with raw_l3mdev_accept=0
and once with raw_l3mdev_accept=1. Unless I'm missing something, this
option shouldn't affect non-VRF users. Maybe the objective is to make
sure that it really doesn't affect them. David certainly knows better.

> Thanks,
> Mirsad
>


2023-06-10 18:56:10

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL][FIX TESTED] in vrf "bind - ns-B IPv6 LLA" test

On 6/9/23 18:13, Guillaume Nault wrote:
> On Thu, Jun 08, 2023 at 07:37:15AM +0200, Mirsad Goran Todorovac wrote:
>> On 6/7/23 18:51, Guillaume Nault wrote:
>>> On Wed, Jun 07, 2023 at 12:04:52AM +0200, Mirsad Goran Todorovac wrote:
>>>> [...]
>>>> TEST: ping local, VRF bind - ns-A IP [ OK ]
>>>> TEST: ping local, VRF bind - VRF IP [FAIL]
>>>> TEST: ping local, VRF bind - loopback [ OK ]
>>>> TEST: ping local, device bind - ns-A IP [FAIL]
>>>> TEST: ping local, device bind - VRF IP [ OK ]
>>>> [...]
>>>> TEST: ping local, VRF bind - ns-A IP [ OK ]
>>>> TEST: ping local, VRF bind - VRF IP [FAIL]
>>>> TEST: ping local, VRF bind - loopback [ OK ]
>>>> TEST: ping local, device bind - ns-A IP [FAIL]
>>>> TEST: ping local, device bind - VRF IP [ OK ]
>>>> [...]
>>>
>>> I have the same failures here. They don't seem to be recent.
>>> I'll take a look.
>>
>> Certainly. I thought it might be something architecture-specific?
>>
>> I have reproduced it also on a Lenovo IdeaPad 3 with Ubuntu 22.10,
>> but on Lenovo desktop with AlmaLinux 8.8 (CentOS fork), the result
>> was "888/888 passed".
>
> I've taken a deeper look at these failures. That's actually a problem in
> ping. That's probably why you have different results depending on the
> distribution.

Thank you for your work. I feel encouraged by your aim to get to the bottom
of the problem ...

> The problem is that, for some versions, 'ping -I netdev ...' doesn't
> bind the socket to 'netdev' if the IPv4 address to ping is set on that
> same device. The VRF tests depend on this socket binding, so they fail
> when ping refuses to bind. That was fixed upstream with commit
> 92ce8ef21393 ("Revert "ping: do not bind to device when destination IP
> is on device"") (https://github.com/iputils/iputils/commit/92ce8ef2139353da3bf55fe2280bd4abd2155c9f).
>
> Long story short, the tests should pass with the latest upstream ping
> version.
>
> Alternatively, you can modify the commands run by fcnal-test.sh and
> provide the -I option twice: one for setting the device binding and one
> for setting the source IPv4 address. This way ping should accept to
> bind its socket.
>
> Something like (not tested):
>
> - run_cmd ping -c1 -w1 -I ${VRF} ${a}
> + run_cmd ping -c1 -w1 -I ${VRF} -I ${a} ${a}
> [...]
> - run_cmd ping -c1 -w1 -I ${NSA_DEV} ${a}
> + run_cmd ping -c1 -w1 -I ${NSA_DEV} -I ${a} ${a}

I have tested this and the fix appears to work:

#################################################################
With VRF

SYSCTL: net.ipv4.raw_l3mdev_accept=1

TEST: ping out, VRF bind - ns-B IP [ OK ]
TEST: ping out, device bind - ns-B IP [ OK ]
TEST: ping out, vrf device + dev address bind - ns-B IP [ OK ]
TEST: ping out, vrf device + vrf address bind - ns-B IP [ OK ]
TEST: ping out, VRF bind - ns-B loopback IP [ OK ]
TEST: ping out, device bind - ns-B loopback IP [ OK ]
TEST: ping out, vrf device + dev address bind - ns-B loopback IP [ OK ]
TEST: ping out, vrf device + vrf address bind - ns-B loopback IP [ OK ]
TEST: ping in - ns-A IP [ OK ]
TEST: ping in - VRF IP [ OK ]
TEST: ping local, VRF bind - ns-A IP [ OK ]
TEST: ping local, VRF bind - VRF IP [ OK ]
TEST: ping local, VRF bind - loopback [ OK ]
TEST: ping local, device bind - ns-A IP [ OK ]
TEST: ping local, device bind - VRF IP [ OK ]
TEST: ping local, device bind - loopback [ OK ]
TEST: ping out, vrf bind, blocked by rule - ns-B loopback IP [ OK ]
TEST: ping out, device bind, blocked by rule - ns-B loopback IP [ OK ]
TEST: ping in, blocked by rule - ns-A loopback IP [ OK ]
TEST: ping out, vrf bind, unreachable route - ns-B loopback IP [ OK ]
TEST: ping out, device bind, unreachable route - ns-B loopback IP [ OK ]
TEST: ping in, unreachable route - ns-A loopback IP [ OK ]
SYSCTL: net.ipv4.ping_group_range=0 2147483647

SYSCTL: net.ipv4.raw_l3mdev_accept=1

TEST: ping out, VRF bind - ns-B IP [ OK ]
TEST: ping out, device bind - ns-B IP [ OK ]
TEST: ping out, vrf device + dev address bind - ns-B IP [ OK ]
TEST: ping out, vrf device + vrf address bind - ns-B IP [ OK ]
TEST: ping out, VRF bind - ns-B loopback IP [ OK ]
TEST: ping out, device bind - ns-B loopback IP [ OK ]
TEST: ping out, vrf device + dev address bind - ns-B loopback IP [ OK ]
TEST: ping out, vrf device + vrf address bind - ns-B loopback IP [ OK ]
TEST: ping in - ns-A IP [ OK ]
TEST: ping in - VRF IP [ OK ]
TEST: ping local, VRF bind - ns-A IP [ OK ]
TEST: ping local, VRF bind - VRF IP [ OK ]
TEST: ping local, VRF bind - loopback [ OK ]
TEST: ping local, device bind - ns-A IP [ OK ]
TEST: ping local, device bind - VRF IP [ OK ]
TEST: ping local, device bind - loopback [ OK ]
TEST: ping out, vrf bind, blocked by rule - ns-B loopback IP [ OK ]
TEST: ping out, device bind, blocked by rule - ns-B loopback IP [ OK ]
TEST: ping in, blocked by rule - ns-A loopback IP [ OK ]
TEST: ping out, vrf bind, unreachable route - ns-B loopback IP [ OK ]
TEST: ping out, device bind, unreachable route - ns-B loopback IP [ OK ]
TEST: ping in, unreachable route - ns-A loopback IP [ OK ]

###########################################################################

This also works on the Lenovo IdeaPad 3 Ubuntu 22.10 laptop, but on the AlmaLinux 8.8
Lenovo desktop I have a problem:

[root@pc-mtodorov net]# grep FAIL ../fcnal-test-4.log
TEST: ping local, VRF bind - ns-A IP [FAIL]
TEST: ping local, VRF bind - VRF IP [FAIL]
TEST: ping local, device bind - ns-A IP [FAIL]
TEST: ping local, VRF bind - ns-A IP [FAIL]
TEST: ping local, VRF bind - VRF IP [FAIL]
TEST: ping local, device bind - ns-A IP [FAIL]
[root@pc-mtodorov net]#

Kernel is the recent one:

[root@pc-mtodorov net]# uname -rms
Linux 6.4.0-rc5-testnet-00003-g5b23878f7ed9 x86_64
[root@pc-mtodorov net]#

>> However, I have a question:
>>
>> In the ping + "With VRF" section, the tests with net.ipv4.raw_l3mdev_accept=1
>> are repeated twice, while "No VRF" section has the versions:
>>
>> SYSCTL: net.ipv4.raw_l3mdev_accept=0
>>
>> and
>>
>> SYSCTL: net.ipv4.raw_l3mdev_accept=1
>>
>> The same happens with the IPv6 ping tests.
>>
>> In that case, it could be that we have only 2 actual FAIL cases,
>> because the error is reported twice.
>>
>> Is this intentional?
>
> I don't know why the non-VRF tests are run once with raw_l3mdev_accept=0
> and once with raw_l3mdev_accept=1. Unless I'm missing something, this
> option shouldn't affect non-VRF users. Maybe the objective is to make
> sure that it really doesn't affect them. David certainly knows better.

The problem appears to be that non-VRF tests are being ran with
raw_l3mdev_accept={0|1}, while VRF tests w raw_l3mdev_accept={1|1} ...

I will try to fix that, but I am not sure of the semantics either.

Regards,
Mirsad

2023-06-14 09:03:06

by Guillaume Nault

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL][FIX TESTED] in vrf "bind - ns-B IPv6 LLA" test

On Sat, Jun 10, 2023 at 08:04:02PM +0200, Mirsad Goran Todorovac wrote:
> This also works on the Lenovo IdeaPad 3 Ubuntu 22.10 laptop, but on the AlmaLinux 8.8
> Lenovo desktop I have a problem:
>
> [root@pc-mtodorov net]# grep FAIL ../fcnal-test-4.log
> TEST: ping local, VRF bind - ns-A IP [FAIL]
> TEST: ping local, VRF bind - VRF IP [FAIL]
> TEST: ping local, device bind - ns-A IP [FAIL]
> TEST: ping local, VRF bind - ns-A IP [FAIL]
> TEST: ping local, VRF bind - VRF IP [FAIL]
> TEST: ping local, device bind - ns-A IP [FAIL]
> [root@pc-mtodorov net]#
>
> Kernel is the recent one:
>
> [root@pc-mtodorov net]# uname -rms
> Linux 6.4.0-rc5-testnet-00003-g5b23878f7ed9 x86_64
> [root@pc-mtodorov net]#

Maybe a problem with the ping version used by the distribution.
You can try "./fcnal-test.sh -t ipv4_ping -p -v" to view the commands
run and make the script stop when there's a test failure (so that you
can see the ping output and try your own commands in the testing
environment).

> > > However, I have a question:
> > >
> > > In the ping + "With VRF" section, the tests with net.ipv4.raw_l3mdev_accept=1
> > > are repeated twice, while "No VRF" section has the versions:
> > >
> > > SYSCTL: net.ipv4.raw_l3mdev_accept=0
> > >
> > > and
> > >
> > > SYSCTL: net.ipv4.raw_l3mdev_accept=1
> > >
> > > The same happens with the IPv6 ping tests.
> > >
> > > In that case, it could be that we have only 2 actual FAIL cases,
> > > because the error is reported twice.
> > >
> > > Is this intentional?
> >
> > I don't know why the non-VRF tests are run once with raw_l3mdev_accept=0
> > and once with raw_l3mdev_accept=1. Unless I'm missing something, this
> > option shouldn't affect non-VRF users. Maybe the objective is to make
> > sure that it really doesn't affect them. David certainly knows better.
>
> The problem appears to be that non-VRF tests are being ran with
> raw_l3mdev_accept={0|1}, while VRF tests w raw_l3mdev_accept={1|1} ...

The reason the VRF tests run twice is to test both raw and ping sockets
(using the "net.ipv4.ping_group_range" sysctl). It doesn't seem anyone
ever intended to run the VRF tests with raw_l3mdev_accept=0.

Only the non-VRF tests were intended to be tested with
raw_l3mdev_accept=0 (see commit c032dd8cc7e2 ("selftests: Add ipv4 ping
tests to fcnal-test")). But I have no idea why.

> I will try to fix that, but I am not sure of the semantics either.
>
> Regards,
> Mirsad
>


2023-06-15 20:22:06

by Mirsad Todorovac

[permalink] [raw]
Subject: Re: POSSIBLE BUG: selftests/net/fcnal-test.sh: [FAIL][FIX TESTED] in vrf "bind - ns-B IPv6 LLA" test

On 6/14/23 10:47, Guillaume Nault wrote:
> On Sat, Jun 10, 2023 at 08:04:02PM +0200, Mirsad Goran Todorovac wrote:
>> This also works on the Lenovo IdeaPad 3 Ubuntu 22.10 laptop, but on the AlmaLinux 8.8
>> Lenovo desktop I have a problem:
>>
>> [root@pc-mtodorov net]# grep FAIL ../fcnal-test-4.log
>> TEST: ping local, VRF bind - ns-A IP [FAIL]
>> TEST: ping local, VRF bind - VRF IP [FAIL]
>> TEST: ping local, device bind - ns-A IP [FAIL]
>> TEST: ping local, VRF bind - ns-A IP [FAIL]
>> TEST: ping local, VRF bind - VRF IP [FAIL]
>> TEST: ping local, device bind - ns-A IP [FAIL]
>> [root@pc-mtodorov net]#
>>
>> Kernel is the recent one:
>>
>> [root@pc-mtodorov net]# uname -rms
>> Linux 6.4.0-rc5-testnet-00003-g5b23878f7ed9 x86_64
>> [root@pc-mtodorov net]#
>
> Maybe a problem with the ping version used by the distribution.
> You can try "./fcnal-test.sh -t ipv4_ping -p -v" to view the commands
> run and make the script stop when there's a test failure (so that you
> can see the ping output and try your own commands in the testing
> environment).

Thank you for taking the time for the reply. And thanks for the hint.
But I am sort of on ebb tide on this.

It would be good to have the test run on both versions of Linux to test
the actual kernel faults. Maybe pack a version of ping command w the test?
But I cannot deploy too much time in this.

I hope then the upgrade AlmaLinux 8.8 -> 9.x (or CentOS clones in general)
would solve the issue, but it is not guaranteed, and I would lose bisect
to the old kernels. Which is why I do not upgrade to the latest releases
in the first place. :-/

If it is just the AlmaLinux ping, then it is just an exotic distro, but it
is a CentOS clone, so the issue might exist in the more popular Rocky, too.

I am not sure what is the right way to do in this case or I would already
have done it. Presumptuous maybe, but true.

>>>> However, I have a question:
>>>>
>>>> In the ping + "With VRF" section, the tests with net.ipv4.raw_l3mdev_accept=1
>>>> are repeated twice, while "No VRF" section has the versions:
>>>>
>>>> SYSCTL: net.ipv4.raw_l3mdev_accept=0
>>>>
>>>> and
>>>>
>>>> SYSCTL: net.ipv4.raw_l3mdev_accept=1
>>>>
>>>> The same happens with the IPv6 ping tests.
>>>>
>>>> In that case, it could be that we have only 2 actual FAIL cases,
>>>> because the error is reported twice.
>>>>
>>>> Is this intentional?
>>>
>>> I don't know why the non-VRF tests are run once with raw_l3mdev_accept=0
>>> and once with raw_l3mdev_accept=1. Unless I'm missing something, this
>>> option shouldn't affect non-VRF users. Maybe the objective is to make
>>> sure that it really doesn't affect them. David certainly knows better.
>>
>> The problem appears to be that non-VRF tests are being ran with
>> raw_l3mdev_accept={0|1}, while VRF tests w raw_l3mdev_accept={1|1} ...
>
> The reason the VRF tests run twice is to test both raw and ping sockets
> (using the "net.ipv4.ping_group_range" sysctl). It doesn't seem anyone
> ever intended to run the VRF tests with raw_l3mdev_accept=0.
>
> Only the non-VRF tests were intended to be tested with
> raw_l3mdev_accept=0 (see commit c032dd8cc7e2 ("selftests: Add ipv4 ping
> tests to fcnal-test")). But I have no idea why.

Well, you are not to blame if it is not documented.

This thing doesn't come out of the testsuite save by prayer and fasting,
I'm afraid ;-)

Best regards,
Mirsad