On 2/16/22 3:36 AM, Xiao, Jiguang wrote:
> Hello,
>
> I found a counter in the kernel(5.10.49) that did not follow the RFC4293
> specification. The test steps are as follows:
>
>
>
> Topology:
>
> |VM 1| ------ |linux| ------ |VM 2|
>
>
>
> Steps:
>
> 1. Verify that “VM1” is reachable from “VM 2” and vice versa using ping6
> command.
>
> 2. On “linux” node, in proper fib, remove default route to NW address
> which “VM 2” resides in. This way, the packet won’t be forwarded by
> “linux” due to no route pointing to destination address of “VM 2”.
>
> 3. Collect the corresponding SNMP counters from “linux” node.
>
> 4. Verify that there is no connectivity from “VM 1” to “VM 2” using
> ping6 command.
>
> 5. Check the counters again.
>
>
>
> The test results:
>
> The counter “ip6InNoRoutes” in “/proc/net/dev_snmp6/” has not increased
> accordingly. In my test environment, it was always zero.
>
>
>
> My question is :
>
> Within RFC4293, “ipSystemStatsInNoRoutes” is defined as follows:
>
> “The number of input IP datagrams discarded because no route could be
> found to transmit them to their destination.”
>
> Does this version of the kernel comply with the RFC4293 specification?
>
>
I see that counter incrementing. Look at the fib6 tracepoints and see
what the lookups are returning:
perf record -e fib6:* -a
<run test>
Ctrl-C
perf script
Hi David
Thanks for guiding me how to proceed. I have captured the output result of perf (perf_output_5.10.49).
To confirm the problem, I tested it again on Ubuntu (kernel version is 5.4.0-79) using Docker and the results were the same, the only difference is the kernel version. I also collected the perf results and added them to the attachment (perf_output_5.4.0).
Best Regards
Xiao Jiguang
-----Original Message-----
From: David Ahern <[email protected]>
Sent: 2022年2月17日 11:00
To: Xiao, Jiguang <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: This counter "ip6InNoRoutes" does not follow the RFC4293 specification implementation
[Please note: This e-mail is from an EXTERNAL e-mail address]
On 2/16/22 3:36 AM, Xiao, Jiguang wrote:
> Hello,
>
> I found a counter in the kernel(5.10.49) that did not follow the
> RFC4293 specification. The test steps are as follows:
>
>
>
> Topology:
>
> |VM 1| ------ |linux| ------ |VM 2|
>
>
>
> Steps:
>
> 1. Verify that “VM1” is reachable from “VM 2” and vice versa using
> ping6 command.
>
> 2. On “linux” node, in proper fib, remove default route to NW address
> which “VM 2” resides in. This way, the packet won’t be forwarded by
> “linux” due to no route pointing to destination address of “VM 2”.
>
> 3. Collect the corresponding SNMP counters from “linux” node.
>
> 4. Verify that there is no connectivity from “VM 1” to “VM 2” using
> ping6 command.
>
> 5. Check the counters again.
>
>
>
> The test results:
>
> The counter “ip6InNoRoutes” in “/proc/net/dev_snmp6/” has not
> increased accordingly. In my test environment, it was always zero.
>
>
>
> My question is :
>
> Within RFC4293, “ipSystemStatsInNoRoutes” is defined as follows:
>
> “The number of input IP datagrams discarded because no route could
> be found to transmit them to their destination.”
>
> Does this version of the kernel comply with the RFC4293 specification?
>
>
I see that counter incrementing. Look at the fib6 tracepoints and see what the lookups are returning:
perf record -e fib6:* -a
<run test>
Ctrl-C
perf script
Hi David
To confirm whether my test method is correct, could you please briefly describe your test procedure?
Best Regards
Xiao Jiguang
-----Original Message-----
From: Xiao, Jiguang
Sent: 2022年2月24日 17:04
To: David Ahern <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Cc: Pudak, Filip <[email protected]>
Subject: RE: This counter "ip6InNoRoutes" does not follow the RFC4293 specification implementation
Hi David
Thanks for guiding me how to proceed. I have captured the output result of perf (perf_output_5.10.49).
To confirm the problem, I tested it again on Ubuntu (kernel version is 5.4.0-79) using Docker and the results were the same, the only difference is the kernel version. I also collected the perf results and added them to the attachment (perf_output_5.4.0).
Best Regards
Xiao Jiguang
-----Original Message-----
From: David Ahern <[email protected]>
Sent: 2022年2月17日 11:00
To: Xiao, Jiguang <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: This counter "ip6InNoRoutes" does not follow the RFC4293 specification implementation
[Please note: This e-mail is from an EXTERNAL e-mail address]
On 2/16/22 3:36 AM, Xiao, Jiguang wrote:
> Hello,
>
> I found a counter in the kernel(5.10.49) that did not follow the
> RFC4293 specification. The test steps are as follows:
>
>
>
> Topology:
>
> |VM 1| ------ |linux| ------ |VM 2|
>
>
>
> Steps:
>
> 1. Verify that “VM1” is reachable from “VM 2” and vice versa using
> ping6 command.
>
> 2. On “linux” node, in proper fib, remove default route to NW address
> which “VM 2” resides in. This way, the packet won’t be forwarded by
> “linux” due to no route pointing to destination address of “VM 2”.
>
> 3. Collect the corresponding SNMP counters from “linux” node.
>
> 4. Verify that there is no connectivity from “VM 1” to “VM 2” using
> ping6 command.
>
> 5. Check the counters again.
>
>
>
> The test results:
>
> The counter “ip6InNoRoutes” in “/proc/net/dev_snmp6/” has not
> increased accordingly. In my test environment, it was always zero.
>
>
>
> My question is :
>
> Within RFC4293, “ipSystemStatsInNoRoutes” is defined as follows:
>
> “The number of input IP datagrams discarded because no route could
> be found to transmit them to their destination.”
>
> Does this version of the kernel comply with the RFC4293 specification?
>
>
I see that counter incrementing. Look at the fib6 tracepoints and see what the lookups are returning:
perf record -e fib6:* -a
<run test>
Ctrl-C
perf script
On 3/8/22 7:16 PM, Xiao, Jiguang wrote:
> Hi David
>
> To confirm whether my test method is correct, could you please briefly describe your test procedure?
>
>
>
no formal test. Code analysis (ip6_pkt_discard{,_out} -> ip6_pkt_drop)
shows the counters that should be incrementing and then looking at the
counters on a local server.
FIB Lookup failures should generate a dst with one of these handlers:
static void ip6_rt_init_dst_reject(struct rt6_info *rt, u8 fib6_type)
{
rt->dst.error = ip6_rt_type_to_error(fib6_type);
switch (fib6_type) {
case RTN_BLACKHOLE:
rt->dst.output = dst_discard_out;
rt->dst.input = dst_discard;
break;
case RTN_PROHIBIT:
rt->dst.output = ip6_pkt_prohibit_out;
rt->dst.input = ip6_pkt_prohibit;
break;
case RTN_THROW:
case RTN_UNREACHABLE:
default:
rt->dst.output = ip6_pkt_discard_out;
rt->dst.input = ip6_pkt_discard;
break;
}
}
They all drop the packet with a given counter bumped.
Hi David,
So we end up in ip6_pkt_discard -> ip6_pkt_drop :
---
if (netif_is_l3_master(skb->dev) &&
dst->dev == net->loopback_dev)
idev = __in6_dev_get_safely(dev_get_by_index_rcu(net, IP6CB(skb)->iif));
else
idev = ip6_dst_idev(dst);
switch (ipstats_mib_noroutes) {
case IPSTATS_MIB_INNOROUTES:
type = ipv6_addr_type(&ipv6_hdr(skb)->daddr);
if (type == IPV6_ADDR_ANY) {
IP6_INC_STATS(net, idev, IPSTATS_MIB_INADDRERRORS);
break;
}
fallthrough;
case IPSTATS_MIB_OUTNOROUTES:
IP6_INC_STATS(net, idev, ipstats_mib_noroutes);
break;
}
---
What happens in the case where the l3mdev is not used, is that we go into the else branch(idev = ip6_dst_idev(dst);) and then we can see that the counter is incremented on the loopback IF.
So is the only option that l3mdev should be used or is it strange to expect that the idev where the INNOROUTES should increment is the ingress device by default in this case?
Best Regards,
Filip Pudak
-----Original Message-----
From: David Ahern <[email protected]>
Sent: Wednesday, 9 March 2022 05:50
To: Xiao, Jiguang <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Cc: Pudak, Filip <[email protected]>
Subject: Re: This counter "ip6InNoRoutes" does not follow the RFC4293 specification implementation
On 3/8/22 7:16 PM, Xiao, Jiguang wrote:
> Hi David
>
> To confirm whether my test method is correct, could you please briefly describe your test procedure?
>
>
>
no formal test. Code analysis (ip6_pkt_discard{,_out} -> ip6_pkt_drop) shows the counters that should be incrementing and then looking at the counters on a local server.
FIB Lookup failures should generate a dst with one of these handlers:
static void ip6_rt_init_dst_reject(struct rt6_info *rt, u8 fib6_type) {
rt->dst.error = ip6_rt_type_to_error(fib6_type);
switch (fib6_type) {
case RTN_BLACKHOLE:
rt->dst.output = dst_discard_out;
rt->dst.input = dst_discard;
break;
case RTN_PROHIBIT:
rt->dst.output = ip6_pkt_prohibit_out;
rt->dst.input = ip6_pkt_prohibit;
break;
case RTN_THROW:
case RTN_UNREACHABLE:
default:
rt->dst.output = ip6_pkt_discard_out;
rt->dst.input = ip6_pkt_discard;
break;
}
}
They all drop the packet with a given counter bumped.
On 3/31/22 3:13 AM, Pudak, Filip wrote:
> Hi David,
>
> So we end up in ip6_pkt_discard -> ip6_pkt_drop :
>
> ---
> if (netif_is_l3_master(skb->dev) &&
> dst->dev == net->loopback_dev)
That's a bug. I can not think of a case where those 2 conditions will
ever be true at the same time. I think that should '||'
> idev = __in6_dev_get_safely(dev_get_by_index_rcu(net, IP6CB(skb)->iif));
> else
> idev = ip6_dst_idev(dst);
>
> switch (ipstats_mib_noroutes) {
> case IPSTATS_MIB_INNOROUTES:
> type = ipv6_addr_type(&ipv6_hdr(skb)->daddr);
> if (type == IPV6_ADDR_ANY) {
> IP6_INC_STATS(net, idev, IPSTATS_MIB_INADDRERRORS);
> break;
> }
> fallthrough;
> case IPSTATS_MIB_OUTNOROUTES:
> IP6_INC_STATS(net, idev, ipstats_mib_noroutes);
> break;
> }
>
> ---
> What happens in the case where the l3mdev is not used, is that we go into the else branch(idev = ip6_dst_idev(dst);) and then we can see that the counter is incremented on the loopback IF.
>
> So is the only option that l3mdev should be used or is it strange to expect that the idev where the INNOROUTES should increment is the ingress device by default in this case?
>
Hi David,
It was indeed a bug. We've retested with '||' and the counter is incremented properly.
How do we go about including this change to the kernel? Will you perform the update?
Thanks,
Filip Pudak
-----Original Message-----
From: David Ahern <[email protected]>
Sent: Thursday, 31 March 2022 16:13
To: Pudak, Filip <[email protected]>; Xiao, Jiguang <[email protected]>; [email protected]; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: This counter "ip6InNoRoutes" does not follow the RFC4293 specification implementation
On 3/31/22 3:13 AM, Pudak, Filip wrote:
> Hi David,
>
> So we end up in ip6_pkt_discard -> ip6_pkt_drop :
>
> ---
> if (netif_is_l3_master(skb->dev) &&
> dst->dev == net->loopback_dev)
That's a bug. I can not think of a case where those 2 conditions will ever be true at the same time. I think that should '||'
> idev = __in6_dev_get_safely(dev_get_by_index_rcu(net, IP6CB(skb)->iif));
> else
> idev = ip6_dst_idev(dst);
>
> switch (ipstats_mib_noroutes) {
> case IPSTATS_MIB_INNOROUTES:
> type = ipv6_addr_type(&ipv6_hdr(skb)->daddr);
> if (type == IPV6_ADDR_ANY) {
> IP6_INC_STATS(net, idev, IPSTATS_MIB_INADDRERRORS);
> break;
> }
> fallthrough;
> case IPSTATS_MIB_OUTNOROUTES:
> IP6_INC_STATS(net, idev, ipstats_mib_noroutes);
> break;
> }
>
> ---
> What happens in the case where the l3mdev is not used, is that we go into the else branch(idev = ip6_dst_idev(dst);) and then we can see that the counter is incremented on the loopback IF.
>
> So is the only option that l3mdev should be used or is it strange to expect that the idev where the INNOROUTES should increment is the ingress device by default in this case?
>
On 4/4/22 1:09 AM, Pudak, Filip wrote:
> It was indeed a bug. We've retested with '||' and the counter is incremented properly.
>
> How do we go about including this change to the kernel? Will you perform the update?
patch sent.