LinuxLists.cc - [PATCH] tools/testing/selftests/bpf/test_tc_tunnel.sh: Prevent client connect before server bind

2024-02-29 14:01:57

Subject: [PATCH] tools/testing/selftests/bpf/test_tc_tunnel.sh: Prevent client connect before server bind

In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.

Signed-off-by: Alessandro Carminati (Red Hat) <[email protected]>
---
tools/testing/selftests/bpf/test_tc_tunnel.sh | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
index 910044f08908..01c0f4b1a8c2 100755
--- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
+++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
@@ -72,7 +72,6 @@ cleanup() {
server_listen() {
ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
server_pid=$!
- sleep 0.2
}

client_connect() {
@@ -93,6 +92,22 @@ verify_data() {
fi
}

+wait_for_port() {
+ local digits=8
+ local port2check=$(printf ":%04X" $1)
+ local prot=$([ "$2" == "-6" ] && echo 6 && digits=32)
+
+ for i in $(seq 20); do
+ if ip netns exec "${ns2}" cat /proc/net/tcp${prot} | \
+ sed -r 's/^[ \t]+[0-9]+: ([0-9A-F]{'${digits}'}:[0-9A-F]{4}) .*$/\1/' | \
+ grep -q "${port2check}"; then
+ return 0
+ fi
+ sleep 0.1
+ done
+ return 1
+}
+
set -e

# no arguments: automated test, run all
@@ -193,6 +208,7 @@ setup
# basic communication works
echo "test basic connectivity"
server_listen
+wait_for_port ${port} ${netcat_opt}
client_connect
verify_data

@@ -204,6 +220,7 @@ ip netns exec "${ns1}" tc filter add dev veth1 egress \
section "encap_${tuntype}_${mac}"
echo "test bpf encap without decap (expect failure)"
server_listen
+wait_for_port ${port} ${netcat_opt}
! client_connect

if [[ "$tuntype" =~ "udp" ]]; then
--
2.34.1

2024-03-08 01:04:06

by Martin KaFai Lau

[permalink] [raw]

Subject: Re: [PATCH] tools/testing/selftests/bpf/test_tc_tunnel.sh: Prevent client connect before server bind

On 2/29/24 6:00 AM, Alessandro Carminati (Red Hat) wrote:
> In some systems, the netcat server can incur in delay to start listening.
> When this happens, the test can randomly fail in various points.
> This is an example error message:
> # ip gre none gso
> # encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
> # test basic connectivity
> # Ncat: Connection refused.

This explained what is the issue. Please also explain how the patch solves it.

>
> Signed-off-by: Alessandro Carminati (Red Hat) <[email protected]>
> ---
> tools/testing/selftests/bpf/test_tc_tunnel.sh | 19 ++++++++++++++++++-
> 1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
> index 910044f08908..01c0f4b1a8c2 100755
> --- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
> +++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
> @@ -72,7 +72,6 @@ cleanup() {
> server_listen() {
> ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
> server_pid=$!
> - sleep 0.2
> }
>
> client_connect() {
> @@ -93,6 +92,22 @@ verify_data() {
> fi
> }
>
> +wait_for_port() {
> + local digits=8
> + local port2check=$(printf ":%04X" $1)
> + local prot=$([ "$2" == "-6" ] && echo 6 && digits=32)
> +
> + for i in $(seq 20); do
> + if ip netns exec "${ns2}" cat /proc/net/tcp${prot} | \
> + sed -r 's/^[ \t]+[0-9]+: ([0-9A-F]{'${digits}'}:[0-9A-F]{4}) .*$/\1/' | \
> + grep -q "${port2check}"; then

The idea is to check if there is socket listening on port 8888?

May be something simpler like "ss -OHtl src :$1" instead?

--
pw-bot: cr

The check-and-wait fix in this patch is fine to get your test environment going.

Eventually, it will be good to see the test_tc_tunnel.sh test moved to
test_progs. The test_tc_tunnel.sh is not run by bpf CI and issue like this got
unnoticed. Some other "*.sh" tests have already been moved to test_progs.

2024-03-10 09:46:28

by Alessandro Carminati (Red Hat)

[permalink] [raw]

Subject: Re: [PATCH] tools/testing/selftests/bpf/test_tc_tunnel.sh: Prevent client connect before server bind

Hi Martin,
Thanks for the review.

Il giorno ven 8 mar 2024 alle ore 02:03 Martin KaFai Lau
<[email protected]> ha scritto:
>
> On 2/29/24 6:00 AM, Alessandro Carminati (Red Hat) wrote:
> > In some systems, the netcat server can incur in delay to start listening.
> > When this happens, the test can randomly fail in various points.
> > This is an example error message:
> > # ip gre none gso
> > # encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
> > # test basic connectivity
> > # Ncat: Connection refused.
>
> This explained what is the issue. Please also explain how the patch solves it.
>
The issue, as stated, depends on a race condition between the netcat client
and server. The test author addressed this problem using a sleep, which I
removed in this patch. To easily solve the issue, one could simply increase
the sleep duration. However, this patch opts to tackle the problem by
querying the /proc directory and verifying TCP binds at the specified port
before letting the client connect.
> >
> > Signed-off-by: Alessandro Carminati (Red Hat) <[email protected]>
> > ---
> > tools/testing/selftests/bpf/test_tc_tunnel.sh | 19 ++++++++++++++++++-
> > 1 file changed, 18 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
> > index 910044f08908..01c0f4b1a8c2 100755
> > --- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
> > +++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
> > @@ -72,7 +72,6 @@ cleanup() {
> > server_listen() {
> > ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
> > server_pid=$!
> > - sleep 0.2
> > }
> >
> > client_connect() {
> > @@ -93,6 +92,22 @@ verify_data() {
> > fi
> > }
> >
> > +wait_for_port() {
> > + local digits=8
> > + local port2check=$(printf ":%04X" $1)
> > + local prot=$([ "$2" == "-6" ] && echo 6 && digits=32)
> > +
> > + for i in $(seq 20); do
> > + if ip netns exec "${ns2}" cat /proc/net/tcp${prot} | \
> > + sed -r 's/^[ \t]+[0-9]+: ([0-9A-F]{'${digits}'}:[0-9A-F]{4}) .*$/\1/' | \
> > + grep -q "${port2check}"; then
>
> The idea is to check if there is socket listening on port 8888?
>
> May be something simpler like "ss -OHtl src :$1" instead?
Indeed, the aim is to ensure that the server is bound before the
client attempts to
connect by checking if socket is listening, and yes using 'ss' would be shorter.
However, I chose not to use 'ss' or 'netstat' to avoid adding new dependencies,
considering they are already many.
Nonetheless, 'ss' is preferred, I have no objections.
>
> --
> pw-bot: cr
>
> The check-and-wait fix in this patch is fine to get your test environment going.
>
> Eventually, it will be good to see the test_tc_tunnel.sh test moved to
> test_progs. The test_tc_tunnel.sh is not run by bpf CI and issue like this got
> unnoticed. Some other "*.sh" tests have already been moved to test_progs.
>
>
Regards
Alessandro

2024-03-13 23:45:31

by Martin KaFai Lau

[permalink] [raw]

Subject: Re: [PATCH] tools/testing/selftests/bpf/test_tc_tunnel.sh: Prevent client connect before server bind

On 3/10/24 1:45 AM, Alessandro Carminati wrote:
> Hi Martin,
> Thanks for the review.
>
> Il giorno ven 8 mar 2024 alle ore 02:03 Martin KaFai Lau
> <[email protected]> ha scritto:
>>
>> On 2/29/24 6:00 AM, Alessandro Carminati (Red Hat) wrote:
>>> In some systems, the netcat server can incur in delay to start listening.
>>> When this happens, the test can randomly fail in various points.
>>> This is an example error message:
>>> # ip gre none gso
>>> # encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
>>> # test basic connectivity
>>> # Ncat: Connection refused.
>>
>> This explained what is the issue. Please also explain how the patch solves it.
>>
> The issue, as stated, depends on a race condition between the netcat client
> and server. The test author addressed this problem using a sleep, which I
> removed in this patch. To easily solve the issue, one could simply increase
> the sleep duration. However, this patch opts to tackle the problem by
> querying the /proc directory and verifying TCP binds at the specified port
> before letting the client connect.

Please include this in the commit message.

>>>
>>> Signed-off-by: Alessandro Carminati (Red Hat) <[email protected]>
>>> ---
>>> tools/testing/selftests/bpf/test_tc_tunnel.sh | 19 ++++++++++++++++++-
>>> 1 file changed, 18 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tools/testing/selftests/bpf/test_tc_tunnel.sh b/tools/testing/selftests/bpf/test_tc_tunnel.sh
>>> index 910044f08908..01c0f4b1a8c2 100755
>>> --- a/tools/testing/selftests/bpf/test_tc_tunnel.sh
>>> +++ b/tools/testing/selftests/bpf/test_tc_tunnel.sh
>>> @@ -72,7 +72,6 @@ cleanup() {
>>> server_listen() {
>>> ip netns exec "${ns2}" nc "${netcat_opt}" -l "${port}" > "${outfile}" &
>>> server_pid=$!
>>> - sleep 0.2
>>> }
>>>
>>> client_connect() {
>>> @@ -93,6 +92,22 @@ verify_data() {
>>> fi
>>> }
>>>
>>> +wait_for_port() {
>>> + local digits=8
>>> + local port2check=$(printf ":%04X" $1)
>>> + local prot=$([ "$2" == "-6" ] && echo 6 && digits=32)
>>> +
>>> + for i in $(seq 20); do
>>> + if ip netns exec "${ns2}" cat /proc/net/tcp${prot} | \
>>> + sed -r 's/^[ \t]+[0-9]+: ([0-9A-F]{'${digits}'}:[0-9A-F]{4}) .*$/\1/' | \
>>> + grep -q "${port2check}"; then
>>
>> The idea is to check if there is socket listening on port 8888?
>>
>> May be something simpler like "ss -OHtl src :$1" instead?
> Indeed, the aim is to ensure that the server is bound before the
> client attempts to
> connect by checking if socket is listening, and yes using 'ss' would be shorter.
> However, I chose not to use 'ss' or 'netstat' to avoid adding new dependencies,
> considering they are already many.

ss should be in the same iproute package that "(ip) netns..." lives in also. The
above changes added sed and grep.

Regardless, this external dependency will be all gone once moved to the
selftests/test_progs. The check-and-wait will be gone by creating a listen fd
instead of using "nc -l...". A simpler change such that people doing the future
test_progs migration will have an easier time.

>> The check-and-wait fix in this patch is fine to get your test environment going.
>>
>> Eventually, it will be good to see the test_tc_tunnel.sh test moved to
>> test_progs. The test_tc_tunnel.sh is not run by bpf CI and issue like this got
>> unnoticed. Some other "*.sh" tests have already been moved to test_progs.
>>
>>
> Regards
> Alessandro