LinuxLists.cc - [PATCH v2 0/2] ARM: allow kernel mode NEON in softirq context

2022-12-07 10:41:20

Subject: [PATCH v2 0/2] ARM: allow kernel mode NEON in softirq context

Currently on ARM, we only permit kernel mode NEON in task context, and
NEON based processing triggered from softirq context is queued for
asynchronous completion via the crypto API's cryptd layer.

For IPsec packet encryption involving highly performant crypto
implementations, this results in a substantial performance hit, and so
it would be desirable to permit those crypto operations to complete
synchronously even when invoked from softirq context.

For example, on a 1 GHz Cortex-A53 machine (SynQuacer), AES-256-GCM
executes in 7.2 cycles per byte, putting an upper bound of ~140 MB/s
on the achievable throughput of a single CPU.

Without these changes, an IPsec tunnel from a 32-bit VM to the 64-bit
host can achieve a throughput of 9.5 MB/s TX and 11.9 MB/s RX.

When the crypto algorithm is permitted to execute in softirq context,
the throughput increases to 16.5 MB/s TX and 41 MB/s RX.

(This is measured using debian's iperf3 3.11 with the default options)

So let's reorganize the VFP state handling so that it its critical
handling of the FPU registers runs with softirqs disabled. Then, update
the kernel_neon_begin()/end() logic to keep softirq processing disabled
as long as the NEON is being used in kernel mode.

Cc: Linus Walleij <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Russell King <[email protected]>

Ard Biesheuvel (2):
ARM: vfp: Manipulate VFP state with softirqs disabled
ARM: permit non-nested kernel mode NEON in softirq context

arch/arm/include/asm/assembler.h | 19 ++++++++++++-------
arch/arm/include/asm/simd.h | 8 ++++++++
arch/arm/kernel/asm-offsets.c | 1 +
arch/arm/vfp/entry.S | 4 ++--
arch/arm/vfp/vfphw.S | 4 ++--
arch/arm/vfp/vfpmodule.c | 19 ++++++++++++-------
6 files changed, 37 insertions(+), 18 deletions(-)
create mode 100644 arch/arm/include/asm/simd.h

--
2.35.1

2022-12-07 10:41:20

by Ard Biesheuvel

[permalink] [raw]

Subject: [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context

We currently only permit kernel mode NEON in process context, to avoid
the need to preserve/restore the NEON register file when taking an
exception while running in the kernel.

Like we did on arm64, we can relax this restriction substantially, by
permitting kernel mode NEON from softirq context, while ensuring that
softirq processing is disabled when the NEON is being used in task
context. This guarantees that only NEON context belonging to user space
needs to be preserved and restored, which is already taken care of.

This is especially relevant for network encryption, where incoming
frames are typically handled in softirq context, and deferring software
decryption to a kernel thread or falling back to C code are both
undesirable from a performance PoV.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm/include/asm/simd.h | 8 ++++++++
arch/arm/vfp/vfpmodule.c | 13 ++++++-------
2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/arch/arm/include/asm/simd.h b/arch/arm/include/asm/simd.h
new file mode 100644
index 0000000000000000..82191dbd7e78a036
--- /dev/null
+++ b/arch/arm/include/asm/simd.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#include <linux/hardirq.h>
+
+static __must_check inline bool may_use_simd(void)
+{
+ return IS_ENABLED(CONFIG_KERNEL_MODE_NEON) && !in_hardirq();
+}
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 8f5bc672b4aac04a..4e1a786df76df157 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -723,12 +723,12 @@ void kernel_neon_begin(void)
local_bh_disable();

/*
- * Kernel mode NEON is only allowed outside of interrupt context
- * with preemption disabled. This will make sure that the kernel
- * mode NEON register contents never need to be preserved.
+ * Kernel mode NEON is only allowed outside of hardirq context with
+ * preemption and softirq processing disabled. This will make sure that
+ * the kernel mode NEON register contents never need to be preserved.
*/
- BUG_ON(in_interrupt());
- cpu = get_cpu();
+ BUG_ON(in_hardirq());
+ cpu = __smp_processor_id();

fpexc = fmrx(FPEXC) | FPEXC_EN;
fmxr(FPEXC, fpexc);
@@ -744,7 +744,6 @@ void kernel_neon_begin(void)
vfp_save_state(vfp_current_hw_state[cpu], fpexc);
#endif
vfp_current_hw_state[cpu] = NULL;
- local_bh_enable();
}
EXPORT_SYMBOL(kernel_neon_begin);

@@ -752,7 +751,7 @@ void kernel_neon_end(void)
{
/* Disable the NEON/VFP unit. */
fmxr(FPEXC, fmrx(FPEXC) & ~FPEXC_EN);
- put_cpu();
+ local_bh_enable();
}
EXPORT_SYMBOL(kernel_neon_end);

--
2.35.1

2022-12-07 10:42:10

by Ard Biesheuvel

[permalink] [raw]

Subject: [PATCH v2 1/2] ARM: vfp: Manipulate VFP state with softirqs disabled

In a subsequent patch, we will relax the kernel mode NEON policy, and
permit kernel mode NEON to be used not only from task context, as is
permitted today, but also from softirq context.

Given that softirqs may trigger over the back of any IRQ unless they are
explicitly disabled, we need to address the resulting races in the VFP
state handling, by disabling softirq processing in two distinct but
related cases:
- kernel mode NEON will leave the FPU disabled after it completes, so
any kernel code sequence that enables the FPU and subsequently accesses
its registers needs to disable softirqs until it completes;
- kernel_neon_begin() will preserve the userland VFP state in memory,
and if it interrupts the ordinary VFP state preserve sequence, the
latter will resume execution with the VFP registers corrupted, and
happily save them to memory.

Given that disabling softirqs also disables preemption, we can replace
the existing preempt_disable/enable occurrences in the VFP state
handling asm code with new macros that dis/enable softirqs instead.
In the VFP state handling C code, add local_bh_disable/enable() calls
in those places where the VFP state is preserved.

One thing to keep in mind is that, once we allow NEON use in softirq
context, the result of any such interruption is that the FPEXC_EN bit in
the FPEXC register will be cleared, and vfp_current_hw_state[cpu] will
be NULL. This means that any sequence that [conditionally] clears
FPEXC_EN and/or sets vfp_current_hw_state[cpu] to NULL does not need to
run with softirqs disabled, as the result will be the same. Furthermore,
the handling of THREAD_NOTIFY_SWITCH is guaranteed to run with IRQs
disabled, and so it does not need protection from softirq interruptions
either.

Signed-off-by: Ard Biesheuvel <[email protected]>
---
arch/arm/include/asm/assembler.h | 19 ++++++++++++-------
arch/arm/kernel/asm-offsets.c | 1 +
arch/arm/vfp/entry.S | 4 ++--
arch/arm/vfp/vfphw.S | 4 ++--
arch/arm/vfp/vfpmodule.c | 8 +++++++-
5 files changed, 24 insertions(+), 12 deletions(-)

diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h
index 90fbe4a3f9c8472f..df999b75c0e25b01 100644
--- a/arch/arm/include/asm/assembler.h
+++ b/arch/arm/include/asm/assembler.h
@@ -236,21 +236,26 @@ THUMB( fpreg .req r7 )
sub \tmp, \tmp, #1 @ decrement it
str \tmp, [\ti, #TI_PREEMPT]
.endm
-
- .macro dec_preempt_count_ti, ti, tmp
- get_thread_info \ti
- dec_preempt_count \ti, \tmp
- .endm
#else
.macro inc_preempt_count, ti, tmp
.endm

.macro dec_preempt_count, ti, tmp
.endm
+#endif
+
+ .macro local_bh_disable, ti, tmp
+ ldr \tmp, [\ti, #TI_PREEMPT]
+ add \tmp, \tmp, #SOFTIRQ_DISABLE_OFFSET
+ str \tmp, [\ti, #TI_PREEMPT]
+ .endm

- .macro dec_preempt_count_ti, ti, tmp
+ .macro local_bh_enable_ti, ti, tmp
+ get_thread_info \ti
+ ldr \tmp, [\ti, #TI_PREEMPT]
+ sub \tmp, \tmp, #SOFTIRQ_DISABLE_OFFSET
+ str \tmp, [\ti, #TI_PREEMPT]
.endm
-#endif

#define USERL(l, x...) \
9999: x; \
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index 2c8d76fd7c66298a..38121c59cbc26cdd 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -56,6 +56,7 @@ int main(void)
DEFINE(VFP_CPU, offsetof(union vfp_state, hard.cpu));
#endif
#endif
+ DEFINE(SOFTIRQ_DISABLE_OFFSET,SOFTIRQ_DISABLE_OFFSET);
#ifdef CONFIG_ARM_THUMBEE
DEFINE(TI_THUMBEE_STATE, offsetof(struct thread_info, thumbee_state));
#endif
diff --git a/arch/arm/vfp/entry.S b/arch/arm/vfp/entry.S
index 27b0a1f27fbdf392..9a89264cdcc0b46e 100644
--- a/arch/arm/vfp/entry.S
+++ b/arch/arm/vfp/entry.S
@@ -22,7 +22,7 @@
@ IRQs enabled.
@
ENTRY(do_vfp)
- inc_preempt_count r10, r4
+ local_bh_disable r10, r4
ldr r4, .LCvfp
ldr r11, [r10, #TI_CPU] @ CPU number
add r10, r10, #TI_VFPSTATE @ r10 = workspace
@@ -30,7 +30,7 @@ ENTRY(do_vfp)
ENDPROC(do_vfp)

ENTRY(vfp_null_entry)
- dec_preempt_count_ti r10, r4
+ local_bh_enable_ti r10, r4
ret lr
ENDPROC(vfp_null_entry)

diff --git a/arch/arm/vfp/vfphw.S b/arch/arm/vfp/vfphw.S
index 6f7926c9c1790f66..26c4f61ecfa39638 100644
--- a/arch/arm/vfp/vfphw.S
+++ b/arch/arm/vfp/vfphw.S
@@ -175,7 +175,7 @@ vfp_hw_state_valid:
@ else it's one 32-bit instruction, so
@ always subtract 4 from the following
@ instruction address.
- dec_preempt_count_ti r10, r4
+ local_bh_enable_ti r10, r4
ret r9 @ we think we have handled things

@@ -200,7 +200,7 @@ skip:
@ not recognised by VFP

DBGSTR "not VFP"
- dec_preempt_count_ti r10, r4
+ local_bh_enable_ti r10, r4
ret lr

process_exception:
diff --git a/arch/arm/vfp/vfpmodule.c b/arch/arm/vfp/vfpmodule.c
index 2cb355c1b5b71694..8f5bc672b4aac04a 100644
--- a/arch/arm/vfp/vfpmodule.c
+++ b/arch/arm/vfp/vfpmodule.c
@@ -416,7 +416,7 @@ void VFP_bounce(u32 trigger, u32 fpexc, struct pt_regs *regs)
if (exceptions)
vfp_raise_exceptions(exceptions, trigger, orig_fpscr, regs);
exit:
- preempt_enable();
+ local_bh_enable();
}

static void vfp_enable(void *unused)
@@ -517,6 +517,8 @@ void vfp_sync_hwstate(struct thread_info *thread)
{
unsigned int cpu = get_cpu();

+ local_bh_disable();
+
if (vfp_state_in_hw(cpu, thread)) {
u32 fpexc = fmrx(FPEXC);

@@ -528,6 +530,7 @@ void vfp_sync_hwstate(struct thread_info *thread)
fmxr(FPEXC, fpexc);
}

+ local_bh_enable();
put_cpu();
}

@@ -717,6 +720,8 @@ void kernel_neon_begin(void)
unsigned int cpu;
u32 fpexc;

+ local_bh_disable();
+
/*
* Kernel mode NEON is only allowed outside of interrupt context
* with preemption disabled. This will make sure that the kernel
@@ -739,6 +744,7 @@ void kernel_neon_begin(void)
vfp_save_state(vfp_current_hw_state[cpu], fpexc);
#endif
vfp_current_hw_state[cpu] = NULL;
+ local_bh_enable();
}
EXPORT_SYMBOL(kernel_neon_begin);

--
2.35.1

2022-12-12 15:02:16

by Martin Willi

[permalink] [raw]

Subject: Re: [PATCH v2 0/2] ARM: allow kernel mode NEON in softirq context

Hi Ard,

> Currently on ARM, we only permit kernel mode NEON in task context [...]
> For IPsec packet encryption involving highly performant crypto
> implementations, this results in a substantial performance hit [...]

Thanks for your continued work on this.

> Without these changes, an IPsec tunnel from a 32-bit VM to the 64-bit
> host can achieve a throughput of 9.5 MB/s TX and 11.9 MB/s RX.
>
> When the crypto algorithm is permitted to execute in softirq context,
> the throughput increases to 16.5 MB/s TX and 41 MB/s RX.

In my tests on an Armada 385, I could increase IPsec throughput with
ChaCha20/Poly1305 on RX from ~230 to ~260 MBit/s when using the NEON
code path. So you may add my:

Tested-by: Martin Willi <[email protected]>

Thanks,
Martin

2022-12-13 17:04:19

by Ard Biesheuvel

[permalink] [raw]

Subject: Re: [PATCH v2 0/2] ARM: allow kernel mode NEON in softirq context

On Mon, 12 Dec 2022 at 15:38, Martin Willi <[email protected]> wrote:
>
> Hi Ard,
>
> > Currently on ARM, we only permit kernel mode NEON in task context [...]
> > For IPsec packet encryption involving highly performant crypto
> > implementations, this results in a substantial performance hit [...]
>
> Thanks for your continued work on this.
>
> > Without these changes, an IPsec tunnel from a 32-bit VM to the 64-bit
> > host can achieve a throughput of 9.5 MB/s TX and 11.9 MB/s RX.
> >
> > When the crypto algorithm is permitted to execute in softirq context,
> > the throughput increases to 16.5 MB/s TX and 41 MB/s RX.
>
> In my tests on an Armada 385, I could increase IPsec throughput with
> ChaCha20/Poly1305 on RX from ~230 to ~260 MBit/s when using the NEON
> code path. So you may add my:
>
> Tested-by: Martin Willi <[email protected]>
>

Thanks!

2022-12-15 10:37:35

by Linus Walleij

[permalink] [raw]

Subject: Re: [PATCH v2 1/2] ARM: vfp: Manipulate VFP state with softirqs disabled

On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <[email protected]> wrote:

> In a subsequent patch, we will relax the kernel mode NEON policy, and
> permit kernel mode NEON to be used not only from task context, as is
> permitted today, but also from softirq context.
>
> Given that softirqs may trigger over the back of any IRQ unless they are
> explicitly disabled, we need to address the resulting races in the VFP
> state handling, by disabling softirq processing in two distinct but
> related cases:
> - kernel mode NEON will leave the FPU disabled after it completes, so
> any kernel code sequence that enables the FPU and subsequently accesses
> its registers needs to disable softirqs until it completes;
> - kernel_neon_begin() will preserve the userland VFP state in memory,
> and if it interrupts the ordinary VFP state preserve sequence, the
> latter will resume execution with the VFP registers corrupted, and
> happily save them to memory.
>
> Given that disabling softirqs also disables preemption, we can replace
> the existing preempt_disable/enable occurrences in the VFP state
> handling asm code with new macros that dis/enable softirqs instead.
> In the VFP state handling C code, add local_bh_disable/enable() calls
> in those places where the VFP state is preserved.
>
> One thing to keep in mind is that, once we allow NEON use in softirq
> context, the result of any such interruption is that the FPEXC_EN bit in
> the FPEXC register will be cleared, and vfp_current_hw_state[cpu] will
> be NULL. This means that any sequence that [conditionally] clears
> FPEXC_EN and/or sets vfp_current_hw_state[cpu] to NULL does not need to
> run with softirqs disabled, as the result will be the same. Furthermore,
> the handling of THREAD_NOTIFY_SWITCH is guaranteed to run with IRQs
> disabled, and so it does not need protection from softirq interruptions
> either.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>

Tricky patch, I had to read it a few times and visualize the concepts,
but I am sufficiently convinced that it does the right thing.
Reviewed-by: Linus Walleij <[email protected]>

Yours,
Linus Walleij

2022-12-15 10:39:36

by Linus Walleij

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context

On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <[email protected]> wrote:

> We currently only permit kernel mode NEON in process context, to avoid
> the need to preserve/restore the NEON register file when taking an
> exception while running in the kernel.
>
> Like we did on arm64, we can relax this restriction substantially, by
> permitting kernel mode NEON from softirq context, while ensuring that
> softirq processing is disabled when the NEON is being used in task
> context. This guarantees that only NEON context belonging to user space
> needs to be preserved and restored, which is already taken care of.
>
> This is especially relevant for network encryption, where incoming
> frames are typically handled in softirq context, and deferring software
> decryption to a kernel thread or falling back to C code are both
> undesirable from a performance PoV.
>
> Signed-off-by: Ard Biesheuvel <[email protected]>

So boosting WireGuard as primary SW network encryption user?
This is really neat, BTW:
Reviewed-by: Linus Walleij <[email protected]>

Yours,
Linus Walleij

2022-12-15 10:54:12

by Ard Biesheuvel

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context

On Thu, 15 Dec 2022 at 11:27, Linus Walleij <[email protected]> wrote:
>
> On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <[email protected]> wrote:
>
> > We currently only permit kernel mode NEON in process context, to avoid
> > the need to preserve/restore the NEON register file when taking an
> > exception while running in the kernel.
> >
> > Like we did on arm64, we can relax this restriction substantially, by
> > permitting kernel mode NEON from softirq context, while ensuring that
> > softirq processing is disabled when the NEON is being used in task
> > context. This guarantees that only NEON context belonging to user space
> > needs to be preserved and restored, which is already taken care of.
> >
> > This is especially relevant for network encryption, where incoming
> > frames are typically handled in softirq context, and deferring software
> > decryption to a kernel thread or falling back to C code are both
> > undesirable from a performance PoV.
> >
> > Signed-off-by: Ard Biesheuvel <[email protected]>
>
> So boosting WireGuard as primary SW network encryption user?

Essentially, although the use case that inspired this work is related
to IPsec not WireGuard, and the crypto algorithm in that case (GCM) is
~3x faster than WG's chacha20poly1305, which makes the performance
overhead of asynchronous completion even more significant. (Note that
GCM needs the AES and PMULL instructions which are usually only
available when running the 32-bit kernel on a 64-bit core, whereas
chacha20poly1305 uses ordinary NEON instructions.)

But Martin responded with a Tested-by regarding chacha20poly1305 on
IPsec (not WG) where there is also a noticeable speedup, so WG on
ARM32 should definitely benefit from this as well.

> This is really neat, BTW:
> Reviewed-by: Linus Walleij <[email protected]>
>

Thanks!

2022-12-15 10:58:20

by Russell King (Oracle)

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context

On Thu, Dec 15, 2022 at 11:43:22AM +0100, Ard Biesheuvel wrote:
> On Thu, 15 Dec 2022 at 11:27, Linus Walleij <[email protected]> wrote:
> >
> > On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <[email protected]> wrote:
> >
> > > We currently only permit kernel mode NEON in process context, to avoid
> > > the need to preserve/restore the NEON register file when taking an
> > > exception while running in the kernel.
> > >
> > > Like we did on arm64, we can relax this restriction substantially, by
> > > permitting kernel mode NEON from softirq context, while ensuring that
> > > softirq processing is disabled when the NEON is being used in task
> > > context. This guarantees that only NEON context belonging to user space
> > > needs to be preserved and restored, which is already taken care of.
> > >
> > > This is especially relevant for network encryption, where incoming
> > > frames are typically handled in softirq context, and deferring software
> > > decryption to a kernel thread or falling back to C code are both
> > > undesirable from a performance PoV.
> > >
> > > Signed-off-by: Ard Biesheuvel <[email protected]>
> >
> > So boosting WireGuard as primary SW network encryption user?
>
> Essentially, although the use case that inspired this work is related
> to IPsec not WireGuard, and the crypto algorithm in that case (GCM) is
> ~3x faster than WG's chacha20poly1305, which makes the performance
> overhead of asynchronous completion even more significant. (Note that
> GCM needs the AES and PMULL instructions which are usually only
> available when running the 32-bit kernel on a 64-bit core, whereas
> chacha20poly1305 uses ordinary NEON instructions.)
>
> But Martin responded with a Tested-by regarding chacha20poly1305 on
> IPsec (not WG) where there is also a noticeable speedup, so WG on
> ARM32 should definitely benefit from this as well.

It'll be interesting to see whether there is any noticable difference
with my WG VPN.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2022-12-15 11:59:57

by Ard Biesheuvel

[permalink] [raw]

Subject: Re: [PATCH v2 2/2] ARM: permit non-nested kernel mode NEON in softirq context

On Thu, 15 Dec 2022 at 11:51, Russell King (Oracle)
<[email protected]> wrote:
>
> On Thu, Dec 15, 2022 at 11:43:22AM +0100, Ard Biesheuvel wrote:
> > On Thu, 15 Dec 2022 at 11:27, Linus Walleij <[email protected]> wrote:
> > >
> > > On Wed, Dec 7, 2022 at 11:39 AM Ard Biesheuvel <[email protected]> wrote:
> > >
> > > > We currently only permit kernel mode NEON in process context, to avoid
> > > > the need to preserve/restore the NEON register file when taking an
> > > > exception while running in the kernel.
> > > >
> > > > Like we did on arm64, we can relax this restriction substantially, by
> > > > permitting kernel mode NEON from softirq context, while ensuring that
> > > > softirq processing is disabled when the NEON is being used in task
> > > > context. This guarantees that only NEON context belonging to user space
> > > > needs to be preserved and restored, which is already taken care of.
> > > >
> > > > This is especially relevant for network encryption, where incoming
> > > > frames are typically handled in softirq context, and deferring software
> > > > decryption to a kernel thread or falling back to C code are both
> > > > undesirable from a performance PoV.
> > > >
> > > > Signed-off-by: Ard Biesheuvel <[email protected]>
> > >
> > > So boosting WireGuard as primary SW network encryption user?
> >
> > Essentially, although the use case that inspired this work is related
> > to IPsec not WireGuard, and the crypto algorithm in that case (GCM) is
> > ~3x faster than WG's chacha20poly1305, which makes the performance
> > overhead of asynchronous completion even more significant. (Note that
> > GCM needs the AES and PMULL instructions which are usually only
> > available when running the 32-bit kernel on a 64-bit core, whereas
> > chacha20poly1305 uses ordinary NEON instructions.)
> >
> > But Martin responded with a Tested-by regarding chacha20poly1305 on
> > IPsec (not WG) where there is also a noticeable speedup, so WG on
> > ARM32 should definitely benefit from this as well.
>
> It'll be interesting to see whether there is any noticable difference
> with my WG VPN.
>

Using WireGuard with the same 32-bit KVM guest communicating with its
64-bit host using virtio-net, I get a 44% speedup in the host->guest
direction. The other direction performs exactly the same, which is
unsurprising as it doesn't involve NEON crypto in softirq context at
all.

BEFORE
======

ardb@vm32:~$ iperf3 -c 192.168.11.2
Connecting to host 192.168.11.2, port 5201
[ 5] local 192.168.11.1 port 40144 connected to 192.168.11.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 25.8 MBytes 216 Mbits/sec 0 397 KBytes
[ 5] 1.00-2.00 sec 25.9 MBytes 217 Mbits/sec 0 397 KBytes
[ 5] 2.00-3.00 sec 27.0 MBytes 226 Mbits/sec 0 397 KBytes
[ 5] 3.00-4.00 sec 26.5 MBytes 222 Mbits/sec 0 397 KBytes
[ 5] 4.00-5.00 sec 26.2 MBytes 220 Mbits/sec 0 397 KBytes
[ 5] 5.00-6.00 sec 26.1 MBytes 219 Mbits/sec 0 436 KBytes
[ 5] 6.00-7.00 sec 26.2 MBytes 220 Mbits/sec 0 458 KBytes
[ 5] 7.00-8.00 sec 26.2 MBytes 220 Mbits/sec 0 458 KBytes
[ 5] 8.00-9.00 sec 26.5 MBytes 222 Mbits/sec 0 480 KBytes
[ 5] 9.00-10.00 sec 26.9 MBytes 225 Mbits/sec 0 480 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 263 MBytes 221 Mbits/sec 0 sender
[ 5] 0.00-10.00 sec 262 MBytes 220 Mbits/sec receiver

ardb@sudo:~$ iperf3 -c 192.168.11.1
Connecting to host 192.168.11.1, port 5201
[ 5] local 192.168.11.2 port 46340 connected to 192.168.11.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 47.5 MBytes 398 Mbits/sec 0 1.75 MBytes
[ 5] 1.00-2.00 sec 45.0 MBytes 377 Mbits/sec 18 1.35 MBytes
[ 5] 2.00-3.00 sec 43.8 MBytes 367 Mbits/sec 0 1.47 MBytes
[ 5] 3.00-4.00 sec 45.0 MBytes 377 Mbits/sec 0 1.56 MBytes
[ 5] 4.00-5.00 sec 45.0 MBytes 377 Mbits/sec 0 1.63 MBytes
[ 5] 5.00-6.00 sec 42.5 MBytes 357 Mbits/sec 0 1.68 MBytes
[ 5] 6.00-7.00 sec 43.8 MBytes 367 Mbits/sec 0 1.71 MBytes
[ 5] 7.00-8.00 sec 43.8 MBytes 367 Mbits/sec 0 1.73 MBytes
[ 5] 8.00-9.00 sec 45.0 MBytes 377 Mbits/sec 0 1.74 MBytes
[ 5] 9.00-10.00 sec 43.8 MBytes 367 Mbits/sec 0 1.75 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 445 MBytes 373 Mbits/sec 18 sender
[ 5] 0.00-10.04 sec 444 MBytes 371 Mbits/sec receiver

iperf Done.

AFTER
=====

ardb@vm32:~$ iperf3 -c 192.168.11.2
Connecting to host 192.168.11.2, port 5201
[ 5] local 192.168.11.1 port 44004 connected to 192.168.11.2 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 26.2 MBytes 220 Mbits/sec 0 399 KBytes
[ 5] 1.00-2.00 sec 25.9 MBytes 217 Mbits/sec 0 399 KBytes
[ 5] 2.00-3.00 sec 26.0 MBytes 218 Mbits/sec 0 444 KBytes
[ 5] 3.00-4.00 sec 26.8 MBytes 225 Mbits/sec 0 485 KBytes
[ 5] 4.00-5.00 sec 26.4 MBytes 222 Mbits/sec 0 542 KBytes
[ 5] 5.00-6.00 sec 26.6 MBytes 223 Mbits/sec 0 568 KBytes
[ 5] 6.00-7.00 sec 25.4 MBytes 213 Mbits/sec 0 568 KBytes
[ 5] 7.00-8.00 sec 25.9 MBytes 217 Mbits/sec 0 568 KBytes
[ 5] 8.00-9.00 sec 26.7 MBytes 224 Mbits/sec 0 568 KBytes
[ 5] 9.00-10.00 sec 25.9 MBytes 217 Mbits/sec 0 568 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 262 MBytes 220 Mbits/sec 0 sender
[ 5] 0.00-9.99 sec 261 MBytes 219 Mbits/sec receiver

iperf Done.

ardb@sudo:~$ iperf3 -c 192.168.11.1
Connecting to host 192.168.11.1, port 5201
[ 5] local 192.168.11.2 port 49838 connected to 192.168.11.1 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 61.2 MBytes 514 Mbits/sec 0 1.59 MBytes
[ 5] 1.00-2.00 sec 66.2 MBytes 555 Mbits/sec 0 1.67 MBytes
[ 5] 2.00-3.00 sec 65.0 MBytes 545 Mbits/sec 79 1.24 MBytes
[ 5] 3.00-4.00 sec 63.8 MBytes 535 Mbits/sec 0 1.36 MBytes
[ 5] 4.00-5.00 sec 63.8 MBytes 535 Mbits/sec 0 1.46 MBytes
[ 5] 5.00-6.00 sec 63.8 MBytes 535 Mbits/sec 0 1.53 MBytes
[ 5] 6.00-7.00 sec 62.5 MBytes 524 Mbits/sec 0 1.59 MBytes
[ 5] 7.00-8.00 sec 65.0 MBytes 545 Mbits/sec 99 1.18 MBytes
[ 5] 8.00-9.00 sec 65.0 MBytes 545 Mbits/sec 0 1.25 MBytes
[ 5] 9.00-10.00 sec 65.0 MBytes 545 Mbits/sec 0 1.30 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 641 MBytes 538 Mbits/sec 178 sender
[ 5] 0.00-10.02 sec 638 MBytes 535 Mbits/sec receiver

iperf Done.