From: Menglong Dong <[email protected]>
Once tcp small queue check failed in tcp_small_queue_check(), the
throughput of tcp will be limited, and it's hard to distinguish
whether it is out of tcp congestion control.
Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
Signed-off-by: Menglong Dong <[email protected]>
---
v2:
- use NET_INC_STATS() instead of __NET_INC_STATS()
---
include/uapi/linux/snmp.h | 1 +
net/ipv4/proc.c | 1 +
net/ipv4/tcp_output.c | 5 ++++-
3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/snmp.h b/include/uapi/linux/snmp.h
index 904909d020e2..e32ec6932e82 100644
--- a/include/uapi/linux/snmp.h
+++ b/include/uapi/linux/snmp.h
@@ -292,6 +292,7 @@ enum
LINUX_MIB_TCPDSACKIGNOREDDUBIOUS, /* TCPDSACKIgnoredDubious */
LINUX_MIB_TCPMIGRATEREQSUCCESS, /* TCPMigrateReqSuccess */
LINUX_MIB_TCPMIGRATEREQFAILURE, /* TCPMigrateReqFailure */
+ LINUX_MIB_TCPSMALLQUEUEFAILURE, /* TCPSmallQueueFailure */
__LINUX_MIB_MAX
};
diff --git a/net/ipv4/proc.c b/net/ipv4/proc.c
index f30273afb539..43b7a77cd6b4 100644
--- a/net/ipv4/proc.c
+++ b/net/ipv4/proc.c
@@ -297,6 +297,7 @@ static const struct snmp_mib snmp4_net_list[] = {
SNMP_MIB_ITEM("TCPDSACKIgnoredDubious", LINUX_MIB_TCPDSACKIGNOREDDUBIOUS),
SNMP_MIB_ITEM("TCPMigrateReqSuccess", LINUX_MIB_TCPMIGRATEREQSUCCESS),
SNMP_MIB_ITEM("TCPMigrateReqFailure", LINUX_MIB_TCPMIGRATEREQFAILURE),
+ SNMP_MIB_ITEM("TCPSmallQueueFailure", LINUX_MIB_TCPSMALLQUEUEFAILURE),
SNMP_MIB_SENTINEL
};
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 2e6e5a70168e..835a556a597a 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2524,8 +2524,11 @@ static bool tcp_small_queue_check(struct sock *sk, const struct sk_buff *skb,
* test again the condition.
*/
smp_mb__after_atomic();
- if (refcount_read(&sk->sk_wmem_alloc) > limit)
+ if (refcount_read(&sk->sk_wmem_alloc) > limit) {
+ NET_INC_STATS(sock_net(sk),
+ LINUX_MIB_TCPSMALLQUEUEFAILURE);
return true;
+ }
}
return false;
}
--
2.30.2
Hello:
This patch was applied to netdev/net-next.git (master)
by David S. Miller <[email protected]>:
On Sun, 28 Nov 2021 14:01:02 +0800 you wrote:
> From: Menglong Dong <[email protected]>
>
> Once tcp small queue check failed in tcp_small_queue_check(), the
> throughput of tcp will be limited, and it's hard to distinguish
> whether it is out of tcp congestion control.
>
> Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
>
> [...]
Here is the summary with links:
- [v2,net-next] net: snmp: add statistics for tcp small queue check
https://git.kernel.org/netdev/net-next/c/aeeecb889165
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html
On Sun, 28 Nov 2021 14:01:02 +0800 [email protected] wrote:
> Once tcp small queue check failed in tcp_small_queue_check(), the
> throughput of tcp will be limited, and it's hard to distinguish
> whether it is out of tcp congestion control.
>
> Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
Isn't this going to trigger all the time and alarm users because of the
"Failure" in the TCPSmallQueueFailure name? Isn't it perfectly fine
for TCP to bake full TSQ amount of data and have it paced out onto the
wire? What's your link speed?
On Mon, Nov 29, 2021 at 7:57 AM Jakub Kicinski <[email protected]> wrote:
>
> On Sun, 28 Nov 2021 14:01:02 +0800 [email protected] wrote:
> > Once tcp small queue check failed in tcp_small_queue_check(), the
> > throughput of tcp will be limited, and it's hard to distinguish
> > whether it is out of tcp congestion control.
> >
> > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
>
> Isn't this going to trigger all the time and alarm users because of the
> "Failure" in the TCPSmallQueueFailure name? Isn't it perfectly fine
> for TCP to bake full TSQ amount of data and have it paced out onto the
> wire? What's your link speed?
Yes, I would be curious to have some instructions on how this new SNMP
variable can be used,
in a concrete case.
Like, how getting these SNMP values can translate to an action, giving
more throughput ?
On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <[email protected]> wrote:
>
> On Sun, 28 Nov 2021 14:01:02 +0800 [email protected] wrote:
> > Once tcp small queue check failed in tcp_small_queue_check(), the
> > throughput of tcp will be limited, and it's hard to distinguish
> > whether it is out of tcp congestion control.
> >
> > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
>
> Isn't this going to trigger all the time and alarm users because of the
> "Failure" in the TCPSmallQueueFailure name? Isn't it perfectly fine
> for TCP to bake full TSQ amount of data and have it paced out onto the
> wire? What's your link speed?
Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net
is used with napi_tx enabled.
With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net,
until it is released. The point is that the sending interrupt of
virtio_net will be
turned off and the skb can't be released until the next net_rx interrupt comes.
So, wmem_alloc can't decrease on time, and the bandwidth is limited. When
this happens, the bandwidth can decrease from 500M to 10M.
In fact, this issue of uapi_tx is fixed in this commit:
https://lore.kernel.org/lkml/[email protected]/
I added this statistics to monitor the sending failure (may be called
sending delay)
caused by qdisc and net_device. When something happen, maybe users can
raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth.
Thanks!
Menglong Dong
On Tue, 30 Nov 2021 22:36:59 +0800 Menglong Dong wrote:
> On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <[email protected]> wrote:
> >
> > On Sun, 28 Nov 2021 14:01:02 +0800 [email protected] wrote:
> > > Once tcp small queue check failed in tcp_small_queue_check(), the
> > > throughput of tcp will be limited, and it's hard to distinguish
> > > whether it is out of tcp congestion control.
> > >
> > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
> >
> > Isn't this going to trigger all the time and alarm users because of the
> > "Failure" in the TCPSmallQueueFailure name? Isn't it perfectly fine
> > for TCP to bake full TSQ amount of data and have it paced out onto the
> > wire? What's your link speed?
>
> Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net
> is used with napi_tx enabled.
>
> With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net,
> until it is released. The point is that the sending interrupt of
> virtio_net will be
> turned off and the skb can't be released until the next net_rx interrupt comes.
> So, wmem_alloc can't decrease on time, and the bandwidth is limited. When
> this happens, the bandwidth can decrease from 500M to 10M.
>
> In fact, this issue of uapi_tx is fixed in this commit:
> https://lore.kernel.org/lkml/[email protected]/
>
> I added this statistics to monitor the sending failure (may be called
> sending delay)
> caused by qdisc and net_device. When something happen, maybe users can
> raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth.
Sounds very second-order and particular to a buggy driver :/
Let's see what Eric says but I vote revert.
On Tue, Nov 30, 2021 at 7:23 AM Jakub Kicinski <[email protected]> wrote:
>
> On Tue, 30 Nov 2021 22:36:59 +0800 Menglong Dong wrote:
> > On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <[email protected]> wrote:
> > >
> > > On Sun, 28 Nov 2021 14:01:02 +0800 [email protected] wrote:
> > > > Once tcp small queue check failed in tcp_small_queue_check(), the
> > > > throughput of tcp will be limited, and it's hard to distinguish
> > > > whether it is out of tcp congestion control.
> > > >
> > > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
> > >
> > > Isn't this going to trigger all the time and alarm users because of the
> > > "Failure" in the TCPSmallQueueFailure name? Isn't it perfectly fine
> > > for TCP to bake full TSQ amount of data and have it paced out onto the
> > > wire? What's your link speed?
> >
> > Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net
> > is used with napi_tx enabled.
> >
> > With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net,
> > until it is released. The point is that the sending interrupt of
> > virtio_net will be
> > turned off and the skb can't be released until the next net_rx interrupt comes.
> > So, wmem_alloc can't decrease on time, and the bandwidth is limited. When
> > this happens, the bandwidth can decrease from 500M to 10M.
> >
> > In fact, this issue of uapi_tx is fixed in this commit:
> > https://lore.kernel.org/lkml/[email protected]/
> >
> > I added this statistics to monitor the sending failure (may be called
> > sending delay)
> > caused by qdisc and net_device. When something happen, maybe users can
> > raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth.
>
> Sounds very second-order and particular to a buggy driver :/
> Let's see what Eric says but I vote revert.
I did some tests yesterday, using one high speed TCP_STREAM (~90Gbit),
and got plenty of increments when using pfifo_fast qdisc.
Yet seed was nominal (bottleneck is the copyout() cost at receiver)
I got few counter increments when qdisc is fq as expected
(because of commit c73e5807e4f6fc6d "tcp: tsq: no longer use
limit_output_bytes for paced flows")
So this new SNMP counter is not a proxy for the kind of problems that a buggy
driver would trigger.
I also suggest we revert this patch.
Thanks !
On Tue, Nov 30, 2021 at 11:56 PM Eric Dumazet <[email protected]> wrote:
>
> On Tue, Nov 30, 2021 at 7:23 AM Jakub Kicinski <[email protected]> wrote:
> >
> > On Tue, 30 Nov 2021 22:36:59 +0800 Menglong Dong wrote:
> > > On Mon, Nov 29, 2021 at 11:57 PM Jakub Kicinski <[email protected]> wrote:
> > > >
> > > > On Sun, 28 Nov 2021 14:01:02 +0800 [email protected] wrote:
> > > > > Once tcp small queue check failed in tcp_small_queue_check(), the
> > > > > throughput of tcp will be limited, and it's hard to distinguish
> > > > > whether it is out of tcp congestion control.
> > > > >
> > > > > Add statistics of LINUX_MIB_TCPSMALLQUEUEFAILURE for this scene.
> > > >
> > > > Isn't this going to trigger all the time and alarm users because of the
> > > > "Failure" in the TCPSmallQueueFailure name? Isn't it perfectly fine
> > > > for TCP to bake full TSQ amount of data and have it paced out onto the
> > > > wire? What's your link speed?
> > >
> > > Well, it's a little complex. In my case, there is a guest in kvm, and virtio_net
> > > is used with napi_tx enabled.
> > >
> > > With napi_tx enabled, skb won't be orphaned after it is passed to virtio_net,
> > > until it is released. The point is that the sending interrupt of
> > > virtio_net will be
> > > turned off and the skb can't be released until the next net_rx interrupt comes.
> > > So, wmem_alloc can't decrease on time, and the bandwidth is limited. When
> > > this happens, the bandwidth can decrease from 500M to 10M.
> > >
> > > In fact, this issue of uapi_tx is fixed in this commit:
> > > https://lore.kernel.org/lkml/[email protected]/
> > >
> > > I added this statistics to monitor the sending failure (may be called
> > > sending delay)
> > > caused by qdisc and net_device. When something happen, maybe users can
> > > raise ‘/proc/sys/net/ipv4/tcp_pacing_ss_ratio’ to get better bandwidth.
> >
> > Sounds very second-order and particular to a buggy driver :/
> > Let's see what Eric says but I vote revert.
>
> I did some tests yesterday, using one high speed TCP_STREAM (~90Gbit),
> and got plenty of increments when using pfifo_fast qdisc.
> Yet seed was nominal (bottleneck is the copyout() cost at receiver)
>
> I got few counter increments when qdisc is fq as expected
> (because of commit c73e5807e4f6fc6d "tcp: tsq: no longer use
> limit_output_bytes for paced flows")
>
>
> So this new SNMP counter is not a proxy for the kind of problems that a buggy
> driver would trigger.
>
> I also suggest we revert this patch.
Seems this SNMP is indeed not suitable....I vote to revert too.
(Seems Jakub already do the revert, thanks~)
In fact, I'm a little curious that this patch is applied directly. I
used to receive
emails with the message 'applied'.
>
> Thanks !