LinuxLists.cc - Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

2006-11-29 23:42:11

Subject: Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

On Wed, 29 Nov 2006 17:22:10 -0600
Wenji Wu <[email protected]> wrote:

> From: Wenji Wu <[email protected]>
>
> Greetings,
>
> For Linux TCP, when the network applcaiton make system call to move data
> from
> socket's receive buffer to user space by calling tcp_recvmsg(). The socket
> will
> be locked. During the period, all the incoming packet for the TCP socket
> will go
> to the backlog queue without being TCP processed. Since Linux 2.6 can be
> inerrupted mid-task, if the network application expires, and moved to the
> expired array with the socket locked, all the packets within the backlog
> queue
> will not be TCP processed till the network applicaton resume its execution.
> If
> the system is heavily loaded, TCP can easily RTO in the Sender Side.
>
> Attached is the detailed description of the problem and one possible
> solution.

Thanks. The attachment will be too large for the mailing-list servers so I
uploaded a copy to
http://userweb.kernel.org/~akpm/Linux-TCP-Bottleneck-Analysis-Report.pdf

>From a quick peek it appears that you're getting around 10% improvement in
TCP throughput, best case.

2006-11-30 06:35:10

by Ingo Molnar

[permalink] [raw]

Subject: Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

* Andrew Morton <[email protected]> wrote:

> > Attached is the detailed description of the problem and one possible
> > solution.
>
> Thanks. The attachment will be too large for the mailing-list servers
> so I uploaded a copy to
> http://userweb.kernel.org/~akpm/Linux-TCP-Bottleneck-Analysis-Report.pdf
>
> From a quick peek it appears that you're getting around 10%
> improvement in TCP throughput, best case.

Wenji, have you tried to renice the receiving task (to say nice -20) and
see how much TCP throughput you get in "background load of 10.0".
(similarly, you could also renice the background load tasks to nice +19
and/or set their scheduling policy to SCHED_BATCH)

as far as i can see, the numbers in the paper and the patch prove the
following two points:

- a task doing TCP receive with 10 other tasks running on the CPU will
see lower TCP throughput than if it had the CPU for itself alone.

- a patch that tweaks the scheduler to give the receiving task more
timeslices (i.e. raises its nice level in essence) results in ...
more timeslices, which results in higher receive numbers ...

so the most important thing to check would be, before any scheduler and
TCP code change is considered: if you give the task higher priority
/explicitly/, via nice -20, do the numbers improve? Similarly, if all
the other "background load" tasks are reniced to nice +19 (or their
policy is set to SCHED_BATCH), do you get a similar improvement?

Ingo

2006-12-19 18:21:13

by Stephen Hemminger

[permalink] [raw]

Subject: Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

I noticed this bit of discussion in tcp_recvmsg. It implies that a better
queuing policy would be good. But it is confusing English (Alexey?) so
not sure where to start.

> if (!sysctl_tcp_low_latency && tp->ucopy.task == user_recv) {
> /* Install new reader */
> if (!user_recv && !(flags & (MSG_TRUNC | MSG_PEEK))) {
> user_recv = current;
> tp->ucopy.task = user_recv;
> tp->ucopy.iov = msg->msg_iov;
> }
>
> tp->ucopy.len = len;
>
> BUG_TRAP(tp->copied_seq == tp->rcv_nxt ||
> (flags & (MSG_PEEK | MSG_TRUNC)));
>
> /* Ugly... If prequeue is not empty, we have to
> * process it before releasing socket, otherwise
> * order will be broken at second iteration.
> * More elegant solution is required!!!
> *
> * Look: we have the following (pseudo)queues:
> *
> * 1. packets in flight
> * 2. backlog
> * 3. prequeue
> * 4. receive_queue
> *
> * Each queue can be processed only if the next ones
> * are empty. At this point we have empty receive_queue.
> * But prequeue _can_ be not empty after 2nd iteration,
> * when we jumped to start of loop because backlog
> * processing added something to receive_queue.
> * We cannot release_sock(), because backlog contains
> * packets arrived _after_ prequeued ones.
> *
> * Shortly, algorithm is clear --- to process all
> * the queues in order. We could make it more directly,
> * requeueing packets from backlog to prequeue, if
> * is not empty. It is more elegant, but eats cycles,
> * unfortunately.
> */
> if (!skb_queue_empty(&tp->ucopy.prequeue))
> goto do_prequeue;
>
> /* __ Set realtime policy in scheduler __ */
> }
>
> if (copied >= target) {
> /* Do not sleep, just process backlog. */
> release_sock(sk);
> lock_sock(sk);
> } else
>

--
Stephen Hemminger <[email protected]>

2006-12-20 00:24:08

by Herbert Xu

[permalink] [raw]

Subject: Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

Stephen Hemminger <[email protected]> wrote:
> I noticed this bit of discussion in tcp_recvmsg. It implies that a better
> queuing policy would be good. But it is confusing English (Alexey?) so
> not sure where to start.

Actually I think the comment says that the current code isn't the
most elegant but is more efficient.

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2006-12-20 02:55:27

by David Miller

[permalink] [raw]

Subject: Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

From: Herbert Xu <[email protected]>
Date: Wed, 20 Dec 2006 10:52:19 +1100

> Stephen Hemminger <[email protected]> wrote:
> > I noticed this bit of discussion in tcp_recvmsg. It implies that a better
> > queuing policy would be good. But it is confusing English (Alexey?) so
> > not sure where to start.
>
> Actually I think the comment says that the current code isn't the
> most elegant but is more efficient.

It's just explaining the hierarchy of queues that need to
be purged, and in what order, for correctness.

Alexey added that code when I mentioned to him, right after
we added the prequeue, that it was possible process the
normal backlog before the prequeue, which is illegal.
In fixing that bug, he added the comment we are discussing.

2006-12-20 05:12:41

by Stephen Hemminger

[permalink] [raw]

Subject: Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

On Tue, 19 Dec 2006 18:55:25 -0800 (PST)
David Miller <[email protected]> wrote:

> From: Herbert Xu <[email protected]>
> Date: Wed, 20 Dec 2006 10:52:19 +1100
>
> > Stephen Hemminger <[email protected]> wrote:
> > > I noticed this bit of discussion in tcp_recvmsg. It implies that a better
> > > queuing policy would be good. But it is confusing English (Alexey?) so
> > > not sure where to start.
> >
> > Actually I think the comment says that the current code isn't the
> > most elegant but is more efficient.
>
> It's just explaining the hierarchy of queues that need to
> be purged, and in what order, for correctness.
>
> Alexey added that code when I mentioned to him, right after
> we added the prequeue, that it was possible process the
> normal backlog before the prequeue, which is illegal.
> In fixing that bug, he added the comment we are discussing.

It was the realtime/normal comments that piqued my interest.
Perhaps we should either tweak process priority or remove
the comments.

2006-12-20 05:16:21

by David Miller

[permalink] [raw]

Subject: Re: Bug 7596 - Potential performance bottleneck for Linxu TCP

From: Stephen Hemminger <[email protected]>
Date: Tue, 19 Dec 2006 21:11:24 -0800

> It was the realtime/normal comments that piqued my interest.
> Perhaps we should either tweak process priority or remove
> the comments.

I mentioned that to Linus once and he said the entire
idea was bogus.

With the recent tcp_recvmsg() preemption issue thread,
I agree with his sentiments even more than I did previously.

What needs to happen is to liberate the locking so that
input packet processing can occur in parallel with
tcp_recvmsg(), instead of doing this bogus backlog thing
which can wedge TCP ACK processing for an entire quantum
if we take a kernel preemption while the process has the
socket lock held.