2001-07-27 17:11:20

by Sridhar Samudrala

[permalink] [raw]
Subject: [PATCH] Inbound Connection Control mechanism: Prioritized Accept Queue

The following patch provides a mechanism called Prioritized Accept Queues(PAQ)
to prioritize incoming connection requests on a socket based on the source/dest
ip addreses and ports.

For example, this feature can be used to guarantee low delay and high throughput
to preferred clients on a web server by assigning higher priority to connection
requests whose source ip address matches the ip address of the preferred clients.
It can also be used on a server hosting multiple websites each identified by its
own ip address. In this case the prioritization can be done based on the
destination ip address of the connection requests.

The documentation on HOWTO use this patch and the test results which show an
improvement in connection rate for higher priority classes can be found at our
project website.
http://oss.software.ibm.com/qos

We would appreciate any comments or suggestions.

Thanks
Sridhar

---------------------------
Sridhar Samudrala
IBM Linux Technology Centre
[email protected]

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
diff -urN -X dontdiff linux-2.4.6/Documentation/Configure.help linux-2.4.6-paq/Documentation/Configure.help
--- linux-2.4.6/Documentation/Configure.help Mon Jul 2 14:07:55 2001
+++ linux-2.4.6-paq/Documentation/Configure.help Thu Jul 5 16:34:05 2001
@@ -1955,6 +1955,14 @@
If you want to compile it as a module, say M here and read
Documentation/modules.txt. If unsure, say `N'.

+Prioritized Accept Queue (EXPERIMENTAL)
+CONFIG_PRIO_ACCEPTQ
+ When enabled, this option allows you to set priorities to incoming
+ connection requests using the rules created by the iptables MARK target
+ option. The nfmark field set by the rules is used as a priority value
+ when the connection is added to accept queue. The priority value can
+ range between 0-7 with 0 being the highest priority and 7 the lowest.
+
Packet filtering
CONFIG_IP_NF_FILTER
Packet filtering defines a table `filter', which has a series of
diff -urN -X dontdiff linux-2.4.6/include/net/sock.h linux-2.4.6-paq/include/net/sock.h
--- linux-2.4.6/include/net/sock.h Tue Jul 3 15:44:12 2001
+++ linux-2.4.6-paq/include/net/sock.h Thu Jul 5 16:45:31 2001
@@ -239,6 +239,11 @@
#define pppoe_relay proto.pppoe.relay
#endif

+#ifdef CONFIG_PRIO_ACCEPTQ
+/* Priorities range from 0-7 */
+#define MAX_ACCEPTQ_PRIO 7
+#endif
+
/* This defines a selective acknowledgement block. */
struct tcp_sack_block {
__u32 start_seq;
@@ -409,7 +414,11 @@

/* FIFO of established children */
struct open_request *accept_queue;
+#ifdef CONFIG_PRIO_ACCEPTQ
+ struct open_request *accept_queue_tail[MAX_ACCEPTQ_PRIO];
+#else
struct open_request *accept_queue_tail;
+#endif

int write_pending; /* A write to socket waits to start. */

diff -urN -X dontdiff linux-2.4.6/include/net/tcp.h linux-2.4.6-paq/include/net/tcp.h
--- linux-2.4.6/include/net/tcp.h Tue Jul 3 15:44:20 2001
+++ linux-2.4.6-paq/include/net/tcp.h Thu Jul 5 16:49:18 2001
@@ -519,6 +519,9 @@
struct tcp_v6_open_req v6_req;
#endif
} af;
+#ifdef CONFIG_PRIO_ACCEPTQ
+ int acceptq_prio;
+#endif
};

/* SLAB cache for open requests. */
@@ -1566,10 +1569,33 @@
struct sock *child)
{
struct tcp_opt *tp = &sk->tp_pinfo.af_tcp;
+#ifdef CONFIG_PRIO_ACCEPTQ
+ int prio = req->acceptq_prio;
+ int prev_prio;
+#endif

req->sk = child;
tcp_acceptq_added(sk);

+#ifdef CONFIG_PRIO_ACCEPTQ
+ if (!tp->accept_queue_tail[prio]) {
+ for (prev_prio = prio - 1; prev_prio >= 0; prev_prio--)
+ if (tp->accept_queue_tail[prev_prio])
+ break;
+ tp->accept_queue_tail[prio] = req;
+ if (prev_prio >= 0) {
+ req->dl_next = tp->accept_queue_tail[prev_prio]->dl_next;
+ tp->accept_queue_tail[prev_prio]->dl_next = req;
+ } else {
+ req->dl_next = tp->accept_queue;
+ tp->accept_queue = req;
+ }
+ } else {
+ req->dl_next = tp->accept_queue_tail[prio]->dl_next;
+ tp->accept_queue_tail[prio]->dl_next = req;
+ tp->accept_queue_tail[prio] = req;
+ }
+#else
if (!tp->accept_queue_tail) {
tp->accept_queue = req;
} else {
@@ -1577,6 +1603,7 @@
}
tp->accept_queue_tail = req;
req->dl_next = NULL;
+#endif
}

struct tcp_listen_opt
@@ -1643,6 +1670,10 @@
struct tcp_opt *tp,
struct sk_buff *skb)
{
+#ifdef CONFIG_PRIO_ACCEPTQ
+ int nfmark = (int)skb->nfmark;
+#endif
+
req->rcv_wnd = 0; /* So that tcp_send_synack() knows! */
req->rcv_isn = TCP_SKB_CB(skb)->seq;
req->mss = tp->mss_clamp;
@@ -1654,6 +1685,9 @@
req->acked = 0;
req->ecn_ok = 0;
req->rmt_port = skb->h.th->source;
+#ifdef CONFIG_PRIO_ACCEPTQ
+ req->acceptq_prio = (nfmark < 0) ? 0 : ((nfmark > MAX_ACCEPTQ_PRIO) ? MAX_ACCEPTQ_PRIO : nfmark);
+#endif
}

#define TCP_MEM_QUANTUM ((int)PAGE_SIZE)
diff -urN -X dontdiff linux-2.4.6/net/ipv4/netfilter/Config.in linux-2.4.6-paq/net/ipv4/netfilter/Config.in
--- linux-2.4.6/net/ipv4/netfilter/Config.in Tue Mar 6 22:44:16 2001
+++ linux-2.4.6-paq/net/ipv4/netfilter/Config.in Thu Jul 5 16:34:05 2001
@@ -27,6 +27,7 @@
if [ "$CONFIG_EXPERIMENTAL" = "y" ]; then
dep_tristate ' Unclean match support (EXPERIMENTAL)' CONFIG_IP_NF_MATCH_UNCLEAN $CONFIG_IP_NF_IPTABLES
dep_tristate ' Owner match support (EXPERIMENTAL)' CONFIG_IP_NF_MATCH_OWNER $CONFIG_IP_NF_IPTABLES
+ bool ' Prioritized Accept Queues (EXPERIMENTAL)' CONFIG_PRIO_ACCEPTQ
fi
# The targets
dep_tristate ' Packet filtering' CONFIG_IP_NF_FILTER $CONFIG_IP_NF_IPTABLES
diff -urN -X dontdiff linux-2.4.6/net/ipv4/tcp.c linux-2.4.6-paq/net/ipv4/tcp.c
--- linux-2.4.6/net/ipv4/tcp.c Wed May 16 10:31:27 2001
+++ linux-2.4.6-paq/net/ipv4/tcp.c Thu Jul 5 16:34:05 2001
@@ -529,7 +529,12 @@

sk->max_ack_backlog = 0;
sk->ack_backlog = 0;
+#ifdef CONFIG_PRIO_ACCEPTQ
+ tp->accept_queue = NULL;
+ memset(tp->accept_queue_tail, 0, (sizeof(struct open_request *) * (MAX_ACCEPTQ_PRIO + 1)));
+#else
tp->accept_queue = tp->accept_queue_tail = NULL;
+#endif
tp->syn_wait_lock = RW_LOCK_UNLOCKED;
tcp_delack_init(tp);

@@ -588,7 +593,12 @@
write_lock_bh(&tp->syn_wait_lock);
tp->listen_opt =NULL;
write_unlock_bh(&tp->syn_wait_lock);
+#ifdef CONFIG_PRIO_ACCEPTQ
+ tp->accept_queue = NULL;
+ memset(tp->accept_queue_tail, 0, (sizeof(struct open_request *) * (MAX_ACCEPTQ_PRIO + 1)));
+#else
tp->accept_queue = tp->accept_queue_tail = NULL;
+#endif

if (lopt->qlen) {
for (i=0; i<TCP_SYNQ_HSIZE; i++) {
@@ -2109,6 +2119,9 @@
struct open_request *req;
struct sock *newsk;
int error;
+#ifdef CONFIG_PRIO_ACCEPTQ
+ int prio;
+#endif

lock_sock(sk);

@@ -2134,8 +2147,17 @@
}

req = tp->accept_queue;
+#ifdef CONFIG_PRIO_ACCEPTQ
+ tp->accept_queue = req->dl_next;
+ for (prio = 0; prio <= MAX_ACCEPTQ_PRIO; prio++)
+ if (req == tp->accept_queue_tail[prio]) {
+ tp->accept_queue_tail[prio] = NULL;
+ break;
+ }
+#else
if ((tp->accept_queue = req->dl_next) == NULL)
tp->accept_queue_tail = NULL;
+#endif

newsk = req->sk;
tcp_acceptq_removed(sk);
diff -urN -X dontdiff linux-2.4.6/net/ipv4/tcp_minisocks.c linux-2.4.6-paq/net/ipv4/tcp_minisocks.c
--- linux-2.4.6/net/ipv4/tcp_minisocks.c Thu Apr 12 12:11:39 2001
+++ linux-2.4.6-paq/net/ipv4/tcp_minisocks.c Thu Jul 5 16:34:05 2001
@@ -733,7 +733,12 @@
newtp->num_sacks = 0;
newtp->urg_data = 0;
newtp->listen_opt = NULL;
+#ifdef CONFIG_PRIO_ACCEPTQ
+ newtp->accept_queue = NULL;
+ memset(newtp->accept_queue_tail, 0, (sizeof(struct open_request *) * (MAX_ACCEPTQ_PRIO + 1)));
+#else
newtp->accept_queue = newtp->accept_queue_tail = NULL;
+#endif
/* Deinitialize syn_wait_lock to trap illegal accesses. */
memset(&newtp->syn_wait_lock, 0, sizeof(newtp->syn_wait_lock));
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


2001-07-27 17:24:39

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Inbound Connection Control mechanism: Prioritized Accept

> The documentation on HOWTO use this patch and the test results which show an
> improvement in connection rate for higher priority classes can be found at our
> project website.
> http://oss.software.ibm.com/qos
>
> We would appreciate any comments or suggestions.

Simple question.

How is this different from having a single userspace thread in your
application which accepts connections as they come in and then hands them
out in an order it chooses, if need be erorring and closing some ?

Alan

2001-07-27 18:01:50

by Sridhar Samudrala

[permalink] [raw]
Subject: Re: [PATCH] Inbound Connection Control mechanism: Prioritized Accept

There are couple of reasons why prioritization in kernel works better than at
user level.
* The kernel mechanisms are more efficient and scalable than the user space
mechanism. Non compliant connection requests are discarded earlier reducing the
queuing time of the compliant requests, in particular less CPU is consumed and
the context switch to userspace is avoided.
* Doing it in user space requires changes to existing applications which is not
always possible.

Thanks
Sridhar

On Fri, 27 Jul 2001, Alan Cox wrote:

> > The documentation on HOWTO use this patch and the test results which show an
> > improvement in connection rate for higher priority classes can be found at our
> > project website.
> > http://oss.software.ibm.com/qos
> >
> > We would appreciate any comments or suggestions.
>
> Simple question.
>
> How is this different from having a single userspace thread in your
> application which accepts connections as they come in and then hands them
> out in an order it chooses, if need be erorring and closing some ?
>
> Alan
>

2001-07-27 18:05:00

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] Inbound Connection Control mechanism: Prioritized Accept

Hello!

> How is this different from having a single userspace thread in your
> application which accepts connections as they come in and then hands them
> out in an order it chooses, if need be erorring and closing some ?

Seems, I can answer. Because closing some would break the service.

The idea is that when kernel accept queue is full we stop to
move open requests to established state and hence spurious
aborts are not generated. So, accepting cannot be artificially
speed up and extension of accept queue to user space is impossible.
The similar problem was open with TUX, which relays requests
to slow path. I do not know how Ingo solved it, by the way,
but it looked terrible: either massive socket leak (no limit on accept queue)
or massive aborts. :-)


Another question to author: missing prioritization of drops.
"Low priority" connections will clog accept queue, so that no room
for high priority connections remains. It is not good.
Any scheme with priority reserves some room for each high priority band
or does dropping based on priority.

Alexey

2001-07-27 18:06:50

by Alan

[permalink] [raw]
Subject: Re: [PATCH] Inbound Connection Control mechanism: Prioritized Accept

> There are couple of reasons why prioritization in kernel works better than at
> user level.
> * The kernel mechanisms are more efficient and scalable than the user space
> mechanism. Non compliant connection requests are discarded earlier reducing the
> queuing time of the compliant requests, in particular less CPU is consumed and
> the context switch to userspace is avoided.

Im not sure this is that true. I just added a user space implementation to
thttpd to favour one network range and close under load on others to keep
capacity there. Its a ten minute hack, and Im still seeing the same 1400
hits per second or so I was before.

Alan

2001-07-27 19:56:09

by Sridhar Samudrala

[permalink] [raw]
Subject: Re: [PATCH] Inbound Connection Control mechanism: Prioritized Accept

On Fri, 27 Jul 2001 [email protected] wrote:

> Hello!
>
> > How is this different from having a single userspace thread in your
> > application which accepts connections as they come in and then hands them
> > out in an order it chooses, if need be erorring and closing some ?
>
> Seems, I can answer. Because closing some would break the service.
>
> The idea is that when kernel accept queue is full we stop to
> move open requests to established state and hence spurious
> aborts are not generated. So, accepting cannot be artificially
> speed up and extension of accept queue to user space is impossible.
> The similar problem was open with TUX, which relays requests
> to slow path. I do not know how Ingo solved it, by the way,
> but it looked terrible: either massive socket leak (no limit on accept queue)
> or massive aborts. :-)
>
>
> Another question to author: missing prioritization of drops.
> "Low priority" connections will clog accept queue, so that no room
> for high priority connections remains. It is not good.
> Any scheme with priority reserves some room for each high priority band
> or does dropping based on priority.

Low priority connections can clog the accept queue only when there are no
high priority connection requests coming along. As soon as a slot becomes empty
in the accept queue, it becomes available for a high priority connection. This
should work fine when we are receiving a steady flow of low priority and high
priority connections. But as you said, we may have a problem when there is a
burst of low priority connections filling the accept queue followed by a burst of
high priority connections.

In our testing, we did not notice any starvation of higher priority connection
requests due to clogging of accept queue by low priority connections,
If that happens, TCP SYN policing can be employed to limit the rate of low
priority connections getting into accept queue.

Reserving room in the accept queue for each priority class may help higher
priority connections, but this may cause lower priority connections to get
dropped simply because there is no room for that class although there is room
for higher priority classes and there are no incoming higher priority
connections.

-Sridhar

>
> Alexey
>

2001-07-28 19:12:42

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] Inbound Connection Control mechanism: Prioritized Accept

Hello!

> Low priority connections can clog the accept queue only when there are no
> high priority connection requests coming along. As soon as a slot becomes empty
> in the accept queue, it becomes available for a high priority connection.

And in presence of persistent low priority traffic, high priority connection
will not have any chances to take this slot. When high priority connection
arrives all the slots are permanently busy with low ones.

> If that happens, TCP SYN policing can be employed to limit the rate of low
> priority connections getting into accept queue.

After this your patch is not required at all. :-)

All the effect is a bit better latency, not a big win.


> dropped simply because there is no room for that class although there is room
> for higher priority classes and there are no incoming higher priority
> connections.

ABC of resource control. If you have finite resource and want to give
better service to class A, you must reserve for it some bits of resource
or must be able to preempt other classes.

Alexey

2001-07-28 20:01:50

by Thiemo Voigt

[permalink] [raw]
Subject: Re: [PATCH] Inbound Connection Control mechanism: Prioritized Accept

[email protected] wrote:

> Hello!
>
> > Low priority connections can clog the accept queue only when there are no
> > high priority connection requests coming along. As soon as a slot becomes empty
> > in the accept queue, it becomes available for a high priority connection.
>
> And in presence of persistent low priority traffic, high priority connection
> will not have any chances to take this slot. When high priority connection
> arrives all the slots are permanently busy with low ones.
>
> > If that happens, TCP SYN policing can be employed to limit the rate of low
> > priority connections getting into accept queue.
>

The aim of TCP SYN policing is to prevent server overload by discarding
connection requests early when the server system is about to reach overload.
One of the indicators of overload might be that the accept queue is
close to being filled up, there is little CPU time etc.
In these cases, TCP SYN policing should adapt (i.e. lower) the acceptance rates.
In such an adaptive system, the accept queue is not supposed to be completely
filled, thus low priority connections are not able to starve high priority
connections. By the way, different acceptance rates can be given
to different priority classes.

A more detailed discussion than on the website can be found
in the paper "In-kernel mechanisms for adaptive control of
overloaded web servers", available at
http://wwwtgs.cs.utwente.nl/Docs/eunice/summerschool/papers/programme.html
This paper discusses TCP SYN policing and prioritized listen queue.


Cheers,
Thiemo


2001-07-29 16:26:17

by Alexey Kuznetsov

[permalink] [raw]
Subject: Re: [PATCH] Inbound Connection Control mechanism: Prioritized Accept

Hello!

> The aim of TCP SYN policing is to prevent server overload by discarding
> connection requests

Well, I alluded to this particularly. :-)

But if Sridhar meaned this saying about SYN policing, I would
prefer this, rather than bare prioritization, which is pretty
dubious when taken alone.

Alexey

2001-07-30 07:41:39

by Sridhar Samudrala

[permalink] [raw]
Subject: Re: [PATCH] Inbound Connection Control mechanism: Prioritized Accept

On Sun, 29 Jul 2001 [email protected] wrote:

> Hello!
>
> > The aim of TCP SYN policing is to prevent server overload by discarding
> > connection requests
>
> Well, I alluded to this particularly. :-)
>
> But if Sridhar meaned this saying about SYN policing, I would
> prefer this, rather than bare prioritization, which is pretty
> dubious when taken alone.

Alexey,

Yes. I also meant that in kernel prioritization of connections needs to be
complemented with SYN policing so that starvation of a particular class of
connections is avoided. We do mention this in our HOWTO for our patch.

I also agree with your suggestion that an enhancement to our patch can be
to reserve some slots for each class based on the priority and drop lower
priority connections even when accept queue is not full.
I am not sure how much overhead is involved in maintaining the the no. of
slots left for each priority class. Also what should be the ratio of slots
that need to reserved for each class?

Do you think that the existing PAQ patch with SYN policing is a reasonable
way for prioritizing incoming connection requests? Or will it be worthwhile
to enhance our patch to add dropping of connections based on priority.
Preempting existing low priority connections in acceptq with high priority
ones may not be good idea as we need to abort them by sending a RST.

Thanks
Sridhar
>
> Alexey
>