Let's skip stuff which is smaller than 100 bytes.
Size Uses Wasted Name and definition
===== ==== ====== ================================================
58 20 722 __skb_dequeue include/linux/skbuff.h
122 119 12036 skb_dequeue include/linux/skbuff.h
103 10 747 __skb_queue_purge include/linux/skbuff.h
164 78 11088 skb_queue_purge include/linux/skbuff.h
46 26 650 __skb_queue_tail include/linux/skbuff.h
109 101 8900 skb_queue_tail include/linux/skbuff.h
47 11 270 __skb_queue_head include/linux/skbuff.h
110 44 3870 skb_queue_head include/linux/skbuff.h
51 16 465 __skb_unlink include/linux/skbuff.h
132 9 896 skb_unlink include/linux/skbuff.h
46 2 26 __skb_append include/linux/skbuff.h
114 5 376 skb_append include/linux/skbuff.h
'Uses' was counted over .c files only, .h were excluded, so
no double count here.
As you can see here, non-locking functions are more or less sanely
sized (except __skb_queue_purge). When spin_lock/unlock code gets added,
we cross 100 byte mark.
What shall be done with this? I'll make patch to move locking functions
into net/core/skbuff.c unless there is some reason not to do it.
Just to make you feel what is 100+ bytes in assembly:
void inline_skb_dequeue_1() { skb_dequeqe(0); }
gets compiled into:
00003513 <inline_skb_dequeue_1>:
3513: 55 push %ebp
3514: 89 e5 mov %esp,%ebp
3516: 53 push %ebx
3517: 9c pushf
3518: 5b pop %ebx
3519: fa cli
351a: b9 00 e0 ff ff mov $0xffffe000,%ecx
351f: 21 e1 and %esp,%ecx
3521: ff 41 14 incl 0x14(%ecx)
3524: f0 fe 0d 0c 00 00 00 lock decb 0xc
352b: 78 02 js 352f <inline_skb_dequeue_1+0x1c>
352d: eb 0d jmp 353c <inline_skb_dequeue_1+0x29>
352f: f3 90 repz nop
3531: 80 3d 0c 00 00 00 00 cmpb $0x0,0xc
3538: 7e f5 jle 352f <inline_skb_dequeue_1+0x1c>
353a: eb e8 jmp 3524 <inline_skb_dequeue_1+0x11>
353c: 8b 15 00 00 00 00 mov 0x0,%edx
3542: 85 d2 test %edx,%edx
3544: 74 2b je 3571 <inline_skb_dequeue_1+0x5e>
3546: 89 d0 mov %edx,%eax
3548: 8b 12 mov (%edx),%edx
354a: 89 15 00 00 00 00 mov %edx,0x0
3550: ff 0d 08 00 00 00 decl 0x8
3556: c7 42 04 00 00 00 00 movl $0x0,0x4(%edx)
355d: c7 00 00 00 00 00 movl $0x0,(%eax)
3563: c7 40 04 00 00 00 00 movl $0x0,0x4(%eax)
356a: c7 40 08 00 00 00 00 movl $0x0,0x8(%eax)
3571: b0 01 mov $0x1,%al
3573: 86 05 0c 00 00 00 xchg %al,0xc
3579: 53 push %ebx
357a: 9d popf
357b: 8b 41 08 mov 0x8(%ecx),%eax
357e: ff 49 14 decl 0x14(%ecx)
3581: a8 08 test $0x8,%al
3583: 74 05 je 358a <inline_skb_dequeue_1+0x77>
3585: e8 fc ff ff ff call 3586 <inline_skb_dequeue_1+0x73>
358a: 5b pop %ebx
358b: c9 leave
358c: c3 ret
--
vda
On Wed, 21 Apr 2004, Denis Vlasenko wrote:
> What shall be done with this? I'll make patch to move locking functions
> into net/core/skbuff.c unless there is some reason not to do it.
How will these changes impact performance? I asked this last time you
posted about inlines and didn't see any response.
- James
--
James Morris
<[email protected]>
James Morris <[email protected]> writes:
> On Wed, 21 Apr 2004, Denis Vlasenko wrote:
>
>> What shall be done with this? I'll make patch to move locking functions
>> into net/core/skbuff.c unless there is some reason not to do it.
>
> How will these changes impact performance? I asked this last time you
> posted about inlines and didn't see any response.
I don't think it will be an issue. The optimization guidelines
of AMD and Intel recommend to move functions that generate
more than 30-40 instructions out of line. 100 instructions
is certainly enough to amortize the call overhead, and you
safe some icache too so it may be even faster.
-Andi
On Thu, 22 Apr 2004, Andi Kleen wrote:
> > How will these changes impact performance? I asked this last time you
> > posted about inlines and didn't see any response.
>
> I don't think it will be an issue. The optimization guidelines
> of AMD and Intel recommend to move functions that generate
> more than 30-40 instructions out of line. 100 instructions
> is certainly enough to amortize the call overhead, and you
> safe some icache too so it may be even faster.
Of course, but it would be good to see some measurements.
- James
--
James Morris
<[email protected]>
James Morris wrote:
>On Thu, 22 Apr 2004, Andi Kleen wrote:
>
>
>
>>>How will these changes impact performance? I asked this last time you
>>>posted about inlines and didn't see any response.
>>>
>>>
>>I don't think it will be an issue. The optimization guidelines
>>of AMD and Intel recommend to move functions that generate
>>more than 30-40 instructions out of line. 100 instructions
>>is certainly enough to amortize the call overhead, and you
>>safe some icache too so it may be even faster.
>>
>>
>
>Of course, but it would be good to see some measurements.
>
>
>- James
>
>
It depends a lot of the workload of the machine.
If this a specialized machine, with a small program that mostly uses
recv() & send() syscalls, then, inlining functions is a gain, since
icache may have a 100% hit ratio. Optimization guidelines are good for
the common cases, not every cases.
Eric Dumazet
On Thu, 22 Apr 2004 17:56:46 +0300
Denis Vlasenko <[email protected]> wrote:
> On Thursday 22 April 2004 03:47, James Morris wrote:
> > On Wed, 21 Apr 2004, Denis Vlasenko wrote:
> > > What shall be done with this? I'll make patch to move locking functions
> > > into net/core/skbuff.c unless there is some reason not to do it.
> >
> > How will these changes impact performance? I asked this last time you
> > posted about inlines and didn't see any response.
>
> Hard to say. We will lose ~2% to extra call/return instructions
> but will gain 40kb of icache space.
He's asking you to test and quantify the performance changes that
occur as a result of this patch, not to figure it out via calculations
in your head :-)
Denis Vlasenko wrote:
>
>>
>>If this a specialized machine, with a small program that mostly uses
>>recv() & send() syscalls, then, inlining functions is a gain, since
>>icache may have a 100% hit ratio. Optimization guidelines are good for
>>the common cases, not every cases.
>>
>>
>
>And if it is NOT a specialized machine? icache miss will nullify
>any speed advantages, while you still pay space penalty.
>We don't need to pay for at least ~250 kbytes wasted overall
>in allyesconfig kernel for each and every specialized
>setup conceivable, IMHO.
>
>
The point is : if this IS a specialized machine, then the kernel is
custom one, not allyesconfig.
This is imho what I do for specialized machines, and yes, I even inline
some specific functions, like fget() and others.
But I didnt asked to not doing the un-inlining, I was just reminding
that some guys (like me) are doing micro-optimizations that 'seem' to go
against Optimizations guidelines from intel or AMD.
Eric
"David S. Miller" <[email protected]> writes:
>
> He's asking you to test and quantify the performance changes that
> occur as a result of this patch, not to figure it out via calculations
> in your head :-)
I bet it will be completely unnoticeable in macrobenchmarks.
The only way to measure a difference would be likely to use
some unrealistic microbenchmark (program that calls this in a tight loop)
-Andi
On Thu, 22 Apr 2004 22:39:34 +0300
Denis Vlasenko <[email protected]> wrote:
> Okay. I am willing do it. Hmm...
> Which type of test will highlight the difference?
> I don't have gigabit network to play with. Any hints of how can I measure
> it on 100mbit?
Get someone to run specweb before and after your changes.
On Thu, Apr 22, 2004 at 11:15:47AM -0400, James Morris wrote:
> On Thu, 22 Apr 2004, Andi Kleen wrote:
>
> > > How will these changes impact performance? I asked this last time you
> > > posted about inlines and didn't see any response.
> >
> > I don't think it will be an issue. The optimization guidelines
> > of AMD and Intel recommend to move functions that generate
> > more than 30-40 instructions out of line. 100 instructions
> > is certainly enough to amortize the call overhead, and you
> > safe some icache too so it may be even faster.
>
> Of course, but it would be good to see some measurements.
It's useless in this case. networking is dominated by cache misses
and locks and a modern CPU can do hundreds of function calls in a
single cache miss.
-Andi
On 23 Apr 2004 00:23:26 +0200
Andi Kleen <[email protected]> wrote:
> > Of course, but it would be good to see some measurements.
>
> It's useless in this case. networking is dominated by cache misses
> and locks and a modern CPU can do hundreds of function calls in a
> single cache miss.
Indeed, this change does influence the I-cache footprint of the
networking, so I conclude that it is relevant in this case.
On Thu, Apr 22, 2004 at 03:25:25PM -0700, David S. Miller wrote:
> On 23 Apr 2004 00:23:26 +0200
> Andi Kleen <[email protected]> wrote:
>
> > > Of course, but it would be good to see some measurements.
> >
> > It's useless in this case. networking is dominated by cache misses
> > and locks and a modern CPU can do hundreds of function calls in a
> > single cache miss.
>
> Indeed, this change does influence the I-cache footprint of the
> networking, so I conclude that it is relevant in this case.
All it does it is to lower the cache foot print, not enlarge it.
So even if it made a difference it could only get better.
It's extremly unlikely to make it detectable worse.
-Andi
On 23 Apr 2004 00:34:01 +0200
Andi Kleen <[email protected]> wrote:
> All it does it is to lower the cache foot print, not enlarge it.
> So even if it made a difference it could only get better.
> It's extremly unlikely to make it detectable worse.
That's a good argument. I would still like to see some
numbers, you know to help me sleep at night :-)
This takes the suggestion and makes all the locked skb_ stuff, not inline.
It showed a 3% performance improvement when doing single TCP stream over 1G
Ethernet between SMP machines. Test was average of 10 iterations of
iperf for 30 seconds and SUT was 4 way Xeon. Http performance difference
was not noticeable (less than the std. deviation of the test runs).
diff -Nru a/include/linux/skbuff.h b/include/linux/skbuff.h
--- a/include/linux/skbuff.h Tue Apr 27 10:40:45 2004
+++ b/include/linux/skbuff.h Tue Apr 27 10:40:45 2004
@@ -511,6 +511,7 @@
*
* A buffer cannot be placed on two lists at the same time.
*/
+extern void skb_queue_head(struct sk_buff_head *list, struct sk_buff *newsk);
static inline void __skb_queue_head(struct sk_buff_head *list,
struct sk_buff *newsk)
{
@@ -525,28 +526,6 @@
next->prev = prev->next = newsk;
}
-
-/**
- * skb_queue_head - queue a buffer at the list head
- * @list: list to use
- * @newsk: buffer to queue
- *
- * Queue a buffer at the start of the list. This function takes the
- * list lock and can be used safely with other locking &sk_buff functions
- * safely.
- *
- * A buffer cannot be placed on two lists at the same time.
- */
-static inline void skb_queue_head(struct sk_buff_head *list,
- struct sk_buff *newsk)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&list->lock, flags);
- __skb_queue_head(list, newsk);
- spin_unlock_irqrestore(&list->lock, flags);
-}
-
/**
* __skb_queue_tail - queue a buffer at the list tail
* @list: list to use
@@ -557,6 +536,7 @@
*
* A buffer cannot be placed on two lists at the same time.
*/
+extern void skb_queue_tail(struct sk_buff_head *list, struct sk_buff *newsk);
static inline void __skb_queue_tail(struct sk_buff_head *list,
struct sk_buff *newsk)
{
@@ -571,26 +551,6 @@
next->prev = prev->next = newsk;
}
-/**
- * skb_queue_tail - queue a buffer at the list tail
- * @list: list to use
- * @newsk: buffer to queue
- *
- * Queue a buffer at the tail of the list. This function takes the
- * list lock and can be used safely with other locking &sk_buff functions
- * safely.
- *
- * A buffer cannot be placed on two lists at the same time.
- */
-static inline void skb_queue_tail(struct sk_buff_head *list,
- struct sk_buff *newsk)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&list->lock, flags);
- __skb_queue_tail(list, newsk);
- spin_unlock_irqrestore(&list->lock, flags);
-}
/**
* __skb_dequeue - remove from the head of the queue
@@ -600,6 +560,7 @@
* so must be used with appropriate locks held only. The head item is
* returned or %NULL if the list is empty.
*/
+extern struct sk_buff *skb_dequeue(struct sk_buff_head *list);
static inline struct sk_buff *__skb_dequeue(struct sk_buff_head *list)
{
struct sk_buff *next, *prev, *result;
@@ -619,30 +580,11 @@
return result;
}
-/**
- * skb_dequeue - remove from the head of the queue
- * @list: list to dequeue from
- *
- * Remove the head of the list. The list lock is taken so the function
- * may be used safely with other locking list functions. The head item is
- * returned or %NULL if the list is empty.
- */
-
-static inline struct sk_buff *skb_dequeue(struct sk_buff_head *list)
-{
- unsigned long flags;
- struct sk_buff *result;
-
- spin_lock_irqsave(&list->lock, flags);
- result = __skb_dequeue(list);
- spin_unlock_irqrestore(&list->lock, flags);
- return result;
-}
/*
* Insert a packet on a list.
*/
-
+extern void skb_insert(struct sk_buff *old, struct sk_buff *newsk);
static inline void __skb_insert(struct sk_buff *newsk,
struct sk_buff *prev, struct sk_buff *next,
struct sk_buff_head *list)
@@ -654,58 +596,20 @@
list->qlen++;
}
-/**
- * skb_insert - insert a buffer
- * @old: buffer to insert before
- * @newsk: buffer to insert
- *
- * Place a packet before a given packet in a list. The list locks are taken
- * and this function is atomic with respect to other list locked calls
- * A buffer cannot be placed on two lists at the same time.
- */
-
-static inline void skb_insert(struct sk_buff *old, struct sk_buff *newsk)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&old->list->lock, flags);
- __skb_insert(newsk, old->prev, old, old->list);
- spin_unlock_irqrestore(&old->list->lock, flags);
-}
-
/*
* Place a packet after a given packet in a list.
*/
-
+extern void skb_append(struct sk_buff *old, struct sk_buff *newsk);
static inline void __skb_append(struct sk_buff *old, struct sk_buff *newsk)
{
__skb_insert(newsk, old, old->next, old->list);
}
-/**
- * skb_append - append a buffer
- * @old: buffer to insert after
- * @newsk: buffer to insert
- *
- * Place a packet after a given packet in a list. The list locks are taken
- * and this function is atomic with respect to other list locked calls.
- * A buffer cannot be placed on two lists at the same time.
- */
-
-
-static inline void skb_append(struct sk_buff *old, struct sk_buff *newsk)
-{
- unsigned long flags;
-
- spin_lock_irqsave(&old->list->lock, flags);
- __skb_append(old, newsk);
- spin_unlock_irqrestore(&old->list->lock, flags);
-}
-
/*
* remove sk_buff from list. _Must_ be called atomically, and with
* the list known..
*/
+extern void skb_unlink(struct sk_buff *skb);
static inline void __skb_unlink(struct sk_buff *skb, struct sk_buff_head *list)
{
struct sk_buff *next, *prev;
@@ -719,31 +623,6 @@
prev->next = next;
}
-/**
- * skb_unlink - remove a buffer from a list
- * @skb: buffer to remove
- *
- * Place a packet after a given packet in a list. The list locks are taken
- * and this function is atomic with respect to other list locked calls
- *
- * Works even without knowing the list it is sitting on, which can be
- * handy at times. It also means that THE LIST MUST EXIST when you
- * unlink. Thus a list must have its contents unlinked before it is
- * destroyed.
- */
-static inline void skb_unlink(struct sk_buff *skb)
-{
- struct sk_buff_head *list = skb->list;
-
- if (list) {
- unsigned long flags;
-
- spin_lock_irqsave(&list->lock, flags);
- if (skb->list == list)
- __skb_unlink(skb, skb->list);
- spin_unlock_irqrestore(&list->lock, flags);
- }
-}
/* XXX: more streamlined implementation */
@@ -755,6 +634,7 @@
* so must be used with appropriate locks held only. The tail item is
* returned or %NULL if the list is empty.
*/
+extern struct sk_buff *skb_dequeue_tail(struct sk_buff_head *list);
static inline struct sk_buff *__skb_dequeue_tail(struct sk_buff_head *list)
{
struct sk_buff *skb = skb_peek_tail(list);
@@ -763,24 +643,6 @@
return skb;
}
-/**
- * skb_dequeue_tail - remove from the tail of the queue
- * @list: list to dequeue from
- *
- * Remove the tail of the list. The list lock is taken so the function
- * may be used safely with other locking list functions. The tail item is
- * returned or %NULL if the list is empty.
- */
-static inline struct sk_buff *skb_dequeue_tail(struct sk_buff_head *list)
-{
- unsigned long flags;
- struct sk_buff *result;
-
- spin_lock_irqsave(&list->lock, flags);
- result = __skb_dequeue_tail(list);
- spin_unlock_irqrestore(&list->lock, flags);
- return result;
-}
static inline int skb_is_nonlinear(const struct sk_buff *skb)
{
@@ -1012,21 +874,6 @@
}
/**
- * skb_queue_purge - empty a list
- * @list: list to empty
- *
- * Delete all buffers on an &sk_buff list. Each buffer is removed from
- * the list and one reference dropped. This function takes the list
- * lock and is atomic with respect to other list locking functions.
- */
-static inline void skb_queue_purge(struct sk_buff_head *list)
-{
- struct sk_buff *skb;
- while ((skb = skb_dequeue(list)) != NULL)
- kfree_skb(skb);
-}
-
-/**
* __skb_queue_purge - empty a list
* @list: list to empty
*
@@ -1034,6 +881,7 @@
* the list and one reference dropped. This function does not take the
* list lock and the caller must hold the relevant locks to use it.
*/
+extern void skb_queue_purge(struct sk_buff_head *list);
static inline void __skb_queue_purge(struct sk_buff_head *list)
{
struct sk_buff *skb;
diff -Nru a/net/core/skbuff.c b/net/core/skbuff.c
--- a/net/core/skbuff.c Tue Apr 27 10:40:45 2004
+++ b/net/core/skbuff.c Tue Apr 27 10:40:45 2004
@@ -1091,6 +1091,165 @@
}
}
+/**
+ * skb_dequeue - remove from the head of the queue
+ * @list: list to dequeue from
+ *
+ * Remove the head of the list. The list lock is taken so the function
+ * may be used safely with other locking list functions. The head item is
+ * returned or %NULL if the list is empty.
+ */
+
+struct sk_buff *skb_dequeue(struct sk_buff_head *list)
+{
+ unsigned long flags;
+ struct sk_buff *result;
+
+ spin_lock_irqsave(&list->lock, flags);
+ result = __skb_dequeue(list);
+ spin_unlock_irqrestore(&list->lock, flags);
+ return result;
+}
+
+/**
+ * skb_dequeue_tail - remove from the tail of the queue
+ * @list: list to dequeue from
+ *
+ * Remove the tail of the list. The list lock is taken so the function
+ * may be used safely with other locking list functions. The tail item is
+ * returned or %NULL if the list is empty.
+ */
+struct sk_buff *skb_dequeue_tail(struct sk_buff_head *list)
+{
+ unsigned long flags;
+ struct sk_buff *result;
+
+ spin_lock_irqsave(&list->lock, flags);
+ result = __skb_dequeue_tail(list);
+ spin_unlock_irqrestore(&list->lock, flags);
+ return result;
+}
+
+/**
+ * skb_queue_purge - empty a list
+ * @list: list to empty
+ *
+ * Delete all buffers on an &sk_buff list. Each buffer is removed from
+ * the list and one reference dropped. This function takes the list
+ * lock and is atomic with respect to other list locking functions.
+ */
+void skb_queue_purge(struct sk_buff_head *list)
+{
+ struct sk_buff *skb;
+ while ((skb = skb_dequeue(list)) != NULL)
+ kfree_skb(skb);
+}
+
+/**
+ * skb_queue_head - queue a buffer at the list head
+ * @list: list to use
+ * @newsk: buffer to queue
+ *
+ * Queue a buffer at the start of the list. This function takes the
+ * list lock and can be used safely with other locking &sk_buff functions
+ * safely.
+ *
+ * A buffer cannot be placed on two lists at the same time.
+ */
+void skb_queue_head(struct sk_buff_head *list, struct sk_buff *newsk)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&list->lock, flags);
+ __skb_queue_head(list, newsk);
+ spin_unlock_irqrestore(&list->lock, flags);
+}
+
+/**
+ * skb_queue_tail - queue a buffer at the list tail
+ * @list: list to use
+ * @newsk: buffer to queue
+ *
+ * Queue a buffer at the tail of the list. This function takes the
+ * list lock and can be used safely with other locking &sk_buff functions
+ * safely.
+ *
+ * A buffer cannot be placed on two lists at the same time.
+ */
+void skb_queue_tail(struct sk_buff_head *list, struct sk_buff *newsk)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&list->lock, flags);
+ __skb_queue_tail(list, newsk);
+ spin_unlock_irqrestore(&list->lock, flags);
+}
+/**
+ * skb_unlink - remove a buffer from a list
+ * @skb: buffer to remove
+ *
+ * Place a packet after a given packet in a list. The list locks are taken
+ * and this function is atomic with respect to other list locked calls
+ *
+ * Works even without knowing the list it is sitting on, which can be
+ * handy at times. It also means that THE LIST MUST EXIST when you
+ * unlink. Thus a list must have its contents unlinked before it is
+ * destroyed.
+ */
+void skb_unlink(struct sk_buff *skb)
+{
+ struct sk_buff_head *list = skb->list;
+
+ if (list) {
+ unsigned long flags;
+
+ spin_lock_irqsave(&list->lock, flags);
+ if (skb->list == list)
+ __skb_unlink(skb, skb->list);
+ spin_unlock_irqrestore(&list->lock, flags);
+ }
+}
+
+
+/**
+ * skb_append - append a buffer
+ * @old: buffer to insert after
+ * @newsk: buffer to insert
+ *
+ * Place a packet after a given packet in a list. The list locks are taken
+ * and this function is atomic with respect to other list locked calls.
+ * A buffer cannot be placed on two lists at the same time.
+ */
+
+void skb_append(struct sk_buff *old, struct sk_buff *newsk)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&old->list->lock, flags);
+ __skb_append(old, newsk);
+ spin_unlock_irqrestore(&old->list->lock, flags);
+}
+
+
+/**
+ * skb_insert - insert a buffer
+ * @old: buffer to insert before
+ * @newsk: buffer to insert
+ *
+ * Place a packet before a given packet in a list. The list locks are taken
+ * and this function is atomic with respect to other list locked calls
+ * A buffer cannot be placed on two lists at the same time.
+ */
+
+void skb_insert(struct sk_buff *old, struct sk_buff *newsk)
+{
+ unsigned long flags;
+
+ spin_lock_irqsave(&old->list->lock, flags);
+ __skb_insert(newsk, old->prev, old, old->list);
+ spin_unlock_irqrestore(&old->list->lock, flags);
+}
+
#if 0
/*
* Tune the memory allocator for a new MTU size.
@@ -1133,3 +1292,11 @@
EXPORT_SYMBOL(skb_pad);
EXPORT_SYMBOL(skb_realloc_headroom);
EXPORT_SYMBOL(skb_under_panic);
+EXPORT_SYMBOL(skb_dequeue);
+EXPORT_SYMBOL(skb_dequeue_tail);
+EXPORT_SYMBOL(skb_insert);
+EXPORT_SYMBOL(skb_queue_purge);
+EXPORT_SYMBOL(skb_queue_head);
+EXPORT_SYMBOL(skb_queue_tail);
+EXPORT_SYMBOL(skb_unlink);
+EXPORT_SYMBOL(skb_append);
On Tue, 27 Apr 2004 14:21:36 -0700
Stephen Hemminger <[email protected]> wrote:
> This takes the suggestion and makes all the locked skb_ stuff, not inline.
> It showed a 3% performance improvement when doing single TCP stream over 1G
> Ethernet between SMP machines. Test was average of 10 iterations of
> iperf for 30 seconds and SUT was 4 way Xeon. Http performance difference
> was not noticeable (less than the std. deviation of the test runs).
Ok, let's give this a spin.
Applied, thanks.