Hi,
I am running a client/server test app over IPOIB in which the client sends
a certain amount of data to the server. When the transmittion ends, the
server prints the bandwidth and how much data it received. I can see that
the server reports it received about 60% that the client sent. However,
when I look at the server's interface counters before and after the
transmittion, I see that it actually received all the data that the client
sent. This leads me to suspect that the networking layer somehow dropped
some of the data. One thing to not - the CPU is 100% busy at the receiver.
Could this be the reason (the machine I am using is 2 dual cores - 4
CPUs).
The secod question is how do I make the interrupts be srviced by all CPUs?
I tried through the procfs as described by IRQ-affinity.txt but I can set
the mask to 0F bu then I read back and see it is indeed 0f but after a few
seconds I see it back to 02 (which means only CPU1).
One more thing - the device I am using is capable of generating MSIX
interrupts.
Thanks from advance
Eli
On Wed, 15 Nov 2006 00:15:47 +0200 (IST)
[email protected] wrote:
> Hi,
> I am running a client/server test app over IPOIB in which the client sends
> a certain amount of data to the server. When the transmittion ends, the
> server prints the bandwidth and how much data it received. I can see that
> the server reports it received about 60% that the client sent. However,
> when I look at the server's interface counters before and after the
> transmittion, I see that it actually received all the data that the client
> sent. This leads me to suspect that the networking layer somehow dropped
> some of the data. One thing to not - the CPU is 100% busy at the receiver.
> Could this be the reason (the machine I am using is 2 dual cores - 4
> CPUs).
If receiver application can't keep up UDP drops packets. The counter
receive buffer errors (UDP_MIB_RCVBUFERRORS) is incremented.
Don't expect flow control or reliable delivery; it's a datagram service!
> The secod question is how do I make the interrupts be srviced by all CPUs?
> I tried through the procfs as described by IRQ-affinity.txt but I can set
> the mask to 0F bu then I read back and see it is indeed 0f but after a few
> seconds I see it back to 02 (which means only CPU1).
Most likely, the user level irq balance daemon (irqbalanced) is adjusting it?
>
> One more thing - the device I am using is capable of generating MSIX
> interrupts.
>
Look at device capabilities with:
lspci -vv
--
Stephen Hemminger <[email protected]>
----- Original Message -----
From: <[email protected]>
To: <[email protected]>; <[email protected]>
Sent: Tuesday, November 14, 2006 5:15 PM
Subject: UDP packets loss
> Hi,
> I am running a client/server test app over IPOIB in which the client sends
> a certain amount of data to the server. When the transmittion ends, the
> server prints the bandwidth and how much data it received. I can see that
> the server reports it received about 60% that the client sent. However,
> when I look at the server's interface counters before and after the
> transmittion, I see that it actually received all the data that the client
> sent. This leads me to suspect that the networking layer somehow dropped
> some of the data. One thing to not - the CPU is 100% busy at the receiver.
> Could this be the reason (the machine I am using is 2 dual cores - 4
> CPUs).
>
> The secod question is how do I make the interrupts be srviced by all CPUs?
> I tried through the procfs as described by IRQ-affinity.txt but I can set
> the mask to 0F bu then I read back and see it is indeed 0f but after a few
> seconds I see it back to 02 (which means only CPU1).
>
> One more thing - the device I am using is capable of generating MSIX
> interrupts.
>
> Thanks from advance
> Eli
>
> -
Yes. The packet counters tell that the data was received by the interface.
However, the interface may be faster than the application that ultimately
receives the data so that the kernel eventually runs out of buffers used to
store the temporary data. When this happens, the kernel just drops them.
Since UDP is not "reliable", it can't ask the sender to send them again when
it has resources available. If you need all the data, use a TCP/IP stream
protocol, in other words, a connection. That way, you will get all the data,
even if you are writing it to a slow disk.
Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.24 (somewhere). IT removed email for
engineers!
New Book: http://www.AbominableFirebug.com
Thanks for the commets.
I actually use UDP because I am seeking for ways to improve the
performance of IPOIB and I wanted to avoid TCP's flow control. I am really
up to making anaysis. Can you tell me more about irqbalnced? Where can I
find more info how to control it? I would like my interrupts serviced by
all CPUs in a somehow equal manner. I mentioned MSIX - the driver already
make use of MSIX and I thought this is relevant to interrupts affinity.
> On Wed, 15 Nov 2006 00:15:47 +0200 (IST)
> [email protected] wrote:
>
>> Hi,
>> I am running a client/server test app over IPOIB in which the client
>> sends
>> a certain amount of data to the server. When the transmittion ends, the
>> server prints the bandwidth and how much data it received. I can see
>> that
>> the server reports it received about 60% that the client sent. However,
>> when I look at the server's interface counters before and after the
>> transmittion, I see that it actually received all the data that the
>> client
>> sent. This leads me to suspect that the networking layer somehow dropped
>> some of the data. One thing to not - the CPU is 100% busy at the
>> receiver.
>> Could this be the reason (the machine I am using is 2 dual cores - 4
>> CPUs).
>
> If receiver application can't keep up UDP drops packets. The counter
> receive buffer errors (UDP_MIB_RCVBUFERRORS) is incremented.
>
> Don't expect flow control or reliable delivery; it's a datagram service!
>
>> The secod question is how do I make the interrupts be srviced by all
>> CPUs?
>> I tried through the procfs as described by IRQ-affinity.txt but I can
>> set
>> the mask to 0F bu then I read back and see it is indeed 0f but after a
>> few
>> seconds I see it back to 02 (which means only CPU1).
>
> Most likely, the user level irq balance daemon (irqbalanced) is adjusting
> it?
>
>>
>> One more thing - the device I am using is capable of generating MSIX
>> interrupts.
>>
>
> Look at device capabilities with:
>
> lspci -vv
>
>
> --
> Stephen Hemminger <[email protected]>
>
Stephen Hemminger wrote:
> On Wed, 15 Nov 2006 00:15:47 +0200 (IST)
> [email protected] wrote:
>> The secod question is how do I make the interrupts be srviced by all CPUs?
>> I tried through the procfs as described by IRQ-affinity.txt but I can set
>> the mask to 0F bu then I read back and see it is indeed 0f but after a few
>> seconds I see it back to 02 (which means only CPU1).
>
> Most likely, the user level irq balance daemon (irqbalanced) is adjusting it?
Having it bounce between cpu's would likely result in a lower performance anyway: you
really want it bound to a single CPU to benefit from cache hits on the various involved
data structs that are needed to receive the data from hardware, do accounting etc.
the userspace irq balance daemon attempts to keep network interrupts on the same cpu for
longer periods. The old obsolete kernel-space daemon did exactly the opposite completely
destroying network performance.
I'm not sure whether this is completely optimal on newer chips like conroe with large
shared caches though...
Cheers,
Auke
On Wed, 15 Nov 2006 01:08:41 +0200 (IST)
[email protected] wrote:
> Thanks for the commets.
> I actually use UDP because I am seeking for ways to improve the
> performance of IPOIB and I wanted to avoid TCP's flow control. I am really
> up to making anaysis. Can you tell me more about irqbalnced?
Look for info on irqbalance (depends which linux distribution you
are using). You might not be running it at all, and it is completely
optional. There is also a kernel level IRQ balancer that may or
may not be configured.
> Where can I
> find more info how to control it?
man irqbalance
Note: irqbalance has heuristics about device names and driver names,
it might be worthwhile to either update the source and teach it about
infiniband, or work with existing heuristics (ie. call your interrupt "eth0", "eth1",...)
>I would like my interrupts serviced by
> all CPUs in a somehow equal manner. I mentioned MSIX - the driver already
> make use of MSIX and I thought this is relevant to interrupts affinity.
MSIX is not directly related to affinity. But with MSIX you can have multiple
CPU's all working at once. The device needs to return some info, and the driver
has to register multiple times.
Regular round-robin of network IRQ's is cache hostile, and that is why
irqbalance tries to keep them on the same processor.
On Wed, Nov 15, 2006 at 01:08:41AM +0200, [email protected] wrote:
> Thanks for the commets.
> I actually use UDP because I am seeking for ways to improve the
> performance of IPOIB and I wanted to avoid TCP's flow control. I am really
> up to making anaysis. Can you tell me more about irqbalnced? Where can I
> find more info how to control it? I would like my interrupts serviced by
> all CPUs in a somehow equal manner. I mentioned MSIX - the driver already
> make use of MSIX and I thought this is relevant to interrupts affinity.
>
If you want complete control over which CPU's service which interrupts, just
turn irqbalance off (usually service irqbalance stop). Then use
/proc/irq/<irq_number>/smp_affinity to tune the cpu affinity for each interrupt.
That being said however, As Auke and others have mentioned, servicing interrupts
on multiple cpu's leads to lower performance, not higher performance. cache
line bouncing is going to create greater latency for each interrupt you service
and slow you down overall. I assume these are gigabit interfaces? You're best
focus to improve throughput is to (if the driver supports it), tune your
interrupt coalescing factors such that you minimize the number of interrupts you
actually receive from the card.
Regards
Neil
> > On Wed, 15 Nov 2006 00:15:47 +0200 (IST)
> > [email protected] wrote:
> >
> >> Hi,
> >> I am running a client/server test app over IPOIB in which the client
> >> sends
> >> a certain amount of data to the server. When the transmittion ends, the
> >> server prints the bandwidth and how much data it received. I can see
> >> that
> >> the server reports it received about 60% that the client sent. However,
> >> when I look at the server's interface counters before and after the
> >> transmittion, I see that it actually received all the data that the
> >> client
> >> sent. This leads me to suspect that the networking layer somehow dropped
> >> some of the data. One thing to not - the CPU is 100% busy at the
> >> receiver.
> >> Could this be the reason (the machine I am using is 2 dual cores - 4
> >> CPUs).
> >
> > If receiver application can't keep up UDP drops packets. The counter
> > receive buffer errors (UDP_MIB_RCVBUFERRORS) is incremented.
> >
> > Don't expect flow control or reliable delivery; it's a datagram service!
> >
> >> The secod question is how do I make the interrupts be srviced by all
> >> CPUs?
> >> I tried through the procfs as described by IRQ-affinity.txt but I can
> >> set
> >> the mask to 0F bu then I read back and see it is indeed 0f but after a
> >> few
> >> seconds I see it back to 02 (which means only CPU1).
> >
> > Most likely, the user level irq balance daemon (irqbalanced) is adjusting
> > it?
> >
> >>
> >> One more thing - the device I am using is capable of generating MSIX
> >> interrupts.
> >>
> >
> > Look at device capabilities with:
> >
> > lspci -vv
> >
> >
> > --
> > Stephen Hemminger <[email protected]>
> >
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/
Auke Kok wrote:
> Having it bounce between cpu's would likely result in a lower
> performance anyway: you really want it bound to a single CPU to benefit
> from cache hits on the various involved data structs that are needed to
> receive the data from hardware, do accounting etc.
Additionally: beware you will likely get a fair amount of packet
reordering as well.
-John
> Hi,
> I am running a client/server test app over IPOIB in which the client sends
> a certain amount of data to the server. When the transmittion ends, the
> server prints the bandwidth and how much data it received. I can see that
> the server reports it received about 60% that the client sent. However,
> when I look at the server's interface counters before and after the
> transmittion, I see that it actually received all the data that the client
> sent. This leads me to suspect that the networking layer somehow dropped
> some of the data. One thing to not - the CPU is 100% busy at the receiver.
> Could this be the reason (the machine I am using is 2 dual cores - 4
> CPUs).
I still have the following argumet: the network and the network driver are
capable of transffering data at a high rate and the networking stack is
unable to keep the pace. If I used TCP probably TCP's flow control would
eventually slow the whole thing to a rate such all parts can handle. But
is there a way to overcome this situation and to avoid packets drop? If
this would happen then TCP would work at higher rates as well?? Perhaps
increase buffers sizes? Maybe the kernel is not designed to handle packets
rate like IPOIB can generate?
[email protected] wrote:
>> Hi,
>> I am running a client/server test app over IPOIB in which the client sends
>> a certain amount of data to the server. When the transmittion ends, the
>> server prints the bandwidth and how much data it received. I can see that
>> the server reports it received about 60% that the client sent. However,
>> when I look at the server's interface counters before and after the
>> transmittion, I see that it actually received all the data that the client
>> sent. This leads me to suspect that the networking layer somehow dropped
>> some of the data. One thing to not - the CPU is 100% busy at the receiver.
>> Could this be the reason (the machine I am using is 2 dual cores - 4
>> CPUs).
>
> I still have the following argumet: the network and the network driver are
> capable of transffering data at a high rate and the networking stack is
> unable to keep the pace. If I used TCP probably TCP's flow control would
> eventually slow the whole thing to a rate such all parts can handle. But
> is there a way to overcome this situation and to avoid packets drop? If
> this would happen then TCP would work at higher rates as well?? Perhaps
> increase buffers sizes? Maybe the kernel is not designed to handle packets
> rate like IPOIB can generate?
Increasing buffer sizes is not likely to help your problem. If the
receiving machine just can't keep up, you need flow control to avoid
loss. There clearly is an upper bound on the packet rate any given
receiver can handle, and you are hitting it.
BTW, TCP will be significantly faster than UDP because with UDP you
incur an extra full context switch on every packet.
-John
>
> BTW, TCP will be significantly faster than UDP because with UDP you
> incur an extra full context switch on every packet.
>
Could you elaborate on this a bit more? What kind of context switch?
From: [email protected]
Date: Thu, 16 Nov 2006 23:10:08 +0200 (IST)
> >
> > BTW, TCP will be significantly faster than UDP because with UDP you
> > incur an extra full context switch on every packet.
> >
>
> Could you elaborate on this a bit more? What kind of context switch?
TCP queues and takes care of all the sending, packetization,
etc. handling asynchronously. Whereas with UDP every write()
results in a packet on the wire, packets are always emitted
synchronously in process context.
On Thu, 16 Nov 2006 20:51:37 +0200 (IST)
[email protected] wrote:
> eventually slow the whole thing to a rate such all parts can handle. But
> is there a way to overcome this situation and to avoid packets drop? If
> this would happen then TCP would work at higher rates as well?? Perhaps
> increase buffers sizes?
Increased buffer sizes can actually paradoxically make the situation
worse. Van Jacobson once claimed that those who do not understand TCP are
doomed to re-invent it.
If you have a very controlled environment then there are alternative flow
control approaches including counting approaches when you know the
underlying transport is basically reliable (or you can tolerate minor
loss). That's roughly speaking the equivalent of TCP with fixed windows
and knowing that the buffering worst cases are the end points.
Alan
On Thu, Nov 16, 2006 at 08:51:37PM +0200, [email protected] wrote:
> > Hi,
> > I am running a client/server test app over IPOIB in which the client sends
> > a certain amount of data to the server. When the transmittion ends, the
> > server prints the bandwidth and how much data it received. I can see that
> > the server reports it received about 60% that the client sent. However,
> > when I look at the server's interface counters before and after the
> > transmittion, I see that it actually received all the data that the client
> > sent. This leads me to suspect that the networking layer somehow dropped
> > some of the data. One thing to not - the CPU is 100% busy at the receiver.
> > Could this be the reason (the machine I am using is 2 dual cores - 4
> > CPUs).
>
> I still have the following argumet: the network and the network driver are
> capable of transffering data at a high rate and the networking stack is
> unable to keep the pace. If I used TCP probably TCP's flow control would
> eventually slow the whole thing to a rate such all parts can handle. But
> is there a way to overcome this situation and to avoid packets drop? If
> this would happen then TCP would work at higher rates as well?? Perhaps
> increase buffers sizes? Maybe the kernel is not designed to handle packets
> rate like IPOIB can generate?
>
It sounds kind of like your looking for a protocol like SCTP. SCTP is a bit
like a hybrid between TCP and UDP. It provides for both a stream oriented
transport, and a sequential packet transport, which is like UDP, but with
sequence numbers that are visibile to the user space app, so in order delivery
can be guaranteed, and application controlled packet loss can be enabled.
Neil
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
/***************************************************
*Neil Horman
*Software Engineer
*gpg keyid: 1024D / 0x92A74FA1 - http://pgp.mit.edu
***************************************************/