LinuxLists.cc - Linux 2.6.9 pktgen module causes INIT process respawning and sickness

2004-11-19 21:39:00

Subject: Linux 2.6.9 pktgen module causes INIT process respawning and sickness

With pktgen.o configured to send 123MB/S on a gigabit on a system using
pktgen set to the following parms:

pgset "odev eth1"
pgset "pkt_size 1500"
pgset "count 0"
pgset "ipg 5000"
pgset "src_min 10.0.0.1"
pgset "src_max 10.0.0.254"
pgset "dst_min 192.168.0.1"
pgset "dst_max 192.168.0.254"

After 37 hours of continual packet generation into a gigabit
regeneration tap device,
the server system console will start to respawn the INIT process about
every 10-12
hours of continuous packet generation.

As a side note, this module in Linux is extremely useful and the "USE
WITH CAUTION" warnings
are certainly will stated. The performance of this tool is excellent.

Jeff

2004-11-19 22:00:12

by Jeff V. Merkey

[permalink] [raw]

Subject: Re: Linux 2.6.9 pktgen module causes INIT process respawning and sickness

Additionally, when packets sizes 64, 128, and 256 are selected, pktgen
is unable to achieve > 500,000 pps (349,000 only on my system).
A Smartbits generator can achieve over 1 million pps with 64 byte
packets on gigabit. This is one performance
issue for this app. However, at 1500 and 1048 sizes, gigabit saturation
is achievable.

Jeff

Jeff V. Merkey wrote:

>
> With pktgen.o configured to send 123MB/S on a gigabit on a system
> using pktgen set to the following parms:
>
> pgset "odev eth1"
> pgset "pkt_size 1500"
> pgset "count 0"
> pgset "ipg 5000"
> pgset "src_min 10.0.0.1"
> pgset "src_max 10.0.0.254"
> pgset "dst_min 192.168.0.1"
> pgset "dst_max 192.168.0.254"
>
> After 37 hours of continual packet generation into a gigabit
> regeneration tap device,
> the server system console will start to respawn the INIT process about
> every 10-12
> hours of continuous packet generation.
>
> As a side note, this module in Linux is extremely useful and the "USE
> WITH CAUTION" warnings
> are certainly will stated. The performance of this tool is excellent.
>
> Jeff
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-11-22 03:48:22

by Lincoln Dale

[permalink] [raw]

Subject: Re: Linux 2.6.9 pktgen module causes INIT process respawning and sickness

Jeff,

you're using commodity x86 hardware. what do you expect?

while the speed of PCs has increased significantly, there are still
significant bottlenecks when it comes to PCI bandwidth, PCI arbitration
efficiency & # of interrupts/second.
linux ain't bad -- but there are other OSes which still do slightly better
given equivalent hardware.

with a PC comes flexibility.
that won't match the speed of the FPGAs in a Spirent Smartbits, Agilent
RouterTester, IXIA et al ...

cheers,

lincoln.

At 09:06 AM 20/11/2004, Jeff V. Merkey wrote:

>Additionally, when packets sizes 64, 128, and 256 are selected, pktgen is
>unable to achieve > 500,000 pps (349,000 only on my system).
>A Smartbits generator can achieve over 1 million pps with 64 byte packets
>on gigabit. This is one performance
>issue for this app. However, at 1500 and 1048 sizes, gigabit saturation
>is achievable.
>Jeff
>
>Jeff V. Merkey wrote:
>
>>
>>With pktgen.o configured to send 123MB/S on a gigabit on a system using
>>pktgen set to the following parms:
>>
>>pgset "odev eth1"
>>pgset "pkt_size 1500"
>>pgset "count 0"
>>pgset "ipg 5000"
>>pgset "src_min 10.0.0.1"
>>pgset "src_max 10.0.0.254"
>>pgset "dst_min 192.168.0.1"
>>pgset "dst_max 192.168.0.254"
>>
>>After 37 hours of continual packet generation into a gigabit regeneration
>>tap device,
>>the server system console will start to respawn the INIT process about
>>every 10-12
>>hours of continuous packet generation.
>>
>>As a side note, this module in Linux is extremely useful and the "USE
>>WITH CAUTION" warnings
>>are certainly will stated. The performance of this tool is excellent.
>>
>>Jeff

2004-11-22 18:28:54

by Jeff V. Merkey

[permalink] [raw]

Subject: Re: Linux 2.6.9 pktgen module causes INIT process respawning and sickness

Martin,

See the comments below. This test uses dual and quad adapters, but
doesn;t get around the
poor design of dev_queue_xmit or the driver layer for xmit packets. The
reasons explained
below:

Jeff

Lincoln,

I've studied these types of problems for years, and I think it's
possible even for Linux. The
problem with small packet sizes on x86 hardware is related to
non-cachable writes to
the adapter ring buffer for preloading of addresses. From my
measurements, I have observed
that the increased memory write traffic increases latency to the point
that the OS is unable to
receive data off the card at high enough rates. With testing against
Linux with a Spirent Smartbits,
at @ 3,00,000 packets per second for 64 byte packets aboutn 80% of the
packets get dropped
at 1000 mbs rates. It's true that Linux is simply incapable of
generating at these rates, but the
reason in Linux is due to poor design at the xmit layer. You see a lot
better behavior at
1500 byte packet sizes, but this is because the card doesnt have to
preload as many addresses
into the ring buffer since you are only dealing with 150,000 packets per
second in the 1500
byte case, not in the millions for the 64 byte case.

Linux uses polling (bad) and the tx queue does not feed packets back to
the adapter on tx cleaning
of the queue via tx complete (or terminal dma count) interrupts
durectly, instead they go through
a semaphore to trigger the next send -- horribly broken for high speed
communications. They should
just post the packets and allow tx complete interrupts to feed them off
the queues. The queue depths
in qdisc are far too short before Linux starts dropping packets
internally. I've had to increase
the depth of tx_queue_len for some apps to work properly without
dropping all the skbs on the floor.

So how to get around this problem. At present, the design of the Intel
drivers allow all the ripe ring buffers
to be reaped at once from a single interrupt. This is very efficient on
the RX side and in fact, with static
tests, I have been able to program the Intel card to accept 64 byte
packets at the maximum rate for
gigabit saturation on Linux provided the ring buffers are loaded with
static addresses. This indicates
the problem in the design is related to the preloading anbd serializing
memory behavior of Intel's
architecture at the ring buffer level on the card. This also means that
Linux on current PC architecture,
(and most OS for that matter) will not be able to sustain 10 gigabit
rates unless the packet sizes get larger
and larger due to the nature of this problem. The solution for he card
vendors is to instrument the
ability to load a descriptor to the card once which contains the
addresses of all the ring buffers
for a session of the card and reap them in A / B lists. i.e. two active
preload memory tables which
contain a listing of preload addresses for receive and when the card
fills one list, it switches to the second
for receives, sends an interruptr, and the ISR loads the next table into
the card.

I see no other way for OS to sustain high packet loading about 500,000
packets per second on Linux
or even come close to dealing with small packets or full 10 gigabite
ethernet without such a model.
The bus speeds are actually fine for dealing with this on current
hardware. The problem is realated
to the serializing behavior of non-cachable memory references on IO
mapped card memory, and this
suggestion could be implemented in Intel Gigabit and 10 gE hardware with
microcode and minor changes
to the DMA designs of their chipsets. It would allow all OS to reach
performance levels of a Smartbits
or even a CISCO router without the need for custom hardware design.

My 2 cents.

Jeff

Lincoln Dale wrote:

> Jeff,
>
> you're using commodity x86 hardware. what do you expect?
>
> while the speed of PCs has increased significantly, there are still
significant bottlenecks when it comes to PCI bandwidth, PCI arbitration
efficiency & # of interrupts/second.
> linux ain't bad -- but there are other OSes which still do slightly
better given equivalent hardware.
>
> with a PC comes flexibility.
> that won't match the speed of the FPGAs in a Spirent Smartbits,
Agilent RouterTester, IXIA et al ...
>
> cheers,
>
> lincoln.

Martin Josefsson wrote:

>On Fri, 2004-11-19 at 23:06, Jeff V. Merkey wrote:
>
>
>>Additionally, when packets sizes 64, 128, and 256 are selected, pktgen
>>is unable to achieve > 500,000 pps (349,000 only on my system).
>>A Smartbits generator can achieve over 1 million pps with 64 byte
>>packets on gigabit. This is one performance
>>issue for this app. However, at 1500 and 1048 sizes, gigabit saturation
>>is achievable.
>>
>>
>
>What hardware are you using? 349kpps is _low_ performance at 64byte
>packets.
>
>Here you can see Roberts (pktgen author) results when testing diffrent
>e1000 nics at diffrent bus speeds. He also tested 2port and 4port e1000
>cards, the 4port nics have an pci-x bridge...
>
>http://robur.slu.se/Linux/net-development/experiments/2004/040808-pktgen
>
>I get a lot higher than 349kpps with an e1000 desktop adapter running at
>32bit/66MHz.
>
>
>

2004-11-22 22:53:43

by Lincoln Dale

[permalink] [raw]

Subject: Re: Linux 2.6.9 pktgen module causes INIT process respawning and sickness

Jeff,

At 04:06 AM 23/11/2004, Jeff V. Merkey wrote:
>I've studied these types of problems for years, and I think it's possible
>even for Linux.

so you have the source code --if its such a big deal for you, how about you
contribute the work to make this possible ?

the fact is, large-packet-per-second generation fits into two categories:
(a) script kiddies / haxors who are interested in building DoS tools
(b) folks that spend too much time benchmarking.

for the (b) case, typically the PPS-generation is only part of it. getting
meaningful statistics on reordering (if any) as well as accurate latency
and ideally real-world traffic flows is important. there are specialized
tools out there to do this: Spirent, Ixia, Agilent et al make them.

>[..]
>I see no other way for OS to sustain high packet loading about 500,000
>packets per second on Linux
>or even come close to dealing with small packets or full 10 gigabite
>ethernet without such a model.

10GbE NICs are an entirely different beast from 1GbE.
as you pointed out, with real-world packet sizes today, one can sustain
wire-rate 1GbE today (same holds true for 2Gbps Fibre Channel also).

i wouldn't call pushing minimum-packet-size @ 1GbE (which is 46 payload, 72
bytes on the wire btw) "real world". and its 1.488M packets/second.

>The bus speeds are actually fine for dealing with this on current hardware.

its fine when you have meaningful interrupt coalescing going on & large
packets to DMA.
it fails when you have inefficient DMA (small) with the overhead of setting
up & tearing down the DMA and associated arbitration overhead.

cheers,

lincoln.

2004-11-23 00:28:49

by Jeff V. Merkey

[permalink] [raw]

Subject: Re: Linux 2.6.9 pktgen module causes INIT process respawning and sickness

Lincoln Dale wrote:

> Jeff,
>
> At 04:06 AM 23/11/2004, Jeff V. Merkey wrote:
>
>> I've studied these types of problems for years, and I think it's
>> possible even for Linux.
>
>
> so you have the source code --if its such a big deal for you, how
> about you contribute the work to make this possible ?

Bryan Sparks says no to open sourcing this code in Linux. Sorry -- I
asked. I am allowed to open source any modifications
to public kernel sources like dev.c since we have an obligation to do
so. I will provide source code enhancements for the kernel
for anyone who purchases our Linux based appliances and asks for the
source code (so says Bryan Sparks). You can issue a purchase
request to Bryan Sparks ([email protected]) if you want any source
code changes for the Linux kernel.

>
> the fact is, large-packet-per-second generation fits into two categories:
> (a) script kiddies / haxors who are interested in building DoS tools
> (b) folks that spend too much time benchmarking.
>
> for the (b) case, typically the PPS-generation is only part of it.
> getting meaningful statistics on reordering (if any) as well as
> accurate latency and ideally real-world traffic flows is important.
> there are specialized tools out there to do this: Spirent, Ixia,
> Agilent et al make them.

There are about four pages of listings of open source tools and scripts
that do this -- we support all of them.

>> [..]
>> I see no other way for OS to sustain high packet loading about
>> 500,000 packets per second on Linux
>> or even come close to dealing with small packets or full 10 gigabite
>> ethernet without such a model.
>
>
> 10GbE NICs are an entirely different beast from 1GbE.
> as you pointed out, with real-world packet sizes today, one can
> sustain wire-rate 1GbE today (same holds true for 2Gbps Fibre Channel
> also).
>
> i wouldn't call pushing minimum-packet-size @ 1GbE (which is 46
> payload, 72 bytes on the wire btw) "real world". and its 1.488M
> packets/second.
>
I agree. I have also noticed that CISCO routers are not even able to
withstand these rates with 64 byte packets without dropping them,
so I agree this is not real world. It is useful testing howevr, to
determine the limits and bottlenecks of where things break.

>> The bus speeds are actually fine for dealing with this on current
>> hardware.
>
>
> its fine when you have meaningful interrupt coalescing going on &
> large packets to DMA.
> it fails when you have inefficient DMA (small) with the overhead of
> setting up & tearing down the DMA and associated arbitration overhead.
>
>

I can sustain full line rate gigabit on two adapters at the tsame time
with a 12 CLK interpacket gap time and 0 dropped packets at 64
byte sizes from a Smartbits to Linux provided the adapter ring buffer is
loaded with static addresses. This demonstrates that it is
possible to sustain 64 byte packet rates at full line rate with current
DMA architectures on 400 Mhz buses with Linux.
(which means it will handle any network loading scenario). The
bottleneck from my measurements appears to be the
overhead of serializing writes to the adapter ring buffer IO memory. The
current drivers also perform interrupt
coalescing very well with Linux. What's needed is a method for
submission of ring buffer entries that can be sent in large
scatter gather listings rather than one at a time. Ring buffers exhibit
sequential behavior so this method should work well
to support 1Gbe and 10Gbe at full line rate with small packet sizes.

Jeff

>
> cheers,
>
> lincoln.
>
> -
> To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2004-11-23 00:56:15

by Jeff V. Merkey

[permalink] [raw]

Subject: Re: Linux 2.6.9 pktgen module causes INIT process respawning and sickness

Jeff V. Merkey wrote:

>
> Bryan Sparks says no to open sourcing this code in Linux. Sorry -- I
> asked. I am allowed to open source any modifications
> to public kernel sources like dev.c since we have an obligation to do
> so. I will provide source code enhancements for the kernel
> for anyone who purchases our Linux based appliances and asks for the
> source code (so says Bryan Sparks). You can issue a purchase
> request to Bryan Sparks ([email protected]) if you want any
> source code changes for the Linux kernel.
>
Lincoln,

Needless to say, we are not open sourcing any of our proprietary
technology with the appliances, just the changes to the core
Linux kernel files as required by the GPL, just to clarify. It comes as
a patch to linux-2.6.9 and does not include the appliance
core systems.

Jeff

2004-11-23 01:30:57

by Lincoln Dale

[permalink] [raw]

Subject: Re: Linux 2.6.9 pktgen module causes INIT process respawning and sickness

At 12:06 PM 23/11/2004, Jeff V. Merkey wrote:
>>Bryan Sparks says no to open sourcing this code in Linux. Sorry -- I
>>asked. I am allowed to open source any modifications
>>to public kernel sources like dev.c since we have an obligation to do so.
>>I will provide source code enhancements for the kernel
>>for anyone who purchases our Linux based appliances and asks for the
>>source code (so says Bryan Sparks). You can issue a purchase
>>request to Bryan Sparks ([email protected]) if you want any source
>>code changes for the Linux kernel.
>
>Needless to say, we are not open sourcing any of our proprietary
>technology with the appliances, just the changes to the core
>Linux kernel files as required by the GPL, just to clarify. It comes as a
>patch to linux-2.6.9 and does not include the appliance
>core systems.

got it - much clearer.

fair enough.

cheers,

lincoln.

2004-11-23 01:28:43

by Lincoln Dale

[permalink] [raw]

Subject: Re: Linux 2.6.9 pktgen module causes INIT process respawning and sickness

At 11:36 AM 23/11/2004, Jeff V. Merkey wrote:
>>>I've studied these types of problems for years, and I think it's
>>>possible even for Linux.
>>
>>so you have the source code --if its such a big deal for you, how about
>>you contribute the work to make this possible ?
>
>Bryan Sparks says no to open sourcing this code in Linux. Sorry -- I
>asked. I am allowed to open source any modifications
>to public kernel sources like dev.c since we have an obligation to do so.
>I will provide source code enhancements for the kernel
>for anyone who purchases our Linux based appliances and asks for the
>source code (so says Bryan Sparks). You can issue a purchase
>request to Bryan Sparks ([email protected]) if you want any source
>code changes for the Linux kernel.

LOL. in wonderland again?

>>the fact is, large-packet-per-second generation fits into two categories:
>>(a) script kiddies / haxors who are interested in building DoS tools
>>(b) folks that spend too much time benchmarking.
>>
>>for the (b) case, typically the PPS-generation is only part of it.
>>getting meaningful statistics on reordering (if any) as well as accurate
>>latency and ideally real-world traffic flows is important. there are
>>specialized tools out there to do this: Spirent, Ixia, Agilent et al make them.
>
>There are about four pages of listings of open source tools and scripts
>that do this -- we support all of them.

so you're creating a packet-generation tool?
you mentioned already that you had to increase receive-buffers up to some
large number. doesn't sound like a very useful packet-generation tool if
you're internally having to buffer >60K packets . . .
LOL.

>>i wouldn't call pushing minimum-packet-size @ 1GbE (which is 46 payload,
>>72 bytes on the wire btw) "real world". and its 1.488M packets/second.
>I agree. I have also noticed that CISCO routers are not even able to
>withstand these rates with 64 byte packets without dropping them,
>so I agree this is not real world. It is useful testing howevr, to
>determine the limits and bottlenecks of where things break.

Cisco software-based routers? sure ...
however, if you had an application which required wire-rate minimum-sized
frames, then a software-based router wouldn't really be your platform of
choice.

hint: go look at EANTC's testing of GbE and 10GbE L3 switches.

there's public test data of 10GbE with 10,000-line ACLs for both IPv4 &
IPv6-based L3 switching.

cheers,

lincoln.

2004-11-22 17:25:57

by Martin Josefsson

[permalink] [raw]

Subject: Re: Linux 2.6.9 pktgen module causes INIT process respawning and sickness

On Fri, 2004-11-19 at 23:06, Jeff V. Merkey wrote:
> Additionally, when packets sizes 64, 128, and 256 are selected, pktgen
> is unable to achieve > 500,000 pps (349,000 only on my system).
> A Smartbits generator can achieve over 1 million pps with 64 byte
> packets on gigabit. This is one performance
> issue for this app. However, at 1500 and 1048 sizes, gigabit saturation
> is achievable.

What hardware are you using? 349kpps is _low_ performance at 64byte
packets.

Here you can see Roberts (pktgen author) results when testing diffrent
e1000 nics at diffrent bus speeds. He also tested 2port and 4port e1000
cards, the 4port nics have an pci-x bridge...

http://robur.slu.se/Linux/net-development/experiments/2004/040808-pktgen

I get a lot higher than 349kpps with an e1000 desktop adapter running at
32bit/66MHz.

--
/Martin

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part

2004-11-22 16:55:35

by Jeff V. Merkey

[permalink] [raw]

Subject: Re: Linux 2.6.9 pktgen module causes INIT process respawning and sickness

Lincoln,

I've studied these types of problems for years, and I think it's
possible even for Linux. The
problem with small packet sizes on x86 hardware is related to
non-cachable writes to
the adapter ring buffer for preloading of addresses. From my
measurements, I have observed
that the increased memory write traffic increases latency to the point
that the OS is unable to
receive data off the card at high enough rates. With testing against
Linux with a Spirent Smartbits,
at @ 3,00,000 packets per second for 64 byte packets aboutn 80% of the
packets get dropped
at 1000 mbs rates. It's true that Linux is simply incapable of
generating at these rates, but the
reason in Linux is due to poor design at the xmit layer. You see a lot
better behavior at
1500 byte packet sizes, but this is because the card doesnt have to
preload as many addresses
into the ring buffer since you are only dealing with 150,000 packets per
second in the 1500
byte case, not in the millions for the 64 byte case.

Linux uses polling (bad) and the tx queue does not feed packets back to
the adapter on tx cleaning
of the queue via tx complete (or terminal dma count) interrupts
durectly, instead they go through
a semaphore to trigger the next send -- horribly broken for high speed
communications. They should
just post the packets and allow tx complete interrupts to feed them off
the queues. The queue depths
in qdisc are far too short before Linux starts dropping packets
internally. I've had to increase
the depth of tx_queue_len for some apps to work properly without
dropping all the skbs on the floor.

So how to get around this problem. At present, the design of the Intel
drivers allow all the ripe ring buffers
to be reaped at once from a single interrupt. This is very efficient on
the RX side and in fact, with static
tests, I have been able to program the Intel card to accept 64 byte
packets at the maximum rate for
gigabit saturation on Linux provided the ring buffers are loaded with
static addresses. This indicates
the problem in the design is related to the preloading anbd serializing
memory behavior of Intel's
architecture at the ring buffer level on the card. This also means that
Linux on current PC architecture,
(and most OS for that matter) will not be able to sustain 10 gigabit
rates unless the packet sizes get larger
and larger due to the nature of this problem. The solution for he card
vendors is to instrument the
ability to load a descriptor to the card once which contains the
addresses of all the ring buffers
for a session of the card and reap them in A / B lists. i.e. two active
preload memory tables which
contain a listing of preload addresses for receive and when the card
fills one list, it switches to the second
for receives, sends an interruptr, and the ISR loads the next table into
the card.

I see no other way for OS to sustain high packet loading about 500,000
packets per second on Linux
or even come close to dealing with small packets or full 10 gigabite
ethernet without such a model.
The bus speeds are actually fine for dealing with this on current
hardware. The problem is realated
to the serializing behavior of non-cachable memory references on IO
mapped card memory, and this
suggestion could be implemented in Intel Gigabit and 10 gE hardware with
microcode and minor changes
to the DMA designs of their chipsets. It would allow all OS to reach
performance levels of a Smartbits
or even a CISCO router without the need for custom hardware design.

My 2 cents.

Jeff

Lincoln Dale wrote:

> Jeff,
>
> you're using commodity x86 hardware. what do you expect?
>
> while the speed of PCs has increased significantly, there are still
> significant bottlenecks when it comes to PCI bandwidth, PCI
> arbitration efficiency & # of interrupts/second.
> linux ain't bad -- but there are other OSes which still do slightly
> better given equivalent hardware.
>
> with a PC comes flexibility.
> that won't match the speed of the FPGAs in a Spirent Smartbits,
> Agilent RouterTester, IXIA et al ...
>
> cheers,
>
> lincoln.
>
> At 09:06 AM 20/11/2004, Jeff V. Merkey wrote:
>
>> Additionally, when packets sizes 64, 128, and 256 are selected,
>> pktgen is unable to achieve > 500,000 pps (349,000 only on my system).
>> A Smartbits generator can achieve over 1 million pps with 64 byte
>> packets on gigabit. This is one performance
>> issue for this app. However, at 1500 and 1048 sizes, gigabit
>> saturation is achievable.
>> Jeff
>>
>> Jeff V. Merkey wrote:
>>
>>>
>>> With pktgen.o configured to send 123MB/S on a gigabit on a system
>>> using pktgen set to the following parms:
>>>
>>> pgset "odev eth1"
>>> pgset "pkt_size 1500"
>>> pgset "count 0"
>>> pgset "ipg 5000"
>>> pgset "src_min 10.0.0.1"
>>> pgset "src_max 10.0.0.254"
>>> pgset "dst_min 192.168.0.1"
>>> pgset "dst_max 192.168.0.254"
>>>
>>> After 37 hours of continual packet generation into a gigabit
>>> regeneration tap device,
>>> the server system console will start to respawn the INIT process
>>> about every 10-12
>>> hours of continuous packet generation.
>>>
>>> As a side note, this module in Linux is extremely useful and the
>>> "USE WITH CAUTION" warnings
>>> are certainly will stated. The performance of this tool is excellent.
>>>
>>> Jeff
>>
>
>