2002-06-27 18:14:44

by Hurwitz Justin W.

[permalink] [raw]
Subject: zero-copy networking & a performance drop


Please Cc me in replies. I'm an archive lurker.

Howdy-

I've been working to isolate (and fix ;) a receive-side performance
bottleneck in the IP implementation used over the Quadrics Interconnect.
For those "not in the know," quadrics is a high speed (3.2 gbit) low
latency propreitary interconnect, used primarily in supercomputing
applications. It provides a shared memory architecture, and most of the
communication is handled by the NICs.

Previous tests have show that we can transmit IP packets easily at around
1.4 Gbit, but we can only receive at about 0.9 Gbit. We suspect there is a
memory copy somewhere either in the quadrics IP driver (covered by an NDA,
sorry), or in the IP stack after netif_rx() is called. I've looked at the
driver, and, upon a (good) cursory inspection, it looks good.

Now, here is my direct question: has a zero-copy TCP stack been introduced
after 2.4.3 (which we're running)? I believe the answer is yes, but I've
not been able to find direct confirmation. If the answer is yes, does
anything special need to be done (in terms of allocating/working with
skbs, or passing the packets up to higher levels) in order to use the
zero-copy implementation.

>From what I've read/determined, fixing this performance drop might be as
simple as upgrading the kernel and recomiling our IP module. It is
unfortunately the case that I can't play around with our test cluster-
it's become a production cluster, so even upgrading kernel versions will
be a hard sell. I need to be certain that what I want to do is the Right
Thing(tm) before I propose it.

For the curious (and possibly advice-offering): neither fragmentation nor
checksums should be an issue with this hardware. The IP data is sent via
the hardware's link level protocol, which performs fragmentation and error
checking/correcting transparent to the OS (which, btw, is sometimes a real
pain).

Cheers,
--Gus


2002-06-27 19:51:21

by Nivedita Singhvi

[permalink] [raw]
Subject: Re: zero-copy networking & a performance drop

> Now, here is my direct question: has a zero-copy TCP stack
> been introduced after 2.4.3 (which we're running)? I believe
> the answer is yes, but I've not been able to find direct confirmation.
> If the answer is yes, does anything special need to be done
> (in terms of allocating/working with skbs, or passing the packets
> up to higher levels) in order to use the zero-copy implementation.

Yes, 2.4.4. You should upgrade to that if possible, or rather,
a much more recent version..

That said, rx has been slower than sends in most of our testing
too.


thanks,
Nivedita


2002-06-27 21:06:22

by Bernd Eckenfels

[permalink] [raw]
Subject: Re: zero-copy networking & a performance drop

In article <[email protected]> you wrote:
> Previous tests have show that we can transmit IP packets easily at around
> 1.4 Gbit, but we can only receive at about 0.9 Gbit. We suspect there is a
> memory copy somewhere either in the quadrics IP driver (covered by an NDA,
> sorry), or in the IP stack after netif_rx() is called. I've looked at the
> driver, and, upon a (good) cursory inspection, it looks good.

You have to sign a NDA for a hardware which bahaves slower than expected and
then u seek help in the open community? I would ask you to bother the
manufacturer, so they know about this problem and solve it. After all, if
you have to pay for it you should actually USE their service.

AFAIK only a few methods like sendfile support zero copy ip.

But on the other hand, if the card works by memory sharing perhaps TCP/IP is
simply the wrong api to speak to that device?

Greetings
Bernd

2002-06-27 22:05:41

by Hurwitz Justin W.

[permalink] [raw]
Subject: Re: zero-copy networking & a performance drop

On Thu, 27 Jun 2002, Nivedita Singhvi wrote:

[ snip ]

> That said, rx has been slower than sends in most of our testing
> too.

Is this a documented/explained phemomenon? Or are you and I the only
people experiencing it? Do we have any idea as to its cause (or is it
inherent architecturally)?

Cheers,
--Gus


2002-06-27 22:31:22

by Nivedita Singhvi

[permalink] [raw]
Subject: Re: zero-copy networking & a performance drop

On Thu, 27 Jun 2002, Hurwitz Justin W. wrote:

> On Thu, 27 Jun 2002, Nivedita Singhvi wrote:
>
> [ snip ]
>
> > That said, rx has been slower than sends in most of our testing
> > too.
>
> Is this a documented/explained phemomenon? Or are you and I the only
> people experiencing it? Do we have any idea as to its cause (or is it
> inherent architecturally)?
>
> Cheers,
> --Gus


Well, briefly, completely speculatively, and possibly unhelpfully,

- rx side processing can involve more work (stack length
is simply longer) and so can legitimately take longer.
This is especially true when options and out of order
packets are involved, and TCP fast path processing
on the rx side isnt taken. (I had done a breakdown
of this based on some profiles last year, but dont
have that at the moment)

- rx side reassembly could cause longer delays in the
case of fragmentation

- scheduler comes slightly more into play on the rx
side for TCP, may be since we can put stuff on the backlog
or prequeue q's (waiting for a recvmsg()) (??). this is
again, very off the cuff and based on some profiles
I had seen on send/rx side with rx side scheduler
showing up higher, and without having investigated
further at the time..(long time ago, dont quote me, etc..)


there are possibly many different scenario's here, and
I'm probably missing the most obvious causes...


thanks,
Nivedita

2002-06-28 02:57:19

by Andi Kleen

[permalink] [raw]
Subject: Re: zero-copy networking & a performance drop


Nivedita Singhvi <[email protected]> writes:
>
> there are possibly many different scenario's here, and
> I'm probably missing the most obvious causes...

There is one problem with the TCP csum-copy to user RX implementation. When the
fast path misses (process not scheduling in time for some reason) the
remaining packet is taken care of by the delack timer. This adds considerable
latency to the ACK generation (worst case 1/50s), because the stack does not
generate send the ack earlier when 2*rcvmss data is received and can be
visible as latency to user space in protocols that send lots of small messages.

csum-copy-to-user makes only sense when the NIC doesn't support hardware
checksumming, otherwise it is better to just queue and do a normal copy
and avoid these latencies.

I'm using this patch (should apply to any 2.4.4+ and 2.5). It essentially
disables most of the RX user context TCP for NICs with hardware checksums
(except for the usual processing as part of socket lock). IMHO the user
context code (prequeue etc.) is not too useful because of the latencies it
adds and it would be best to drop it. Most NICs should have hardware
checksumming these days and those that don't are likely slow enough (old
Realtek) to not need any special hacks.

With that patch it also makes even more sense to go for a SSE optimized
copy-to-user to get more speed out of networking.

Regarding the RX slowdown: I think there was some slowdown in chatroom
when the zero-copy TX stack was introduced. chatroom is horrible
benchmark in itself, but the stack work should not have slowed it down.
It's possible that it is fixed by this patch too; i haven't checked.

-Andi

diff -urN linux-2.4.18.tmp/net/ipv4/tcp_ipv4.c linux-2.4.18.SuSE/net/ipv4/tcp_ipv4.c
--- linux-2.4.18.tmp/net/ipv4/tcp_ipv4.c Mon Apr 15 14:43:40 2002
+++ linux-2.4.18.SuSE/net/ipv4/tcp_ipv4.c Mon Apr 15 16:19:52 2002
@@ -1767,7 +1775,7 @@
bh_lock_sock(sk);
ret = 0;
if (!sk->lock.users) {
- if (!tcp_prequeue(sk, skb))
+ if (skb->ip_summed != CHECKSUM_NONE || !tcp_prequeue(sk, skb))
ret = tcp_v4_do_rcv(sk, skb);
} else
sk_add_backlog(sk, skb);

2002-06-28 02:59:13

by Hurwitz Justin W.

[permalink] [raw]
Subject: Re: zero-copy networking & a performance drop


In a mail that wasn't CC'd to me Bernd said:
>
> In article <[email protected]> you wrote:
>> Previous tests have show that we can transmit IP packets easily at around
>> 1.4 Gbit, but we can only receive at about 0.9 Gbit. We suspect there is a
>> memory copy somewhere either in the quadrics IP driver (covered by an NDA,
>> sorry), or in the IP stack after netif_rx() is called. I've looked at the
>> driver, and, upon a (good) cursory inspection, it looks good.
>
> You have to sign a NDA for a hardware which bahaves slower than expected and
> then u seek help in the open community? I would ask you to bother the
> manufacturer, so they know about this problem and solve it. After all, if
> you have to pay for it you should actually USE their service.
>
> AFAIK only a few methods like sendfile support zero copy ip.
>
> But on the other hand, if the card works by memory sharing perhaps TCP/IP is
> simply the wrong api to speak to that device?
>

Your mail begs a reply, to defend both my work, and the hardware that I'm
working on (and company behind it)- no offense is intended, and none was
taken. I feel that, in a community, such as this, where the immediate
response (which I do share) to letters like NDA is disgust, a bit of
defense where it is appropriate is an always on-topic reminder of their
occasional legitimacy.

There are three points in your mail to which I need reply- the NDA, the
performance of the hardware (esp. wrt TCP/IP), and my requesting the help
of the open source community.

I agree that NDAs are dangerous, often ill-conceived, and usually counter-
productive devices of intellectual monopoly (e.g., Broadcom's refusal to
share Tigon3 information). That said they can serve a legitimate purpose,
especially when you are not talking about commodity products. Quadrics
products are not commodity- I'd be surprised if anyone reading this has
quadrics hardware in any general use (commodity) machine. The material
covered under the NDA (of which the IP driver is a trivial part)
represents an important part of Quadrics' unique product line. They do
have legitimate reason to keep this code out of their competitors' hands;
we are not their competitors, and they have given us access to their code,
provided that we do not share it with their competitors.

You are very right in saying that TCP/IP is the wrong api for this
hardware. We achieve significantly better performance with MPI. But we
have applications that could significantly benefit from having TCP/IP
available; being able to run TCP/IP on quadrics based clusters will open
up new possibilities for us- hence this is of research interest for us.

Another consequence of your point is that Quadrics does not have a direct
interest in developing TCP/IP for their hardware; it is not part of their
design for the hardware, and they cannot be expected to provide a top
notch driver to do what we want, just because we want it- their hardware
does what it is supposed to do; we're trying to get it to do the
unexpected. Initial research shows that we should be able to get twice the
performance out of TCP/IP on this hardware than we currently do. Quadrics
has helped us get this far, despite the fact that they did not design
their hardware for this purpose- that's much more than I expect from most
companies.

Which brings me to my work. I am not trying to get the open source
community's help to make a propreitary driver work. As I said before,
Quadrics' driver does look to be of good quality- I do not believe that
the problem necessarily lies there. I am trying to determine why there is
so large a disparity between send and receive performance when using this
driver in the linux kernel. As indicated in a previous reply to my
question, this asymmetry might be inherent to the kernel. If it is, my job
is become to try to eliminate or reduce this asymmetry. That work, if
sucessful, will be returned to the kernel. As you certainly know, TCP only
performs as well as the slowest side of the connection- reducing asymmetry
could offer significant performance gains.

Hope this clarifies my intent, repudiates any indication that the Quadrics
hardware is underperforming, and justifies the NDA :) It's a lot to do.
I _am_ an open source developer (I learned how to program while hacking
SCSI drivers in the kernel's pre-1.0 days); and I do believe that
Quadrics' NDA is legitimate, and that they are not the bad-guys.

Cheers,
--Gus

PS- You are right about sendfile() and zero-copy IP.

2002-06-28 13:50:05

by Ingo Oeser

[permalink] [raw]
Subject: Re: zero-copy networking & a performance drop

On Thu, Jun 27, 2002 at 03:33:33PM -0700, Nivedita Singhvi wrote:
> - rx side processing can involve more work (stack length
> is simply longer) and so can legitimately take longer.
> This is especially true when options and out of order
> packets are involved, and TCP fast path processing
> on the rx side isnt taken. (I had done a breakdown
> of this based on some profiles last year, but dont
> have that at the moment)

Jupp, I think this is really true. Look at all the checking alone.

Remember: We accept data from an untrusted source (network) which
has lots of control information encoded with many of them
being optional.

-> This involves a lot of "parsing" (for binary streams,
decoding might be better) of a complex language (TCP/IP ;-))
with many optional elements (read: lots of branches in the
language tree).

On sending data, we have all the information trusted, because we
checked that already, as the user sets it. With sendfile, we have
even trusted and mapped data (because we just paged it in before).

If we take this into account, rx MUST be always slower, or tx
isn't really optimized yet.

Regards

Ingo Oeser
--
Science is what we can tell a computer. Art is everything else. --- D.E.Knuth

2002-06-28 15:02:35

by Nivedita Singhvi

[permalink] [raw]
Subject: Re: zero-copy networking & a performance drop

On 28 Jun 2002, Andi Kleen wrote:

>
> Nivedita Singhvi <[email protected]> writes:
> >
> > there are possibly many different scenario's here, and
> > I'm probably missing the most obvious causes...
>
> There is one problem with the TCP csum-copy to user RX implementation.
> When the fast path misses (process not scheduling in time for some
> reason) the remaining packet is taken care of by the delack timer.
> This adds considerable latency to the ACK generation (worst case 1/50s),
> because the stack does not generate send the ack earlier when
> 2*rcvmss data is received and can be visible as latency to user space
> in protocols that send lots of small messages.

Yes!! delayed and _very_ delayed acks were killing performance
for a certain workloads (unidirectional, small packet sends).

The current stack is (naturally) optimized for heavy continuous
tcp traffic, and shows up well in benchmarks like netperf tcp
stream tests, which blast full sized packets out (well, msg
size permitting), and SpecWeb99..

> csum-copy-to-user makes only sense when the NIC doesn't support hardware
> checksumming, otherwise it is better to just queue and do a normal copy
> and avoid these latencies.
>
> I'm using this patch (should apply to any 2.4.4+ and 2.5). It essentially
> disables most of the RX user context TCP for NICs with hardware checksums
> (except for the usual processing as part of socket lock). IMHO the user
> context code (prequeue etc.) is not too useful because of the latencies it
> adds and it would be best to drop it. Most NICs should have hardware
> checksumming these days and those that don't are likely slow enough (old
> Realtek) to not need any special hacks.

Thanks, will try it out (back in the office july 8)..

> With that patch it also makes even more sense to go for a SSE optimized
> copy-to-user to get more speed out of networking.
>
> Regarding the RX slowdown: I think there was some slowdown in chatroom
> when the zero-copy TX stack was introduced. chatroom is horrible
> benchmark in itself, but the stack work should not have slowed it down.
> It's possible that it is fixed by this patch too; i haven't checked.
>
> -Andi

I dont recall seeing a large number of delayed acks in
the profile (only seen one profile tho). Possibly a large
number of threads sending out packets? (Although the delayed ack
count is incremented only when the timer goes off, so
only full delayed acks count. Partial delays can also
add up when we schedule them very frequently)...

thanks,
Nivedita


2002-06-28 16:40:48

by Hurwitz Justin W.

[permalink] [raw]
Subject: Re: zero-copy networking & a performance drop

On Fri, 28 Jun 2002, Ingo Oeser wrote:

> Jupp, I think this is really true. Look at all the checking alone.
>
> Remember: We accept data from an untrusted source (network) which
> has lots of control information encoded with many of them
> being optional.
>
> -> This involves a lot of "parsing" (for binary streams,
> decoding might be better) of a complex language (TCP/IP ;-))
> with many optional elements (read: lots of branches in the
> language tree).
>
> On sending data, we have all the information trusted, because we
> checked that already, as the user sets it. With sendfile, we have
> even trusted and mapped data (because we just paged it in before).
>
> If we take this into account, rx MUST be always slower, or tx
> isn't really optimized yet.

Indeed, the above does make sense. But, what about when the receive side
is trustable, too. This is obviously not the normal case; but with
quadrics I think that it is. Since quadrics is a shared memory
architecture, with all of the processing taken care of on the cards, we
should be able to reliably trust the receive side as much as the send
side. Unless I'm misunderstanding your use of trust?

If I am right on this, how much of an overhaul would it be to implement,
for instance, a NETIF_RX_TRUSTED flag in the net_device struct to force a
receive side fast path? I don't expect this to bring the receive side even
with the transmit side, but right now we (and I have heard the same from
others) are running at ~70% of the transmit side on the receive side,
which seems to leave a good margin for improvement.

This seems like it could be useful not just for the special case of
quadrics (and other shared mamory architectures)- if, for instance, cards
begin to do more TCP processing in hardware (or we modify the firmware to
do this for us, ala AceNIC), it would be nice to bypass it in the kernel.
This is, of course, a quick idea that's just popped into my head, and
probably is inherently impossible or foolish :)

--Gus

2002-06-29 17:02:20

by Nivedita Singhvi

[permalink] [raw]
Subject: Re: zero-copy networking & a performance drop

> > On sending data, we have all the information trusted, because we
> > checked that already, as the user sets it. With sendfile, we have
> > even trusted and mapped data (because we just paged it in before).
> >
> > If we take this into account, rx MUST be always slower, or tx
> > isn't really optimized yet.
>
> Indeed, the above does make sense. But, what about when the receive side
> is trustable, too. This is obviously not the normal case; but with
> quadrics I think that it is. Since quadrics is a shared memory
> architecture, with all of the processing taken care of on the cards, we
> should be able to reliably trust the receive side as much as the send
> side. Unless I'm misunderstanding your use of trust?

Perhaps a little bit - the work the rx processing has to do
involves determining the validity of the packet as well as
demultiplexing at each protocol stage..You will always
have to at least test for a valid packet, parse options, find
the recipients, etc. If you're saying you have offloaded
the tcp/ip protocol stack on the cards, thats a whole different
issue.


> If I am right on this, how much of an overhaul would it be to implement,
> for instance, a NETIF_RX_TRUSTED flag in the net_device struct to force a
> receive side fast path? I don't expect this to bring the receive side even
> with the transmit side, but right now we (and I have heard the same from
> others) are running at ~70% of the transmit side on the receive side,
> which seems to leave a good margin for improvement.

I'm no maintainer, but I would at least say whoa! Not sure if we
have a problem, not sure where the problems are, and not sure
if there are problems in determining whether there is a problem :)..

fer'nstance:
- if the rx code path is say 1000 lines and the tx
stack length is 700 lines of code, then I'd expect
the tx time to be 70% of rx time. A difference
in the stack length is not a problem by itself,
given that they have different functions to perform?

- in reality, there is no such clean concept,
right? rx processing sends acks, interrupts happen,
and a whole bunch of profiling and lockmeter
data can end up pretty misleading.. Do you
have pure unidirectional data transfer? What
applications are you running? How are you
measuring things, where are you measuring things,
and what are the skewing factors?

- and I'm not even a benchmarking or performance person,
wait till you get done talking to them :)


> This seems like it could be useful not just for the special case of
> quadrics (and other shared mamory architectures)- if, for instance, cards
> begin to do more TCP processing in hardware (or we modify the firmware to
> do this for us, ala AceNIC), it would be nice to bypass it in the kernel.
> This is, of course, a quick idea that's just popped into my head, and
> probably is inherently impossible or foolish :)

Thats a question of is the stack modified to take advantage
of what the hardware can do, and, as I said above, thats a rather bigger
and different issue.

> --Gus

thanks,
Nivedita

2002-07-01 23:24:46

by Dax Kelson

[permalink] [raw]
Subject: Is ff:00:00:00:00:00 a broadcast frame?

In the "earlier" 2.4 kernels (those that shipped/errata with RHL 7.1
~2.4.6), an ethernet frame destined to ff:00:00:00:00:00 were not
processed and passed up the stack.

Now, with "current" 2.4 kernels (RHL 7.2 errata, and RHL 7.3
2.4.9-2.4.18), the same frame IS processed and passed up the stack.

The hardware is identical, the NICs are 3c905C.

Is this an intentional optimization/bug?

Here is an ARP request encapsulated in a "bogus" ethernet frame (look at
the layer 2 destination). A RHL 7.3 (2.4.18) system will respond (see
Frame 2 below) to this frame, an older RHL 7.1 (2.4.6) system will not
respond.

Frame 1 (42 on wire, 42 captured)
Arrival Time: Jul 1, 2002 17:16:35.565996000
Time delta from previous packet: 1.009672000 seconds
Time relative to first packet: 1.009833000 seconds
Frame Number: 3
Packet Length: 42 bytes
Capture Length: 42 bytes
Ethernet II
Destination: ff:00:00:00:00:00 (ff:00:00:00:00:00)
Source: ff:00:00:00:00:00 (ff:00:00:00:00:00)
Type: ARP (0x0806)
Address Resolution Protocol (request)
Hardware type: Ethernet (0x0001)
Protocol type: IP (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: request (0x0001)
Sender MAC address: 00:01:03:de:56:a4 (00:01:03:de:56:a4)
Sender IP address: 10.100.0.8 (10.100.0.8)
Target MAC address: 00:00:00:00:00:00 (00:00:00:00:00:00)
Target IP address: 10.100.0.10 (10.100.0.10)

Frame 2 (60 on wire, 60 captured)
Arrival Time: Jul 1, 2002 17:16:35.566146000
Time delta from previous packet: 0.000150000 seconds
Time relative to first packet: 1.009983000 seconds
Frame Number: 4
Packet Length: 60 bytes
Capture Length: 60 bytes
Ethernet II
Destination: 00:01:03:de:56:a4 (00:01:03:de:56:a4)
Source: 00:01:03:de:57:37 (00:01:03:de:57:37)
Type: ARP (0x0806)
Trailer: 00000000000000000000000000000000...
Address Resolution Protocol (reply)
Hardware type: Ethernet (0x0001)
Protocol type: IP (0x0800)
Hardware size: 6
Protocol size: 4
Opcode: reply (0x0002)
Sender MAC address: 00:01:03:de:57:37 (00:01:03:de:57:37)
Sender IP address: 10.100.0.10 (10.100.0.10)
Target MAC address: 00:01:03:de:56:a4 (00:01:03:de:56:a4)
Target IP address: 10.100.0.8 (10.100.0.8)

2002-07-02 06:45:36

by Matti Aarnio

[permalink] [raw]
Subject: Re: Is ff:00:00:00:00:00 a broadcast frame?

On Mon, Jul 01, 2002 at 05:27:11PM -0600, Dax Kelson wrote:
> In the "earlier" 2.4 kernels (those that shipped/errata with RHL 7.1
> ~2.4.6), an ethernet frame destined to ff:00:00:00:00:00 were not
> processed and passed up the stack.
>
> Now, with "current" 2.4 kernels (RHL 7.2 errata, and RHL 7.3
> 2.4.9-2.4.18), the same frame IS processed and passed up the stack.

That IEEE 802 MAC address is NOT a broadcast address.
Broadcast would be one with all bits set: ff:ff:ff:ff:ff:ff

The above mentioned MAC address is a multicast frame, because
bit 0x01 in the first octet is set. (All broadcast-frames are
also multicast.)

> The hardware is identical, the NICs are 3c905C.
>
> Is this an intentional optimization/bug?
>
> Here is an ARP request encapsulated in a "bogus" ethernet frame (look at
> the layer 2 destination). A RHL 7.3 (2.4.18) system will respond (see
> Frame 2 below) to this frame, an older RHL 7.1 (2.4.6) system will not
> respond.

Neither should respond. IPv4 ARP is defined using only broadcast
frames.

> Frame 1 (42 on wire, 42 captured)
> Arrival Time: Jul 1, 2002 17:16:35.565996000
> Time delta from previous packet: 1.009672000 seconds
> Time relative to first packet: 1.009833000 seconds
> Frame Number: 3
> Packet Length: 42 bytes
> Capture Length: 42 bytes
> Ethernet II
> Destination: ff:00:00:00:00:00 (ff:00:00:00:00:00)
> Source: ff:00:00:00:00:00 (ff:00:00:00:00:00)
> Type: ARP (0x0806)
> Address Resolution Protocol (request)
> Hardware type: Ethernet (0x0001)
> Protocol type: IP (0x0800)
> Hardware size: 6
> Protocol size: 4
> Opcode: request (0x0001)
> Sender MAC address: 00:01:03:de:56:a4 (00:01:03:de:56:a4)
> Sender IP address: 10.100.0.8 (10.100.0.8)
> Target MAC address: 00:00:00:00:00:00 (00:00:00:00:00:00)
> Target IP address: 10.100.0.10 (10.100.0.10)
>
> Frame 2 (60 on wire, 60 captured)
> Arrival Time: Jul 1, 2002 17:16:35.566146000
> Time delta from previous packet: 0.000150000 seconds
> Time relative to first packet: 1.009983000 seconds
> Frame Number: 4
> Packet Length: 60 bytes
> Capture Length: 60 bytes
> Ethernet II
> Destination: 00:01:03:de:56:a4 (00:01:03:de:56:a4)
> Source: 00:01:03:de:57:37 (00:01:03:de:57:37)
> Type: ARP (0x0806)
> Trailer: 00000000000000000000000000000000...
> Address Resolution Protocol (reply)
> Hardware type: Ethernet (0x0001)
> Protocol type: IP (0x0800)
> Hardware size: 6
> Protocol size: 4
> Opcode: reply (0x0002)
> Sender MAC address: 00:01:03:de:57:37 (00:01:03:de:57:37)
> Sender IP address: 10.100.0.10 (10.100.0.10)
> Target MAC address: 00:01:03:de:56:a4 (00:01:03:de:56:a4)
> Target IP address: 10.100.0.8 (10.100.0.8)