2007-05-01 11:47:14

by Milton Miller

[permalink] [raw]
Subject: [PATCH] e100 rx: or s and el bits

In commit d52df4a35af569071fda3f4eb08e47cc7023f094, the description
talks about emulating another driver by setting addtional bits and
the being unable to test when submitted. Seeing the & operator to
set more bits made me suspicious, and indeed the bits are defined
in positive logic:

cb_s = 0x4000,
cb_el = 0x8000,

So anding those together would be 0. I'm guessing they should
be or'd, but don't have hardware here to test, much like the
committed patch. In fact, I'll let someone else do the compile
test too. I'll update the comment.

(It looks like the end of list and s bits would not be set
in the template nor cleared when linking in recieve skbs, so
as long as the kernel can keep up with the 100Mb card we
wouldn't see the adapter go off the linked list, possibly
explaining any successful use of this patch written against
2.6.14).

Signed-off-by: Milton Miller <[email protected]>


--- linux-2.6/drivers/net/e100.c.orig 2007-05-01 05:19:17.000000000 -0500
+++ linux-2.6/drivers/net/e100.c 2007-05-01 05:22:14.000000000 -0500
@@ -947,7 +947,7 @@ static void e100_get_defaults(struct nic
((nic->mac >= mac_82558_D101_A4) ? cb_cid : cb_i));

/* Template for a freshly allocated RFD */
- nic->blank_rfd.command = cpu_to_le16(cb_el & cb_s);
+ nic->blank_rfd.command = cpu_to_le16(cb_el | cb_s);
nic->blank_rfd.rbd = 0xFFFFFFFF;
nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN);

@@ -1769,13 +1769,13 @@ static int e100_rx_alloc_skb(struct nic
}

/* Link the RFD to end of RFA by linking previous RFD to
- * this one, and clearing EL bit of previous. */
+ * this one, and clearing EL and S bit of previous. */
if(rx->prev->skb) {
struct rfd *prev_rfd = (struct rfd *)rx->prev->skb->data;
put_unaligned(cpu_to_le32(rx->dma_addr),
(u32 *)&prev_rfd->link);
wmb();
- prev_rfd->command &= ~cpu_to_le16(cb_el & cb_s);
+ prev_rfd->command &= ~cpu_to_le16(cb_el | cb_s);
pci_dma_sync_single_for_device(nic->pdev, rx->prev->dma_addr,
sizeof(struct rfd), PCI_DMA_TODEVICE);
}


2007-05-01 15:01:04

by David Acker

[permalink] [raw]
Subject: Re: [PATCH] e100 rx: or s and el bits

Milton Miller wrote:
> In commit d52df4a35af569071fda3f4eb08e47cc7023f094, the description
> talks about emulating another driver by setting addtional bits and
> the being unable to test when submitted. Seeing the & operator to
> set more bits made me suspicious, and indeed the bits are defined
> in positive logic:
>
> cb_s = 0x4000,
> cb_el = 0x8000,
>
> So anding those together would be 0. I'm guessing they should
> be or'd, but don't have hardware here to test, much like the
> committed patch. In fact, I'll let someone else do the compile
> test too. I'll update the comment.
>

I wonder if this worked for me because the hardware also spun on the
link field being NULL? Since the RU base is also set to 0, the
calculated physical address would be 0 as well. I would imagine if the
hardware tried to read/write to very low addresses across PCI, there
would be issues. I will retest with a small receive pool to try to hit
the problem.

It seems to apply to a pretty recent git pull from linus's tree. I
manually merged this into the 2.6.18.4 kernel we are using. With the
original in kernel driver (just EL bit, no S bit), I had two tests that
would always end in horrible memory corruption on a PXA255 based system.
One is a 12 hour bidirectional TCP test using iperf with the ethernet
port sending packets to a wireless card and vice versa. The other is a
similar configuration running a 12 hour UDP test sending 20
megabits/second in each direction. Even through the original S-bit
patch seems broken, we have had days of continuous traffic through it
without issue where previously we could never go more than 6 hours.

I will let folks know how it goes. In the UDP test, the ethernet side
often gets ahead of the available buffers due to CPU and PCI usage by
the wireless card and its driver.

I will also run these tests with the new patch and with a smaller
receive pool (default is 256) to make the pool run out more often.


-Ack

2007-05-01 15:21:38

by Kok, Auke

[permalink] [raw]
Subject: Re: [PATCH] e100 rx: or s and el bits

Milton Miller wrote:
> In commit d52df4a35af569071fda3f4eb08e47cc7023f094, the description
> talks about emulating another driver by setting addtional bits and
> the being unable to test when submitted. Seeing the & operator to
> set more bits made me suspicious, and indeed the bits are defined
> in positive logic:
>
> cb_s = 0x4000,
> cb_el = 0x8000,
>
> So anding those together would be 0. I'm guessing they should
> be or'd, but don't have hardware here to test, much like the
> committed patch. In fact, I'll let someone else do the compile
> test too. I'll update the comment.
>
> (It looks like the end of list and s bits would not be set
> in the template nor cleared when linking in recieve skbs, so
> as long as the kernel can keep up with the 100Mb card we
> wouldn't see the adapter go off the linked list, possibly
> explaining any successful use of this patch written against
> 2.6.14).

yes, that definately doesn't seem right. I'll try to run this and watch out for
David Ackers report as well.

Thanks,

Auke


>
> Signed-off-by: Milton Miller <[email protected]>
>
>
> --- linux-2.6/drivers/net/e100.c.orig 2007-05-01 05:19:17.000000000 -0500
> +++ linux-2.6/drivers/net/e100.c 2007-05-01 05:22:14.000000000 -0500
> @@ -947,7 +947,7 @@ static void e100_get_defaults(struct nic
> ((nic->mac >= mac_82558_D101_A4) ? cb_cid : cb_i));
>
> /* Template for a freshly allocated RFD */
> - nic->blank_rfd.command = cpu_to_le16(cb_el & cb_s);
> + nic->blank_rfd.command = cpu_to_le16(cb_el | cb_s);
> nic->blank_rfd.rbd = 0xFFFFFFFF;
> nic->blank_rfd.size = cpu_to_le16(VLAN_ETH_FRAME_LEN);
>
> @@ -1769,13 +1769,13 @@ static int e100_rx_alloc_skb(struct nic
> }
>
> /* Link the RFD to end of RFA by linking previous RFD to
> - * this one, and clearing EL bit of previous. */
> + * this one, and clearing EL and S bit of previous. */
> if(rx->prev->skb) {
> struct rfd *prev_rfd = (struct rfd *)rx->prev->skb->data;
> put_unaligned(cpu_to_le32(rx->dma_addr),
> (u32 *)&prev_rfd->link);
> wmb();
> - prev_rfd->command &= ~cpu_to_le16(cb_el & cb_s);
> + prev_rfd->command &= ~cpu_to_le16(cb_el | cb_s);
> pci_dma_sync_single_for_device(nic->pdev, rx->prev->dma_addr,
> sizeof(struct rfd), PCI_DMA_TODEVICE);
> }

2007-05-02 20:20:07

by David Acker

[permalink] [raw]
Subject: Re: [PATCH] e100 rx: or s and el bits

David Acker wrote:
> Milton Miller wrote:
>> In commit d52df4a35af569071fda3f4eb08e47cc7023f094, the description
>> talks about emulating another driver by setting addtional bits and
>> the being unable to test when submitted. Seeing the & operator to
>> set more bits made me suspicious, and indeed the bits are defined
>> in positive logic:
>>
>> cb_s = 0x4000,
>> cb_el = 0x8000,
>>
>> So anding those together would be 0. I'm guessing they should
>> be or'd, but don't have hardware here to test, much like the
>> committed patch. In fact, I'll let someone else do the compile
>> test too. I'll update the comment.
>>
>
> I wonder if this worked for me because the hardware also spun on the
> link field being NULL? Since the RU base is also set to 0, the
> calculated physical address would be 0 as well. I would imagine if the
> hardware tried to read/write to very low addresses across PCI, there
> would be issues. I will retest with a small receive pool to try to hit
> the problem.
> I will also run these tests with the new patch and with a smaller
> receive pool (default is 256) to make the pool run out more often.

So far my testing has shown both the original and the new version of the
S-bit patch work in that no corruption seemed to occur over long term
runs. The previous S-bit patch may have only worked due to something
specific about how my PCI companion chip handles I/O to low memory
addresses (from dereferencing a link address of 0). Perhaps the e100
handles the NULL link as well, but given that the manual does not seem
to state what happens when the hardware encounters a buffer with a link
of 0, I think Milton's fix is the proper way to do it. The old eepro
driver did set both bits although it did it with a hardcoded constant.

I will continue testing with slab debug on but that will take longer.
Has anyone tried this on other platforms?
-Ack

2007-05-04 21:42:27

by David Acker

[permalink] [raw]
Subject: Re: [PATCH] e100 rx: or s and el bits

David Acker wrote:
> So far my testing has shown both the original and the new version of the
> S-bit patch work in that no corruption seemed to occur over long term
> runs.

I spoke too soon. Further testing has not gone well. If I use the
default settings for CPU saver and drop the receive pool down to 16
buffers I can cause problems with various forms of the patch. With the
original S-bit patch I can get:
e100: eth0: e100_update_stats: exec cuc_dump_reset failed

At this point sends and receives are failing and their counters are not
changing. When a raw socket opened on the port tries to send it gets
errno 11, Resource temporarily unavailable.

When this error occurred, I added a reset call through the tx timeout
schedule_work(&nic->tx_timeout_task);

This made the port come back but obviously this is somewhat of a kludge
and eventually, the entire system hangs or get an oops from memory
corruption.

Apparently my earlier tests got lucky with a large receive pool size.
The only way the original patch was stable for me was by disabling CPU
saver, and setting the NAPI weight (default of 16) as large as the
default receive pool size (256) so that all buffers were claimed and
reallocated in one NAPI call. None of this should be needed so there is
definitely a problem with the original patch.

The updated patch produced a different issue. We got an RNR interrupt
indicating the receive unit got ahead of the software. The S-bit patch
removed any handling of this issue as it assumed the hardware would spin
on the sbit. Apparently if both the S-bit and the EL-bit are set on the
same RFD, it follows the EL-bit handling. Printing the stat/ack and
status bytes on the RNR interrupts I get:
status
01001000 = 0x48 = RUS of 0010 = No Resources, CUS of 01 = Suspended

stat/ack
01010000 = 0x50 = FR, RNR
or
00010000 = 0x10 = RNR

Notice that the RUS went into No Resources and not suspended. Thus
clearing the S-bit does not wake it up; it needs a new start command.
I could not find documentation that states that the S-bit need only be
cleared to take the RU out of suspended state. Before the S-bit patch
the driver tried to track this need but that version of the driver
didn't work for me either. By the way, I am using, "Intel 8255x 10/100
Mbps Ethernet Controller Family, Open Source Software Developer Manual,
January 2006" as my documentation.

This got me looking at just how in the world this worked on the old
eepro100 driver. It had another difference; it did not reap the last rx
buffer in the chain. It set a postponed bit and then picked it up on
the next interrupt after more buffers had been allocated. It then
noticed that the RU was in a suspended or no resources state and did a
softreset.

I don't believe this avoid the last buffer trick really fixes the race.
Imagine the following:
1. 4 buffers in receive pool, all freshly allocated
2. Hardware consumes 3 buffers
3. Software processes 3 buffers, begins to allocate new buffers
4. Hardware writes status bits into buffer 4 while software updates link
and command word bits in buffer 4. They share a cache line and corrupt
each other.

This appears to be possible with any of the versions of this driver I
have seen. The problem is one of packet ownership. Once the driver
gives a list of buffers to hardware, hardware owns them all. The driver
can not safely change these buffers. Sadly, this means that the idea of
the driver "staying ahead" of the hardware such that the hardware never
runs out of resources will not work here. Once the driver gives the
hardware a packet with S or EL bits set, it must let the hardware
encounter the packet and return it to software.

I think the driver needs to protect the last entry in the ring by
putting the S-bit on the entry before it. The first time the driver
allocates a block of packets, it writes a new S-bit out on the next to
last packet. As buffers complete it allocates more packets in the chain
but does not set a new S-bit since the old one will stop hardware. It
can not clear the old S-bit because the driver does not own the buffer,
hardware does. After processing the s-bit packet the hardware will
interrupt with a stat/ack of RNR and RUS of suspended.
When software processes a packet with an old S-bit it allocates new
buffers and sets the s-bit on the new next to last packet.

The above case changes now:
1. 4 buffers numbered 1-4 in a receive pool, all freshly allocated.
S-bit is on buffer 3.
2. Hardware consumes 3 buffers, hits S-bit, RNR interrupts
3. Software processes 3 buffers, begins to allocate new buffers
4. Software sends resume once buffers are allocated, S-bit is on buffer 2.
5. Hardware gets resume. When it processed buffer 3, it saved the link
to buffer 4 and thus resumes at buffer 4.


Here is a different flow where the software stays ahead:
1. 4 buffers numbered 1-4 in a receive pool, all freshly allocated.
S-bit is on buffer 3.
2. Hardware consumes 2 buffers (1, 2).
3. Software processes buffers 1, 2, begins to allocate new buffers
4. Software buffers 1, 2 are allocated
5. Hardware consumes 1 buffer (#3) and hits S-bit, RNR interrupts.
6. Software consumes 1 buffer, (#3) and finds the old S-bit. It
allocates a new buffer 3 and sets the S-bit on buffer 2.
7. Software sends resume, hardware continues at buffer 4.

In this setup, software will send a resume command every RING_SIZE
packets. RNR interrupts will also occur every RING_SIZE packets. When
hardware is faster than software, it will process RING_SIZE packets, RNR
interrupt and wait for software to process all of them. When software
is faster then hardware, hardware will still process RING_SIZE packets
before interrupting but software will only need to allocate 1 packet or
so before sending the resume so hardware will wait much less time.

This will probably slow things down since on a fast CPU, software will
normally stay ahead of the hardware and the only PCI operations from the
driver would be interrupt acks. With this change, we have PCI
operations every 256 packets. I don't see how else to do this in a safe
way on ARM (at least PXA255).

I am testing this over the weekend with a 16-buffer receive pool. If
all goes well, I will send a patch early next week. It will basically
back out the S-bit patch and then make the changes noted above.

-Ack