LinuxLists.cc - [PATCH] e1000: Reset rx ring index on receive overrun

2012-05-17 23:01:49

Subject: [PATCH] e1000: Reset rx ring index on receive overrun

At high traffic rate, the rx ring may get completely filled before we
manage to consume it. After it is filled, the kernel and device indexes
are not synchronized any more, so we have to reset them, otherwise the
kernel will be stuck waiting for the wrong slot to be filled.

Signed-off-by: Samuel Thibault <[email protected]>

---
This is just a patch suggestion, I'm not an expert in network drivers, I
leave to actual driver authors to bake a better version.

diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 37caa88..77c8dbc 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -3759,6 +3759,21 @@ static irqreturn_t e1000_intr(int irq, void *data)
if (unlikely(test_bit(__E1000_DOWN, &adapter->flags)))
return IRQ_HANDLED;

+ if (unlikely(icr & E1000_ICR_RXO)) {
+ /* Receive Overrun */
+ u32 rctl;
+ int i;
+ rctl = er32(RCTL);
+ ew32(RCTL, rctl & ~E1000_RCTL_EN);
+ for (i = 0; i < adapter->num_rx_queues; i++) {
+ memset(adapter->rx_ring[i].desc, 0, adapter->rx_ring[i].size);
+ adapter->rx_ring[i].next_to_clean = 0;
+ }
+ ew32(RDH, 0);
+ ew32(RCTL, rctl);
+ adapter->netdev->stats.rx_fifo_errors++;
+ }
+
if (unlikely(icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))) {
hw->get_link_status = 1;
/* guard against interrupt when we're going down */

2012-05-17 23:22:36

by Dave, Tushar N

[permalink] [raw]

Subject: RE: [PATCH] e1000: Reset rx ring index on receive overrun

I am interested in to see if you have actual test case and more importantly test data that shows that kernel and device indexes are not synchronized any more.

-Tushar

>-----Original Message-----
>From: [email protected] [mailto:[email protected]]
>On Behalf Of Samuel Thibault
>Sent: Thursday, May 17, 2012 4:02 PM
>To: Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W; Wyborny,
>Carolyn; Skidmore, Donald C; Rose, Gregory V; Waskiewicz Jr, Peter P;
>Duyck, Alexander H; Ronciak, John; David S. Miller; Jiri Pirko; Dean
>Nelson; [email protected]; [email protected]
>Cc: [email protected]
>Subject: [PATCH] e1000: Reset rx ring index on receive overrun
>
>At high traffic rate, the rx ring may get completely filled before we
>manage to consume it. After it is filled, the kernel and device indexes
>are not synchronized any more, so we have to reset them, otherwise the
>kernel will be stuck waiting for the wrong slot to be filled.
>
>Signed-off-by: Samuel Thibault <[email protected]>
>
>---
>This is just a patch suggestion, I'm not an expert in network drivers, I
>leave to actual driver authors to bake a better version.
>
>diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
>b/drivers/net/ethernet/intel/e1000/e1000_main.c
>index 37caa88..77c8dbc 100644
>--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
>+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
>@@ -3759,6 +3759,21 @@ static irqreturn_t e1000_intr(int irq, void *data)
> if (unlikely(test_bit(__E1000_DOWN, &adapter->flags)))
> return IRQ_HANDLED;
>
>+ if (unlikely(icr & E1000_ICR_RXO)) {
>+ /* Receive Overrun */
>+ u32 rctl;
>+ int i;
>+ rctl = er32(RCTL);
>+ ew32(RCTL, rctl & ~E1000_RCTL_EN);
>+ for (i = 0; i < adapter->num_rx_queues; i++) {
>+ memset(adapter->rx_ring[i].desc, 0, adapter-
>>rx_ring[i].size);
>+ adapter->rx_ring[i].next_to_clean = 0;
>+ }
>+ ew32(RDH, 0);
>+ ew32(RCTL, rctl);
>+ adapter->netdev->stats.rx_fifo_errors++;
>+ }
>+
> if (unlikely(icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))) {
> hw->get_link_status = 1;
> /* guard against interrupt when we're going down */
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in the
>body of a message to [email protected] More majordomo info at
>http://vger.kernel.org/majordomo-info.html

2012-05-17 23:28:26

by Samuel Thibault

[permalink] [raw]

Subject: Re: [PATCH] e1000: Reset rx ring index on receive overrun

Dave, Tushar N, le Thu 17 May 2012 23:22:31 +0000, a ?crit :
> I am interested in to see if you have actual test case and more importantly test data that shows that kernel and device indexes are not synchronized any more.

Well, it's not with an actual physical device, but with the kvm
emulation. Printing indexes from the clean_rx handler, I'm getting:

(status linux index/total size/device index)

status 7 2/256/19
status 7 3/256/19
...
status 7 18/256/19
status 7 19/256/19
status 7 20/256/19
status 7 21/256/19
...
since the status is still 7, linux continues on.

...
status 7 254/256/19
status 7 255/256/19
status 7 0/256/19
status 7 1/256/19
status 0 2/256/19

here it stops, and on next interrupts,

status 0 2/256/20

status 0 2/256/21
etc. and no progress is made any more, until

status 0 2/256/253
status 0 2/256/254

and it gets stuck there:

status 0 2/256/254
status 0 2/256/254
status 0 2/256/254

Samuel

2012-05-17 23:31:28

by Samuel Thibault

[permalink] [raw]

Subject: Re: [PATCH] e1000: Reset rx ring index on receive overrun

Samuel Thibault, le Fri 18 May 2012 01:28:21 +0200, a ?crit :
> Dave, Tushar N, le Thu 17 May 2012 23:22:31 +0000, a ?crit :
> > I am interested in to see if you have actual test case and more importantly test data that shows that kernel and device indexes are not synchronized any more.
>
> Well, it's not with an actual physical device, but with the kvm
> emulation.

BTW, it also happens easily when request_irq takes some time to
complete: since we enable E1000_TCTL_EN before that, the card can have
time to fill the ring before irqs are processed.

Samuel

2012-05-18 00:04:10

by Jesse Brandeburg

[permalink] [raw]

Subject: Re: [PATCH] e1000: Reset rx ring index on receive overrun

On Thu, 17 May 2012, Samuel Thibault wrote:

> Samuel Thibault, le Fri 18 May 2012 01:28:21 +0200, a ?crit :
> > Dave, Tushar N, le Thu 17 May 2012 23:22:31 +0000, a ?crit :
> > > I am interested in to see if you have actual test case and more
> > > importantly test data that shows that kernel and device indexes are
> > > not synchronized any more.
> >
> > Well, it's not with an actual physical device, but with the kvm
> > emulation.
>
> BTW, it also happens easily when request_irq takes some time to
> complete: since we enable E1000_TCTL_EN before that, the card can have
> time to fill the ring before irqs are processed.

I think there may well be a bug in the implementation in kvm. The
hardware doesn't have this bug.

2012-05-18 00:12:10

by Samuel Thibault

[permalink] [raw]

Subject: Re: [PATCH] e1000: Reset rx ring index on receive overrun

Brandeburg, Jesse, le Thu 17 May 2012 17:04:04 -0700, a ?crit :
> On Thu, 17 May 2012, Samuel Thibault wrote:
>
> > Samuel Thibault, le Fri 18 May 2012 01:28:21 +0200, a ?crit :
> > > Dave, Tushar N, le Thu 17 May 2012 23:22:31 +0000, a ?crit :
> > > > I am interested in to see if you have actual test case and more
> > > > importantly test data that shows that kernel and device indexes are
> > > > not synchronized any more.
> > >
> > > Well, it's not with an actual physical device, but with the kvm
> > > emulation.
> >
> > BTW, it also happens easily when request_irq takes some time to
> > complete: since we enable E1000_TCTL_EN before that, the card can have
> > time to fill the ring before irqs are processed.
>
> I think there may well be a bug in the implementation in kvm. The
> hardware doesn't have this bug.

How does it avoid filling the ring? What is the purpose of the RXO flag
if it does avoid them?

Samuel

2012-05-18 00:26:49

by Jesse Brandeburg

[permalink] [raw]

Subject: Re: [PATCH] e1000: Reset rx ring index on receive overrun

On Thu, 17 May 2012, Samuel Thibault wrote:
> Brandeburg, Jesse, le Thu 17 May 2012 17:04:04 -0700, a ?crit :
> > > BTW, it also happens easily when request_irq takes some time to
> > > complete: since we enable E1000_TCTL_EN before that, the card can have
> > > time to fill the ring before irqs are processed.
> >
> > I think there may well be a bug in the implementation in kvm. The
> > hardware doesn't have this bug.
>
> How does it avoid filling the ring? What is the purpose of the RXO flag
> if it does avoid them?

RXO is only used to let the driver know "information" that the rx fifo is
overflowing. As it turns out the flag isn't very useful and none of our
drivers currently use it for anything.

If the hardware fills all the available receive *descriptors* then the
hardware's RDH (head) register will advance all the way to the RDT (tail)
value. The tail always points one past the rx descriptors available to
hardware to use. RDH==RDT means there are no software provided
descriptors in the ring available to be used by the hardware. Our drivers
typically allow for 1-2 descriptors to be unused in the ring to help avoid
any confusion.

Hope this helps,
Jesse

2012-05-18 13:51:44

by Samuel Thibault

[permalink] [raw]

Subject: e1000 rx emulation bug (Was: [PATCH] e1000: Reset rx ring index on receive overrun)

Hello,

There seems to be a bug in qemu in the e1000 emulation, which triggers
an issue with the Linux driver.

What happens in Linux is the following:

- e1000_open
- e1000_configure
- e1000_setup_rctl
- enables E1000_RCTL_EN
- e1000_configure_rx
- sets RDT/RDH to 0
- alloc_rx_buf
- pushes buffers to the ring

with bad luck, or on high traffic of small packets, what is observed is
that between setting RDT/RDH and pushing buffers, the ring fills up in
qemu. Here is what happens there on the qemu side:

- e1000_receive
- e1000_has_rxbufs
- total_size <= s->rxbuf_size (because it's small)
return s->mac_reg[RDH] != s->mac_reg[RDT] || !s->check_rxov;

although RDH == RDT == 0, it returns 1, because since RDT/RDH have
just been set to 0, set_rdt has cleared check_rxov. e1000_receive
thus believes there is room, and proceeds with filling the ring.
Unfortunately, since no buffer was pushed, desc.buffer_addr is NULL, and
thus the do loop skips all these nul rx descriptors of the ring, but
marking each of them with E1000_RXD_STAT_DD, and eventually wrapping
around. From then on, since check_rxov has been set by the do loop,
nothing more is pushed, until the linux driver pushes buffers to the
ring. qemu can then fill some descriptors, and Linux read them, but
since the whole ring was filled with E1000_RXD_STAT_DD, Linux goes on
reading, and thus gets completely desynchronized with the device.

That raises two questions:

- what is the role of the check_rxov flag? Is hardware really allowed
to push in some cases, even when RDH==RDT? Removing it makes things
work just fine.
- BTW, when skipping a descriptor because of NULL address, does
E1000_RXD_STAT_DD have to be set?

Samuel