There is a race condition in e1000 driver.
It enables HW receive before RX rings initalization.
In case of specific timing this may lead to host memory corruption
due to DMA write to arbitrary memory location.
Following patch fixes this issue by reordering initialization steps.
Other Intel network drivers does not seem to have this issue.
Dmitry Fleytman (1):
RX initialization sequence fixed - enable RX after corresponding ring
initialization only
drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
drivers/net/ethernet/intel/e1000/e1000_main.c | 18 ++++++++++++++++--
2 files changed, 21 insertions(+), 6 deletions(-)
--
1.7.9.5
Reported-by: Chris Webb <[email protected]>
Reported-by: Richard Davies <[email protected]>
Signed-off-by: Dmitry Fleytman <[email protected]>
---
drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
drivers/net/ethernet/intel/e1000/e1000_main.c | 18 ++++++++++++++++--
2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index 9089d00..ebcce7a 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -1091,10 +1091,6 @@ static int e1000_setup_desc_rings(struct e1000_adapter *adapter)
ew32(RDLEN, rxdr->size);
ew32(RDH, 0);
ew32(RDT, 0);
- rctl = E1000_RCTL_EN | E1000_RCTL_BAM | E1000_RCTL_SZ_2048 |
- E1000_RCTL_LBM_NO | E1000_RCTL_RDMTS_HALF |
- (hw->mc_filter_type << E1000_RCTL_MO_SHIFT);
- ew32(RCTL, rctl);
for (i = 0; i < rxdr->count; i++) {
struct e1000_rx_desc *rx_desc = E1000_RX_DESC(*rxdr, i);
@@ -1115,6 +1111,11 @@ static int e1000_setup_desc_rings(struct e1000_adapter *adapter)
memset(skb->data, 0x00, skb->len);
}
+ rctl = E1000_RCTL_EN | E1000_RCTL_BAM | E1000_RCTL_SZ_2048 |
+ E1000_RCTL_LBM_NO | E1000_RCTL_RDMTS_HALF |
+ (hw->mc_filter_type << E1000_RCTL_MO_SHIFT);
+ ew32(RCTL, rctl);
+
return 0;
err_nomem:
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 222bfaf..01a4ad9 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -118,6 +118,7 @@ static int e1000_open(struct net_device *netdev);
static int e1000_close(struct net_device *netdev);
static void e1000_configure_tx(struct e1000_adapter *adapter);
static void e1000_configure_rx(struct e1000_adapter *adapter);
+static void e1000_enable_rx(struct e1000_adapter *adapter);
static void e1000_setup_rctl(struct e1000_adapter *adapter);
static void e1000_clean_all_tx_rings(struct e1000_adapter *adapter);
static void e1000_clean_all_rx_rings(struct e1000_adapter *adapter);
@@ -404,6 +405,7 @@ static void e1000_configure(struct e1000_adapter *adapter)
adapter->alloc_rx_buf(adapter, ring,
E1000_DESC_UNUSED(ring));
}
+ e1000_enable_rx(adapter);
}
int e1000_up(struct e1000_adapter *adapter)
@@ -1928,8 +1930,19 @@ static void e1000_configure_rx(struct e1000_adapter *adapter)
rxcsum &= ~E1000_RXCSUM_TUOFL;
ew32(RXCSUM, rxcsum);
}
+}
+
+/**
+ * e1000_enable_rx - Enable receive in HW
+ * @adapter: board private structure
+ *
+ * Inform HW that SW is ready for incoming packets indications
+ **/
- /* Enable Receives */
+static void e1000_enable_rx(struct e1000_adapter *adapter)
+{
+ struct e1000_hw *hw = &adapter->hw;
+ u32 rctl = er32(RCTL);
ew32(RCTL, rctl | E1000_RCTL_EN);
}
@@ -2196,6 +2209,7 @@ static void e1000_leave_82542_rst(struct e1000_adapter *adapter)
struct e1000_rx_ring *ring = &adapter->rx_ring[0];
e1000_configure_rx(adapter);
adapter->alloc_rx_buf(adapter, ring, E1000_DESC_UNUSED(ring));
+ e1000_enable_rx(adapter);
}
}
@@ -5010,7 +5024,7 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool *enable_wake)
rctl |= E1000_RCTL_MPE;
/* enable receives in the hardware */
- ew32(RCTL, rctl | E1000_RCTL_EN);
+ e1000_enable_rx(adapter);
if (hw->mac_type >= e1000_82540) {
ctrl = er32(CTRL);
--
1.7.9.5
On Sun, 2012-10-14 at 19:19 +0200, Dmitry Fleytman wrote:
> Reported-by: Chris Webb <[email protected]>
> Reported-by: Richard Davies <[email protected]>
>
> Signed-off-by: Dmitry Fleytman <[email protected]>
> ---
> drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
> drivers/net/ethernet/intel/e1000/e1000_main.c | 18
> ++++++++++++++++--
> 2 files changed, 21 insertions(+), 6 deletions(-)
I will add it to my queue. Thanks!
Thanks, Jeff
According to what we see this problem is very old and exists in stable
branches at least back to 2.6.x.
Since we see this problem a lot on our servers and we cannot recompile
some kernels we use, we are
highly interested in backporting of this patch to older versions to
make it merged into stable distributions.
Is there any chance you could commit it in stable series?
Can we help you with this (i.e. prepare backported patches, etc.)?
Please, advise.
Thanks in advance,
Dmitry Fleytman
On Mon, Oct 15, 2012 at 7:52 AM, Jeff Kirsher
<[email protected]> wrote:
>
> On Sun, 2012-10-14 at 19:19 +0200, Dmitry Fleytman wrote:
> > Reported-by: Chris Webb <[email protected]>
> > Reported-by: Richard Davies <[email protected]>
> >
> > Signed-off-by: Dmitry Fleytman <[email protected]>
> > ---
> > drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
> > drivers/net/ethernet/intel/e1000/e1000_main.c | 18
> > ++++++++++++++++--
> > 2 files changed, 21 insertions(+), 6 deletions(-)
>
> I will add it to my queue. Thanks!
--
Dmitry Fleytman
Technology Expert and Consultant,
Daynix Computing Ltd.
Cell: +972-54-2819481
Skype: dmitry.fleytman
On 10/14/2012 10:19 AM, Dmitry Fleytman wrote:
> There is a race condition in e1000 driver.
> It enables HW receive before RX rings initalization.
> In case of specific timing this may lead to host memory corruption
> due to DMA write to arbitrary memory location.
> Following patch fixes this issue by reordering initialization steps.
>
> Other Intel network drivers does not seem to have this issue.
>
> Dmitry Fleytman (1):
> RX initialization sequence fixed - enable RX after corresponding ring
> initialization only
>
> drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
> drivers/net/ethernet/intel/e1000/e1000_main.c | 18 ++++++++++++++++--
> 2 files changed, 21 insertions(+), 6 deletions(-)
>
What device was it you saw this issue with? The reason why I ask is
because I suspect this change should cause most of our e1000 hardware to
lock up since normally if you allocate buffers and then enable Rx it
will mean the ring was not updated and it will treat it as if there are
no buffers available.
Thanks,
Alex
Hello, Alex
Originally this bug was reported for virtual machines running on top
of QEMU/KVM.
After patch preparation I've tested it on physical e1000 card and it
worked fine.
However, it could be I've missed something, as I see now other Intel
drivers (e1000e, ixgb, etc.)
use the same sequence (RX enable and then ring allocate), so I'm
starting to suspect that this is
the correct behavior.
If you confirm this is the way HW works, the this patch should be
ignored. This is pure QEMU bug and we'll fix it there.
Thanks,
Dmitry.
On Mon, Oct 15, 2012 at 8:53 PM, Alexander Duyck
<[email protected]> wrote:
> On 10/14/2012 10:19 AM, Dmitry Fleytman wrote:
>> There is a race condition in e1000 driver.
>> It enables HW receive before RX rings initalization.
>> In case of specific timing this may lead to host memory corruption
>> due to DMA write to arbitrary memory location.
>> Following patch fixes this issue by reordering initialization steps.
>>
>> Other Intel network drivers does not seem to have this issue.
>>
>> Dmitry Fleytman (1):
>> RX initialization sequence fixed - enable RX after corresponding ring
>> initialization only
>>
>> drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
>> drivers/net/ethernet/intel/e1000/e1000_main.c | 18 ++++++++++++++++--
>> 2 files changed, 21 insertions(+), 6 deletions(-)
>>
>
> What device was it you saw this issue with? The reason why I ask is
> because I suspect this change should cause most of our e1000 hardware to
> lock up since normally if you allocate buffers and then enable Rx it
> will mean the ring was not updated and it will treat it as if there are
> no buffers available.
>
> Thanks,
>
> Alex
>
--
Dmitry Fleytman
Technology Expert and Consultant,
Daynix Computing Ltd.
Cell: +972-54-2819481
Skype: dmitry.fleytman
Hello Dmitry,
My concern is that on many of our parts the behavior is to initialize
both the head and tail to 0, enable Rx for either the ring or device
depending on the queue configuration, and then allocate buffers and bump
tail to indicate that the new buffers are present. The reason behind
enabling Rx and bumping tail is because that signals the DMA engine to
start fetching buffers. In my experience most of our hardware will
ignore the tail bump if it is done first and the Rx is enabled.
With both head and tail at the same value it should not be possible for
any of the devices to start a DMA. This is probably what you should be
checking for in fixing QEMU/KVM as it may be incorrectly assuming it can
fetch the descriptor pointed to by tail.
We have your patch in our queue and can test to verify my assumptions
are correct. If they are we will let you know and reject the patch.
Thanks,
Alex
On 10/15/2012 12:44 PM, Dmitry Fleytman wrote:
> Hello, Alex
>
> Originally this bug was reported for virtual machines running on top
> of QEMU/KVM.
> After patch preparation I've tested it on physical e1000 card and it
> worked fine.
>
> However, it could be I've missed something, as I see now other Intel
> drivers (e1000e, ixgb, etc.)
> use the same sequence (RX enable and then ring allocate), so I'm
> starting to suspect that this is
> the correct behavior.
>
> If you confirm this is the way HW works, the this patch should be
> ignored. This is pure QEMU bug and we'll fix it there.
>
> Thanks,
> Dmitry.
>
> On Mon, Oct 15, 2012 at 8:53 PM, Alexander Duyck
> <[email protected]> wrote:
>> On 10/14/2012 10:19 AM, Dmitry Fleytman wrote:
>>> There is a race condition in e1000 driver.
>>> It enables HW receive before RX rings initalization.
>>> In case of specific timing this may lead to host memory corruption
>>> due to DMA write to arbitrary memory location.
>>> Following patch fixes this issue by reordering initialization steps.
>>>
>>> Other Intel network drivers does not seem to have this issue.
>>>
>>> Dmitry Fleytman (1):
>>> RX initialization sequence fixed - enable RX after corresponding ring
>>> initialization only
>>>
>>> drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
>>> drivers/net/ethernet/intel/e1000/e1000_main.c | 18 ++++++++++++++++--
>>> 2 files changed, 21 insertions(+), 6 deletions(-)
>>>
>> What device was it you saw this issue with? The reason why I ask is
>> because I suspect this change should cause most of our e1000 hardware to
>> lock up since normally if you allocate buffers and then enable Rx it
>> will mean the ring was not updated and it will treat it as if there are
>> no buffers available.
>>
>> Thanks,
>>
>> Alex
Hello, Alex
Many thanks for clarification. I think your assumption is
correct and this is exactly what needs to be fixed in QEMU.
Is there any publicly available specification for Intel devices that
explains their operation in such a great details?
Dmitry.
On Mon, Oct 15, 2012 at 10:03 PM, Alexander Duyck
<[email protected]> wrote:
> Hello Dmitry,
>
> My concern is that on many of our parts the behavior is to initialize
> both the head and tail to 0, enable Rx for either the ring or device
> depending on the queue configuration, and then allocate buffers and bump
> tail to indicate that the new buffers are present. The reason behind
> enabling Rx and bumping tail is because that signals the DMA engine to
> start fetching buffers. In my experience most of our hardware will
> ignore the tail bump if it is done first and the Rx is enabled.
>
> With both head and tail at the same value it should not be possible for
> any of the devices to start a DMA. This is probably what you should be
> checking for in fixing QEMU/KVM as it may be incorrectly assuming it can
> fetch the descriptor pointed to by tail.
>
> We have your patch in our queue and can test to verify my assumptions
> are correct. If they are we will let you know and reject the patch.
>
> Thanks,
>
> Alex
>
>
>
> On 10/15/2012 12:44 PM, Dmitry Fleytman wrote:
>> Hello, Alex
>>
>> Originally this bug was reported for virtual machines running on top
>> of QEMU/KVM.
>> After patch preparation I've tested it on physical e1000 card and it
>> worked fine.
>>
>> However, it could be I've missed something, as I see now other Intel
>> drivers (e1000e, ixgb, etc.)
>> use the same sequence (RX enable and then ring allocate), so I'm
>> starting to suspect that this is
>> the correct behavior.
>>
>> If you confirm this is the way HW works, the this patch should be
>> ignored. This is pure QEMU bug and we'll fix it there.
>>
>> Thanks,
>> Dmitry.
>>
>> On Mon, Oct 15, 2012 at 8:53 PM, Alexander Duyck
>> <[email protected]> wrote:
>>> On 10/14/2012 10:19 AM, Dmitry Fleytman wrote:
>>>> There is a race condition in e1000 driver.
>>>> It enables HW receive before RX rings initalization.
>>>> In case of specific timing this may lead to host memory corruption
>>>> due to DMA write to arbitrary memory location.
>>>> Following patch fixes this issue by reordering initialization steps.
>>>>
>>>> Other Intel network drivers does not seem to have this issue.
>>>>
>>>> Dmitry Fleytman (1):
>>>> RX initialization sequence fixed - enable RX after corresponding ring
>>>> initialization only
>>>>
>>>> drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
>>>> drivers/net/ethernet/intel/e1000/e1000_main.c | 18 ++++++++++++++++--
>>>> 2 files changed, 21 insertions(+), 6 deletions(-)
>>>>
>>> What device was it you saw this issue with? The reason why I ask is
>>> because I suspect this change should cause most of our e1000 hardware to
>>> lock up since normally if you allocate buffers and then enable Rx it
>>> will mean the ring was not updated and it will treat it as if there are
>>> no buffers available.
>>>
>>> Thanks,
>>>
>>> Alex
>
--
Dmitry Fleytman
Technology Expert and Consultant,
Daynix Computing Ltd.
Cell: +972-54-2819481
Skype: dmitry.fleytman
The datasheets for most of the parts are available at:
http://developer.intel.com/products/ethernet/resource.htm
You just need to select one of the parts supported by e1000 and select
either the datasheet or software developers manual depending on the part.
Thanks,
Alex
On 10/15/2012 01:20 PM, Dmitry Fleytman wrote:
> Hello, Alex
>
> Many thanks for clarification. I think your assumption is
> correct and this is exactly what needs to be fixed in QEMU.
>
> Is there any publicly available specification for Intel devices that
> explains their operation in such a great details?
>
> Dmitry.
>
>
> On Mon, Oct 15, 2012 at 10:03 PM, Alexander Duyck
> <[email protected]> wrote:
>> Hello Dmitry,
>>
>> My concern is that on many of our parts the behavior is to initialize
>> both the head and tail to 0, enable Rx for either the ring or device
>> depending on the queue configuration, and then allocate buffers and bump
>> tail to indicate that the new buffers are present. The reason behind
>> enabling Rx and bumping tail is because that signals the DMA engine to
>> start fetching buffers. In my experience most of our hardware will
>> ignore the tail bump if it is done first and the Rx is enabled.
>>
>> With both head and tail at the same value it should not be possible for
>> any of the devices to start a DMA. This is probably what you should be
>> checking for in fixing QEMU/KVM as it may be incorrectly assuming it can
>> fetch the descriptor pointed to by tail.
>>
>> We have your patch in our queue and can test to verify my assumptions
>> are correct. If they are we will let you know and reject the patch.
>>
>> Thanks,
>>
>> Alex
>>
>>
>>
>> On 10/15/2012 12:44 PM, Dmitry Fleytman wrote:
>>> Hello, Alex
>>>
>>> Originally this bug was reported for virtual machines running on top
>>> of QEMU/KVM.
>>> After patch preparation I've tested it on physical e1000 card and it
>>> worked fine.
>>>
>>> However, it could be I've missed something, as I see now other Intel
>>> drivers (e1000e, ixgb, etc.)
>>> use the same sequence (RX enable and then ring allocate), so I'm
>>> starting to suspect that this is
>>> the correct behavior.
>>>
>>> If you confirm this is the way HW works, the this patch should be
>>> ignored. This is pure QEMU bug and we'll fix it there.
>>>
>>> Thanks,
>>> Dmitry.
>>>
>>> On Mon, Oct 15, 2012 at 8:53 PM, Alexander Duyck
>>> <[email protected]> wrote:
>>>> On 10/14/2012 10:19 AM, Dmitry Fleytman wrote:
>>>>> There is a race condition in e1000 driver.
>>>>> It enables HW receive before RX rings initalization.
>>>>> In case of specific timing this may lead to host memory corruption
>>>>> due to DMA write to arbitrary memory location.
>>>>> Following patch fixes this issue by reordering initialization steps.
>>>>>
>>>>> Other Intel network drivers does not seem to have this issue.
>>>>>
>>>>> Dmitry Fleytman (1):
>>>>> RX initialization sequence fixed - enable RX after corresponding ring
>>>>> initialization only
>>>>>
>>>>> drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
>>>>> drivers/net/ethernet/intel/e1000/e1000_main.c | 18 ++++++++++++++++--
>>>>> 2 files changed, 21 insertions(+), 6 deletions(-)
>>>>>
>>>> What device was it you saw this issue with? The reason why I ask is
>>>> because I suspect this change should cause most of our e1000 hardware to
>>>> lock up since normally if you allocate buffers and then enable Rx it
>>>> will mean the ring was not updated and it will treat it as if there are
>>>> no buffers available.
>>>>
>>>> Thanks,
>>>>
>>>> Alex
>
>
On 10/14/2012 07:19 PM, Dmitry Fleytman wrote:
> There is a race condition in e1000 driver.
> It enables HW receive before RX rings initalization.
> In case of specific timing this may lead to host memory corruption
> due to DMA write to arbitrary memory location.
> Following patch fixes this issue by reordering initialization steps.
>
> Other Intel network drivers does not seem to have this issue.
>
> Dmitry Fleytman (1):
> RX initialization sequence fixed - enable RX after corresponding ring
> initialization only
>
> drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
> drivers/net/ethernet/intel/e1000/e1000_main.c | 18 ++++++++++++++++--
> 2 files changed, 21 insertions(+), 6 deletions(-)
>
Would it be at all possible to copy netdev on networking-related
discussions?
Jeff Kirsher wrote:
> Dmitry Fleytman wrote:
> > Reported-by: Chris Webb <[email protected]>
> > Reported-by: Richard Davies <[email protected]>
> >
> > Signed-off-by: Dmitry Fleytman <[email protected]>
> > ---
> > drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
> > drivers/net/ethernet/intel/e1000/e1000_main.c | 18
> > ++++++++++++++++--
> > 2 files changed, 21 insertions(+), 6 deletions(-)
>
> I will add it to my queue. Thanks!
Hi Jeff,
I hope it was already clear from the following discussion - this patch
turned out to be a qemu-kvm bug and you do not need to apply it.
Dmitry - please confirm.
Richard.
Exactly. This patch is incorrect and should be ignored.
Dmitry.
On Fri, Oct 19, 2012 at 9:19 PM, Richard Davies
<[email protected]> wrote:
> Jeff Kirsher wrote:
>> Dmitry Fleytman wrote:
>> > Reported-by: Chris Webb <[email protected]>
>> > Reported-by: Richard Davies <[email protected]>
>> >
>> > Signed-off-by: Dmitry Fleytman <[email protected]>
>> > ---
>> > drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
>> > drivers/net/ethernet/intel/e1000/e1000_main.c | 18
>> > ++++++++++++++++--
>> > 2 files changed, 21 insertions(+), 6 deletions(-)
>>
>> I will add it to my queue. Thanks!
>
> Hi Jeff,
>
> I hope it was already clear from the following discussion - this patch
> turned out to be a qemu-kvm bug and you do not need to apply it.
>
> Dmitry - please confirm.
>
> Richard.
--
Dmitry Fleytman
Technology Expert and Consultant,
Daynix Computing Ltd.
Cell: +972-54-2819481
Skype: dmitry.fleytman
On Fri, 2012-10-19 at 20:19 +0100, Richard Davies wrote:
> Jeff Kirsher wrote:
> > Dmitry Fleytman wrote:
> > > Reported-by: Chris Webb <[email protected]>
> > > Reported-by: Richard Davies <[email protected]>
> > >
> > > Signed-off-by: Dmitry Fleytman <[email protected]>
> > > ---
> > > drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 9 +++++----
> > > drivers/net/ethernet/intel/e1000/e1000_main.c | 18
> > > ++++++++++++++++--
> > > 2 files changed, 21 insertions(+), 6 deletions(-)
> >
> > I will add it to my queue. Thanks!
>
> Hi Jeff,
>
> I hope it was already clear from the following discussion - this patch
> turned out to be a qemu-kvm bug and you do not need to apply it.
>
> Dmitry - please confirm.
>
> Richard.
I have dropped the patch from my queue, thanks guys!