Under memory pressure, enetc_refill_rx_ring() may fail, and when called
during the enetc_open() -> enetc_setup_rxbdr() procedure, this is not
checked for.
An extreme case of memory pressure will result in exactly zero buffers
being allocated for the RX ring, and in such a case it is expected that
hardware drops all RX packets due to lack of buffers.
This does not happen, because the reset-default value of the consumer
and produces index is 0, and this makes the ENETC think that all buffers
have been initialized and that it owns them (when in reality none were).
The hardware guide explains this best:
| Configure the receive ring producer index register RBaPIR with a value
| of 0. The producer index is initially configured by software but owned
| by hardware after the ring has been enabled. Hardware increments the
| index when a frame is received which may consume one or more BDs.
| Hardware is not allowed to increment the producer index to match the
| consumer index since it is used to indicate an empty condition. The ring
| can hold at most RBLENR[LENGTH]-1 received BDs.
|
| Configure the receive ring consumer index register RBaCIR. The
| consumer index is owned by software and updated during operation of the
| of the BD ring by software, to indicate that any receive data occupied
| in the BD has been processed and it has been prepared for new data.
| - If consumer index and producer index are initialized to the same
| value, it indicates that all BDs in the ring have been prepared and
| hardware owns all of the entries.
| - If consumer index is initialized to producer index plus N, it would
| indicate N BDs have been prepared. Note that hardware cannot start if
| only a single buffer is prepared due to the restrictions described in
| (2).
| - Software may write consumer index to match producer index anytime
| while the ring is operational to indicate all received BDs prior have
| been processed and new BDs prepared for hardware.
Normally, the value of rx_ring->rcir (consumer index) is brought in sync
with the rx_ring->next_to_use software index, but this only happens if
page allocation ever succeeded.
When PI==CI==0, the hardware appears to receive frames and write them to
DMA address 0x0 (?!), then set the READY bit in the BD.
The enetc_clean_rx_ring() function (and its XDP derivative) is naturally
not prepared to handle such a condition. It will attempt to process
those frames using the rx_swbd structure associated with index i of the
RX ring, but that structure is not fully initialized (enetc_new_page()
does all of that). So what happens next is undefined behavior.
To operate using no buffer, we must initialize the CI to PI + 1, which
will block the hardware from advancing the CI any further, and drop
everything.
The issue was seen while adding support for zero-copy AF_XDP sockets,
where buffer memory comes from user space, which can even decide to
supply no buffers at all (example: "xdpsock --txonly"). However, the bug
is present also with the network stack code, even though it would take a
very determined person to trigger a page allocation failure at the
perfect time (a series of ifup/ifdown under memory pressure should
eventually reproduce it given enough retries).
Fixes: d4fd0404c1c9 ("enetc: Introduce basic PF and VF ENETC ethernet drivers")
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Claudiu Manoil <[email protected]>
---
v2->v3:
- Jakub points out that dma_alloc_coherent() returns zeroized memory, so
v1 was completely wrong and v2 contained an unnecessary change. Remove
the changes made to enetc_dma_alloc_bdr() and rewrite the commit
message with this in mind.
v1->v2:
- also add an initial value for the RX ring consumer index, to be used
when page allocation fails
- update the commit message
- deliberately not pick up Claudiu's review tag due to the above changes
v1 at:
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
v2 at:
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
drivers/net/ethernet/freescale/enetc/enetc.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 54bc92fc6bf0..f8c06c3f9464 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -2090,7 +2090,12 @@ static void enetc_setup_rxbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring)
else
enetc_rxbdr_wr(hw, idx, ENETC_RBBSR, ENETC_RXB_DMA_SIZE);
+ /* Also prepare the consumer index in case page allocation never
+ * succeeds. In that case, hardware will never advance producer index
+ * to match consumer index, and will drop all frames.
+ */
enetc_rxbdr_wr(hw, idx, ENETC_RBPIR, 0);
+ enetc_rxbdr_wr(hw, idx, ENETC_RBCIR, 1);
/* enable Rx ints by setting pkt thr to 1 */
enetc_rxbdr_wr(hw, idx, ENETC_RBICR0, ENETC_RBICR0_ICEN | 0x1);
--
2.34.1
Hello:
This patch was applied to netdev/net.git (master)
by Jakub Kicinski <[email protected]>:
On Thu, 27 Oct 2022 21:29:25 +0300 you wrote:
> Under memory pressure, enetc_refill_rx_ring() may fail, and when called
> during the enetc_open() -> enetc_setup_rxbdr() procedure, this is not
> checked for.
>
> An extreme case of memory pressure will result in exactly zero buffers
> being allocated for the RX ring, and in such a case it is expected that
> hardware drops all RX packets due to lack of buffers.
>
> [...]
Here is the summary with links:
- [v3,net] net: enetc: survive memory pressure without crashing
https://git.kernel.org/netdev/net/c/84ce1ca3fe9e
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html