The premature loop termination check makes sense only in case of the
jump to read_again where the count may have been updated. But
read_again did not include the check.
Fixes: ec222003bd94 ("net: stmmac: Prepare to add Split Header support")
Signed-off-by: Jochen Henneberg <[email protected]>
---
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index e4902a7bb61e..ea51c7c93101 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -5221,10 +5221,10 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
len = 0;
}
+read_again:
if (count >= limit)
break;
-read_again:
buf1_len = 0;
buf2_len = 0;
entry = next_entry;
--
2.39.2
On Thu, Mar 16, 2023 at 08:59:39AM +0100, Jochen Henneberg wrote:
> The premature loop termination check makes sense only in case of the
> jump to read_again where the count may have been updated. But
> read_again did not include the check.
>
> Fixes: ec222003bd94 ("net: stmmac: Prepare to add Split Header support")
> Signed-off-by: Jochen Henneberg <[email protected]>
> ---
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index e4902a7bb61e..ea51c7c93101 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -5221,10 +5221,10 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
> len = 0;
> }
>
> +read_again:
> if (count >= limit)
> break;
>
> -read_again:
> buf1_len = 0;
> buf2_len = 0;
> entry = next_entry;
> --
> 2.39.2
>
LGTM, thanks.
Reviewed-by: Piotr Raczynski <[email protected]>
On Thu, 16 Mar 2023 08:59:39 +0100 Jochen Henneberg wrote:
> The premature loop termination check makes sense only in case of the
> jump to read_again where the count may have been updated. But
> read_again did not include the check.
>
> Fixes: ec222003bd94 ("net: stmmac: Prepare to add Split Header support")
> Signed-off-by: Jochen Henneberg <[email protected]>
> ---
> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> index e4902a7bb61e..ea51c7c93101 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> @@ -5221,10 +5221,10 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
> len = 0;
> }
>
> +read_again:
> if (count >= limit)
> break;
Are you sure? Can you provide more detailed analysis?
Do you observe a problem / error in real life or is this theoretical?
As far as I can tell only path which jumps to read_again after doing
count++ is via the drain_data jump, but I can't tell how it's
discarding subsequent segments in that case..
> -read_again:
> buf1_len = 0;
> buf2_len = 0;
> entry = next_entry;
Jakub Kicinski <[email protected]> writes:
> On Thu, 16 Mar 2023 08:59:39 +0100 Jochen Henneberg wrote:
>> The premature loop termination check makes sense only in case of the
>> jump to read_again where the count may have been updated. But
>> read_again did not include the check.
>>
>> Fixes: ec222003bd94 ("net: stmmac: Prepare to add Split Header support")
>> Signed-off-by: Jochen Henneberg <[email protected]>
>> ---
>> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> index e4902a7bb61e..ea51c7c93101 100644
>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
>> @@ -5221,10 +5221,10 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue)
>> len = 0;
>> }
>>
>> +read_again:
>> if (count >= limit)
>> break;
>
> Are you sure? Can you provide more detailed analysis?
> Do you observe a problem / error in real life or is this theoretical?
This is theoretical, I was hunting another bug and just stumbled over
the check which is, I think you agree, pointless right now. I did not
try to force execute that code path.
>
> As far as I can tell only path which jumps to read_again after doing
> count++ is via the drain_data jump, but I can't tell how it's
> discarding subsequent segments in that case..
>
>> -read_again:
>> buf1_len = 0;
>> buf2_len = 0;
>> entry = next_entry;
Correct. The read_again is triggered in case that the segment is not the
last segment of the frame:
if (likely(status & rx_not_ls))
goto read_again;
So in case there is no skb (queue error) it will keep increasing count
until the last segment has been found with released device DMA
ownership. So skb will not change while the goto loop is running, the
only thing that will change is that subsequent segments release device
DMA ownership. The dirty buffers are then cleaned up from
stmmac_rx_refill().
I think the driver code is really hard to read I have planned to cleanup
things later, however, this patch simply tries to prevent us from
returning a value greater than limit which could happen and would
definitely be wrong.
On Sat, 18 Mar 2023 09:38:12 +0100 Jochen Henneberg wrote:
> > Are you sure? Can you provide more detailed analysis?
> > Do you observe a problem / error in real life or is this theoretical?
>
> This is theoretical, I was hunting another bug and just stumbled over
> the check which is, I think you agree, pointless right now. I did not
> try to force execute that code path.
If you have the HW it's definitely worth doing. There is a fault
injection infra in Linus which allows to fail memory allocations.
Or you can just make a little patch to the driver to fake failing
every 1000th allocation.
> > As far as I can tell only path which jumps to read_again after doing
> > count++ is via the drain_data jump, but I can't tell how it's
> > discarding subsequent segments in that case..
> >
> >> -read_again:
> >> buf1_len = 0;
> >> buf2_len = 0;
> >> entry = next_entry;
>
> Correct. The read_again is triggered in case that the segment is not the
> last segment of the frame:
>
> if (likely(status & rx_not_ls))
> goto read_again;
>
> So in case there is no skb (queue error) it will keep increasing count
> until the last segment has been found with released device DMA
> ownership. So skb will not change while the goto loop is running, the
> only thing that will change is that subsequent segments release device
> DMA ownership. The dirty buffers are then cleaned up from
> stmmac_rx_refill().
To be clear - I'm only looking at stmmac_rx(), that ZC one is even more
confusing.
Your patch makes sense, but I think it's not enough to make this code
work in case of memory allocation failure. AFAIU the device supports
scatter - i.e. receiving a single frame in multiple chunks. Each time
thru the loop we process one (or two?) chunks. But the code uses
skb == NULL to decide whether it's the first chunk or not. So in case
of memory allocation error it will treat the second chunk as the first
(since skb will be NULL) and we'll get a malformed frame with missing
chunks sent to the stack. The driver should discard the entire frame
on failure..
> I think the driver code is really hard to read I have planned to cleanup
> things later, however, this patch simply tries to prevent us from
> returning a value greater than limit which could happen and would
> definitely be wrong.
Jakub Kicinski <[email protected]> writes:
> On Sat, 18 Mar 2023 09:38:12 +0100 Jochen Henneberg wrote:
>> > Are you sure? Can you provide more detailed analysis?
>> > Do you observe a problem / error in real life or is this theoretical?
>>
>> This is theoretical, I was hunting another bug and just stumbled over
>> the check which is, I think you agree, pointless right now. I did not
>> try to force execute that code path.
>
> If you have the HW it's definitely worth doing. There is a fault
> injection infra in Linus which allows to fail memory allocations.
> Or you can just make a little patch to the driver to fake failing
> every 1000th allocation.
>
I have the hardware available and will do the check.
>> > As far as I can tell only path which jumps to read_again after doing
>> > count++ is via the drain_data jump, but I can't tell how it's
>> > discarding subsequent segments in that case..
>> >
>> >> -read_again:
>> >> buf1_len = 0;
>> >> buf2_len = 0;
>> >> entry = next_entry;
>>
>> Correct. The read_again is triggered in case that the segment is not the
>> last segment of the frame:
>>
>> if (likely(status & rx_not_ls))
>> goto read_again;
>>
>> So in case there is no skb (queue error) it will keep increasing count
>> until the last segment has been found with released device DMA
>> ownership. So skb will not change while the goto loop is running, the
>> only thing that will change is that subsequent segments release device
>> DMA ownership. The dirty buffers are then cleaned up from
>> stmmac_rx_refill().
>
> To be clear - I'm only looking at stmmac_rx(), that ZC one is even more
> confusing.
>
> Your patch makes sense, but I think it's not enough to make this code
> work in case of memory allocation failure. AFAIU the device supports
> scatter - i.e. receiving a single frame in multiple chunks. Each time
> thru the loop we process one (or two?) chunks. But the code uses
> skb == NULL to decide whether it's the first chunk or not. So in case
> of memory allocation error it will treat the second chunk as the first
> (since skb will be NULL) and we'll get a malformed frame with missing
> chunks sent to the stack. The driver should discard the entire frame
> on failure..
>
Understood. However, this forces me to read the code and datahseet very
carefully to understand the details. I will do that, however, it will
take me some time.
For the ST and Synopsys people:
I could imagine that you would be able to fix this much faster than
I can, so if they want to work on this please let me know so I don't
waste my time on doing double work.
>> I think the driver code is really hard to read I have planned to cleanup
>> things later, however, this patch simply tries to prevent us from
>> returning a value greater than limit which could happen and would
>> definitely be wrong.
On Mon, 20 Mar 2023 10:04:54 +0100 Jochen Henneberg wrote:
> For the ST and Synopsys people:
> I could imagine that you would be able to fix this much faster than
> I can, so if they want to work on this please let me know so I don't
> waste my time on doing double work.
Don't hold your breath, we haven't heard from any of the maintainers
in 2 years :(
The drivers for CoTS IPs are really not great in general, I'm guessing
delivering solid code is both difficult for them (given customer
parametrization of each instance) and hard to fit into their business
process :(
Jakub Kicinski <[email protected]> writes:
> On Mon, 20 Mar 2023 10:04:54 +0100 Jochen Henneberg wrote:
>> For the ST and Synopsys people:
>> I could imagine that you would be able to fix this much faster than
>> I can, so if they want to work on this please let me know so I don't
>> waste my time on doing double work.
>
> Don't hold your breath, we haven't heard from any of the maintainers
> in 2 years :(
>
> The drivers for CoTS IPs are really not great in general, I'm guessing
> delivering solid code is both difficult for them (given customer
> parametrization of each instance) and hard to fit into their business
> process :(
Thanks for your response. I will try to figure out the issue and work on
them, however, be patient as I can only limited time on this.