2023-07-06 12:09:36

by Souradeep Chakrabarti

[permalink] [raw]
Subject: RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive



>-----Original Message-----
>From: Alexander Lobakin <[email protected]>
>Sent: Thursday, July 6, 2023 5:09 PM
>To: Souradeep Chakrabarti <[email protected]>; souradeep
>chakrabarti <[email protected]>
>Cc: KY Srinivasan <[email protected]>; Haiyang Zhang
><[email protected]>; [email protected]; Dexuan Cui
><[email protected]>; [email protected]; [email protected];
>[email protected]; [email protected]; Long Li <[email protected]>; Ajay
>Sharma <[email protected]>; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected]; [email protected];
>[email protected]; [email protected];
>[email protected]
>Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
>when host is unresponsive
>
>From: Souradeep Chakrabarti <[email protected]>
>Date: Thu, 6 Jul 2023 10:41:03 +0000
>
>>
>>
>>> -----Original Message-----
>>> From: Alexander Lobakin <[email protected]>
>>> Sent: Wednesday, July 5, 2023 8:06 PM
>
>[...]
>
>>>>> 120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
>>>>> iterations. I know usleep_range() often is much less precise, but still.
>>>>> Do you really need that much time? Has this been measured during
>>>>> the tests that it can take up to 120 seconds or is it just some
>>>>> random value that "should be enough"?
>>>>> If you really need 120 seconds, I'd suggest using a timer / delayed
>>>>> work instead of wasting resources.
>>>> Here the intent is not waiting for 120 seconds, rather than avoid
>>>> continue checking the pending_sends of each tx queues for an
>>>> indefinite time,
>>> before freeing sk_buffs.
>>>> The pending_sends can only get decreased for a tx queue, if
>>>> mana_poll_tx_cq() gets called for a completion notification and
>>>> increased by
>>> xmit.
>>>>
>>>> In this particular bug, apc->port_is_up is not set to false, causing
>>>> xmit to keep increasing the pending_sends for the queue and
>>>> mana_poll_tx_cq() not getting called for the queue.
>>>>
>>>> If we see the comment in the function mana_dealloc_queues(), it mentions
>it :
>>>>
>>>> 2346 /* No packet can be transmitted now since apc->port_is_up is false.
>>>> 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable
>>>> 2348 * a txq because it may not timely see apc->port_is_up being cleared
>>>> 2349 * to false, but it doesn't matter since mana_start_xmit() drops any
>>>> 2350 * new packets due to apc->port_is_up being false.
>>>>
>>>> The value 120 seconds has been decided here based on maximum number
>>>> of queues
>>>
>>> This is quite opposite to what you're saying above. How should I
>>> connect these
>>> two:
>>>
>>> Here the intent is not waiting for 120 seconds
>>>
>>> +
>>>
>>> The value 120 seconds has been decided here based on maximum number
>>> of queues
>>>
>>> ?
>>> Can cleaning the Tx queues really last for 120 seconds?
>>> My understanding is that timeouts need to be sensible and not go to
>>> the outer space. What is the medium value you got during the tests?
>>>
>> For each queue each takes few milli second, in a normal condition. So
>> based on maximum number of allowed queues for our h/w it won't go
>> beyond a sec.
>> The 120s only happens rarely during some NIC HW issue -unexpected.
>> So this timeout will only trigger in a very rare scenario.
>
>So set the timeout to 2 seconds if it makes no difference?
It can go near 120 seconds in a very rare MANA h/w scenario. That normally won't happen.
But during that scenario, we may need 120 seconds.
>
>>>> are allowed in this specific hardware, it is a safe assumption.
>>>>>
>>>>>>
>>>>>> + for (i = 0; i < apc->num_queues; i++) {
>>>>>> + txq = &apc->tx_qp[i].txq;
>>>>>> + cq = &apc->tx_qp[i].tx_cq;
>[...]
>
>Thanks,
>Olek


2023-07-06 12:31:35

by Alexander Lobakin

[permalink] [raw]
Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive

From: Souradeep Chakrabarti <[email protected]>
Date: Thu, 6 Jul 2023 11:43:58 +0000

>
>
>> -----Original Message-----
>> From: Alexander Lobakin <[email protected]>
>> Sent: Thursday, July 6, 2023 5:09 PM
>> To: Souradeep Chakrabarti <[email protected]>; souradeep
>> chakrabarti <[email protected]>
>> Cc: KY Srinivasan <[email protected]>; Haiyang Zhang
>> <[email protected]>; [email protected]; Dexuan Cui
>> <[email protected]>; [email protected]; [email protected];
>> [email protected]; [email protected]; Long Li <[email protected]>; Ajay
>> Sharma <[email protected]>; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]
>> Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
>> when host is unresponsive
>>
>> From: Souradeep Chakrabarti <[email protected]>
>> Date: Thu, 6 Jul 2023 10:41:03 +0000
>>
>>>
>>>
>>>> -----Original Message-----
>>>> From: Alexander Lobakin <[email protected]>
>>>> Sent: Wednesday, July 5, 2023 8:06 PM
>>
>> [...]
>>
>>>>>> 120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
>>>>>> iterations. I know usleep_range() often is much less precise, but still.
>>>>>> Do you really need that much time? Has this been measured during
>>>>>> the tests that it can take up to 120 seconds or is it just some
>>>>>> random value that "should be enough"?
>>>>>> If you really need 120 seconds, I'd suggest using a timer / delayed
>>>>>> work instead of wasting resources.
>>>>> Here the intent is not waiting for 120 seconds, rather than avoid
>>>>> continue checking the pending_sends of each tx queues for an
>>>>> indefinite time,
>>>> before freeing sk_buffs.
>>>>> The pending_sends can only get decreased for a tx queue, if
>>>>> mana_poll_tx_cq() gets called for a completion notification and
>>>>> increased by
>>>> xmit.
>>>>>
>>>>> In this particular bug, apc->port_is_up is not set to false, causing
>>>>> xmit to keep increasing the pending_sends for the queue and
>>>>> mana_poll_tx_cq() not getting called for the queue.
>>>>>
>>>>> If we see the comment in the function mana_dealloc_queues(), it mentions
>> it :
>>>>>
>>>>> 2346 /* No packet can be transmitted now since apc->port_is_up is false.
>>>>> 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-enable
>>>>> 2348 * a txq because it may not timely see apc->port_is_up being cleared
>>>>> 2349 * to false, but it doesn't matter since mana_start_xmit() drops any
>>>>> 2350 * new packets due to apc->port_is_up being false.
>>>>>
>>>>> The value 120 seconds has been decided here based on maximum number
>>>>> of queues
>>>>
>>>> This is quite opposite to what you're saying above. How should I
>>>> connect these
>>>> two:
>>>>
>>>> Here the intent is not waiting for 120 seconds
>>>>
>>>> +
>>>>
>>>> The value 120 seconds has been decided here based on maximum number
>>>> of queues
>>>>
>>>> ?
>>>> Can cleaning the Tx queues really last for 120 seconds?
>>>> My understanding is that timeouts need to be sensible and not go to
>>>> the outer space. What is the medium value you got during the tests?
>>>>
>>> For each queue each takes few milli second, in a normal condition. So
>>> based on maximum number of allowed queues for our h/w it won't go
>>> beyond a sec.
>>> The 120s only happens rarely during some NIC HW issue -unexpected.
>>> So this timeout will only trigger in a very rare scenario.
>>
>> So set the timeout to 2 seconds if it makes no difference?
> It can go near 120 seconds in a very rare MANA h/w scenario. That normally won't happen.
> But during that scenario, we may need 120 seconds.

This waiting loop is needed to let the pending Tx packets be sent. If
they weren't sent in 1 second, it most likely makes no sense already
whether they will be sent at all or not -- the destination host won't
wait for them for so long.
You say that it may happen only in case of HW issue. If so, I assume you
need to fix it some way, e.g. do a HW reset or so? If so, why bother
waiting for Tx completions if Tx is hung? You free all skbs later either
way, so there are no leaks.

>>
>>>>> are allowed in this specific hardware, it is a safe assumption.
>>>>>>
>>>>>>>
>>>>>>> + for (i = 0; i < apc->num_queues; i++) {
>>>>>>> + txq = &apc->tx_qp[i].txq;
>>>>>>> + cq = &apc->tx_qp[i].tx_cq;
>> [...]
>>
>> Thanks,
>> Olek

Thanks,
Olek

2023-07-06 14:43:39

by Haiyang Zhang

[permalink] [raw]
Subject: RE: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive



> -----Original Message-----
> From: Alexander Lobakin <[email protected]>
> Sent: Thursday, July 6, 2023 7:49 AM
> To: Souradeep Chakrabarti <[email protected]>; souradeep
> chakrabarti <[email protected]>
> Cc: KY Srinivasan <[email protected]>; Haiyang Zhang
> <[email protected]>; [email protected]; Dexuan Cui
> <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; Long Li <[email protected]>; Ajay
> Sharma <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]
> Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
> when host is unresponsive
>
> From: Souradeep Chakrabarti <[email protected]>
> Date: Thu, 6 Jul 2023 11:43:58 +0000
>
> >
> >
> >> -----Original Message-----
> >> From: Alexander Lobakin <[email protected]>
> >> Sent: Thursday, July 6, 2023 5:09 PM
> >> To: Souradeep Chakrabarti <[email protected]>; souradeep
> >> chakrabarti <[email protected]>
> >> Cc: KY Srinivasan <[email protected]>; Haiyang Zhang
> >> <[email protected]>; [email protected]; Dexuan Cui
> >> <[email protected]>; [email protected]; [email protected];
> >> [email protected]; [email protected]; Long Li <[email protected]>;
> Ajay
> >> Sharma <[email protected]>; [email protected];
> >> [email protected]; [email protected];
> [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]
> >> Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload
> >> when host is unresponsive
> >>
> >> From: Souradeep Chakrabarti <[email protected]>
> >> Date: Thu, 6 Jul 2023 10:41:03 +0000
> >>
> >>>
> >>>
> >>>> -----Original Message-----
> >>>> From: Alexander Lobakin <[email protected]>
> >>>> Sent: Wednesday, July 5, 2023 8:06 PM
> >>
> >> [...]
> >>
> >>>>>> 120 seconds by 2 msec step is 60000 iterations, by 1 msec is 120000
> >>>>>> iterations. I know usleep_range() often is much less precise, but still.
> >>>>>> Do you really need that much time? Has this been measured during
> >>>>>> the tests that it can take up to 120 seconds or is it just some
> >>>>>> random value that "should be enough"?
> >>>>>> If you really need 120 seconds, I'd suggest using a timer / delayed
> >>>>>> work instead of wasting resources.
> >>>>> Here the intent is not waiting for 120 seconds, rather than avoid
> >>>>> continue checking the pending_sends of each tx queues for an
> >>>>> indefinite time,
> >>>> before freeing sk_buffs.
> >>>>> The pending_sends can only get decreased for a tx queue, if
> >>>>> mana_poll_tx_cq() gets called for a completion notification and
> >>>>> increased by
> >>>> xmit.
> >>>>>
> >>>>> In this particular bug, apc->port_is_up is not set to false, causing
> >>>>> xmit to keep increasing the pending_sends for the queue and
> >>>>> mana_poll_tx_cq() not getting called for the queue.
> >>>>>
> >>>>> If we see the comment in the function mana_dealloc_queues(), it
> mentions
> >> it :
> >>>>>
> >>>>> 2346 /* No packet can be transmitted now since apc->port_is_up is
> false.
> >>>>> 2347 * There is still a tiny chance that mana_poll_tx_cq() can re-
> enable
> >>>>> 2348 * a txq because it may not timely see apc->port_is_up being
> cleared
> >>>>> 2349 * to false, but it doesn't matter since mana_start_xmit() drops
> any
> >>>>> 2350 * new packets due to apc->port_is_up being false.
> >>>>>
> >>>>> The value 120 seconds has been decided here based on maximum
> number
> >>>>> of queues
> >>>>
> >>>> This is quite opposite to what you're saying above. How should I
> >>>> connect these
> >>>> two:
> >>>>
> >>>> Here the intent is not waiting for 120 seconds
> >>>>
> >>>> +
> >>>>
> >>>> The value 120 seconds has been decided here based on maximum number
> >>>> of queues
> >>>>
> >>>> ?
> >>>> Can cleaning the Tx queues really last for 120 seconds?
> >>>> My understanding is that timeouts need to be sensible and not go to
> >>>> the outer space. What is the medium value you got during the tests?
> >>>>
> >>> For each queue each takes few milli second, in a normal condition. So
> >>> based on maximum number of allowed queues for our h/w it won't go
> >>> beyond a sec.
> >>> The 120s only happens rarely during some NIC HW issue -unexpected.
> >>> So this timeout will only trigger in a very rare scenario.
> >>
> >> So set the timeout to 2 seconds if it makes no difference?
> > It can go near 120 seconds in a very rare MANA h/w scenario. That normally
> won't happen.
> > But during that scenario, we may need 120 seconds.
>
> This waiting loop is needed to let the pending Tx packets be sent. If
> they weren't sent in 1 second, it most likely makes no sense already
> whether they will be sent at all or not -- the destination host won't
> wait for them for so long.
> You say that it may happen only in case of HW issue. If so, I assume you
> need to fix it some way, e.g. do a HW reset or so? If so, why bother
> waiting for Tx completions if Tx is hung? You free all skbs later either
> way, so there are no leaks.

At that point, we don't actually care if the pending packets are sent or not.
But if we free the queues too soon, and the HW is slow for unexpected
reasons, a delayed completion notice will DMA into the freed memory and
cause corruption. That's why we have a longer waiting time.

Souradeep, you may double check with hostnet team to see what's the
best waiting time to ensure no more HW activities.

Thanks,
- Haiyang

2023-07-10 18:39:35

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [EXTERNAL] Re: [PATCH V4 net] net: mana: Fix MANA VF unload when host is unresponsive

On Thu, Jul 06, 2023 at 01:54:35PM +0000, Haiyang Zhang wrote:

> > This waiting loop is needed to let the pending Tx packets be sent. If
> > they weren't sent in 1 second, it most likely makes no sense already
> > whether they will be sent at all or not -- the destination host won't
> > wait for them for so long.
> > You say that it may happen only in case of HW issue. If so, I assume you
> > need to fix it some way, e.g. do a HW reset or so? If so, why bother
> > waiting for Tx completions if Tx is hung? You free all skbs later either
> > way, so there are no leaks.
>
> At that point, we don't actually care if the pending packets are sent or not.
> But if we free the queues too soon, and the HW is slow for unexpected
> reasons, a delayed completion notice will DMA into the freed memory and
> cause corruption. That's why we have a longer waiting time.

Aieiiie that is a horrible HW design to not have a strong fence of DMA.

"just wait and hope the HW doesn't UAF the kernel with DMA" is really
awful.

Jason