We need to ensure the hardware is powered when we transmit a packet.
But if it's not, we can't block to wait for it. So asynchronously
request power in ipa_start_xmit(), and only proceed if the return
value indicates the power state is active.
If the hardware is not active, a runtime resume request will have
been initiated. In that case, stop the network stack from further
transmit attempts until the resume completes. Return NETDEV_TX_BUSY,
to retry sending the packet once the queue is restarted.
If the power request returns an error (other than -EINPROGRESS,
which just means a resume requested elsewhere isn't complete), just
drop the packet.
Signed-off-by: Alex Elder <[email protected]>
---
drivers/net/ipa/ipa_modem.c | 30 +++++++++++++++++++++++++++++-
1 file changed, 29 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ipa/ipa_modem.c b/drivers/net/ipa/ipa_modem.c
index 0a3b034614b61..aa1b483d9f7db 100644
--- a/drivers/net/ipa/ipa_modem.c
+++ b/drivers/net/ipa/ipa_modem.c
@@ -106,6 +106,7 @@ static int ipa_start_xmit(struct sk_buff *skb, struct net_device *netdev)
struct ipa_endpoint *endpoint;
struct ipa *ipa = priv->ipa;
u32 skb_len = skb->len;
+ struct device *dev;
int ret;
if (!skb_len)
@@ -115,7 +116,31 @@ static int ipa_start_xmit(struct sk_buff *skb, struct net_device *netdev)
if (endpoint->data->qmap && skb->protocol != htons(ETH_P_MAP))
goto err_drop_skb;
+ /* The hardware must be powered for us to transmit */
+ dev = &ipa->pdev->dev;
+ ret = pm_runtime_get(dev);
+ if (ret < 1) {
+ /* If a resume won't happen, just drop the packet */
+ if (ret < 0 && ret != -EINPROGRESS) {
+ pm_runtime_put_noidle(dev);
+ goto err_drop_skb;
+ }
+
+ /* No power (yet). Stop the network stack from transmitting
+ * until we're resumed; ipa_modem_resume() arranges for the
+ * TX queue to be started again.
+ */
+ netif_stop_queue(netdev);
+
+ (void)pm_runtime_put(dev);
+
+ return NETDEV_TX_BUSY;
+ }
+
ret = ipa_endpoint_skb_tx(endpoint, skb);
+
+ (void)pm_runtime_put(dev);
+
if (ret) {
if (ret != -E2BIG)
return NETDEV_TX_BUSY;
@@ -201,7 +226,10 @@ void ipa_modem_suspend(struct net_device *netdev)
*
* Re-enable transmit on the modem network device. This is called
* in (power management) work queue context, scheduled when resuming
- * the modem.
+ * the modem. We can't enable the queue directly in ipa_modem_resume()
+ * because transmits restart the instant the queue is awakened; but the
+ * device power state won't be ACTIVE until *after* ipa_modem_resume()
+ * returns.
*/
static void ipa_modem_wake_queue_work(struct work_struct *work)
{
--
2.27.0
On Thu, 12 Aug 2021 14:50:33 -0500 Alex Elder wrote:
> + /* The hardware must be powered for us to transmit */
> + dev = &ipa->pdev->dev;
> + ret = pm_runtime_get(dev);
> + if (ret < 1) {
> + /* If a resume won't happen, just drop the packet */
> + if (ret < 0 && ret != -EINPROGRESS) {
> + pm_runtime_put_noidle(dev);
> + goto err_drop_skb;
> + }
This is racy, what if the pm work gets scheduled on another CPU and
calls wake right here (i.e. before you call netif_stop_queue())?
The queue may never get woken up?
> + /* No power (yet). Stop the network stack from transmitting
> + * until we're resumed; ipa_modem_resume() arranges for the
> + * TX queue to be started again.
> + */
> + netif_stop_queue(netdev);
> +
> + (void)pm_runtime_put(dev);
> +
> + return NETDEV_TX_BUSY;
On 8/13/21 7:46 PM, Jakub Kicinski wrote:
> On Thu, 12 Aug 2021 14:50:33 -0500 Alex Elder wrote:
>> + /* The hardware must be powered for us to transmit */
>> + dev = &ipa->pdev->dev;
>> + ret = pm_runtime_get(dev);
>> + if (ret < 1) {
>> + /* If a resume won't happen, just drop the packet */
>> + if (ret < 0 && ret != -EINPROGRESS) {
>> + pm_runtime_put_noidle(dev);
>> + goto err_drop_skb;
>> + }
>
> This is racy, what if the pm work gets scheduled on another CPU and
> calls wake right here (i.e. before you call netif_stop_queue())?
> The queue may never get woken up?
I haven't been seeing this happen but I think you may be right.
I did think about this race, but I think I was relying on the
PM work queue to somehow avoid the problem. I need to think
about this again after a good night's sleep. I might need
to add an atomic flag or something.
-Alex
>> + /* No power (yet). Stop the network stack from transmitting
>> + * until we're resumed; ipa_modem_resume() arranges for the
>> + * TX queue to be started again.
>> + */
>> + netif_stop_queue(netdev);
>> +
>> + (void)pm_runtime_put(dev);
>> +
>> + return NETDEV_TX_BUSY;
On Fri, 13 Aug 2021 21:25:23 -0500 Alex Elder wrote:
> > This is racy, what if the pm work gets scheduled on another CPU and
> > calls wake right here (i.e. before you call netif_stop_queue())?
> > The queue may never get woken up?
>
> I haven't been seeing this happen but I think you may be right.
>
> I did think about this race, but I think I was relying on the
> PM work queue to somehow avoid the problem. I need to think
> about this again after a good night's sleep. I might need
> to add an atomic flag or something.
Maybe add a spin lock? Seems like the whole wake up path will be
expensive enough for a spin lock to be in the noise. You can always
add complexity later.
On 8/16/21 9:15 AM, Jakub Kicinski wrote:
> On Fri, 13 Aug 2021 21:25:23 -0500 Alex Elder wrote:
>>> This is racy, what if the pm work gets scheduled on another CPU and
>>> calls wake right here (i.e. before you call netif_stop_queue())?
>>> The queue may never get woken up?
>>
>> I haven't been seeing this happen but I think you may be right.
>>
>> I did think about this race, but I think I was relying on the
>> PM work queue to somehow avoid the problem. I need to think
>> about this again after a good night's sleep. I might need
>> to add an atomic flag or something.
>
> Maybe add a spin lock? Seems like the whole wake up path will be
> expensive enough for a spin lock to be in the noise. You can always
> add complexity later.
Exactly what I just decided after trying to work out a
clever way without using a spinlock... I'll be sending
out a fix today. Thanks.
-Alex
On 8/16/21 9:20 AM, Alex Elder wrote:
> On 8/16/21 9:15 AM, Jakub Kicinski wrote:
>> On Fri, 13 Aug 2021 21:25:23 -0500 Alex Elder wrote:
>>>> This is racy, what if the pm work gets scheduled on another CPU and
>>>> calls wake right here (i.e. before you call netif_stop_queue())?
>>>> The queue may never get woken up?
>>>
>>> I haven't been seeing this happen but I think you may be right.
>>>
>>> I did think about this race, but I think I was relying on the
>>> PM work queue to somehow avoid the problem. I need to think
>>> about this again after a good night's sleep. I might need
>>> to add an atomic flag or something.
>>
>> Maybe add a spin lock? Seems like the whole wake up path will be
>> expensive enough for a spin lock to be in the noise. You can always
>> add complexity later.
>
> Exactly what I just decided after trying to work out a
> clever way without using a spinlock... I'll be sending
> out a fix today. Thanks.
I'm finding this isn't an easy problem to solve (or even think
about). While I ponder the best course of action I'm going
to send out another series (i.e., *before* I send a fix for
this issue) because I'd like to get everything I have out
for review this week. I *will* address this potential race
one way or another, possibly later today.
-Alex
>
> -Alex
>
On Mon, 16 Aug 2021 12:56:40 -0500 Alex Elder wrote:
> I'm finding this isn't an easy problem to solve (or even think
> about). While I ponder the best course of action I'm going
> to send out another series (i.e., *before* I send a fix for
> this issue) because I'd like to get everything I have out
> for review this week. I *will* address this potential race
> one way or another, possibly later today.
Alright, thanks for the heads up.