Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
Subject: Re: [PATCH v2] soc: qcom: rpmh-rsc: Sleep waiting for tcs slots to be
 free
To:     Doug Anderson <dianders@chromium.org>
Cc:     Stephen Boyd <sboyd@kernel.org>, Andy Gross <agross@kernel.org>,
        Bjorn Andersson <bjorn.andersson@linaro.org>,
        Stephen Boyd <swboyd@chromium.org>,
        LKML <linux-kernel@vger.kernel.org>,
        linux-arm-msm <linux-arm-msm@vger.kernel.org>,
        Maulik Shah <mkshah@codeaurora.org>,
        Lina Iyer <ilina@codeaurora.org>,
        Rajendra Nayak <rnayak@codeaurora.org>
References: <20200724211711.810009-1-sboyd@kernel.org>
 <fb6fbaa9-63d9-e747-906c-335c8be934f3@linaro.org>
 <CAD=FV=XNw45=NGm9Qj4_Kyq9oQMF6zFaaWz33dfqqDC0A-fF+Q@mail.gmail.com>
From:   Stanimir Varbanov <stanimir.varbanov@linaro.org>
Message-ID: <45552419-7fe2-200c-c277-afbb22559d0e@linaro.org>
Date:   Tue, 28 Jul 2020 21:22:12 +0300
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101
 Thunderbird/68.10.0
MIME-Version: 1.0
In-Reply-To: <CAD=FV=XNw45=NGm9Qj4_Kyq9oQMF6zFaaWz33dfqqDC0A-fF+Q@mail.gmail.com>
Content-Type: text/plain; charset=utf-8
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

Hi Doug,

On 7/28/20 5:48 PM, Doug Anderson wrote:
> Hi,
> 
> On Sun, Jul 26, 2020 at 2:44 AM Stanimir Varbanov
> <stanimir.varbanov@linaro.org> wrote:
>>
>> Hi Stephen,
>>
>> On 7/25/20 12:17 AM, Stephen Boyd wrote:
>>> From: Stephen Boyd <swboyd@chromium.org>
>>>
>>> The busy loop in rpmh_rsc_send_data() is written with the assumption
>>> that the udelay will be preempted by the tcs_tx_done() irq handler when
>>> the TCS slots are all full. This doesn't hold true when the calling
>>> thread is an irqthread and the tcs_tx_done() irq is also an irqthread.
>>> That's because kernel irqthreads are SCHED_FIFO and thus need to
>>> voluntarily give up priority by calling into the scheduler so that other
>>> threads can run.
>>>
>>> I see RCU stalls when I boot with irqthreads on the kernel commandline
>>> because the modem remoteproc driver is trying to send an rpmh async
>>> message from an irqthread that needs to give up the CPU for the rpmh
>>> irqthread to run and clear out tcs slots.
>>>
>>>  rcu: INFO: rcu_preempt self-detected stall on CPU
>>>  rcu:     0-....: (1 GPs behind) idle=402/1/0x4000000000000002 softirq=2108/2109 fqs=4920
>>>   (t=21016 jiffies g=2933 q=590)
>>>  Task dump for CPU 0:
>>>  irq/11-smp2p    R  running task        0   148      2 0x00000028
>>>  Call trace:
>>>   dump_backtrace+0x0/0x154
>>>   show_stack+0x20/0x2c
>>>   sched_show_task+0xfc/0x108
>>>   dump_cpu_task+0x44/0x50
>>>   rcu_dump_cpu_stacks+0xa4/0xf8
>>>   rcu_sched_clock_irq+0x7dc/0xaa8
>>>   update_process_times+0x30/0x54
>>>   tick_sched_handle+0x50/0x64
>>>   tick_sched_timer+0x4c/0x8c
>>>   __hrtimer_run_queues+0x21c/0x36c
>>>   hrtimer_interrupt+0xf0/0x22c
>>>   arch_timer_handler_phys+0x40/0x50
>>>   handle_percpu_devid_irq+0x114/0x25c
>>>   __handle_domain_irq+0x84/0xc4
>>>   gic_handle_irq+0xd0/0x178
>>>   el1_irq+0xbc/0x180
>>>   save_return_addr+0x18/0x28
>>>   return_address+0x54/0x88
>>>   preempt_count_sub+0x40/0x88
>>>   _raw_spin_unlock_irqrestore+0x4c/0x6c
>>>   ___ratelimit+0xd0/0x128
>>>   rpmh_rsc_send_data+0x24c/0x378
>>>   __rpmh_write+0x1b0/0x208
>>>   rpmh_write_async+0x90/0xbc
>>>   rpmhpd_send_corner+0x60/0x8c
>>>   rpmhpd_aggregate_corner+0x8c/0x124
>>>   rpmhpd_set_performance_state+0x8c/0xbc
>>>   _genpd_set_performance_state+0xdc/0x1b8
>>>   dev_pm_genpd_set_performance_state+0xb8/0xf8
>>>   q6v5_pds_disable+0x34/0x60 [qcom_q6v5_mss]
>>>   qcom_msa_handover+0x38/0x44 [qcom_q6v5_mss]
>>>   q6v5_handover_interrupt+0x24/0x3c [qcom_q6v5]
>>>   handle_nested_irq+0xd0/0x138
>>>   qcom_smp2p_intr+0x188/0x200
>>>   irq_thread_fn+0x2c/0x70
>>>   irq_thread+0xfc/0x14c
>>>   kthread+0x11c/0x12c
>>>   ret_from_fork+0x10/0x18
>>>
>>> This busy loop naturally lends itself to using a wait queue so that each
>>> thread that tries to send a message will sleep waiting on the waitqueue
>>> and only be woken up when a free slot is available. This should make
>>> things more predictable too because the scheduler will be able to sleep
>>> tasks that are waiting on a free tcs instead of the busy loop we
>>> currently have today.
>>>
>>> Cc: Douglas Anderson <dianders@chromium.org>
>>> Cc: Maulik Shah <mkshah@codeaurora.org>
>>> Cc: Lina Iyer <ilina@codeaurora.org>
>>> Signed-off-by: Stephen Boyd <swboyd@chromium.org>
>>> ---
>>>
>>> Changes in v2:
>>>  * Document tcs_wait
>>>  * Move wake_up() outside of the spinlock
>>>  * Document claim_tcs_for_req()
>>>
>>>  drivers/soc/qcom/rpmh-internal.h |   4 ++
>>>  drivers/soc/qcom/rpmh-rsc.c      | 115 +++++++++++++++----------------
>>>  2 files changed, 58 insertions(+), 61 deletions(-)
>>
>> This also fixes an issue related to TCS busy, seen with Venus driver
>> with these [1] patches applied.
>>
>> Tested-by: Stanimir Varbanov <stanimir.varbanov@linaro.org>
>>
>> --
>> regards,
>> Stan
>>
>> [1] https://lkml.org/lkml/2020/7/23/394
> 
> It worries me that you say that this fixes any issues for you.
> Specifically I don't see how this could fix anything except:
> 
> 1. Fix the problem with irqthreads, which is how Stephen originally found this.
> 
> 2. Fix things to be a little more efficient.
> 
> 3. Avoid the small handful of messages that show up during normal
> usage that look like:
> 
>   TCS Busy, retrying RPMH message send: addr=...
> 
> I'm guessing you're referring to #3.  Is that correct?

Yes, correct. After Stephen's comment on the Venus DVFS series that he
dropped the message I realized why I don't see the TCS Busy anymore.

> 
> If so, you might want to double-check to confirm that you aren't
> totally spamming the RPMh bus with your patch series.  I found that
> when I was seeing a lot of "TCS Busy, retrying RPMH message send"
> spammed to the console that it was a sign that the code was being
> really inefficient.

The runtime_suspend in venus driver will not happen too often, but
dev_pm_opp_set_rate could be called on every decoded/encoded frame (we
have so called dynamic clock scaling depending on the input bitstream).

In any case, you gave me more material to think of.

> 
> Specifically the code to add interconnect bandwidth and OPP to the SPI
> drivers would run at "runtime_suspend" and "runtime_resume".  For our
> SPI drivers this meant that they were running after every single
> transfer, and they were quite slow.  If your code is doing similar
> then you probably have a problem.
> 
> The problem was fixed by adding an autosuspend delay.
> 
> References (from linuxnext):
> 
> 8592eb959ad4 spi: spi-qcom-qspi: Set an autosuspend delay of 250 ms
> cfdab2cd85ec spi: spi-geni-qcom: Set an autosuspend delay of 250 ms
> 
> 
> Also if your code is adjusting its clock rate too frequently it can
> now trigger similar problems.  References:
> 
> 2124331411a1 spi: spi-qcom-qspi: Avoid clock setting if not needed
> 5f219524ec6e spi: spi-geni-qcom: Set the clock properly at runtime resume
> 68890e20942b spi: spi-geni-qcom: Avoid clock setting if not needed
> 
> 
> -Doug
> 

-- 
regards,
Stan