2022-06-28 15:12:29

by Alex Xu (Hello71)

[permalink] [raw]
Subject: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

Excerpts from Paul E. McKenney's message of June 28, 2022 12:12 am:
> On Mon, Jun 27, 2022 at 09:50:53PM -0400, Alex Xu (Hello71) wrote:
>> Ah, I see. I have selected the default value for
>> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT, but that is 20 if ANDROID. I am not
>> using Android; I'm not sure there exist Android devices with AMD GPUs.
>> However, I have set CONFIG_ANDROID=y in order to use
>> ANDROID_BINDER_IPC=m for emulation.
>>
>> In general, I think CONFIG_ANDROID is not a reliable method for
>> detecting if the kernel is for an Android device; for example, Fedora
>> sets CONFIG_ANDROID, but (AFAIK) its kernel is not intended for use with
>> Android userspace.
>>
>> On the other hand, it's not clear to me why the value 20 should be for
>> Android only anyways. If, as you say in
>> https://lore.kernel.org/lkml/20220216195508.GM4285@paulmck-ThinkPad-P17-Gen-1/,
>> it is related to the size of the system, perhaps some other heuristic
>> would be more appropriate.
>
> It is related to the fact that quite a few Android guys want these
> 20-millisecond short-timeout expedited RCU CPU stall warnings, but no one
> else does. Not yet anyway.
>
> And let's face it, the intent and purpose of CONFIG_ANDROID=y is extremely
> straightforward and unmistakeable. So perhaps people not running Android
> devices but wanting a little bit of the Android functionality should do
> something other than setting CONFIG_ANDROID=y in their .config files. Me,
> I am surprised that it took this long for something like this to bite you.
>
> But just out of curiosity, what would you suggest instead?

Both Debian and Fedora set CONFIG_ANDROID, specifically for binder. If
major distro vendors are consistently making this "mistake", then
perhaps the problem is elsewhere.

In my own opinion, assuming that binderfs means Android vendor is not a
good assumption. The ANDROID help says:

> Enable support for various drivers needed on the Android platform

It doesn't say "Enable only if building an Android device", or "Enable
only if you are Google". Isn't the traditional Linux philosophy a
collection of pieces to be assembled, without gratuitous hidden
dependencies? For example, [0] removes the unnecessary Android
dependency, it doesn't block the whole thing with "depends on ANDROID".

It seems to me that the proper way to set some configuration for Android
kernels is or should be to ask the Android kernel config maintainers,
not to set it based on an upstream kernel option. There is, after all,
no CONFIG_FEDORA or CONFIG_UBUNTU or CONFIG_HANNAH_MONTANA.

WireGuard and random also use CONFIG_ANDROID in a similar "proxy" way as
rcu, there to see if suspends are "frequent". This seems dubious for the
same reasons.

I wonder if it might be time to retire CONFIG_ANDROID: the only
remaining driver covered is binder, which originates from Android but
is no longer used exclusively on Android systems. Like ufs-qcom, binder
is no longer used exclusively on Android devices; it is also used for
Android device emulators, which might be used on Android-like mobile
devices, or might not.

My understanding is that both Android and upstream kernel developers
intend to add no more Android-specific drivers, so binder should be the
only one covered for the foreseeable future.

> For that matter, why the private reply?

Mail client issues, not intentional. Lists re-added, plus Android,
WireGuard, and random.

Thanks,
Alex.

[0] https://lore.kernel.org/all/[email protected]/


2022-06-28 15:29:17

by Jason A. Donenfeld

[permalink] [raw]
Subject: Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

Hi Alex,

On Tue, Jun 28, 2022 at 11:02:40AM -0400, Alex Xu (Hello71) wrote:
> WireGuard and random also use CONFIG_ANDROID in a similar "proxy" way as
> rcu, there to see if suspends are "frequent". This seems dubious for the
> same reasons.

I'd be happy to take a patch in WireGuard and random.c to get rid of the
CONFIG_ANDROID usage, if you can conduct an analysis and conclude this
won't break anything inadvertently.

Jason

2022-06-28 19:02:18

by Paul E. McKenney

[permalink] [raw]
Subject: Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

On Tue, Jun 28, 2022 at 11:02:40AM -0400, Alex Xu (Hello71) wrote:
> Excerpts from Paul E. McKenney's message of June 28, 2022 12:12 am:
> > On Mon, Jun 27, 2022 at 09:50:53PM -0400, Alex Xu (Hello71) wrote:
> >> Ah, I see. I have selected the default value for
> >> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT, but that is 20 if ANDROID. I am not
> >> using Android; I'm not sure there exist Android devices with AMD GPUs.
> >> However, I have set CONFIG_ANDROID=y in order to use
> >> ANDROID_BINDER_IPC=m for emulation.
> >>
> >> In general, I think CONFIG_ANDROID is not a reliable method for
> >> detecting if the kernel is for an Android device; for example, Fedora
> >> sets CONFIG_ANDROID, but (AFAIK) its kernel is not intended for use with
> >> Android userspace.
> >>
> >> On the other hand, it's not clear to me why the value 20 should be for
> >> Android only anyways. If, as you say in
> >> https://lore.kernel.org/lkml/20220216195508.GM4285@paulmck-ThinkPad-P17-Gen-1/,
> >> it is related to the size of the system, perhaps some other heuristic
> >> would be more appropriate.
> >
> > It is related to the fact that quite a few Android guys want these
> > 20-millisecond short-timeout expedited RCU CPU stall warnings, but no one
> > else does. Not yet anyway.
> >
> > And let's face it, the intent and purpose of CONFIG_ANDROID=y is extremely
> > straightforward and unmistakeable. So perhaps people not running Android
> > devices but wanting a little bit of the Android functionality should do
> > something other than setting CONFIG_ANDROID=y in their .config files. Me,
> > I am surprised that it took this long for something like this to bite you.
> >
> > But just out of curiosity, what would you suggest instead?
>
> Both Debian and Fedora set CONFIG_ANDROID, specifically for binder. If
> major distro vendors are consistently making this "mistake", then
> perhaps the problem is elsewhere.
>
> In my own opinion, assuming that binderfs means Android vendor is not a
> good assumption. The ANDROID help says:
>
> > Enable support for various drivers needed on the Android platform
>
> It doesn't say "Enable only if building an Android device", or "Enable
> only if you are Google". Isn't the traditional Linux philosophy a
> collection of pieces to be assembled, without gratuitous hidden
> dependencies? For example, [0] removes the unnecessary Android
> dependency, it doesn't block the whole thing with "depends on ANDROID".
>
> It seems to me that the proper way to set some configuration for Android
> kernels is or should be to ask the Android kernel config maintainers,
> not to set it based on an upstream kernel option. There is, after all,
> no CONFIG_FEDORA or CONFIG_UBUNTU or CONFIG_HANNAH_MONTANA.
>
> WireGuard and random also use CONFIG_ANDROID in a similar "proxy" way as
> rcu, there to see if suspends are "frequent". This seems dubious for the
> same reasons.
>
> I wonder if it might be time to retire CONFIG_ANDROID: the only
> remaining driver covered is binder, which originates from Android but
> is no longer used exclusively on Android systems. Like ufs-qcom, binder
> is no longer used exclusively on Android devices; it is also used for
> Android device emulators, which might be used on Android-like mobile
> devices, or might not.
>
> My understanding is that both Android and upstream kernel developers
> intend to add no more Android-specific drivers, so binder should be the
> only one covered for the foreseeable future.

Thank you for the perspective, but you never did suggest an alternative.

So here is is what I suggest given the current setup:

config RCU_EXP_CPU_STALL_TIMEOUT
int "Expedited RCU CPU stall timeout in milliseconds"
depends on RCU_STALL_COMMON
range 0 21000
default 20 if ANDROID
default 0 if !ANDROID
help
If a given expedited RCU grace period extends more than the
specified number of milliseconds, a CPU stall warning is printed.
If the RCU grace period persists, additional CPU stall warnings
are printed at more widely spaced intervals. A value of zero
says to use the RCU_CPU_STALL_TIMEOUT value converted from
seconds to milliseconds.

The default, and only the default, is controlled by ANDROID.

All you need to do to get the previous behavior is to add something like
this to your defconfig file:

CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000

Any reason why this will not work for you?

> > For that matter, why the private reply?
>
> Mail client issues, not intentional. Lists re-added, plus Android,
> WireGuard, and random.

Thank you!

Thanx, Paul

> Thanks,
> Alex.
>
> [0] https://lore.kernel.org/all/[email protected]/

2022-06-28 20:17:00

by Alex Xu (Hello71)

[permalink] [raw]
Subject: Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> All you need to do to get the previous behavior is to add something like
> this to your defconfig file:
>
> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
>
> Any reason why this will not work for you?

As far as I know, I do not require any particular RCU debugging features
intended for developers; as an individual user and distro maintainer, I
would like to select the option corresponding to "emit errors for
unexpected conditions which should be reported upstream", not "emit
debugging information for development purposes".

Therefore, I think 0 is a suitable setting for me and most ordinary
(not tightly controlled) distributions. My concern is that other users
and distro maintainers will also have confusion about what value to set
and whether the warnings are important, since the help text does not say
anything about Android, and "make oldconfig" does not indicate that the
default value is different for Android.

My suggestion is that the default be set to 0, and if a non-zero value
is appropriate for Android, that should be communicated to the Android
developers, not made conditional on CONFIG_ANDROID.

Thanks,
Alex.

2022-06-28 20:18:29

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

> Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> > All you need to do to get the previous behavior is to add something like
> > this to your defconfig file:
> >
> > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> >
> > Any reason why this will not work for you?
>
> As far as I know, I do not require any particular RCU debugging features
> intended for developers; as an individual user and distro maintainer, I
> would like to select the option corresponding to "emit errors for
> unexpected conditions which should be reported upstream", not "emit
> debugging information for development purposes".
>
Sorry but we need to apply some assumption, i.e. to me the CONFIG_ANDROID
indicates that a kernel runs on the Android wise device. When you enable
this option on you specific box it is supposed that some Android related
code are activated also on your device which may lead to some side effect.

>
> Therefore, I think 0 is a suitable setting for me and most ordinary
> (not tightly controlled) distributions. My concern is that other users
> and distro maintainers will also have confusion about what value to set
> and whether the warnings are important, since the help text does not say
> anything about Android, and "make oldconfig" does not indicate that the
> default value is different for Android.
>
<snip>
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 9b64e55d4f61..ced0d1f7c675 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -94,7 +94,8 @@ config RCU_EXP_CPU_STALL_TIMEOUT
If the RCU grace period persists, additional CPU stall warnings
are printed at more widely spaced intervals. A value of zero
says to use the RCU_CPU_STALL_TIMEOUT value converted from
- seconds to milliseconds.
+ seconds to milliseconds. If CONFIG_ANDROID is set for non-Android
+ platform and you unsure, set the RCU_EXP_CPU_STALL_TIMEOUT to zero.

config RCU_TRACE
bool "Enable tracing for RCU"
<snip>

Will it work for you?

--
Uladzislau Rezki

2022-07-04 11:39:12

by Christian König

[permalink] [raw]
Subject: Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

Hi guys,

Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
>> Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
>>> All you need to do to get the previous behavior is to add something like
>>> this to your defconfig file:
>>>
>>> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
>>>
>>> Any reason why this will not work for you?

sorry for jumping in so later, I was on vacation for a week.

Well when any RCU period is longer than 20ms and amdgpu in the backtrace
my educated guess is that we messed up some timeout waiting for the hw.

We usually do wait a few us, but it can be that somebody is waiting for
ms instead.

So there are some todos here as far as I can see and It would be helpful
to get a cleaner backtrace if possible.

Regards,
Christian.

2022-07-06 18:08:25

by Paul E. McKenney

[permalink] [raw]
Subject: Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote:
> Hello.
>
> On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian K?nig wrote:
> > Hi guys,
> >
> > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
> > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> > > > > All you need to do to get the previous behavior is to add something like
> > > > > this to your defconfig file:
> > > > >
> > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> > > > >
> > > > > Any reason why this will not work for you?
> >
> > sorry for jumping in so later, I was on vacation for a week.
> >
> > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
> > educated guess is that we messed up some timeout waiting for the hw.
> >
> > We usually do wait a few us, but it can be that somebody is waiting for ms
> > instead.
> >
> > So there are some todos here as far as I can see and It would be helpful to
> > get a cleaner backtrace if possible.
> >
> Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
> will not have any dependencies on the CONFIG_ANDROID anymore:
>
> https://lkml.org/lkml/2022/6/29/756

But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you
wish. Setting this option to 20 will get you the behavior previously
obtained by setting the now-defunct ANDROID Kconfig option.

Thanx, Paul

2022-07-06 18:21:45

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

Hello.

On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote:
> Hi guys,
>
> Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
> > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> > > > All you need to do to get the previous behavior is to add something like
> > > > this to your defconfig file:
> > > >
> > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> > > >
> > > > Any reason why this will not work for you?
>
> sorry for jumping in so later, I was on vacation for a week.
>
> Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
> educated guess is that we messed up some timeout waiting for the hw.
>
> We usually do wait a few us, but it can be that somebody is waiting for ms
> instead.
>
> So there are some todos here as far as I can see and It would be helpful to
> get a cleaner backtrace if possible.
>
Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
will not have any dependencies on the CONFIG_ANDROID anymore:

https://lkml.org/lkml/2022/6/29/756

--
Uladzislau Rezki

2022-07-06 18:38:15

by Uladzislau Rezki

[permalink] [raw]
Subject: Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote:
> On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote:
> > Hello.
> >
> > On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote:
> > > Hi guys,
> > >
> > > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
> > > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> > > > > > All you need to do to get the previous behavior is to add something like
> > > > > > this to your defconfig file:
> > > > > >
> > > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> > > > > >
> > > > > > Any reason why this will not work for you?
> > >
> > > sorry for jumping in so later, I was on vacation for a week.
> > >
> > > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
> > > educated guess is that we messed up some timeout waiting for the hw.
> > >
> > > We usually do wait a few us, but it can be that somebody is waiting for ms
> > > instead.
> > >
> > > So there are some todos here as far as I can see and It would be helpful to
> > > get a cleaner backtrace if possible.
> > >
> > Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
> > will not have any dependencies on the CONFIG_ANDROID anymore:
> >
> > https://lkml.org/lkml/2022/6/29/756
>
> But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you
> wish. Setting this option to 20 will get you the behavior previously
> obtained by setting the now-defunct ANDROID Kconfig option.
>
Right. Or over boot parameter. So for us it is not a big issue :)

--
Uladzislau Rezki

2022-07-06 21:33:36

by Paul E. McKenney

[permalink] [raw]
Subject: Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

On Wed, Jul 06, 2022 at 08:09:49PM +0200, Uladzislau Rezki wrote:
> On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote:
> > On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote:
> > > Hello.
> > >
> > > On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian K?nig wrote:
> > > > Hi guys,
> > > >
> > > > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
> > > > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> > > > > > > All you need to do to get the previous behavior is to add something like
> > > > > > > this to your defconfig file:
> > > > > > >
> > > > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> > > > > > >
> > > > > > > Any reason why this will not work for you?
> > > >
> > > > sorry for jumping in so later, I was on vacation for a week.
> > > >
> > > > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
> > > > educated guess is that we messed up some timeout waiting for the hw.
> > > >
> > > > We usually do wait a few us, but it can be that somebody is waiting for ms
> > > > instead.
> > > >
> > > > So there are some todos here as far as I can see and It would be helpful to
> > > > get a cleaner backtrace if possible.
> > > >
> > > Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
> > > will not have any dependencies on the CONFIG_ANDROID anymore:
> > >
> > > https://lkml.org/lkml/2022/6/29/756
> >
> > But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you
> > wish. Setting this option to 20 will get you the behavior previously
> > obtained by setting the now-defunct ANDROID Kconfig option.
> >
> Right. Or over boot parameter. So for us it is not a big issue :)

Specifically rcupdate.rcu_exp_cpu_stall_timeout, for those just now
tuning in. ;-)

Thanx, Paul

2022-07-07 07:35:29

by Christian König

[permalink] [raw]
Subject: Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

Am 06.07.22 um 22:42 schrieb Paul E. McKenney:
> On Wed, Jul 06, 2022 at 08:09:49PM +0200, Uladzislau Rezki wrote:
>> On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote:
>>> On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote:
>>>> Hello.
>>>>
>>>> On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian König wrote:
>>>>> Hi guys,
>>>>>
>>>>> Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
>>>>>>> Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
>>>>>>>> All you need to do to get the previous behavior is to add something like
>>>>>>>> this to your defconfig file:
>>>>>>>>
>>>>>>>> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
>>>>>>>>
>>>>>>>> Any reason why this will not work for you?
>>>>> sorry for jumping in so later, I was on vacation for a week.
>>>>>
>>>>> Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
>>>>> educated guess is that we messed up some timeout waiting for the hw.
>>>>>
>>>>> We usually do wait a few us, but it can be that somebody is waiting for ms
>>>>> instead.
>>>>>
>>>>> So there are some todos here as far as I can see and It would be helpful to
>>>>> get a cleaner backtrace if possible.
>>>>>
>>>> Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
>>>> will not have any dependencies on the CONFIG_ANDROID anymore:
>>>>
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F6%2F29%2F756&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C8b36bcb4fe61475c0eb708da5f8ffce8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637927369274030797%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=eaK66spsbWVi2uRhcFK7eu4usgkHFZCSvErZxB%2F2npM%3D&amp;reserved=0
>>> But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you
>>> wish. Setting this option to 20 will get you the behavior previously
>>> obtained by setting the now-defunct ANDROID Kconfig option.
>>>
>> Right. Or over boot parameter. So for us it is not a big issue :)
> Specifically rcupdate.rcu_exp_cpu_stall_timeout, for those just now
> tuning in. ;-)

I was just about to write a response asking for that :)

Thanks, I will suggest to our QA to add this parameter while doing some
tests.

Regards,
Christian.

>
> Thanx, Paul

2022-07-07 13:59:10

by Paul E. McKenney

[permalink] [raw]
Subject: Re: CONFIG_ANDROID (was: rcu_sched detected expedited stalls in amdgpu after suspend)

On Thu, Jul 07, 2022 at 09:30:39AM +0200, Christian K?nig wrote:
> Am 06.07.22 um 22:42 schrieb Paul E. McKenney:
> > On Wed, Jul 06, 2022 at 08:09:49PM +0200, Uladzislau Rezki wrote:
> > > On Wed, Jul 06, 2022 at 10:58:36AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Jul 06, 2022 at 07:48:20PM +0200, Uladzislau Rezki wrote:
> > > > > Hello.
> > > > >
> > > > > On Mon, Jul 04, 2022 at 01:30:50PM +0200, Christian K?nig wrote:
> > > > > > Hi guys,
> > > > > >
> > > > > > Am 28.06.22 um 22:11 schrieb Uladzislau Rezki:
> > > > > > > > Excerpts from Paul E. McKenney's message of June 28, 2022 2:54 pm:
> > > > > > > > > All you need to do to get the previous behavior is to add something like
> > > > > > > > > this to your defconfig file:
> > > > > > > > >
> > > > > > > > > CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=21000
> > > > > > > > >
> > > > > > > > > Any reason why this will not work for you?
> > > > > > sorry for jumping in so later, I was on vacation for a week.
> > > > > >
> > > > > > Well when any RCU period is longer than 20ms and amdgpu in the backtrace my
> > > > > > educated guess is that we messed up some timeout waiting for the hw.
> > > > > >
> > > > > > We usually do wait a few us, but it can be that somebody is waiting for ms
> > > > > > instead.
> > > > > >
> > > > > > So there are some todos here as far as I can see and It would be helpful to
> > > > > > get a cleaner backtrace if possible.
> > > > > >
> > > > > Actually CONFIG_ANDROID looks like is going to be removed, so the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
> > > > > will not have any dependencies on the CONFIG_ANDROID anymore:
> > > > >
> > > > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F6%2F29%2F756&amp;data=05%7C01%7Cchristian.koenig%40amd.com%7C8b36bcb4fe61475c0eb708da5f8ffce8%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637927369274030797%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=eaK66spsbWVi2uRhcFK7eu4usgkHFZCSvErZxB%2F2npM%3D&amp;reserved=0
> > > > But you can set the RCU_EXP_CPU_STALL_TIMEOUT Kconfig option, if you
> > > > wish. Setting this option to 20 will get you the behavior previously
> > > > obtained by setting the now-defunct ANDROID Kconfig option.
> > > >
> > > Right. Or over boot parameter. So for us it is not a big issue :)
> > Specifically rcupdate.rcu_exp_cpu_stall_timeout, for those just now
> > tuning in. ;-)
>
> I was just about to write a response asking for that :)
>
> Thanks, I will suggest to our QA to add this parameter while doing some
> tests.

Very good! Please let me know how it goes.

Thanx, Paul