On 20. 11. 2022. 18:50, Paul E. McKenney wrote:
> On Sun, Nov 20, 2022 at 04:49:37PM +0100, Mirsad Goran Todorovac wrote:
>> On 18. 11. 2022. 16:55, Paul E. McKenney wrote:
>>> On Fri, Nov 18, 2022 at 06:11:16AM +0000, Phillip Lougher wrote:
>>>> On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
>>>>> Hi,
>>>>>
>>>>> While trying to bisect, I've found another bug that predated the
>>>>> introduction of squashfs_readahead(), but it has
>>>>> a common denominator in squashfs_decompress() and squashfs_xz_uncompress().
>>>>
>>>> Wrong, the stall is happening in the XZ decompressor library, which
>>>> is *not* in Squashfs.
>>>>
>>>> This reported stall in the decompressor code is likely a symptom of you
>>>> deliberately thrashing your system. When the system thrashes everything
>>>> starts to happen very slowly, and the system will spend a lot of
>>>> its time doing page I/O, and the CPU will spend a lot of time in
>>>> any CPU intensive code like the XZ decompressor library.
>>>>
>>>> So the fact the stall is being hit here is a symptom and not
>>>> a cause. The decompressor code is likely running slowly due to
>>>> thrashing and waiting on paged-out buffers. This is not indicative
>>>> of any bug, merely a system running slowly due to overload.
>>>>
>>>> As I said, this is not a Squashfs issue, because the code when the
>>>> stall takes place isn't in Squashfs.
>>>>
>>>> The people responsible for the rcu code should have a lot more insight
>>>> about what happens when the system is thrashing, and how this will
>>>> throw up false positives. In this I believe this is an instance of
>>>> perfectly correct code running slowly due to thrashing incorrectly
>>>> being flagged as looping.
>>>>
>>>> CC'ing Paul E. McKenney <[email protected]>
>>>>
>>>>> Frankly, I need an advice on how to handle a situation like this.
>>>>>
>>>>> Obviously, the best place to test the RCU problem with
>>>>> squashfs_readahead() is where it is introduced + Phillip's patches
>>>>> [012]/3.
>>>>>
>>>>> Thanks,
>>>>> Mirsad
>>>>>
>>>>> P.S.
>>>>>
>>>>> This is excerpt from dmesg log attached:
>>>>>
>>>>> [ 25.338169] rcu: INFO: rcu_preempt detected expedited stalls on
>>>>> CPUs/tasks: { 4-... } 6 jiffies s: 373 root: 0x10/.
>>>
>>> The "6 jiffies" leads me to believe that this kernel was built with
>>> CONFIG_ANDROID=y, but also with the short expedited grace-period commit,
>>> that is, with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT set to 20 milliseconds
>>> and with HZ set to something like 250. This means that the kernel
>>> will complain if an RCU read-side critical section exceeeds about 20
>>> milliseconds. This is a fine thing if you are actually running on an
>>> Android, but a bit extreme for general usage.
>>>
>>> If my guesses are correct, please apply this commit from mainline:
>>>
>>> 1045a06724f3 ("remove CONFIG_ANDROID")
>>>
>>> With this commit in place, CONFIG_RCU_EXP_CPU_STALL_TIMEOUT will default
>>> to 21000 (that is, 21 seconds) and things will be much better.
>
> Please apply the commit called out above and rerun your test.
>
> But if you still cannot bring yourself to do that, more interspersed
> below...
>
>> Dear Paul,
>>
>> Actually, I was able to reproduce and bisect the commit that started causing
>> described stalls (I have already explained how 5.10 didn't show any under the
>> exactly same saved widows and tabs set.)
>>
>> # bad: [1e57930e9f4083ad5854ab6eadffe790a8167fb4] Merge tag
>> 'rcu.2022.05.19a' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
>> git bisect bad 1e57930e9f4083ad5854ab6eadffe790a8167fb4
>> # good: [f993aed406eaf968ba3867a76bb46c95336a33d0] Merge tag 'block-5.18-2022-05-18' of git://git.kernel.dk/linux-block
>> git bisect good f993aed406eaf968ba3867a76bb46c95336a33d0
>> # good: [bf2431021c8cfe52c866d7bf640ced03a35ebe91] Merge tag
>> 'efi-next-for-v5.19' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
>> git bisect good bf2431021c8cfe52c866d7bf640ced03a35ebe91
>> # good: [be05ee54378d451e1d60196921566d6087f3079f] Merge branches
>> 'docs.2022.04.20a', 'fixes.2022.04.20a', 'nocb.2022.04.11b',
>> 'rcu-tasks.2022.04.11b', 'srcu.2022.
>> 05.03a', 'torture.2022.04.11b', 'torture-tasks.2022.04.20a' and 'torturescript.2022.04.20a' into HEAD
>> git bisect good be05ee54378d451e1d60196921566d6087f3079f
>> # good: [023033fe343cdf2ba83ab762f8de69241c7fc086] tools/nolibc/types: define PATH_MAX and MAXPATHLEN
>> git bisect good 023033fe343cdf2ba83ab762f8de69241c7fc086
>> # good: [96980b833a21c6dc29d0dfdc8f211fb8a10256a7] tools/nolibc/string: do not use __builtin_strlen() at -O0
>> git bisect good 96980b833a21c6dc29d0dfdc8f211fb8a10256a7
>> # good: [0e0ff638400be8f497a35b51a4751fd823f6bd6a] tools/nolibc/stdlib: Implement `malloc()`, `calloc()`, `realloc()` and `free()`
>> git bisect good 0e0ff638400be8f497a35b51a4751fd823f6bd6a
>> # good: [b2f02e9cdaade3ce565e6973ac313d2f814447f2] Merge tag
>> 'lkmm.2022.05.20a' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
>> git bisect good b2f02e9cdaade3ce565e6973ac313d2f814447f2
>> # good: [9621fbee44df940e2e1b94b0676460a538dffefa] rcu: Move expedited grace period (GP) work to RT kthread_worker
>> git bisect good 9621fbee44df940e2e1b94b0676460a538dffefa
>> # bad: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 'exp.2022.05.11a' into HEAD
>> git bisect bad ce13389053a347aa9f8ffbfda2238352536e15c9
>> # first bad commit: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 'exp.2022.05.11a' into HEAD
>
> More precisely, this commit:
>
> 28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")
>
> The idea was to cause Android-based systems to have 20-millisecond
> expedited RCU CPU stall warnings, but to leave other kernels at the
> traditional 21-second value. Again, the "6 jiffies" below indicates
> that you are running a kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
> and CONFIG_ANDROID=y.
>
> Except that there were a lot of non-Android systems (including Debian
> and Fedora) that have nothing to do with Android, but that nevertheless
> build their kernels with CONFIG_ANDROID=y, which no doubt seemed like
> a good idea at the time.
>
> For more information, please see https://lwn.net/Articles/899743/
>
> The upshot of the discussion was this same commit that I pointed out
> earlier that removes CONFIG_ANDROID completely:
>
> 1045a06724f3 ("remove CONFIG_ANDROID")
>
> Again, please apply this commit and rerun your test.
>
>> The bug looks like this:
>>
>> [ 57.876820] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-... } 6 jiffies s: 545 root: 0x1/.
>
> Yes, again "6 jiffies" which is roughly the 20 milliseconds set for
> kernels built with CONFIG_ANDROID=y and HZ=250 that have this commit:
>
> 28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")
>
> But not this commit:
>
> 1045a06724f3 ("remove CONFIG_ANDROID")
>
> Yet again, please apply this commit and rerun your test.
>
>> [ 57.876854] rcu: blocking rcu_node structures (internal RCU debug):
>> [ 57.876862] Task dump for CPU 0:
>> [ 57.876867] task:snap-store state:R running task stack: 0 pid: 2678 ppid: 2230 flags:0x00004008
>> [ 57.876883] Call Trace:
>> [ 57.876888] <TASK>
>> [ 57.876893] ? _raw_spin_lock_irqsave+0x93/0xf0
>> [ 57.876909] ? sysvec_apic_timer_interrupt+0x89/0x90
>> [ 57.876920] ? asm_sysvec_apic_timer_interrupt+0x12/0x20
>> [ 57.876930] ? sysvec_apic_timer_interrupt+0x89/0x90
>> [ 57.876939] ? sysvec_apic_timer_interrupt+0x89/0x90
>> [ 57.876948] ? asm_sysvec_apic_timer_interrupt+0x12/0x20
>> [ 57.876959] ? lzma_len+0x1fb/0x3b0
>> [ 57.876971] ? dict_repeat+0xb3/0x120
>> [ 57.876981] ? lzma_main+0x943/0x1260
>> [ 57.876995] ? lzma2_lzma+0x2a4/0x3b0
>> [ 57.877007] ? xz_dec_lzma2_run+0x11f/0xb90
>> [ 57.877018] ? crc32_le_base+0x186/0x250
>> [ 57.877032] ? xz_dec_run+0x346/0x11e0
>> [ 57.877046] ? squashfs_xz_uncompress+0x135/0x330
>> [ 57.877060] ? lzo_init+0xd0/0xd0
>> [ 57.877069] ? squashfs_decompress+0x80/0xb0
>> [ 57.877079] ? squashfs_read_data+0x1eb/0x920
>> [ 57.877090] ? __filemap_get_folio+0x37e/0x540
>> [ 57.877104] ? squashfs_bio_read.isra.2+0x2a0/0x2a0
>> [ 57.877118] ? squashfs_readpage_block+0x493/0x950
>> [ 57.877131] ? squashfs_readpage+0xa52/0x1140
>> [ 57.877143] ? xas_nomem+0x2f/0x100
>> [ 57.877154] ? squashfs_copy_cache+0x360/0x360
>> [ 57.877164] ? folio_wait_bit+0x3c0/0x3c0
>> [ 57.877175] ? __kasan_check_read+0x11/0x20
>> [ 57.877186] ? pagevec_add_and_need_flush+0x8c/0xa0
>> [ 57.877197] ? folio_add_lru+0x42/0x70
>> [ 57.877206] ? filemap_add_folio+0xd4/0x130
>> [ 57.877216] ? add_to_page_cache_locked+0xa0/0xa0
>> [ 57.877228] ? filemap_read_folio.isra.62+0xbe/0x4d0
>> [ 57.877238] ? __kasan_check_write+0x14/0x20
>> [ 57.877248] ? filemap_page_mkwrite+0x6a0/0x6a0
>> [ 57.877260] ? filemap_add_folio+0x130/0x130
>> [ 57.877269] ? try_charge_memcg+0x4cc/0xa90
>> [ 57.877281] ? filemap_fault+0xc9a/0xe50
>> [ 57.877293] ? generic_file_read_iter+0x230/0x230
>> [ 57.877305] ? __rcu_read_unlock+0x55/0x260
>> [ 57.877317] ? __do_fault+0x7c/0x1b0
>> [ 57.877329] ? __handle_mm_fault+0x173b/0x1970
>> [ 57.877340] ? copy_page_range+0x1890/0x1890
>> [ 57.877356] ? handle_mm_fault+0x11b/0x380
>> [ 57.877366] ? do_user_addr_fault+0x258/0x810
>> [ 57.877380] ? exc_page_fault+0x60/0xe0
>> [ 57.877389] ? asm_exc_page_fault+0x8/0x30
>> [ 57.877398] ? asm_exc_page_fault+0x1e/0x30
>> [ 57.877410] </TASK>
>>
>> Please find complete dmesg, my lshw and bisect attached.
>>
>> I am not competent to estimate how this commit causes this kind of problems,
>> and whether stalls up to 20 jiffies are worth eradicating.
>
> The Android folks believe that they are on Android platforms, as you
> can see from the commit log on:
>
> 28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")
>
> But this would be excessive on many other platforms, especially those
> with large quantities of CPUs, devices, or memory.
>
>> Hope this helps.
>
> Likewise!
>
> Thanx, Paul
Dear Mr. McKenney,
And what about the Mr. Robert Elliott's observation about calling conf_recshed()?
> How big can these readahead sizes be? Should one of the loops include
> cond_resched() calls?
That is IMHO better than allowing 21000 milisecond stalls on a core (or more of them).
I don't think it is correct to stay in kernel mode for more than an timer unit
without yielding the CPU. It creates stalls in multimedia and audio (chirps like on scratched
CD-ROMs). This is especially noticeable with a KASAN build.
Since Firefox and most snaps are using squashfs as compressed ROFS, the Firefox appears
to perform poorer since snaps are introduced than Chrome.
IMHO, if we want something like realtime and multimedia processing (which is the specific
area of my research), it seems that anything trying to hold processor for 21000 ms (21 secs)
is either buggy or deliberately malicious. 20 ms is quite enough of work for a threat
in one allotted timeslot.
I do not agree with Mr. Lougher's observation that I am thrashing my laptop. I think that
a system has to endure stress and torture testing. I was raised on Digital MicroVAX systems
on Ultrix which compiled lab at a time in memory that would today sound funny. :)
Besides, this is the very idea behind the MG-LRU algorithm commit, to test eviction of
memory pages in the system with heavy load and low on memory.
I will probably test your commits, but now I have to do my own evening ritual, unwinding,
and knowledge and memory consolidation (called "sleep").
I appreciate your lots of commits on the kernel.org and I hope I do not sound like
I am thinking you are a village idiot :(
I am trying to adhere to the Code of Conduct with mutual respect and politeness.
I know that the Linux kernel is about 30 million lines by now, and by the security experts
we should expect 30,000 bugs in such a solid piece of written code (one per thousand of
lines). Only Mr. Thorsten mentioned 950 unresolved in the "open" list.
Knowing all of this is difficult, but I still believe in open source and open systems
interconnected.
Of course, I always remember a proverb "Who hath despised the day of the small beginnings?"
Hope this helps. My $0.02.
Thank you,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union
On Sun, Nov 20, 2022 at 07:36:21PM +0100, Mirsad Goran Todorovac wrote:
> On 20. 11. 2022. 18:50, Paul E. McKenney wrote:
> > On Sun, Nov 20, 2022 at 04:49:37PM +0100, Mirsad Goran Todorovac wrote:
> > > On 18. 11. 2022. 16:55, Paul E. McKenney wrote:
> > > > On Fri, Nov 18, 2022 at 06:11:16AM +0000, Phillip Lougher wrote:
> > > > > On 17/11/2022 23:05, Mirsad Goran Todorovac wrote:
> > > > > > Hi,
> > > > > >
> > > > > > While trying to bisect, I've found another bug that predated the
> > > > > > introduction of squashfs_readahead(), but it has
> > > > > > a common denominator in squashfs_decompress() and squashfs_xz_uncompress().
> > > > >
> > > > > Wrong, the stall is happening in the XZ decompressor library, which
> > > > > is *not* in Squashfs.
> > > > >
> > > > > This reported stall in the decompressor code is likely a symptom of you
> > > > > deliberately thrashing your system. When the system thrashes everything
> > > > > starts to happen very slowly, and the system will spend a lot of
> > > > > its time doing page I/O, and the CPU will spend a lot of time in
> > > > > any CPU intensive code like the XZ decompressor library.
> > > > >
> > > > > So the fact the stall is being hit here is a symptom and not
> > > > > a cause. The decompressor code is likely running slowly due to
> > > > > thrashing and waiting on paged-out buffers. This is not indicative
> > > > > of any bug, merely a system running slowly due to overload.
> > > > >
> > > > > As I said, this is not a Squashfs issue, because the code when the
> > > > > stall takes place isn't in Squashfs.
> > > > >
> > > > > The people responsible for the rcu code should have a lot more insight
> > > > > about what happens when the system is thrashing, and how this will
> > > > > throw up false positives. In this I believe this is an instance of
> > > > > perfectly correct code running slowly due to thrashing incorrectly
> > > > > being flagged as looping.
> > > > >
> > > > > CC'ing Paul E. McKenney <[email protected]>
> > > > >
> > > > > > Frankly, I need an advice on how to handle a situation like this.
> > > > > >
> > > > > > Obviously, the best place to test the RCU problem with
> > > > > > squashfs_readahead() is where it is introduced + Phillip's patches
> > > > > > [012]/3.
> > > > > >
> > > > > > Thanks,
> > > > > > Mirsad
> > > > > >
> > > > > > P.S.
> > > > > >
> > > > > > This is excerpt from dmesg log attached:
> > > > > >
> > > > > > [ 25.338169] rcu: INFO: rcu_preempt detected expedited stalls on
> > > > > > CPUs/tasks: { 4-... } 6 jiffies s: 373 root: 0x10/.
> > > >
> > > > The "6 jiffies" leads me to believe that this kernel was built with
> > > > CONFIG_ANDROID=y, but also with the short expedited grace-period commit,
> > > > that is, with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT set to 20 milliseconds
> > > > and with HZ set to something like 250. This means that the kernel
> > > > will complain if an RCU read-side critical section exceeeds about 20
> > > > milliseconds. This is a fine thing if you are actually running on an
> > > > Android, but a bit extreme for general usage.
> > > >
> > > > If my guesses are correct, please apply this commit from mainline:
> > > >
> > > > 1045a06724f3 ("remove CONFIG_ANDROID")
> > > >
> > > > With this commit in place, CONFIG_RCU_EXP_CPU_STALL_TIMEOUT will default
> > > > to 21000 (that is, 21 seconds) and things will be much better.
> >
> > Please apply the commit called out above and rerun your test.
> >
> > But if you still cannot bring yourself to do that, more interspersed
> > below...
> >
> > > Dear Paul,
> > >
> > > Actually, I was able to reproduce and bisect the commit that started causing
> > > described stalls (I have already explained how 5.10 didn't show any under the
> > > exactly same saved widows and tabs set.)
> > >
> > > # bad: [1e57930e9f4083ad5854ab6eadffe790a8167fb4] Merge tag
> > > 'rcu.2022.05.19a' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
> > > git bisect bad 1e57930e9f4083ad5854ab6eadffe790a8167fb4
> > > # good: [f993aed406eaf968ba3867a76bb46c95336a33d0] Merge tag 'block-5.18-2022-05-18' of git://git.kernel.dk/linux-block
> > > git bisect good f993aed406eaf968ba3867a76bb46c95336a33d0
> > > # good: [bf2431021c8cfe52c866d7bf640ced03a35ebe91] Merge tag
> > > 'efi-next-for-v5.19' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
> > > git bisect good bf2431021c8cfe52c866d7bf640ced03a35ebe91
> > > # good: [be05ee54378d451e1d60196921566d6087f3079f] Merge branches
> > > 'docs.2022.04.20a', 'fixes.2022.04.20a', 'nocb.2022.04.11b',
> > > 'rcu-tasks.2022.04.11b', 'srcu.2022.
> > > 05.03a', 'torture.2022.04.11b', 'torture-tasks.2022.04.20a' and 'torturescript.2022.04.20a' into HEAD
> > > git bisect good be05ee54378d451e1d60196921566d6087f3079f
> > > # good: [023033fe343cdf2ba83ab762f8de69241c7fc086] tools/nolibc/types: define PATH_MAX and MAXPATHLEN
> > > git bisect good 023033fe343cdf2ba83ab762f8de69241c7fc086
> > > # good: [96980b833a21c6dc29d0dfdc8f211fb8a10256a7] tools/nolibc/string: do not use __builtin_strlen() at -O0
> > > git bisect good 96980b833a21c6dc29d0dfdc8f211fb8a10256a7
> > > # good: [0e0ff638400be8f497a35b51a4751fd823f6bd6a] tools/nolibc/stdlib: Implement `malloc()`, `calloc()`, `realloc()` and `free()`
> > > git bisect good 0e0ff638400be8f497a35b51a4751fd823f6bd6a
> > > # good: [b2f02e9cdaade3ce565e6973ac313d2f814447f2] Merge tag
> > > 'lkmm.2022.05.20a' of
> > > git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
> > > git bisect good b2f02e9cdaade3ce565e6973ac313d2f814447f2
> > > # good: [9621fbee44df940e2e1b94b0676460a538dffefa] rcu: Move expedited grace period (GP) work to RT kthread_worker
> > > git bisect good 9621fbee44df940e2e1b94b0676460a538dffefa
> > > # bad: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 'exp.2022.05.11a' into HEAD
> > > git bisect bad ce13389053a347aa9f8ffbfda2238352536e15c9
> > > # first bad commit: [ce13389053a347aa9f8ffbfda2238352536e15c9] Merge branch 'exp.2022.05.11a' into HEAD
> >
> > More precisely, this commit:
> >
> > 28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")
> >
> > The idea was to cause Android-based systems to have 20-millisecond
> > expedited RCU CPU stall warnings, but to leave other kernels at the
> > traditional 21-second value. Again, the "6 jiffies" below indicates
> > that you are running a kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
> > and CONFIG_ANDROID=y.
> >
> > Except that there were a lot of non-Android systems (including Debian
> > and Fedora) that have nothing to do with Android, but that nevertheless
> > build their kernels with CONFIG_ANDROID=y, which no doubt seemed like
> > a good idea at the time.
> >
> > For more information, please see https://lwn.net/Articles/899743/
> >
> > The upshot of the discussion was this same commit that I pointed out
> > earlier that removes CONFIG_ANDROID completely:
> >
> > 1045a06724f3 ("remove CONFIG_ANDROID")
> >
> > Again, please apply this commit and rerun your test.
> >
> > > The bug looks like this:
> > >
> > > [ 57.876820] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 0-... } 6 jiffies s: 545 root: 0x1/.
> >
> > Yes, again "6 jiffies" which is roughly the 20 milliseconds set for
> > kernels built with CONFIG_ANDROID=y and HZ=250 that have this commit:
> >
> > 28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")
> >
> > But not this commit:
> >
> > 1045a06724f3 ("remove CONFIG_ANDROID")
> >
> > Yet again, please apply this commit and rerun your test.
> >
> > > [ 57.876854] rcu: blocking rcu_node structures (internal RCU debug):
> > > [ 57.876862] Task dump for CPU 0:
> > > [ 57.876867] task:snap-store state:R running task stack: 0 pid: 2678 ppid: 2230 flags:0x00004008
> > > [ 57.876883] Call Trace:
> > > [ 57.876888] <TASK>
> > > [ 57.876893] ? _raw_spin_lock_irqsave+0x93/0xf0
> > > [ 57.876909] ? sysvec_apic_timer_interrupt+0x89/0x90
> > > [ 57.876920] ? asm_sysvec_apic_timer_interrupt+0x12/0x20
> > > [ 57.876930] ? sysvec_apic_timer_interrupt+0x89/0x90
> > > [ 57.876939] ? sysvec_apic_timer_interrupt+0x89/0x90
> > > [ 57.876948] ? asm_sysvec_apic_timer_interrupt+0x12/0x20
> > > [ 57.876959] ? lzma_len+0x1fb/0x3b0
> > > [ 57.876971] ? dict_repeat+0xb3/0x120
> > > [ 57.876981] ? lzma_main+0x943/0x1260
> > > [ 57.876995] ? lzma2_lzma+0x2a4/0x3b0
> > > [ 57.877007] ? xz_dec_lzma2_run+0x11f/0xb90
> > > [ 57.877018] ? crc32_le_base+0x186/0x250
> > > [ 57.877032] ? xz_dec_run+0x346/0x11e0
> > > [ 57.877046] ? squashfs_xz_uncompress+0x135/0x330
> > > [ 57.877060] ? lzo_init+0xd0/0xd0
> > > [ 57.877069] ? squashfs_decompress+0x80/0xb0
> > > [ 57.877079] ? squashfs_read_data+0x1eb/0x920
> > > [ 57.877090] ? __filemap_get_folio+0x37e/0x540
> > > [ 57.877104] ? squashfs_bio_read.isra.2+0x2a0/0x2a0
> > > [ 57.877118] ? squashfs_readpage_block+0x493/0x950
> > > [ 57.877131] ? squashfs_readpage+0xa52/0x1140
> > > [ 57.877143] ? xas_nomem+0x2f/0x100
> > > [ 57.877154] ? squashfs_copy_cache+0x360/0x360
> > > [ 57.877164] ? folio_wait_bit+0x3c0/0x3c0
> > > [ 57.877175] ? __kasan_check_read+0x11/0x20
> > > [ 57.877186] ? pagevec_add_and_need_flush+0x8c/0xa0
> > > [ 57.877197] ? folio_add_lru+0x42/0x70
> > > [ 57.877206] ? filemap_add_folio+0xd4/0x130
> > > [ 57.877216] ? add_to_page_cache_locked+0xa0/0xa0
> > > [ 57.877228] ? filemap_read_folio.isra.62+0xbe/0x4d0
> > > [ 57.877238] ? __kasan_check_write+0x14/0x20
> > > [ 57.877248] ? filemap_page_mkwrite+0x6a0/0x6a0
> > > [ 57.877260] ? filemap_add_folio+0x130/0x130
> > > [ 57.877269] ? try_charge_memcg+0x4cc/0xa90
> > > [ 57.877281] ? filemap_fault+0xc9a/0xe50
> > > [ 57.877293] ? generic_file_read_iter+0x230/0x230
> > > [ 57.877305] ? __rcu_read_unlock+0x55/0x260
> > > [ 57.877317] ? __do_fault+0x7c/0x1b0
> > > [ 57.877329] ? __handle_mm_fault+0x173b/0x1970
> > > [ 57.877340] ? copy_page_range+0x1890/0x1890
> > > [ 57.877356] ? handle_mm_fault+0x11b/0x380
> > > [ 57.877366] ? do_user_addr_fault+0x258/0x810
> > > [ 57.877380] ? exc_page_fault+0x60/0xe0
> > > [ 57.877389] ? asm_exc_page_fault+0x8/0x30
> > > [ 57.877398] ? asm_exc_page_fault+0x1e/0x30
> > > [ 57.877410] </TASK>
> > >
> > > Please find complete dmesg, my lshw and bisect attached.
> > >
> > > I am not competent to estimate how this commit causes this kind of problems,
> > > and whether stalls up to 20 jiffies are worth eradicating.
> >
> > The Android folks believe that they are on Android platforms, as you
> > can see from the commit log on:
> >
> > 28b3ae426598 ("rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT")
> >
> > But this would be excessive on many other platforms, especially those
> > with large quantities of CPUs, devices, or memory.
> >
> > > Hope this helps.
> >
> > Likewise!
> >
> > Thanx, Paul
>
> Dear Mr. McKenney,
>
> And what about the Mr. Robert Elliott's observation about calling conf_recshed()?
>
> > How big can these readahead sizes be? Should one of the loops include
> > cond_resched() calls?
>
> That is IMHO better than allowing 21000 milisecond stalls on a core (or more of them).
>
> I don't think it is correct to stay in kernel mode for more than an timer unit
> without yielding the CPU. It creates stalls in multimedia and audio (chirps like on scratched
> CD-ROMs). This is especially noticeable with a KASAN build.
>
> Since Firefox and most snaps are using squashfs as compressed ROFS, the Firefox appears
> to perform poorer since snaps are introduced than Chrome.
>
> IMHO, if we want something like realtime and multimedia processing (which is the specific
> area of my research), it seems that anything trying to hold processor for 21000 ms (21 secs)
> is either buggy or deliberately malicious. 20 ms is quite enough of work for a threat
> in one allotted timeslot.
>
> I do not agree with Mr. Lougher's observation that I am thrashing my laptop. I think that
> a system has to endure stress and torture testing. I was raised on Digital MicroVAX systems
> on Ultrix which compiled lab at a time in memory that would today sound funny. :)
I personally think that it would be great if you were to work to decrease
the Linux kernel's latency. Doing so would not be fixing a regression,
but I personally would welcome it. Others might have different opinions,
but please do CC me on any resulting patches.
And I will see your MicroVAX and raise you a videogame written on a
PDP-12 whose fastest instruction executed in 1.6 microseconds (-not-
nanoseconds!). ;-)
You can can see a couple of people playing the game on a PDP-12 in
a computer museum: https://www.rcsri.org/collection/pdp-12/
> Besides, this is the very idea behind the MG-LRU algorithm commit, to test eviction of
> memory pages in the system with heavy load and low on memory.
>
> I will probably test your commits, but now I have to do my own evening ritual, unwinding,
> and knowledge and memory consolidation (called "sleep").
And yes, sleep is often one of the best debugging tools available.
> I appreciate your lots of commits on the kernel.org and I hope I do not sound like
> I am thinking you are a village idiot :(
>
> I am trying to adhere to the Code of Conduct with mutual respect and politeness.
Skepticism is not necessarily a bad thing, especially given that I
am not immune from errors and confusion. Me, I just thought you were
forcefully reporting the regression, so I forcefully pointed you at the
fix for that regression.
Again, I have absolutely no objection to your improving the kernel's
response time.
> I know that the Linux kernel is about 30 million lines by now, and by the security experts
> we should expect 30,000 bugs in such a solid piece of written code (one per thousand of
> lines). Only Mr. Thorsten mentioned 950 unresolved in the "open" list.
At least 30,000 bugs, of which we know of maybe 950. ;-)
> Knowing all of this is difficult, but I still believe in open source and open systems
> interconnected.
If it was easy, where would be the challenge?
> Of course, I always remember a proverb "Who hath despised the day of the small beginnings?"
>
> Hope this helps. My $0.02.
I think we are good. ;-)
Thanx, Paul
> Thank you,
> Mirsad
>
> --
> Mirsad Goran Todorovac
> Sistem inženjer
> Grafički fakultet | Akademija likovnih umjetnosti
> Sveučilište u Zagrebu
> --
> System engineer
> Faculty of Graphic Arts | Academy of Fine Arts
> University of Zagreb, Republic of Croatia
> The European Union
>
P.S.
Sorry, Paul, the patches did not apply on the mainline "master" or "v6.1-rc5" nor "v6.0.9".
Here is the session log:
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git checkout v6.1-rc5
Note: checking out 'v6.1-rc5'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by performing another checkout.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -b with the checkout command again. Example:
git checkout -b <new-branch-name>
HEAD is now at 094226ad94f4 Linux 6.1-rc5
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git apply ../remove-android.patch
error: patch failed: drivers/Makefile:176
error: drivers/Makefile: patch does not apply
error: patch failed: drivers/android/Kconfig:1
error: drivers/android/Kconfig: patch does not apply
error: patch failed: kernel/configs/android-base.config:7
error: kernel/configs/android-base.config: patch does not apply
error: patch failed: kernel/rcu/Kconfig.debug:86
error: kernel/rcu/Kconfig.debug: patch does not apply
error: patch failed: tools/testing/selftests/filesystems/binderfs/config:1
error: tools/testing/selftests/filesystems/binderfs/config: patch does not apply
error: patch failed: tools/testing/selftests/sync/config:1
error: tools/testing/selftests/sync/config: patch does not apply
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git apply ../introduce-rcu-stall-timeout.patch
error: patch failed: Documentation/admin-guide/kernel-parameters.txt:4893
error: Documentation/admin-guide/kernel-parameters.txt: patch does not apply
error: patch failed: kernel/rcu/Kconfig.debug:91
error: kernel/rcu/Kconfig.debug: patch does not apply
error: patch failed: kernel/rcu/rcu.h:210
error: kernel/rcu/rcu.h: patch does not apply
error: patch failed: kernel/rcu/tree_exp.h:496
error: kernel/rcu/tree_exp.h: patch does not apply
error: patch failed: kernel/rcu/tree_stall.h:25
error: kernel/rcu/tree_stall.h: patch does not apply
error: patch failed: kernel/rcu/update.c:506
error: kernel/rcu/update.c: patch does not apply
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git checkout v6.0.9
Checking out files: 100% (12048/12048), done.
Previous HEAD position was 094226ad94f4 Linux 6.1-rc5
HEAD is now at be8b93b5cc7d Linux 6.0.9
You have mail in /var/mail/mtodorov
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git apply ../remove-android.patch
error: patch failed: drivers/Makefile:176
error: drivers/Makefile: patch does not apply
error: patch failed: drivers/android/Kconfig:1
error: drivers/android/Kconfig: patch does not apply
error: patch failed: kernel/configs/android-base.config:7
error: kernel/configs/android-base.config: patch does not apply
error: patch failed: kernel/rcu/Kconfig.debug:86
error: kernel/rcu/Kconfig.debug: patch does not apply
error: patch failed: tools/testing/selftests/filesystems/binderfs/config:1
error: tools/testing/selftests/filesystems/binderfs/config: patch does not apply
error: patch failed: tools/testing/selftests/sync/config:1
error: tools/testing/selftests/sync/config: patch does not apply
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git apply ../introduce-rcu-stall-timeout.patch
error: patch failed: Documentation/admin-guide/kernel-parameters.txt:4893
error: Documentation/admin-guide/kernel-parameters.txt: patch does not apply
error: patch failed: kernel/rcu/Kconfig.debug:91
error: kernel/rcu/Kconfig.debug: patch does not apply
error: patch failed: kernel/rcu/rcu.h:210
error: kernel/rcu/rcu.h: patch does not apply
error: patch failed: kernel/rcu/tree_exp.h:496
error: kernel/rcu/tree_exp.h: patch does not apply
error: patch failed: kernel/rcu/tree_stall.h:25
error: kernel/rcu/tree_stall.h: patch does not apply
error: patch failed: kernel/rcu/update.c:506
error: kernel/rcu/update.c: patch does not apply
mtodorov@domac:~/linux/kernel/linux_stable_build_b$
Now I really got to try to catch some sleep because I work my day job in the morning.
I must admit that I am not enthusiastic about these patches for I think they only mask
the problem of stalls: removing reports from the logs, but cores will still lock up.
Besides, I also hoped to develop multimedia for Android devices, too, as a part of our
research, to academically justify my work on the Linux kernel bug bisection and learning
about the kernel proper.
I still need to figure out a way to present my results to the academic community, to
have it justified and to have our University more visible and ranked on the world list. ;-)
Thanks,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union
On 20. 11. 2022. 20:21, Paul E. McKenney wrote:
>> And what about the Mr. Robert Elliott's observation about calling conf_recshed()?
>>
>>> How big can these readahead sizes be? Should one of the loops include
>>> cond_resched() calls?
>>
>> That is IMHO better than allowing 21000 milisecond stalls on a core (or more of them).
>>
>> I don't think it is correct to stay in kernel mode for more than an timer unit
>> without yielding the CPU. It creates stalls in multimedia and audio (chirps like on scratched
>> CD-ROMs). This is especially noticeable with a KASAN build.
>>
>> Since Firefox and most snaps are using squashfs as compressed ROFS, the Firefox appears
>> to perform poorer since snaps are introduced than Chrome.
>>
>> IMHO, if we want something like realtime and multimedia processing (which is the specific
>> area of my research), it seems that anything trying to hold processor for 21000 ms (21 secs)
>> is either buggy or deliberately malicious. 20 ms is quite enough of work for a threat
>> in one allotted timeslot.
>>
>> I do not agree with Mr. Lougher's observation that I am thrashing my laptop. I think that
>> a system has to endure stress and torture testing. I was raised on Digital MicroVAX systems
>> on Ultrix which compiled lab at a time in memory that would today sound funny. :)
>
> I personally think that it would be great if you were to work to decrease
> the Linux kernel's latency. Doing so would not be fixing a regression,
> but I personally would welcome it. Others might have different opinions,
> but please do CC me on any resulting patches.
>
> And I will see your MicroVAX and raise you a videogame written on a
> PDP-12 whose fastest instruction executed in 1.6 microseconds (-not-
> nanoseconds!). ;-)
I'm afraid that I would lose in Far Cry miserably if my cores
decided to all lock up for 21 secs. :-(
> You can can see a couple of people playing the game on a PDP-12 in
> a computer museum: https://www.rcsri.org/collection/pdp-12/
>
>> Besides, this is the very idea behind the MG-LRU algorithm commit, to test eviction of
>> memory pages in the system with heavy load and low on memory.
>>
>> I will probably test your commits, but now I have to do my own evening ritual, unwinding,
>> and knowledge and memory consolidation (called "sleep").
>
> And yes, sleep is often one of the best debugging tools available.
>
>> I appreciate your lots of commits on the kernel.org and I hope I do not sound like
>> I am thinking you are a village idiot :(
>>
>> I am trying to adhere to the Code of Conduct with mutual respect and politeness.
>
> Skepticism is not necessarily a bad thing, especially given that I
> am not immune from errors and confusion. Me, I just thought you were
> forcefully reporting the regression, so I forcefully pointed you at the
> fix for that regression.
>
> Again, I have absolutely no objection to your improving the kernel's
> response time.
This is at present just the wishful thinking, as I lack your 30 years of
experience with the kernel and RCU update system. I am only beginning to realise
why it is more efficient than the traditional locking, and IMHO it should
avoid locking up cores instead of increasing the number of complaints.
But even if the Linux kernel source is magically "memory mapped" into my
mind, I still do not see how it could be done. My Linux kernel learning curve
had not yet got that up, but I have no doubts that it is designed by
Intelligent Designers who are very witty people, and not village idiots ;-)
>> I know that the Linux kernel is about 30 million lines by now, and by the security experts
>> we should expect 30,000 bugs in such a solid piece of written code (one per thousand of
>> lines). Only Mr. Thorsten mentioned 950 unresolved in the "open" list.
>
> At least 30,000 bugs, of which we know of maybe 950. ;-)
So I need no point in banning the kernel from screaming to logs that it had
core stalls that needed a physical NMI to recover from, or they would potentially
last much longer.
>> Knowing all of this is difficult, but I still believe in open source and open systems
>> interconnected.
>
> If it was easy, where would be the challenge?
AFAIK, the point I was taught in life was obedience, not overcoming challenges.
>> Of course, I always remember a proverb "Who hath despised the day of the small beginnings?"
>>
>> Hope this helps. My $0.02.
>
> I think we are good. ;-)
Yes, you guys do an amasing job of keeping 30 million lines of code organised
and making some sense. I will cut the smalltalk as I know you are a busy man.
If I make a progress to actually produce any patches fixing these lockups and
stalls, I will be sure to include you into CC: as you requested.
Have a nice day!
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union
On Mon, Nov 21, 2022 at 05:04:36AM +0100, Mirsad Goran Todorovac wrote:
> On 20. 11. 2022. 20:21, Paul E. McKenney wrote:
> > > And what about the Mr. Robert Elliott's observation about calling conf_recshed()?
> > >
> > > > How big can these readahead sizes be? Should one of the loops include
> > > > cond_resched() calls?
> > >
> > > That is IMHO better than allowing 21000 milisecond stalls on a core (or more of them).
> > >
> > > I don't think it is correct to stay in kernel mode for more than an timer unit
> > > without yielding the CPU. It creates stalls in multimedia and audio (chirps like on scratched
> > > CD-ROMs). This is especially noticeable with a KASAN build.
> > >
> > > Since Firefox and most snaps are using squashfs as compressed ROFS, the Firefox appears
> > > to perform poorer since snaps are introduced than Chrome.
> > >
> > > IMHO, if we want something like realtime and multimedia processing (which is the specific
> > > area of my research), it seems that anything trying to hold processor for 21000 ms (21 secs)
> > > is either buggy or deliberately malicious. 20 ms is quite enough of work for a threat
> > > in one allotted timeslot.
> > >
> > > I do not agree with Mr. Lougher's observation that I am thrashing my laptop. I think that
> > > a system has to endure stress and torture testing. I was raised on Digital MicroVAX systems
> > > on Ultrix which compiled lab at a time in memory that would today sound funny. :)
> >
> > I personally think that it would be great if you were to work to decrease
> > the Linux kernel's latency. Doing so would not be fixing a regression,
> > but I personally would welcome it. Others might have different opinions,
> > but please do CC me on any resulting patches.
> >
> > And I will see your MicroVAX and raise you a videogame written on a
> > PDP-12 whose fastest instruction executed in 1.6 microseconds (-not-
> > nanoseconds!). ;-)
>
> I'm afraid that I would lose in Far Cry miserably if my cores
> decided to all lock up for 21 secs. :-(
Agreed, 21 seconds is an improvement over the earlier 60 seconds, but
still a very long time. Me, I come from DYNIX/ptx, where the equivalent
to the RCU CPU stall warning was 1.5 seconds. On the other hand, it
is also the case that DYNIX/ptx had nowhere near the variety of drivers
and subsystems, nor did it scale anywhere near as far as Linux does today.
But you only need one CPU to lock up for 21 seconds to get an RCU CPU
stall warning, not all of them. ;-)
> > You can can see a couple of people playing the game on a PDP-12 in
> > a computer museum: https://www.rcsri.org/collection/pdp-12/
> >
> > > Besides, this is the very idea behind the MG-LRU algorithm commit, to test eviction of
> > > memory pages in the system with heavy load and low on memory.
> > >
> > > I will probably test your commits, but now I have to do my own evening ritual, unwinding,
> > > and knowledge and memory consolidation (called "sleep").
> >
> > And yes, sleep is often one of the best debugging tools available.
> >
> > > I appreciate your lots of commits on the kernel.org and I hope I do not sound like
> > > I am thinking you are a village idiot :(
> > >
> > > I am trying to adhere to the Code of Conduct with mutual respect and politeness.
> >
> > Skepticism is not necessarily a bad thing, especially given that I
> > am not immune from errors and confusion. Me, I just thought you were
> > forcefully reporting the regression, so I forcefully pointed you at the
> > fix for that regression.
> >
> > Again, I have absolutely no objection to your improving the kernel's
> > response time.
>
> This is at present just the wishful thinking, as I lack your 30 years of
> experience with the kernel and RCU update system. I am only beginning to realise
> why it is more efficient than the traditional locking, and IMHO it should
> avoid locking up cores instead of increasing the number of complaints.
Just to set the record straight, RCU does not normally lock up any of
the cores. Instead, RCU detects that cores have been locked up.
Give or take the occasional bug in RCU, of course!
> But even if the Linux kernel source is magically "memory mapped" into my
> mind, I still do not see how it could be done. My Linux kernel learning curve
> had not yet got that up, but I have no doubts that it is designed by
> Intelligent Designers who are very witty people, and not village idiots ;-)
There is the school of thought that claims that the Linux kernel is
driven by evolutionary forces rather than intelligent design. And as
we all know, evolutionary forces are driven by random changes, which
absolutely anyone could make.
And one approach is to take a less aggressive RCU CPU stall timeout,
say reducing from 21 seconds to (say) 15 seconds instead of all the
way down to 20 milliseconds. This could allow you to ease into the
latency-reduction work.
Alternatively, consider that response time is a property of the
entire system plus the environment that it runs in. So I suspect that
the Android folks are accompanying that 20-millisecond timeout with
some restrictions on what the on-phone workloads are permitted to do.
Maybe ask the Android guys what those restrictions are and loosen them
slightly, again allowing you to ease into the latency-reduction work.
> > > I know that the Linux kernel is about 30 million lines by now, and by the security experts
> > > we should expect 30,000 bugs in such a solid piece of written code (one per thousand of
> > > lines). Only Mr. Thorsten mentioned 950 unresolved in the "open" list.
> >
> > At least 30,000 bugs, of which we know of maybe 950. ;-)
>
> So I need no point in banning the kernel from screaming to logs that it had
> core stalls that needed a physical NMI to recover from, or they would potentially
> last much longer.
Sometimes an NMI does get the CPUs back on track. Sometimes the RCU CPU
stall warning is a symptom of the CPU having gotten too old and failing.
Most often, though, it is a sign of some sort of lockup, a too-long
RCU read-side critical section, or as Robert Elliot noted, the lack of
a cond_resched().
But please keep in mind that cond_resched() helps only in kernels built
with CONFIG_PREEMPTION=n.
> > > Knowing all of this is difficult, but I still believe in open source and open systems
> > > interconnected.
> >
> > If it was easy, where would be the challenge?
>
> AFAIK, the point I was taught in life was obedience, not overcoming challenges.
Perhaps early in life I was ordered to overcome challenges? If so, then
my overcoming them would be a matter of obedience. ;-)
> > > Of course, I always remember a proverb "Who hath despised the day of the small beginnings?"
> > >
> > > Hope this helps. My $0.02.
> >
> > I think we are good. ;-)
>
> Yes, you guys do an amasing job of keeping 30 million lines of code organised
> and making some sense. I will cut the smalltalk as I know you are a busy man.
> If I make a progress to actually produce any patches fixing these lockups and
> stalls, I will be sure to include you into CC: as you requested.
Looking forward to seeing what you come up with!
Thanx, Paul
On 22. 11. 2022. 03:07, Paul E. McKenney wrote:
>> I'm afraid that I would lose in Far Cry miserably if my cores
>> decided to all lock up for 21 secs. :-(
>
> Agreed, 21 seconds is an improvement over the earlier 60 seconds, but
> still a very long time. Me, I come from DYNIX/ptx, where the equivalent
> to the RCU CPU stall warning was 1.5 seconds. On the other hand, it
> is also the case that DYNIX/ptx had nowhere near the variety of drivers
> and subsystems, nor did it scale anywhere near as far as Linux does today.
>
> But you only need one CPU to lock up for 21 seconds to get an RCU CPU
> stall warning, not all of them. ;-)
I can recall an occasion or a couple of them where the entire X Window system
had been unresponsive for quite a number of seconds that sometimes made me reset
the Ubuntu box.
I have the good news: the patches did not apply because they were already applied
in the mainline tree:
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git log | grep -C5 28b3ae426598
Signed-off-by: Borislav Petkov <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Acked-by: Florian Weimer <[email protected]>
Link: https://lore.kernel.org/r/898932fe61db6a9d61bc2458fa2f6049f1ca9f5c.1652290558.git.luto@kernel.org
commit 28b3ae426598e722cf5d5ab9cc7038791b955a56
Author: Uladzislau Rezki <[email protected]>
Date: Wed Feb 16 14:52:09 2022 +0100
rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git log | grep -C5 1045a06724f3
Somehow kernel-doc complains here about strong markup, but
we really don't need the [] so just remove that.
Signed-off-by: Johannes Berg <[email protected]>
commit 1045a06724f322ed61f1ffb994427c7bdbe64647
Author: Christoph Hellwig <[email protected]>
Date: Wed Jun 29 17:01:02 2022 +0200
remove CONFIG_ANDROID
mtodorov@domac:~/linux/kernel/linux_stable_build_b$
>> This is at present just the wishful thinking, as I lack your 30 years of
>> experience with the kernel and RCU update system. I am only beginning to realise
>> why it is more efficient than the traditional locking, and IMHO it should
>> avoid locking up cores instead of increasing the number of complaints.
>
> Just to set the record straight, RCU does not normally lock up any of
> the cores. Instead, RCU detects that cores have been locked up.
>
> Give or take the occasional bug in RCU, of course!
Currently, I cannot be the judge of that, for I can't seem to understand how the
magic of RCU works., how it is implemented. There's more homework to be done ;-)
>> But even if the Linux kernel source is magically "memory mapped" into my
>> mind, I still do not see how it could be done. My Linux kernel learning curve
>> had not yet got that up, but I have no doubts that it is designed by
>> Intelligent Designers who are very witty people, and not village idiots ;-)
>
> There is the school of thought that claims that the Linux kernel is
> driven by evolutionary forces rather than intelligent design. And as
> we all know, evolutionary forces are driven by random changes, which
> absolutely anyone could make.
Give or take the rate of improbability where a bunch of monkeys randomly typing
would produce a working Linux kernel source would be about a couple of working
sources in a space of 96^30,000,00 (something like 10^300,000,000), it is comparable
to the probability of random coming of the first simplest DNA into the existence
from the amino acid primordial soup.
(Not that many atoms in the Universe - 10^82, you'd need an awful lot of wasted
multiverses with no even single cell life and certainly no working Linux kernels.)
> And one approach is to take a less aggressive RCU CPU stall timeout,
> say reducing from 21 seconds to (say) 15 seconds instead of all the
> way down to 20 milliseconds. This could allow you to ease into the
> latency-reduction work.
>
> Alternatively, consider that response time is a property of the
> entire system plus the environment that it runs in. So I suspect that
> the Android folks are accompanying that 20-millisecond timeout with
> some restrictions on what the on-phone workloads are permitted to do.
> Maybe ask the Android guys what those restrictions are and loosen them
> slightly, again allowing you to ease into the latency-reduction work.
Good point.
> Sometimes an NMI does get the CPUs back on track. Sometimes the RCU CPU
> stall warning is a symptom of the CPU having gotten too old and failing.
> Most often, though, it is a sign of some sort of lockup, a too-long
> RCU read-side critical section, or as Robert Elliot noted, the lack of
> a cond_resched().
>
> But please keep in mind that cond_resched() helps only in kernels built
> with CONFIG_PREEMPTION=n.
I have bad news that 6.1-r6 is still affected with squashfs_xz_uncompress bug, despite having both of your fixes
(as visible in above command's output -- double checked):
[ 91.065659] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 621 root: 0x8/.
[ 91.065694] rcu: blocking rcu_node structures (internal RCU debug):
[ 91.065704] Sending NMI from CPU 5 to CPUs 3:
[ 91.065721] NMI backtrace for cpu 3
[ 91.065730] CPU: 3 PID: 2829 Comm: snap-store Not tainted 6.1.0-rc6 #1
[ 91.065741] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
[ 91.065746] RIP: 0010:__asan_load4+0x0/0xa0
[ 91.065764] Code: 9e c0 84 c0 75 e1 5d c3 cc cc cc cc 48 c1 e8 03 80 3c 10 00 75 e9 5d c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00
00 00 0f 1f 00 <55> 48 89 e5 48 8b 4d 08 48 83 ff fb 77 64 eb 0f 0f 1f 00 48 b8 00
[ 91.065771] RSP: 0000:ffff8881388ef140 EFLAGS: 00000246
[ 91.065779] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff9be992fd
[ 91.065785] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff888125500004
[ 91.065789] RBP: ffff8881388ef1e0 R08: 0000000000000001 R09: ffffed1024aa0de8
[ 91.065794] R10: ffff888125506f39 R11: ffffed1024aa0de7 R12: 0000000001067db0
[ 91.065799] R13: ffff888125500000 R14: 00000000014fe803 R15: ffff888125502112
[ 91.065804] FS: 00007fdec50ab180(0000) GS:ffff888257180000(0000) knlGS:0000000000000000
[ 91.065810] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 91.065815] CR2: 00007fdeb7cb6260 CR3: 000000011a436005 CR4: 0000000000770ee0
[ 91.065820] PKRU: 55555554
[ 91.065823] Call Trace:
[ 91.065826] <TASK>
[ 91.065829] ? lzma_main+0x37a/0x1260
[ 91.065845] lzma2_lzma+0x2b9/0x430
[ 91.065857] xz_dec_lzma2_run+0x11f/0xb90
[ 91.065867] ? __asan_load4+0x55/0xa0
[ 91.065880] xz_dec_run+0x346/0x11f0
[ 91.065892] squashfs_xz_uncompress+0x196/0x370
[ 91.065905] ? lzo_uncompress+0x400/0x400
[ 91.065913] squashfs_decompress+0x88/0xd0
[ 91.065923] squashfs_read_data+0x1e5/0x900
[ 91.065930] ? __create_object+0x4ae/0x560
[ 91.065942] ? squashfs_bio_read.isra.3+0x230/0x230
[ 91.065951] ? __kasan_kmalloc+0xb6/0xc0
[ 91.065961] ? squashfs_page_actor_init_special+0x1a6/0x210
[ 91.065972] squashfs_readahead+0xaa3/0xe80
[ 91.065985] ? squashfs_fill_page+0x190/0x190
[ 91.065993] ? __filemap_add_folio+0x3a1/0x680
[ 91.066003] ? dio_warn_stale_pagecache.part.67+0x90/0x90
[ 91.066012] read_pages+0x122/0x540
[ 91.066023] ? file_ra_state_init+0x60/0x60
[ 91.066032] ? filemap_add_folio+0xd4/0x140
[ 91.066040] ? folio_alloc+0x1b/0x50
[ 91.066051] page_cache_ra_unbounded+0x1e6/0x280
[ 91.066064] do_page_cache_ra+0x7c/0x90
[ 91.066074] page_cache_ra_order+0x393/0x400
[ 91.066087] ondemand_readahead+0x2f1/0x4e0
[ 91.066098] page_cache_async_ra+0x8b/0xa0
[ 91.066106] filemap_fault+0x742/0x1490
[ 91.066113] ? __folio_memcg_unlock+0x35/0x80
[ 91.066124] ? read_cache_page_gfp+0x90/0x90
[ 91.066132] ? filemap_map_pages+0x28e/0xc60
[ 91.066145] __do_fault+0x76/0x1b0
[ 91.066154] do_fault+0x1c6/0x680
[ 91.066163] __handle_mm_fault+0x89a/0x1310
[ 91.066173] ? copy_page_range+0x1b20/0x1b20
[ 91.066181] ? mt_find+0x189/0x330
[ 91.066190] ? mas_next_entry+0xa80/0xa80
[ 91.066204] handle_mm_fault+0x11b/0x390
[ 91.066213] do_user_addr_fault+0x258/0x860
[ 91.066225] exc_page_fault+0x64/0xf0
[ 91.066235] asm_exc_page_fault+0x27/0x30
[ 91.066245] RIP: 0033:0x7fdeb7a1e541
[ 91.066252] Code: 11 44 b8 e0 0f 11 44 b8 f0 0f 11 04 b8 48 83 c7 40 48 83 c6 f8 75 92 48 85 d2 74 2d 4c 01 d7 49 8d 04 b9 48 83
c0 10 48 f7 da <0f> 28 05 18 7d 29 00 0f 1f 84 00 00 00 00 00 0f 11 40 f0 0f 11 00
[ 91.066259] RSP: 002b:00007fff46f77b60 EFLAGS: 00010293
[ 91.066265] RAX: 000055fbe1a57c30 RBX: 00007fff46f77d18 RCX: 000055fbe1a57c20
[ 91.066270] RDX: fffffffffffffffb RSI: 0000000000000005 RDI: 0000000000000000
[ 91.066274] RBP: 000055fbe16563b0 R08: 0000000000000028 R09: 000055fbe1a57c20
[ 91.066279] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000002b
[ 91.066283] R13: 000055fbe1656408 R14: 00007fff46f77b80 R15: 0000000000000000
[ 91.066292] </TASK>
(This is apparently only visible in CONFIG_KASAN=y build.)
>> Yes, you guys do an amasing job of keeping 30 million lines of code organised
>> and making some sense. I will cut the smalltalk as I know you are a busy man.
>> If I make a progress to actually produce any patches fixing these lockups and
>> stalls, I will be sure to include you into CC: as you requested.
>
> Looking forward to seeing what you come up with!
There will have to be a lot of homework to catch up with to before I'd be able to do anything
sensible. :)
Thanks,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union
On Wed, Nov 23, 2022 at 10:14:07AM +0100, Mirsad Goran Todorovac wrote:
> On 22. 11. 2022. 03:07, Paul E. McKenney wrote:
>
> > > I'm afraid that I would lose in Far Cry miserably if my cores
> > > decided to all lock up for 21 secs. :-(
> >
> > Agreed, 21 seconds is an improvement over the earlier 60 seconds, but
> > still a very long time. Me, I come from DYNIX/ptx, where the equivalent
> > to the RCU CPU stall warning was 1.5 seconds. On the other hand, it
> > is also the case that DYNIX/ptx had nowhere near the variety of drivers
> > and subsystems, nor did it scale anywhere near as far as Linux does today.
> >
> > But you only need one CPU to lock up for 21 seconds to get an RCU CPU
> > stall warning, not all of them. ;-)
>
> I can recall an occasion or a couple of them where the entire X Window system
> had been unresponsive for quite a number of seconds that sometimes made me reset
> the Ubuntu box.
>
> I have the good news: the patches did not apply because they were already applied
> in the mainline tree:
>
> mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git log | grep -C5 28b3ae426598
> Signed-off-by: Borislav Petkov <[email protected]>
> Reviewed-by: Kees Cook <[email protected]>
> Acked-by: Florian Weimer <[email protected]>
> Link: https://lore.kernel.org/r/898932fe61db6a9d61bc2458fa2f6049f1ca9f5c.1652290558.git.luto@kernel.org
>
> commit 28b3ae426598e722cf5d5ab9cc7038791b955a56
> Author: Uladzislau Rezki <[email protected]>
> Date: Wed Feb 16 14:52:09 2022 +0100
>
> rcu: Introduce CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
>
> mtodorov@domac:~/linux/kernel/linux_stable_build_b$ git log | grep -C5 1045a06724f3
> Somehow kernel-doc complains here about strong markup, but
> we really don't need the [] so just remove that.
>
> Signed-off-by: Johannes Berg <[email protected]>
>
> commit 1045a06724f322ed61f1ffb994427c7bdbe64647
> Author: Christoph Hellwig <[email protected]>
> Date: Wed Jun 29 17:01:02 2022 +0200
>
> remove CONFIG_ANDROID
>
> mtodorov@domac:~/linux/kernel/linux_stable_build_b$
OK, very good.
But your CONFIG_RCU_EXP_CPU_STALL_TIMEOUT seems to be about 20.
As in 20 milliseconds.
Is that intentional?
> > > This is at present just the wishful thinking, as I lack your 30 years of
> > > experience with the kernel and RCU update system. I am only beginning to realise
> > > why it is more efficient than the traditional locking, and IMHO it should
> > > avoid locking up cores instead of increasing the number of complaints.
> >
> > Just to set the record straight, RCU does not normally lock up any of
> > the cores. Instead, RCU detects that cores have been locked up.
> >
> > Give or take the occasional bug in RCU, of course!
>
> Currently, I cannot be the judge of that, for I can't seem to understand how the
> magic of RCU works., how it is implemented. There's more homework to be done ;-)
The high-level specification is dead simple. Calls synchronize_rcu()
must wait for any CPU or task that has already executed rcu_read_lock()
to reach the corresponding rcu_read_unlock(). For more requirements:
Documentation/RCU/Design/Requirements/Requirements.rst
> > > But even if the Linux kernel source is magically "memory mapped" into my
> > > mind, I still do not see how it could be done. My Linux kernel learning curve
> > > had not yet got that up, but I have no doubts that it is designed by
> > > Intelligent Designers who are very witty people, and not village idiots ;-)
> >
> > There is the school of thought that claims that the Linux kernel is
> > driven by evolutionary forces rather than intelligent design. And as
> > we all know, evolutionary forces are driven by random changes, which
> > absolutely anyone could make.
>
> Give or take the rate of improbability where a bunch of monkeys randomly typing
> would produce a working Linux kernel source would be about a couple of working
> sources in a space of 96^30,000,00 (something like 10^300,000,000), it is comparable
> to the probability of random coming of the first simplest DNA into the existence
> from the amino acid primordial soup.
>
> (Not that many atoms in the Universe - 10^82, you'd need an awful lot of wasted
> multiverses with no even single cell life and certainly no working Linux kernels.)
Just apply continuous integration to each monkey's output. Just like
was done to the strands of amino acids.
(Sorry, couldn't resist!)
> > And one approach is to take a less aggressive RCU CPU stall timeout,
> > say reducing from 21 seconds to (say) 15 seconds instead of all the
> > way down to 20 milliseconds. This could allow you to ease into the
> > latency-reduction work.
> >
> > Alternatively, consider that response time is a property of the
> > entire system plus the environment that it runs in. So I suspect that
> > the Android folks are accompanying that 20-millisecond timeout with
> > some restrictions on what the on-phone workloads are permitted to do.
> > Maybe ask the Android guys what those restrictions are and loosen them
> > slightly, again allowing you to ease into the latency-reduction work.
>
> Good point.
> > Sometimes an NMI does get the CPUs back on track. Sometimes the RCU CPU
> > stall warning is a symptom of the CPU having gotten too old and failing.
> > Most often, though, it is a sign of some sort of lockup, a too-long
> > RCU read-side critical section, or as Robert Elliot noted, the lack of
> > a cond_resched().
> >
> > But please keep in mind that cond_resched() helps only in kernels built
> > with CONFIG_PREEMPTION=n.
>
> I have bad news that 6.1-r6 is still affected with squashfs_xz_uncompress bug, despite having both of your fixes
> (as visible in above command's output -- double checked):
>
> [ 91.065659] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 621 root: 0x8/.
If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
this still happen?
For that
> [ 91.065694] rcu: blocking rcu_node structures (internal RCU debug):
> [ 91.065704] Sending NMI from CPU 5 to CPUs 3:
> [ 91.065721] NMI backtrace for cpu 3
> [ 91.065730] CPU: 3 PID: 2829 Comm: snap-store Not tainted 6.1.0-rc6 #1
> [ 91.065741] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
> [ 91.065746] RIP: 0010:__asan_load4+0x0/0xa0
> [ 91.065764] Code: 9e c0 84 c0 75 e1 5d c3 cc cc cc cc 48 c1 e8 03 80 3c
> 10 00 75 e9 5d c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 <55>
> 48 89 e5 48 8b 4d 08 48 83 ff fb 77 64 eb 0f 0f 1f 00 48 b8 00
> [ 91.065771] RSP: 0000:ffff8881388ef140 EFLAGS: 00000246
> [ 91.065779] RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff9be992fd
> [ 91.065785] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff888125500004
> [ 91.065789] RBP: ffff8881388ef1e0 R08: 0000000000000001 R09: ffffed1024aa0de8
> [ 91.065794] R10: ffff888125506f39 R11: ffffed1024aa0de7 R12: 0000000001067db0
> [ 91.065799] R13: ffff888125500000 R14: 00000000014fe803 R15: ffff888125502112
> [ 91.065804] FS: 00007fdec50ab180(0000) GS:ffff888257180000(0000) knlGS:0000000000000000
> [ 91.065810] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 91.065815] CR2: 00007fdeb7cb6260 CR3: 000000011a436005 CR4: 0000000000770ee0
> [ 91.065820] PKRU: 55555554
> [ 91.065823] Call Trace:
> [ 91.065826] <TASK>
> [ 91.065829] ? lzma_main+0x37a/0x1260
> [ 91.065845] lzma2_lzma+0x2b9/0x430
> [ 91.065857] xz_dec_lzma2_run+0x11f/0xb90
> [ 91.065867] ? __asan_load4+0x55/0xa0
> [ 91.065880] xz_dec_run+0x346/0x11f0
> [ 91.065892] squashfs_xz_uncompress+0x196/0x370
> [ 91.065905] ? lzo_uncompress+0x400/0x400
> [ 91.065913] squashfs_decompress+0x88/0xd0
> [ 91.065923] squashfs_read_data+0x1e5/0x900
> [ 91.065930] ? __create_object+0x4ae/0x560
> [ 91.065942] ? squashfs_bio_read.isra.3+0x230/0x230
> [ 91.065951] ? __kasan_kmalloc+0xb6/0xc0
> [ 91.065961] ? squashfs_page_actor_init_special+0x1a6/0x210
> [ 91.065972] squashfs_readahead+0xaa3/0xe80
> [ 91.065985] ? squashfs_fill_page+0x190/0x190
> [ 91.065993] ? __filemap_add_folio+0x3a1/0x680
> [ 91.066003] ? dio_warn_stale_pagecache.part.67+0x90/0x90
> [ 91.066012] read_pages+0x122/0x540
> [ 91.066023] ? file_ra_state_init+0x60/0x60
> [ 91.066032] ? filemap_add_folio+0xd4/0x140
> [ 91.066040] ? folio_alloc+0x1b/0x50
> [ 91.066051] page_cache_ra_unbounded+0x1e6/0x280
> [ 91.066064] do_page_cache_ra+0x7c/0x90
> [ 91.066074] page_cache_ra_order+0x393/0x400
> [ 91.066087] ondemand_readahead+0x2f1/0x4e0
> [ 91.066098] page_cache_async_ra+0x8b/0xa0
> [ 91.066106] filemap_fault+0x742/0x1490
> [ 91.066113] ? __folio_memcg_unlock+0x35/0x80
> [ 91.066124] ? read_cache_page_gfp+0x90/0x90
> [ 91.066132] ? filemap_map_pages+0x28e/0xc60
> [ 91.066145] __do_fault+0x76/0x1b0
> [ 91.066154] do_fault+0x1c6/0x680
> [ 91.066163] __handle_mm_fault+0x89a/0x1310
> [ 91.066173] ? copy_page_range+0x1b20/0x1b20
> [ 91.066181] ? mt_find+0x189/0x330
> [ 91.066190] ? mas_next_entry+0xa80/0xa80
> [ 91.066204] handle_mm_fault+0x11b/0x390
> [ 91.066213] do_user_addr_fault+0x258/0x860
> [ 91.066225] exc_page_fault+0x64/0xf0
> [ 91.066235] asm_exc_page_fault+0x27/0x30
> [ 91.066245] RIP: 0033:0x7fdeb7a1e541
> [ 91.066252] Code: 11 44 b8 e0 0f 11 44 b8 f0 0f 11 04 b8 48 83 c7 40 48
> 83 c6 f8 75 92 48 85 d2 74 2d 4c 01 d7 49 8d 04 b9 48 83 c0 10 48 f7 da <0f>
> 28 05 18 7d 29 00 0f 1f 84 00 00 00 00 00 0f 11 40 f0 0f 11 00
> [ 91.066259] RSP: 002b:00007fff46f77b60 EFLAGS: 00010293
> [ 91.066265] RAX: 000055fbe1a57c30 RBX: 00007fff46f77d18 RCX: 000055fbe1a57c20
> [ 91.066270] RDX: fffffffffffffffb RSI: 0000000000000005 RDI: 0000000000000000
> [ 91.066274] RBP: 000055fbe16563b0 R08: 0000000000000028 R09: 000055fbe1a57c20
> [ 91.066279] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000002b
> [ 91.066283] R13: 000055fbe1656408 R14: 00007fff46f77b80 R15: 0000000000000000
> [ 91.066292] </TASK>
>
> (This is apparently only visible in CONFIG_KASAN=y build.)
And KASAN does add some overhead.
So maybe the value of CONFIG_RCU_EXP_CPU_STALL_TIMEOUT needs to be a
function of CONFIG_KASAN in your build.
> > > Yes, you guys do an amasing job of keeping 30 million lines of code organised
> > > and making some sense. I will cut the smalltalk as I know you are a busy man.
> > > If I make a progress to actually produce any patches fixing these lockups and
> > > stalls, I will be sure to include you into CC: as you requested.
> >
> > Looking forward to seeing what you come up with!
>
> There will have to be a lot of homework to catch up with to before I'd be able to do anything
> sensible. :)
Doing the homework up front can be a good thing! ;-)
Thanx, Paul
> Thanks,
> Mirsad
>
> --
> Mirsad Goran Todorovac
> Sistem inženjer
> Grafički fakultet | Akademija likovnih umjetnosti
> Sveučilište u Zagrebu
> --
> System engineer
> Faculty of Graphic Arts | Academy of Fine Arts
> University of Zagreb, Republic of Croatia
> The European Union
>
> -----Original Message-----
> From: Paul E. McKenney <[email protected]>
> Sent: Wednesday, November 23, 2022 8:44 AM
> Subject: Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls
> in squashfs_readahead())
>
> On Wed, Nov 23, 2022 at 10:14:07AM +0100, Mirsad Goran Todorovac wrote:
...
> > commit 1045a06724f322ed61f1ffb994427c7bdbe64647
> > Author: Christoph Hellwig <[email protected]>
> > Date: Wed Jun 29 17:01:02 2022 +0200
> >
> > remove CONFIG_ANDROID
>
> But your CONFIG_RCU_EXP_CPU_STALL_TIMEOUT seems to be about 20.
> As in 20 milliseconds.
>
> Is that intentional?
...
> > > And one approach is to take a less aggressive RCU CPU stall timeout,
> > > say reducing from 21 seconds to (say) 15 seconds instead of all the
> > > way down to 20 milliseconds. This could allow you to ease into the
> > > latency-reduction work.
> > >
> > > Alternatively, consider that response time is a property of the
> > > entire system plus the environment that it runs in. So I suspect that
> > > the Android folks are accompanying that 20-millisecond timeout with
> > > some restrictions on what the on-phone workloads are permitted to do.
> > > Maybe ask the Android guys what those restrictions are and loosen them
> > > slightly, again allowing you to ease into the latency-reduction work.
Since there is no longer a kernel default of 20 ms for Android since
commit 1045a06724f3 ("remove CONFIG_ANDROID"), it might be appropriate
to drop that from stallwarn.rst. Something like:
diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
index e38c587067fc..d86a8b47504f 100644
--- a/Documentation/RCU/stallwarn.rst
+++ b/Documentation/RCU/stallwarn.rst
@@ -168,10 +168,11 @@ CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
Same as the CONFIG_RCU_CPU_STALL_TIMEOUT parameter but only for
the expedited grace period. This parameter defines the period
of time that RCU will wait from the beginning of an expedited
- grace period until it issues an RCU CPU stall warning. This time
- period is normally 20 milliseconds on Android devices. A zero
- value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be used,
- after conversion to milliseconds.
+ grace period until it issues an RCU CPU stall warning.
+
+ A zero value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be
+ used, after conversion to milliseconds, limited to a maximum of
+ 21 seconds.
This configuration parameter may be changed at runtime via the
/sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout, however
diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
index 1b0c41d490f0..4477eeb8a54f 100644
--- a/kernel/rcu/Kconfig.debug
+++ b/kernel/rcu/Kconfig.debug
@@ -93,7 +93,7 @@ config RCU_EXP_CPU_STALL_TIMEOUT
If the RCU grace period persists, additional CPU stall warnings
are printed at more widely spaced intervals. A value of zero
says to use the RCU_CPU_STALL_TIMEOUT value converted from
- seconds to milliseconds.
+ seconds to milliseconds, limited to a maximum of 21 seconds.
> > > But please keep in mind that cond_resched() helps only in kernels
> > > built with CONFIG_PREEMPTION=n.
I think they also help with
CONFIG_PREEMPT_VOLUNTARY=y
which is used by Fedora and RHEL. For example, adding more of those in
the crypto tcrypt test module eliminated some RCU stalls when running
speed tests (that send lots of data through the algorithms).
> > I have bad news that 6.1-r6 is still affected with
> > squashfs_xz_uncompress bug, despite having both of your fixes
> > (as visible in above command's output -- double checked):
> >
> > [ 91.065659] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 621 root: 0x8/.
>
> If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
> this still happen?
BTW, you don't need to rebuild the kernel to change those parameters; they're
module parameters, so can be modified on the kernel command line (if needed
during boot) and sysfs (if only needed after boot).
For sysfs the syntax is:
#!/bin/bash
# set rcu timeouts to specified values
echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
On Wed, Nov 23, 2022 at 04:08:03PM +0000, Elliott, Robert (Servers) wrote:
> > -----Original Message-----
> > From: Paul E. McKenney <[email protected]>
> > Sent: Wednesday, November 23, 2022 8:44 AM
> > Subject: Re: BUG: BISECTED: in squashfs_xz_uncompress() (Was: RCU stalls
> > in squashfs_readahead())
> >
> > On Wed, Nov 23, 2022 at 10:14:07AM +0100, Mirsad Goran Todorovac wrote:
> ...
>
> > > commit 1045a06724f322ed61f1ffb994427c7bdbe64647
> > > Author: Christoph Hellwig <[email protected]>
> > > Date: Wed Jun 29 17:01:02 2022 +0200
> > >
> > > remove CONFIG_ANDROID
> >
> > But your CONFIG_RCU_EXP_CPU_STALL_TIMEOUT seems to be about 20.
> > As in 20 milliseconds.
> >
> > Is that intentional?
> ...
> > > > And one approach is to take a less aggressive RCU CPU stall timeout,
> > > > say reducing from 21 seconds to (say) 15 seconds instead of all the
> > > > way down to 20 milliseconds. This could allow you to ease into the
> > > > latency-reduction work.
> > > >
> > > > Alternatively, consider that response time is a property of the
> > > > entire system plus the environment that it runs in. So I suspect that
> > > > the Android folks are accompanying that 20-millisecond timeout with
> > > > some restrictions on what the on-phone workloads are permitted to do.
> > > > Maybe ask the Android guys what those restrictions are and loosen them
> > > > slightly, again allowing you to ease into the latency-reduction work.
>
> Since there is no longer a kernel default of 20 ms for Android since
> commit 1045a06724f3 ("remove CONFIG_ANDROID"), it might be appropriate
> to drop that from stallwarn.rst. Something like:
Fair point, except that the Android guys asked for this. In fact, the
Android guys supplied the patches. I like the traceability.
> diff --git a/Documentation/RCU/stallwarn.rst b/Documentation/RCU/stallwarn.rst
> index e38c587067fc..d86a8b47504f 100644
> --- a/Documentation/RCU/stallwarn.rst
> +++ b/Documentation/RCU/stallwarn.rst
> @@ -168,10 +168,11 @@ CONFIG_RCU_EXP_CPU_STALL_TIMEOUT
> Same as the CONFIG_RCU_CPU_STALL_TIMEOUT parameter but only for
> the expedited grace period. This parameter defines the period
> of time that RCU will wait from the beginning of an expedited
> - grace period until it issues an RCU CPU stall warning. This time
> - period is normally 20 milliseconds on Android devices. A zero
> - value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be used,
> - after conversion to milliseconds.
> + grace period until it issues an RCU CPU stall warning.
> +
> + A zero value causes the CONFIG_RCU_CPU_STALL_TIMEOUT value to be
> + used, after conversion to milliseconds, limited to a maximum of
> + 21 seconds.
>
> This configuration parameter may be changed at runtime via the
> /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout, however
> diff --git a/kernel/rcu/Kconfig.debug b/kernel/rcu/Kconfig.debug
> index 1b0c41d490f0..4477eeb8a54f 100644
> --- a/kernel/rcu/Kconfig.debug
> +++ b/kernel/rcu/Kconfig.debug
> @@ -93,7 +93,7 @@ config RCU_EXP_CPU_STALL_TIMEOUT
> If the RCU grace period persists, additional CPU stall warnings
> are printed at more widely spaced intervals. A value of zero
> says to use the RCU_CPU_STALL_TIMEOUT value converted from
> - seconds to milliseconds.
> + seconds to milliseconds, limited to a maximum of 21 seconds.
>
> > > > But please keep in mind that cond_resched() helps only in kernels
> > > > built with CONFIG_PREEMPTION=n.
>
> I think they also help with
> CONFIG_PREEMPT_VOLUNTARY=y
> which is used by Fedora and RHEL. For example, adding more of those in
> the crypto tcrypt test module eliminated some RCU stalls when running
> speed tests (that send lots of data through the algorithms).
>
> > > I have bad news that 6.1-r6 is still affected with
> > > squashfs_xz_uncompress bug, despite having both of your fixes
> > > (as visible in above command's output -- double checked):
> > >
> > > [ 91.065659] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 3-.... } 6 jiffies s: 621 root: 0x8/.
> >
> > If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
> > this still happen?
>
> BTW, you don't need to rebuild the kernel to change those parameters; they're
> module parameters, so can be modified on the kernel command line (if needed
> during boot) and sysfs (if only needed after boot).
>
> For sysfs the syntax is:
> #!/bin/bash
> # set rcu timeouts to specified values
> echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
> echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
> echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
> grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
Excellent point, thank you!
I hope that this makes Mirsad's life easier, perhaps featuring less time
waiting for kernel builds and reboots. ;-)
Thanx, Paul
On 23. 11. 2022. 20:09, Paul E. McKenney wrote:
>>> If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
>>> this still happen?
>>
>> BTW, you don't need to rebuild the kernel to change those parameters; they're
>> module parameters, so can be modified on the kernel command line (if needed
>> during boot) and sysfs (if only needed after boot).
>>
>> For sysfs the syntax is:
>> #!/bin/bash
>> # set rcu timeouts to specified values
>> echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
>> echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
>> echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
>> grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
>
> Excellent point, thank you!
>
> I hope that this makes Mirsad's life easier, perhaps featuring less time
> waiting for kernel builds and reboots. ;-)
Unfortunately, the first stall and NMI occurs before any system script or setting a /sys/module/rcupdate/parameters/*
could be executed, as second 14 of the boot process:
[ 14.320045] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6 jiffies s: 105 root: 0x80/.
[ 14.320064] rcu: blocking rcu_node structures (internal RCU debug):
[ 14.320070] Sending NMI from CPU 1 to CPUs 7:
[ 14.320081] NMI backtrace for cpu 7
[ 14.320085] CPU: 7 PID: 532 Comm: modprobe Not tainted 6.1.0-rc6 #1
[ 14.320090] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
[ 14.320093] RIP: 0010:find_kallsyms_symbol+0x1ba/0x2e0
[ 14.320103] Code: 24 02 84 c0 74 24 3c 2e 48 8b 55 a0 74 1c 48 39 55 b8 0f 82 f8 00 00 00 48 3b 55 c0 0f 86 ee 00 00 00 48 89 55
c0 44 89 6d ac <41> 83 c5 01 48 83
c3 18 45 39 fd 0f 85 2d ff ff ff 8b 45 ac 4c 8b
[ 14.320107] RSP: 0018:ffff88811ab5f4f0 EFLAGS: 00000202
[ 14.320113] RAX: 0000000000000000 RBX: ffffffffc1d1429e RCX: ffffffffb1c378e5
[ 14.320116] RDX: ffffc900066e11e0 RSI: dffffc0000000000 RDI: ffffffffc1d142a0
[ 14.320120] RBP: ffff88811ab5f560 R08: 0000000000000000 R09: ffff88811ab5f670
[ 14.320123] R10: ffff88811ab5f90f R11: ffffed102356bf21 R12: ffffffffc1d33bd9
[ 14.320126] R13: 0000000000000171 R14: 000000000000005f R15: 00000000000014ed
[ 14.320128] FS: 00007f878ce39000(0000) GS:ffff888257380000(0000) knlGS:0000000000000000
[ 14.320132] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 14.320135] CR2: 00007f13ac19cae0 CR3: 000000010c73e001 CR4: 0000000000770ee0
[ 14.320138] PKRU: 55555554
[ 14.320140] Call Trace:
[ 14.320142] <TASK>
[ 14.320145] ? __SCT__tp_func_raid56_scrub_read_recover+0x8/0x8 [btrfs]
[ 14.320318] ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
[ 14.320485] ? btrfs_show_options+0x860/0x860 [btrfs]
[ 14.320647] ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
[ 14.320816] module_address_lookup+0x7a/0xc0
[ 14.320822] ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
[ 14.320987] kallsyms_lookup_buildid+0x172/0x200
[ 14.320993] ? do_kernel_range_flush+0x60/0x60
[ 14.321002] kallsyms_lookup+0x14/0x20
[ 14.321007] test_for_valid_rec.isra.38+0x95/0x110
[ 14.321015] ? remove_direct_functions_hash.isra.37+0x1c0/0x1c0
[ 14.321021] ? _find_next_bit+0x42/0xe0
[ 14.321037] ? __kasan_check_write+0x14/0x20
[ 14.321043] ? mutex_lock+0x8f/0xe0
[ 14.321049] ? __mutex_lock_slowpath+0x20/0x20
[ 14.321056] ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
[ 14.321220] ftrace_module_enable+0x15f/0x460
[ 14.321228] load_module+0x323e/0x3800
[ 14.321233] ? __kernel_read+0x304/0x450
[ 14.321246] ? module_frob_arch_sections+0x40/0x40
[ 14.321252] ? security_kernel_post_read_file+0x6d/0x80
[ 14.321259] ? __kasan_check_write+0x14/0x20
[ 14.321264] ? kernel_read_file+0x20a/0x490
[ 14.321270] ? __kasan_check_read+0x11/0x20
[ 14.321275] ? __fget_light+0xb6/0x220
[ 14.321283] __do_sys_finit_module+0x131/0x1d0
[ 14.321288] ? __do_sys_finit_module+0x131/0x1d0
[ 14.321293] ? __ia32_sys_init_module+0x50/0x50
[ 14.321297] ? vm_mmap_pgoff+0x1a8/0x240
[ 14.321307] ? __kasan_check_write+0x14/0x20
[ 14.321311] ? fput+0x1d/0xd0
[ 14.321317] __x64_sys_finit_module+0x43/0x50
[ 14.321322] do_syscall_64+0x58/0x80
[ 14.321330] ? exit_to_user_mode_prepare+0x32/0x190
[ 14.321337] ? syscall_exit_to_user_mode+0x1b/0x30
[ 14.321343] ? do_syscall_64+0x67/0x80
[ 14.321348] ? fpregs_assert_state_consistent+0x5e/0x70
[ 14.321356] ? exit_to_user_mode_prepare+0x32/0x190
[ 14.321362] ? syscall_exit_to_user_mode+0x1b/0x30
[ 14.321367] ? do_syscall_64+0x67/0x80
[ 14.321372] ? syscall_exit_to_user_mode+0x1b/0x30
[ 14.321377] ? do_syscall_64+0x67/0x80
[ 14.321382] ? do_syscall_64+0x67/0x80
[ 14.321388] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 14.321397] RIP: 0033:0x7f878c51ea3d
[ 14.321401] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89
f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff
73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
[ 14.321405] RSP: 002b:00007ffd078d4c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 14.321410] RAX: ffffffffffffffda RBX: 0000558c0d8830a0 RCX: 00007f878c51ea3d
[ 14.321414] RDX: 0000000000000000 RSI: 0000558c0d6fdcd2 RDI: 0000000000000005
[ 14.321417] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000002
[ 14.321419] R10: 0000000000000005 R11: 0000000000000246 R12: 0000558c0d6fdcd2
[ 14.321422] R13: 0000558c0d883060 R14: 0000558c0d8824f0 R15: 0000558c0d88f5d0
[ 14.321428] </TASK>
Probably something sensible should be set in the case of KASAN build. This example of stall
apparently has nothing to do with squashfs_readahead().
Thanks,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union
On Thu, Nov 24, 2022 at 06:06:13PM +0100, Mirsad Goran Todorovac wrote:
> On 23. 11. 2022. 20:09, Paul E. McKenney wrote:
>
> > > > If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
> > > > this still happen?
> > >
> > > BTW, you don't need to rebuild the kernel to change those parameters; they're
> > > module parameters, so can be modified on the kernel command line (if needed
> > > during boot) and sysfs (if only needed after boot).
> > >
> > > For sysfs the syntax is:
> > > #!/bin/bash
> > > # set rcu timeouts to specified values
> > > echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
> > > echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
> > > echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
> > > grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
> >
> > Excellent point, thank you!
> >
> > I hope that this makes Mirsad's life easier, perhaps featuring less time
> > waiting for kernel builds and reboots. ;-)
>
> Unfortunately, the first stall and NMI occurs before any system script or setting a /sys/module/rcupdate/parameters/*
> could be executed, as second 14 of the boot process:
>
> [ 14.320045] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6 jiffies s: 105 root: 0x80/.
> [ 14.320064] rcu: blocking rcu_node structures (internal RCU debug):
> [ 14.320070] Sending NMI from CPU 1 to CPUs 7:
> [ 14.320081] NMI backtrace for cpu 7
> [ 14.320085] CPU: 7 PID: 532 Comm: modprobe Not tainted 6.1.0-rc6 #1
> [ 14.320090] Hardware name: LENOVO 82H8/LNVNB161216, BIOS GGCN49WW 07/21/2022
> [ 14.320093] RIP: 0010:find_kallsyms_symbol+0x1ba/0x2e0
> [ 14.320103] Code: 24 02 84 c0 74 24 3c 2e 48 8b 55 a0 74 1c 48 39 55 b8
> 0f 82 f8 00 00 00 48 3b 55 c0 0f 86 ee 00 00 00 48 89 55 c0 44 89 6d ac <41>
> 83 c5 01 48 83
> c3 18 45 39 fd 0f 85 2d ff ff ff 8b 45 ac 4c 8b
> [ 14.320107] RSP: 0018:ffff88811ab5f4f0 EFLAGS: 00000202
> [ 14.320113] RAX: 0000000000000000 RBX: ffffffffc1d1429e RCX: ffffffffb1c378e5
> [ 14.320116] RDX: ffffc900066e11e0 RSI: dffffc0000000000 RDI: ffffffffc1d142a0
> [ 14.320120] RBP: ffff88811ab5f560 R08: 0000000000000000 R09: ffff88811ab5f670
> [ 14.320123] R10: ffff88811ab5f90f R11: ffffed102356bf21 R12: ffffffffc1d33bd9
> [ 14.320126] R13: 0000000000000171 R14: 000000000000005f R15: 00000000000014ed
> [ 14.320128] FS: 00007f878ce39000(0000) GS:ffff888257380000(0000) knlGS:0000000000000000
> [ 14.320132] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 14.320135] CR2: 00007f13ac19cae0 CR3: 000000010c73e001 CR4: 0000000000770ee0
> [ 14.320138] PKRU: 55555554
> [ 14.320140] Call Trace:
> [ 14.320142] <TASK>
> [ 14.320145] ? __SCT__tp_func_raid56_scrub_read_recover+0x8/0x8 [btrfs]
> [ 14.320318] ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
> [ 14.320485] ? btrfs_show_options+0x860/0x860 [btrfs]
> [ 14.320647] ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
> [ 14.320816] module_address_lookup+0x7a/0xc0
> [ 14.320822] ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
> [ 14.320987] kallsyms_lookup_buildid+0x172/0x200
> [ 14.320993] ? do_kernel_range_flush+0x60/0x60
> [ 14.321002] kallsyms_lookup+0x14/0x20
> [ 14.321007] test_for_valid_rec.isra.38+0x95/0x110
> [ 14.321015] ? remove_direct_functions_hash.isra.37+0x1c0/0x1c0
> [ 14.321021] ? _find_next_bit+0x42/0xe0
> [ 14.321037] ? __kasan_check_write+0x14/0x20
> [ 14.321043] ? mutex_lock+0x8f/0xe0
> [ 14.321049] ? __mutex_lock_slowpath+0x20/0x20
> [ 14.321056] ? btrfs_free_reserved_bytes+0xe0/0xe0 [btrfs]
> [ 14.321220] ftrace_module_enable+0x15f/0x460
> [ 14.321228] load_module+0x323e/0x3800
> [ 14.321233] ? __kernel_read+0x304/0x450
> [ 14.321246] ? module_frob_arch_sections+0x40/0x40
> [ 14.321252] ? security_kernel_post_read_file+0x6d/0x80
> [ 14.321259] ? __kasan_check_write+0x14/0x20
> [ 14.321264] ? kernel_read_file+0x20a/0x490
> [ 14.321270] ? __kasan_check_read+0x11/0x20
> [ 14.321275] ? __fget_light+0xb6/0x220
> [ 14.321283] __do_sys_finit_module+0x131/0x1d0
> [ 14.321288] ? __do_sys_finit_module+0x131/0x1d0
> [ 14.321293] ? __ia32_sys_init_module+0x50/0x50
> [ 14.321297] ? vm_mmap_pgoff+0x1a8/0x240
> [ 14.321307] ? __kasan_check_write+0x14/0x20
> [ 14.321311] ? fput+0x1d/0xd0
> [ 14.321317] __x64_sys_finit_module+0x43/0x50
> [ 14.321322] do_syscall_64+0x58/0x80
> [ 14.321330] ? exit_to_user_mode_prepare+0x32/0x190
> [ 14.321337] ? syscall_exit_to_user_mode+0x1b/0x30
> [ 14.321343] ? do_syscall_64+0x67/0x80
> [ 14.321348] ? fpregs_assert_state_consistent+0x5e/0x70
> [ 14.321356] ? exit_to_user_mode_prepare+0x32/0x190
> [ 14.321362] ? syscall_exit_to_user_mode+0x1b/0x30
> [ 14.321367] ? do_syscall_64+0x67/0x80
> [ 14.321372] ? syscall_exit_to_user_mode+0x1b/0x30
> [ 14.321377] ? do_syscall_64+0x67/0x80
> [ 14.321382] ? do_syscall_64+0x67/0x80
> [ 14.321388] entry_SYSCALL_64_after_hwframe+0x63/0xcd
> [ 14.321397] RIP: 0033:0x7f878c51ea3d
> [ 14.321401] Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89
> f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff
> 73 01 c3 48 8b 0d c3 a3 0f 00 f7 d8 64 89 01 48
> [ 14.321405] RSP: 002b:00007ffd078d4c88 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> [ 14.321410] RAX: ffffffffffffffda RBX: 0000558c0d8830a0 RCX: 00007f878c51ea3d
> [ 14.321414] RDX: 0000000000000000 RSI: 0000558c0d6fdcd2 RDI: 0000000000000005
> [ 14.321417] RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000002
> [ 14.321419] R10: 0000000000000005 R11: 0000000000000246 R12: 0000558c0d6fdcd2
> [ 14.321422] R13: 0000558c0d883060 R14: 0000558c0d8824f0 R15: 0000558c0d88f5d0
> [ 14.321428] </TASK>
>
> Probably something sensible should be set in the case of KASAN build. This example of stall
> apparently has nothing to do with squashfs_readahead().
Can't have everything, I guess!
How about building your kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200?
Again, mainline defaults to 21000.
Thanx, Paul
On 24. 11. 2022. 18:19, Paul E. McKenney wrote:
> On Thu, Nov 24, 2022 at 06:06:13PM +0100, Mirsad Goran Todorovac wrote:
>> On 23. 11. 2022. 20:09, Paul E. McKenney wrote:
>>
>>>>> If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
>>>>> this still happen?
>>>>
>>>> BTW, you don't need to rebuild the kernel to change those parameters; they're
>>>> module parameters, so can be modified on the kernel command line (if needed
>>>> during boot) and sysfs (if only needed after boot).
>>>>
>>>> For sysfs the syntax is:
>>>> #!/bin/bash
>>>> # set rcu timeouts to specified values
>>>> echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
>>>> echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
>>>> echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
>>>> grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
>>>
>>> Excellent point, thank you!
>>>
>>> I hope that this makes Mirsad's life easier, perhaps featuring less time
>>> waiting for kernel builds and reboots. ;-)
>>
>> Unfortunately, the first stall and NMI occurs before any system script or setting a /sys/module/rcupdate/parameters/*
>> could be executed, as second 14 of the boot process:
>>
>> [ 14.320045] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6 jiffies s: 105 root: 0x80/.
>> [ 14.320064] rcu: blocking rcu_node structures (internal RCU debug):
...
>> Probably something sensible should be set in the case of KASAN build. This example of stall
>> apparently has nothing to do with squashfs_readahead().
>
> Can't have everything, I guess!
>
> How about building your kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200?
> Again, mainline defaults to 21000.
Did just that, and so far there is no modprobe stall in second 14 of boot at least. Looks good.
Probably it is too early to say anything in general before more uptime and stress load.
BTW, the 20 for CONFIG_RCU_EXP_STALL_TIMEOUT wasn't my invention, but it comes from generic
Ubuntu stock kernel (but without KASAN or KMEMLEAK config options):
# grep STALL /boot/config-5.19.5-051905-generic
CONFIG_RCU_STALL_COMMON=y
# CONFIG_HEADERS_INSTALL is not set
CONFIG_RCU_CPU_STALL_TIMEOUT=60
CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20
#
Thanks,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union
On 24/11/2022 18:04, Mirsad Goran Todorovac wrote:
> On 24. 11. 2022. 18:19, Paul E. McKenney wrote:
>> On Thu, Nov 24, 2022 at 06:06:13PM +0100, Mirsad Goran Todorovac wrote:
>>> On 23. 11. 2022. 20:09, Paul E. McKenney wrote:
>>>
>>>>>> If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
>>>>>> this still happen?
>>>>>
>>>>> BTW, you don't need to rebuild the kernel to change those
>>>>> parameters; they're
>>>>> module parameters, so can be modified on the kernel command line
>>>>> (if needed
>>>>> during boot) and sysfs (if only needed after boot).
>>>>>
>>>>> For sysfs the syntax is:
>>>>> #!/bin/bash
>>>>> # set rcu timeouts to specified values
>>>>> echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
>>>>> echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
>>>>> echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
>>>>> grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
>>>>
>>>> Excellent point, thank you!
>>>>
>>>> I hope that this makes Mirsad's life easier, perhaps featuring less
>>>> time
>>>> waiting for kernel builds and reboots. ;-)
>>>
>>> Unfortunately, the first stall and NMI occurs before any system
>>> script or setting a /sys/module/rcupdate/parameters/*
>>> could be executed, as second 14 of the boot process:
>>>
>>> [ 14.320045] rcu: INFO: rcu_preempt detected expedited stalls on
>>> CPUs/tasks: { 7-.... } 6 jiffies s: 105 root: 0x80/.
>>> [ 14.320064] rcu: blocking rcu_node structures (internal RCU debug):
>
> ...
>
>>> Probably something sensible should be set in the case of KASAN build.
>>> This example of stall
>>> apparently has nothing to do with squashfs_readahead().
>>
>> Can't have everything, I guess!
>>
>> How about building your kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200?
>> Again, mainline defaults to 21000.
>
> Did just that, and so far there is no modprobe stall in second 14 of
> boot at least. Looks good.
> Probably it is too early to say anything in general before more uptime
> and stress load.
>
> BTW, the 20 for CONFIG_RCU_EXP_STALL_TIMEOUT wasn't my invention, but it
> comes from generic
> Ubuntu stock kernel (but without KASAN or KMEMLEAK config options):
>
> # grep STALL /boot/config-5.19.5-051905-generic
> CONFIG_RCU_STALL_COMMON=y
> # CONFIG_HEADERS_INSTALL is not set
> CONFIG_RCU_CPU_STALL_TIMEOUT=60
> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 > #
That has been raised as a bug, and a fix has been committed.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1991951
>
> Thanks,
> Mirsad
>
> --
> Mirsad Goran Todorovac
> Sistem inženjer
> Grafički fakultet | Akademija likovnih umjetnosti
> Sveučilište u Zagrebu
On Thu, Nov 24, 2022 at 07:04:13PM +0100, Mirsad Goran Todorovac wrote:
> On 24. 11. 2022. 18:19, Paul E. McKenney wrote:
> > On Thu, Nov 24, 2022 at 06:06:13PM +0100, Mirsad Goran Todorovac wrote:
> > > On 23. 11. 2022. 20:09, Paul E. McKenney wrote:
> > >
> > > > > > If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
> > > > > > this still happen?
> > > > >
> > > > > BTW, you don't need to rebuild the kernel to change those parameters; they're
> > > > > module parameters, so can be modified on the kernel command line (if needed
> > > > > during boot) and sysfs (if only needed after boot).
> > > > >
> > > > > For sysfs the syntax is:
> > > > > #!/bin/bash
> > > > > # set rcu timeouts to specified values
> > > > > echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
> > > > > echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
> > > > > echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
> > > > > grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
> > > >
> > > > Excellent point, thank you!
> > > >
> > > > I hope that this makes Mirsad's life easier, perhaps featuring less time
> > > > waiting for kernel builds and reboots. ;-)
> > >
> > > Unfortunately, the first stall and NMI occurs before any system script or setting a /sys/module/rcupdate/parameters/*
> > > could be executed, as second 14 of the boot process:
> > >
> > > [ 14.320045] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6 jiffies s: 105 root: 0x80/.
> > > [ 14.320064] rcu: blocking rcu_node structures (internal RCU debug):
>
> ...
>
> > > Probably something sensible should be set in the case of KASAN build. This example of stall
> > > apparently has nothing to do with squashfs_readahead().
> >
> > Can't have everything, I guess!
> >
> > How about building your kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200?
> > Again, mainline defaults to 21000.
>
> Did just that, and so far there is no modprobe stall in second 14 of boot at least. Looks good.
> Probably it is too early to say anything in general before more uptime and stress load.
>
> BTW, the 20 for CONFIG_RCU_EXP_STALL_TIMEOUT wasn't my invention, but it comes from generic
> Ubuntu stock kernel (but without KASAN or KMEMLEAK config options):
>
> # grep STALL /boot/config-5.19.5-051905-generic
> CONFIG_RCU_STALL_COMMON=y
> # CONFIG_HEADERS_INSTALL is not set
> CONFIG_RCU_CPU_STALL_TIMEOUT=60
> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20
> #
Gotta admit, those Ubuntu folks do not lack for bravery!!! Even if they
don't also enable KASAN or KMEMLEAK by default. ;-)
Thanx, Paul
On 24. 11. 2022. 20:32, Phillip Lougher wrote:
> On 24/11/2022 18:04, Mirsad Goran Todorovac wrote:
>> On 24. 11. 2022. 18:19, Paul E. McKenney wrote:
>>> On Thu, Nov 24, 2022 at 06:06:13PM +0100, Mirsad Goran Todorovac wrote:
>>>> On 23. 11. 2022. 20:09, Paul E. McKenney wrote:
>>>>
>>>>>>> If you build with (say) CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200, does
>>>>>>> this still happen?
>>>>>>
>>>>>> BTW, you don't need to rebuild the kernel to change those parameters; they're
>>>>>> module parameters, so can be modified on the kernel command line (if needed
>>>>>> during boot) and sysfs (if only needed after boot).
>>>>>>
>>>>>> For sysfs the syntax is:
>>>>>> #!/bin/bash
>>>>>> # set rcu timeouts to specified values
>>>>>> echo 60 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout
>>>>>> echo 21000 > /sys/module/rcupdate/parameters/rcu_exp_cpu_stall_timeout
>>>>>> echo 600000 > /sys/module/rcupdate/parameters/rcu_task_stall_timeout
>>>>>> grep -Hs . /sys/module/rcupdate/parameters/rcu_*_timeout
>>>>>
>>>>> Excellent point, thank you!
>>>>>
>>>>> I hope that this makes Mirsad's life easier, perhaps featuring less time
>>>>> waiting for kernel builds and reboots. ;-)
>>>>
>>>> Unfortunately, the first stall and NMI occurs before any system script or setting a /sys/module/rcupdate/parameters/*
>>>> could be executed, as second 14 of the boot process:
>>>>
>>>> [ 14.320045] rcu: INFO: rcu_preempt detected expedited stalls on CPUs/tasks: { 7-.... } 6 jiffies s: 105 root: 0x80/.
>>>> [ 14.320064] rcu: blocking rcu_node structures (internal RCU debug):
>>
>> ...
>>
>>>> Probably something sensible should be set in the case of KASAN build. This example of stall
>>>> apparently has nothing to do with squashfs_readahead().
>>>
>>> Can't have everything, I guess!
>>>
>>> How about building your kernel with CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=200?
>>> Again, mainline defaults to 21000.
>>
>> Did just that, and so far there is no modprobe stall in second 14 of boot at least. Looks good.
>> Probably it is too early to say anything in general before more uptime and stress load.
>>
>> BTW, the 20 for CONFIG_RCU_EXP_STALL_TIMEOUT wasn't my invention, but it comes from generic
>> Ubuntu stock kernel (but without KASAN or KMEMLEAK config options):
>>
>> # grep STALL /boot/config-5.19.5-051905-generic
>> CONFIG_RCU_STALL_COMMON=y
>> # CONFIG_HEADERS_INSTALL is not set
>> CONFIG_RCU_CPU_STALL_TIMEOUT=60
>> CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 > #
>
> That has been raised as a bug, and a fix has been committed.
>
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1991951
P.S.
As for the comment that I am thrashing my systems, I now test activated MG-LRU kernel option
with 6.1-rc8 build and it functions much better, with no multimedia lags or chirps, even
with only 130/8192 MiB free and 5/10 GiB in swap area.
I am running basically the same load of simultaneously opened Firefox, Chrome and Thunderbird
windows.
However, I have set CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=0
The conclusion is that the squashfs isn't blocking, but 6 to 8 jiffies were not enough to
complete the operation, so other CPUs issued NMIs. With longer timeout, it is evident that
it was a longer operation due to KASAN build and not a lockup.
So I think I have to apologise to have wasted so much of your time with a false alarm.
To summarise, the culprit was obviously the CONFIG_RCU_EXP_CPU_STALL_TIMEOUT=20 setting
from the Ubuntu mainline kernel stock, which I unsuspectedly copied into my build and
made a recommended "make olddefconfig".
Thanks,
Mirsad
--
Mirsad Goran Todorovac
Sistem inženjer
Grafički fakultet | Akademija likovnih umjetnosti
Sveučilište u Zagrebu
--
System engineer
Faculty of Graphic Arts | Academy of Fine Arts
University of Zagreb, Republic of Croatia
The European Union