2023-09-12 15:18:01

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible

On Tue, Sep 12, 2023 at 10:29:30AM -0400, Liam R. Howlett wrote:
> * Liam R. Howlett <[email protected]> [230912 09:56]:
> > * Paul E. McKenney <[email protected]> [230912 06:00]:
> > > On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
> > > > Hi Paul,
> > > >
> > > > On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <[email protected]> wrote:
> > > > > On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> > > > > > On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <[email protected]> wrote:
> > > > > > > On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> > > > > > > > * Paul E. McKenney <[email protected]> [230906 14:03]:
> > > > > > > > > On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> > > > > > > > > > * Paul E. McKenney <[email protected]> [230906 13:24]:
> > > > > > > > > > > On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> > > > > > > > > > > > (Adding Paul & Shanker to Cc list.. please see below for why)
> > > > > > > > > > > >
> > > > > > > > > > > > Apologies on the late response, I was away and have been struggling to
> > > > > > > > > > > > get a working PPC32 test environment.
> > > > > > > > > > > >
> > > > > > > > > > > > * Geert Uytterhoeven <[email protected]> [230829 12:42]:
> > > > > > > > > > > > > Hi Liam,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> > > > > > > > > > > > > > The current implementation of append may cause duplicate data and/or
> > > > > > > > > > > > > > incorrect ranges to be returned to a reader during an update. Although
> > > > > > > > > > > > > > this has not been reported or seen, disable the append write operation
> > > > > > > > > > > > > > while the tree is in rcu mode out of an abundance of caution.
> > > > > > > > > > > >
> > > > > > > > > > > > ...
> > > > > > > > > > > > > >
> > > > > > > >
> > > > > > > > ...
> > > > > > > >
> > > > > > > > > > > > > RCU-related configs:
> > > > > > > > > > > > >
> > > > > > > > > > > > > $ grep RCU .config
> > > > > > > > > > > > > # RCU Subsystem
> > > > > > > > > > > > > CONFIG_TINY_RCU=y
> > > > > > >
> > > > > > > I must have been asleep last time I looked at this. I was looking at
> > > > > > > Tree RCU. Please accept my apologies for my lapse. :-/
> > > > > > >
> > > > > > > However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> > > > > > > have said the same thing, albeit after looking at a lot less RCU code.
> > > > > > >
> > > > > > > TL;DR:
> > > > > > >
> > > > > > > 1. Try making the __setup_irq() function's call to mutex_lock()
> > > > > > > instead be as follows:
> > > > > > >
> > > > > > > if (!mutex_trylock(&desc->request_mutex))
> > > > > > > mutex_lock(&desc->request_mutex);
> > > > > > >
> > > > > > > This might fail if __setup_irq() has other dependencies on a
> > > > > > > fully operational scheduler.
>
> This changes where the interrupts become enabled, but doesn't stop it
> from happening. It still throws a WARN after init_IRQ(). I suspect it
> is not the way to proceed as there are probably many places that will
> need to be changed in both common and arch specific code. As soon as
> TIF_NEED_RESCHED is set, then all the checks will need to be altered.

Thank you for trying it!

> I think we either need to set the boot thread to !idle, avoid call_rcu()
> to set TIF_NEED_RESCHED (how was this working before? Do we need rcu
> for the IRQs?), or alter the boot order (note this is NOT arch or
> platform code here).
>
> I don't like any of these. I'd like another option, please?

My favorite is to move the interrupt enabling later, but Michael Ellerman
would know better than would I about the feasibility of this.

Thanx, Paul

> > > > > > >
> > > > > > > 2. Move that ppc32 call to __setup_irq() much later, most definitely
> > > > > > > after interrupts have been enabled and the scheduler is fully
> > > > > > > operational. Invoking mutex_lock() before that time is not a
> > > > > > > good idea. ;-)
> > > > > >
> > > > > > There is no call to __setup_irq() from arch/powerpc/?
> > > > >
> > > > > Glad it is not just me, given that I didn't see a direct call, either. So
> > > > > later in this email, I asked Liam to put a WARN_ON_ONCE(irqs_disabled())
> > > > > just before that mutex_lock() in __setup_irq().
>
> Oh, and also:
> arch/powerpc/platforms/powermac/setup.c: .init_IRQ = pmac_pic_init,
>
> >
> > I had already found that this is the mutex lock that is enabling them.
> > I surrounded the mutex lock to ensure it was not enabled before, but was
> > after. Here is the findings:
> >
> > kernel/irq/manage.c:1587 __setup_irq:
> > [ 0.000000] [c0e65ec0] [c00e9b00] __setup_irq+0x6c4/0x840 (unreliable)
> > [ 0.000000] [c0e65ef0] [c00e9d74] request_threaded_irq+0xf8/0x1f4
> > [ 0.000000] [c0e65f20] [c0c27168] pmac_pic_init+0x204/0x5f8
> > [ 0.000000] [c0e65f80] [c0c1f544] init_IRQ+0xac/0x12c
> > [ 0.000000] [c0e65fa0] [c0c1cad0] start_kernel+0x544/0x6d4
> >
> > Note your line number will be slightly different due to my debug. This
> > is the WARN _after_ the mutex lock.
> >
> > > > >
> > > > > Either way, invoking mutex_lock() early in boot before interrupts have
> > > > > been enabled is a bad idea. ;-)
> > > >
> > > > I'll add that WARN_ON_ONCE() too, and will report back later today...
> > >
> > > Thank you, looking forward to hearing the outcome!
> > >
> > > > > > Note that there are (possibly different) issues seen on ppc32 and on arm32
> > > > > > (Renesas RZ/A in particular, but not on other Renesas ARM systems).
> > > > > >
> > > > > > I saw an issue on arm32 with cfeb6ae8bcb96ccf, but not with cfeb6ae8bcb96ccf^.
> > > > > > Other people saw an issue on ppc32 with both cfeb6ae8bcb96ccf and
> > > > > > cfeb6ae8bcb96ccf^.
> > > > >
> > > > > I look forward to hearing what is the issue in both cases.
> > > >
> > > > For RZ/A, my problem report is
> > > > https://lore.kernel.org/all/[email protected]/
> > >
> > > Thank you, Geert!
> > >
> > > Huh. Is that patch you reverted causing Maple Tree or related code
> > > to attempt to acquire mutexes in early boot before interrupts have
> > > been enabled?
> > >
> > > If that added WARN_ON_ONCE() doesn't trigger early, another approach
> > > would be to put it at the beginning of mutex_lock(). Or for that matter
> > > at the beginning of might_sleep().
> >
> > Yeah, I put many WARN() calls through the code as well as tracking down
> > where TIF_NEED_RESCHED was set; the tiny.c call_rcu().
> >
> >
> > So my findings summarized:
> >
> > 1. My change to the maple tree makes call_rcu() more likely on early boot.
> > 2. The initial thread setup is always set to idle state
> > 3. call_rcu() tiny sets TIF_NEED_RESCHED since is_idle_task(current)
> > 4. init_IRQ() takes a mutex lock which will enable the interrupts since
> > TIF_NEED_RESCHED is set.
> >
> > I don't know which of these things is "wrong".
> >
> > I also looked into the mtmsr register but decided to consult you lot
> > about my findings in hopes that someone with more knowledge of the
> > platform or early boot would alleviate the pain so that I could context
> > switch or sleep :) I mean, an mtmsr bug seems like a leap even for the
> > issues I create..
> >
> > Regards,
> > Liam
> >
> >
> > --
> > maple-tree mailing list
> > [email protected]
> > https://lists.infradead.org/mailman/listinfo/maple-tree


2023-09-12 22:58:14

by Christophe Leroy

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible



Le 12/09/2023 à 17:08, Paul E. McKenney a écrit :
> On Tue, Sep 12, 2023 at 10:29:30AM -0400, Liam R. Howlett wrote:
>> * Liam R. Howlett <[email protected]> [230912 09:56]:
>>> * Paul E. McKenney <[email protected]> [230912 06:00]:
>>>> On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
>>>>> Hi Paul,
>>>>>
>>>>> On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <[email protected]> wrote:
>>>>>> On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
>>>>>>> On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <[email protected]> wrote:
>>>>>>>> On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
>>>>>>>>> * Paul E. McKenney <[email protected]> [230906 14:03]:
>>>>>>>>>> On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
>>>>>>>>>>> * Paul E. McKenney <[email protected]> [230906 13:24]:
>>>>>>>>>>>> On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
>>>>>>>>>>>>> (Adding Paul & Shanker to Cc list.. please see below for why)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Apologies on the late response, I was away and have been struggling to
>>>>>>>>>>>>> get a working PPC32 test environment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> * Geert Uytterhoeven <[email protected]> [230829 12:42]:
>>>>>>>>>>>>>> Hi Liam,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, 18 Aug 2023, Liam R. Howlett wrote:
>>>>>>>>>>>>>>> The current implementation of append may cause duplicate data and/or
>>>>>>>>>>>>>>> incorrect ranges to be returned to a reader during an update. Although
>>>>>>>>>>>>>>> this has not been reported or seen, disable the append write operation
>>>>>>>>>>>>>>> while the tree is in rcu mode out of an abundance of caution.
>>>>>>>>>>>>>
>>>>>>>>>>>>> ...
>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>>>>>> RCU-related configs:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> $ grep RCU .config
>>>>>>>>>>>>>> # RCU Subsystem
>>>>>>>>>>>>>> CONFIG_TINY_RCU=y
>>>>>>>>
>>>>>>>> I must have been asleep last time I looked at this. I was looking at
>>>>>>>> Tree RCU. Please accept my apologies for my lapse. :-/
>>>>>>>>
>>>>>>>> However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
>>>>>>>> have said the same thing, albeit after looking at a lot less RCU code.
>>>>>>>>
>>>>>>>> TL;DR:
>>>>>>>>
>>>>>>>> 1. Try making the __setup_irq() function's call to mutex_lock()
>>>>>>>> instead be as follows:
>>>>>>>>
>>>>>>>> if (!mutex_trylock(&desc->request_mutex))
>>>>>>>> mutex_lock(&desc->request_mutex);
>>>>>>>>
>>>>>>>> This might fail if __setup_irq() has other dependencies on a
>>>>>>>> fully operational scheduler.
>>
>> This changes where the interrupts become enabled, but doesn't stop it
>> from happening. It still throws a WARN after init_IRQ(). I suspect it
>> is not the way to proceed as there are probably many places that will
>> need to be changed in both common and arch specific code. As soon as
>> TIF_NEED_RESCHED is set, then all the checks will need to be altered.
>
> Thank you for trying it!
>
>> I think we either need to set the boot thread to !idle, avoid call_rcu()
>> to set TIF_NEED_RESCHED (how was this working before? Do we need rcu
>> for the IRQs?), or alter the boot order (note this is NOT arch or
>> platform code here).
>>
>> I don't like any of these. I'd like another option, please?
>
> My favorite is to move the interrupt enabling later, but Michael Ellerman
> would know better than would I about the feasibility of this.
>

I digged into it a bit more, looks like IRQs get enabled by the call to
cond_resched() in the loop in vm_area_alloc_pages(), which is called
from powerpc's init_IRQ() fonction when allocating IRQ stacks.

And IRQ stacks definitely need to be enabled before IRQs get enabled, so
there's something wrong here isn't it ?

Christophe

2023-09-13 05:21:30

by Liam R. Howlett

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] maple_tree: Disable mas_wr_append() when other readers are possible

* Christophe Leroy <[email protected]> [230912 11:27]:
>
>
> Le 12/09/2023 à 17:08, Paul E. McKenney a écrit :
> > On Tue, Sep 12, 2023 at 10:29:30AM -0400, Liam R. Howlett wrote:
> >> * Liam R. Howlett <[email protected]> [230912 09:56]:
> >>> * Paul E. McKenney <[email protected]> [230912 06:00]:
> >>>> On Tue, Sep 12, 2023 at 10:34:44AM +0200, Geert Uytterhoeven wrote:
> >>>>> Hi Paul,
> >>>>>
> >>>>> On Tue, Sep 12, 2023 at 10:30 AM Paul E. McKenney <[email protected]> wrote:
> >>>>>> On Tue, Sep 12, 2023 at 10:23:37AM +0200, Geert Uytterhoeven wrote:
> >>>>>>> On Tue, Sep 12, 2023 at 10:14 AM Paul E. McKenney <[email protected]> wrote:
> >>>>>>>> On Mon, Sep 11, 2023 at 07:54:52PM -0400, Liam R. Howlett wrote:
> >>>>>>>>> * Paul E. McKenney <[email protected]> [230906 14:03]:
> >>>>>>>>>> On Wed, Sep 06, 2023 at 01:29:54PM -0400, Liam R. Howlett wrote:
> >>>>>>>>>>> * Paul E. McKenney <[email protected]> [230906 13:24]:
> >>>>>>>>>>>> On Wed, Sep 06, 2023 at 11:23:25AM -0400, Liam R. Howlett wrote:
> >>>>>>>>>>>>> (Adding Paul & Shanker to Cc list.. please see below for why)
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Apologies on the late response, I was away and have been struggling to
> >>>>>>>>>>>>> get a working PPC32 test environment.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> * Geert Uytterhoeven <[email protected]> [230829 12:42]:
> >>>>>>>>>>>>>> Hi Liam,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, 18 Aug 2023, Liam R. Howlett wrote:
> >>>>>>>>>>>>>>> The current implementation of append may cause duplicate data and/or
> >>>>>>>>>>>>>>> incorrect ranges to be returned to a reader during an update. Although
> >>>>>>>>>>>>>>> this has not been reported or seen, disable the append write operation
> >>>>>>>>>>>>>>> while the tree is in rcu mode out of an abundance of caution.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> ...
> >>>>>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> ...
> >>>>>>>>>
> >>>>>>>>>>>>>> RCU-related configs:
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> $ grep RCU .config
> >>>>>>>>>>>>>> # RCU Subsystem
> >>>>>>>>>>>>>> CONFIG_TINY_RCU=y
> >>>>>>>>
> >>>>>>>> I must have been asleep last time I looked at this. I was looking at
> >>>>>>>> Tree RCU. Please accept my apologies for my lapse. :-/
> >>>>>>>>
> >>>>>>>> However, Tiny RCU's call_rcu() also avoids enabling IRQs, so I would
> >>>>>>>> have said the same thing, albeit after looking at a lot less RCU code.
> >>>>>>>>
> >>>>>>>> TL;DR:
> >>>>>>>>
> >>>>>>>> 1. Try making the __setup_irq() function's call to mutex_lock()
> >>>>>>>> instead be as follows:
> >>>>>>>>
> >>>>>>>> if (!mutex_trylock(&desc->request_mutex))
> >>>>>>>> mutex_lock(&desc->request_mutex);
> >>>>>>>>
> >>>>>>>> This might fail if __setup_irq() has other dependencies on a
> >>>>>>>> fully operational scheduler.
> >>
> >> This changes where the interrupts become enabled, but doesn't stop it
> >> from happening. It still throws a WARN after init_IRQ(). I suspect it
> >> is not the way to proceed as there are probably many places that will
> >> need to be changed in both common and arch specific code. As soon as
> >> TIF_NEED_RESCHED is set, then all the checks will need to be altered.
> >
> > Thank you for trying it!
> >
> >> I think we either need to set the boot thread to !idle, avoid call_rcu()
> >> to set TIF_NEED_RESCHED (how was this working before? Do we need rcu
> >> for the IRQs?), or alter the boot order (note this is NOT arch or
> >> platform code here).
> >>
> >> I don't like any of these. I'd like another option, please?
> >
> > My favorite is to move the interrupt enabling later, but Michael Ellerman
> > would know better than would I about the feasibility of this.
> >
>
> I digged into it a bit more, looks like IRQs get enabled by the call to
> cond_resched() in the loop in vm_area_alloc_pages(), which is called
> from powerpc's init_IRQ() fonction when allocating IRQ stacks.

This is another location where the process will enable IRQs because
TIF_NEED_RESCHED was set.

>
> And IRQ stacks definitely need to be enabled before IRQs get enabled, so
> there's something wrong here isn't it ?