Hi Roger,
On Tue, Oct 27, 2015 at 11:37:03AM +0200, Roger Quadros wrote:
> On 26/10/15 23:23, Brian Norris wrote:
> > I'm not too familiar with OMAP platforms, and I might have missed out on
> > prior discussions/context, so please forgive if I'm asking silly or old
> > questions here.
>
> No worries at all.
>
> >
> > On Fri, Sep 18, 2015 at 05:53:22PM +0300, Roger Quadros wrote:
> >> - Remove NAND IRQ handling from omap-gpmc driver, share the GPMC IRQ
> >> with the omap2-nand driver and handle NAND IRQ events in the NAND driver.
> >> This causes performance increase when using prefetch-irq mode.
> >> 30% increase in read, 17% increase in write in prefetch-irq mode.
> >
> > Have you pinpointed the exact causes for the performance increase, or
> > can you give an educated guess? AIUI, you're reducing the number of
> > interrupts needed for NAND prefetch mode, but you're also removing a bit
> > of abstraction and implementing hooks that look awfully like the
> > existing abstractions:
> >
> > + int (*nand_irq_enable)(enum gpmc_nand_irq irq);
> > + int (*nand_irq_disable)(enum gpmc_nand_irq irq);
> > + void (*nand_irq_clear)(enum gpmc_nand_irq irq);
> > + u32 (*nand_irq_status)(void);
> >
> > That's not really a problem if there's a good reason for them (brcmnand
> > implements similar hooks because of quirks in the implementation of
> > interrupts across various BRCM SoCs, and it's not worth writing irqchip
> > drivers for those cases). I'm mainly curious for an explanation.
>
> I have both implementations with me. My guess is that the 20% performance
> gain is due to absence of irqchip/irqdomain translation code.
> I haven't investigated further though.
I don't have much context for whether this makes sense or not. According
to your tests, you're getting ~800K interrupts over ~15 seconds. So
should you start noticing performance hits due to abstraction at 53K
interrupts per second?
But anyway, I'm not sure that completely answered my question. My
question was whether you were removing the irqchip code solely for
performance reasons, or are there others?
> Another concern I have is that I'm not using any locking around
> gpmc_nand_irq_enable/disable(). Could this pose problems in multiple NAND
> use cases? My understanding is that it should not as the controller access
> is serialized between multiple NAND chips.
Right, if you're touching just a NAND interrupt, and it's only used by a
single instance of this NAND controller, then the NAND controller
serialization code will handle this for you.
> However I do need to add some locking as the GPMC_IRQENABLE register is shared
> between NAND and GPMC driver.
>
> NOTE: We are not using prefetch-irq mode for any of the OMAP boards because
> of lesser performance than prefetch-polled mode. So if the less performance
> for an unused mode is a lesser concern compared to cleaner code then
> I can resend this with the irqdomain implementation.
>
> Below are performance logs of irqdomain vs hooks.
>
> --
> cheers,
> -roger
>
> test logs.
[snip]
Brian
Hi Brian,
On 30/11/15 21:54, Brian Norris wrote:
> Hi Roger,
>
> On Tue, Oct 27, 2015 at 11:37:03AM +0200, Roger Quadros wrote:
>> On 26/10/15 23:23, Brian Norris wrote:
>>> I'm not too familiar with OMAP platforms, and I might have missed out on
>>> prior discussions/context, so please forgive if I'm asking silly or old
>>> questions here.
>>
>> No worries at all.
>>
>>>
>>> On Fri, Sep 18, 2015 at 05:53:22PM +0300, Roger Quadros wrote:
>>>> - Remove NAND IRQ handling from omap-gpmc driver, share the GPMC IRQ
>>>> with the omap2-nand driver and handle NAND IRQ events in the NAND driver.
>>>> This causes performance increase when using prefetch-irq mode.
>>>> 30% increase in read, 17% increase in write in prefetch-irq mode.
>>>
>>> Have you pinpointed the exact causes for the performance increase, or
>>> can you give an educated guess? AIUI, you're reducing the number of
>>> interrupts needed for NAND prefetch mode, but you're also removing a bit
>>> of abstraction and implementing hooks that look awfully like the
>>> existing abstractions:
>>>
>>> + int (*nand_irq_enable)(enum gpmc_nand_irq irq);
>>> + int (*nand_irq_disable)(enum gpmc_nand_irq irq);
>>> + void (*nand_irq_clear)(enum gpmc_nand_irq irq);
>>> + u32 (*nand_irq_status)(void);
>>>
>>> That's not really a problem if there's a good reason for them (brcmnand
>>> implements similar hooks because of quirks in the implementation of
>>> interrupts across various BRCM SoCs, and it's not worth writing irqchip
>>> drivers for those cases). I'm mainly curious for an explanation.
>>
>> I have both implementations with me. My guess is that the 20% performance
>> gain is due to absence of irqchip/irqdomain translation code.
>> I haven't investigated further though.
>
> I don't have much context for whether this makes sense or not. According
> to your tests, you're getting ~800K interrupts over ~15 seconds. So
> should you start noticing performance hits due to abstraction at 53K
> interrupts per second?
Yes, this was my understanding.
>
> But anyway, I'm not sure that completely answered my question. My
> question was whether you were removing the irqchip code solely for
> performance reasons, or are there others?
Yes. Only for performance reasons.
>
>> Another concern I have is that I'm not using any locking around
>> gpmc_nand_irq_enable/disable(). Could this pose problems in multiple NAND
>> use cases? My understanding is that it should not as the controller access
>> is serialized between multiple NAND chips.
>
> Right, if you're touching just a NAND interrupt, and it's only used by a
> single instance of this NAND controller, then the NAND controller
> serialization code will handle this for you.
OK.
cheers,
-roger
Hi Roger,
On Tue, Dec 01, 2015 at 04:41:16PM +0200, Roger Quadros wrote:
> On 30/11/15 21:54, Brian Norris wrote:
> > On Tue, Oct 27, 2015 at 11:37:03AM +0200, Roger Quadros wrote:
> >> On 26/10/15 23:23, Brian Norris wrote:
> >>> On Fri, Sep 18, 2015 at 05:53:22PM +0300, Roger Quadros wrote:
> >>>> - Remove NAND IRQ handling from omap-gpmc driver, share the GPMC IRQ
> >>>> with the omap2-nand driver and handle NAND IRQ events in the NAND driver.
> >>>> This causes performance increase when using prefetch-irq mode.
> >>>> 30% increase in read, 17% increase in write in prefetch-irq mode.
> >>>
> >>> Have you pinpointed the exact causes for the performance increase, or
> >>> can you give an educated guess? AIUI, you're reducing the number of
> >>> interrupts needed for NAND prefetch mode, but you're also removing a bit
> >>> of abstraction and implementing hooks that look awfully like the
> >>> existing abstractions:
> >>>
> >>> + int (*nand_irq_enable)(enum gpmc_nand_irq irq);
> >>> + int (*nand_irq_disable)(enum gpmc_nand_irq irq);
> >>> + void (*nand_irq_clear)(enum gpmc_nand_irq irq);
> >>> + u32 (*nand_irq_status)(void);
> >>>
> >>> That's not really a problem if there's a good reason for them (brcmnand
> >>> implements similar hooks because of quirks in the implementation of
> >>> interrupts across various BRCM SoCs, and it's not worth writing irqchip
> >>> drivers for those cases). I'm mainly curious for an explanation.
> >>
> >> I have both implementations with me. My guess is that the 20% performance
> >> gain is due to absence of irqchip/irqdomain translation code.
> >> I haven't investigated further though.
> >
> > I don't have much context for whether this makes sense or not. According
> > to your tests, you're getting ~800K interrupts over ~15 seconds. So
> > should you start noticing performance hits due to abstraction at 53K
> > interrupts per second?
>
> Yes, this was my understanding.
Am I computing wrong, or is that a pretty insane rate of interrupts?
> > But anyway, I'm not sure that completely answered my question. My
> > question was whether you were removing the irqchip code solely for
> > performance reasons, or are there others?
>
> Yes. Only for performance reasons.
Hmm, that's not my favorite answer. I'd prefer that more analysis was
done here before scrapping irqchip...
But maybe that's not too bad. It seems like your patch set overall is a
net positive for disentangling some of arch/ and drivers/.
I'll take another pass over your patch set, but if things are looking
better, how do you expect to merge this? There are significant portions
that touch at least 2 or 3 different subsystem trees, AFAICT.
Brian
Brian,
On 02/12/15 08:56, Brian Norris wrote:
> Hi Roger,
>
> On Tue, Dec 01, 2015 at 04:41:16PM +0200, Roger Quadros wrote:
>> On 30/11/15 21:54, Brian Norris wrote:
>>> On Tue, Oct 27, 2015 at 11:37:03AM +0200, Roger Quadros wrote:
>>>> On 26/10/15 23:23, Brian Norris wrote:
>>>>> On Fri, Sep 18, 2015 at 05:53:22PM +0300, Roger Quadros wrote:
>>>>>> - Remove NAND IRQ handling from omap-gpmc driver, share the GPMC IRQ
>>>>>> with the omap2-nand driver and handle NAND IRQ events in the NAND driver.
>>>>>> This causes performance increase when using prefetch-irq mode.
>>>>>> 30% increase in read, 17% increase in write in prefetch-irq mode.
>>>>>
>>>>> Have you pinpointed the exact causes for the performance increase, or
>>>>> can you give an educated guess? AIUI, you're reducing the number of
>>>>> interrupts needed for NAND prefetch mode, but you're also removing a bit
>>>>> of abstraction and implementing hooks that look awfully like the
>>>>> existing abstractions:
>>>>>
>>>>> + int (*nand_irq_enable)(enum gpmc_nand_irq irq);
>>>>> + int (*nand_irq_disable)(enum gpmc_nand_irq irq);
>>>>> + void (*nand_irq_clear)(enum gpmc_nand_irq irq);
>>>>> + u32 (*nand_irq_status)(void);
>>>>>
>>>>> That's not really a problem if there's a good reason for them (brcmnand
>>>>> implements similar hooks because of quirks in the implementation of
>>>>> interrupts across various BRCM SoCs, and it's not worth writing irqchip
>>>>> drivers for those cases). I'm mainly curious for an explanation.
>>>>
>>>> I have both implementations with me. My guess is that the 20% performance
>>>> gain is due to absence of irqchip/irqdomain translation code.
>>>> I haven't investigated further though.
>>>
>>> I don't have much context for whether this makes sense or not. According
>>> to your tests, you're getting ~800K interrupts over ~15 seconds. So
>>> should you start noticing performance hits due to abstraction at 53K
>>> interrupts per second?
>>
>> Yes, this was my understanding.
>
> Am I computing wrong, or is that a pretty insane rate of interrupts?
I don't have the test board with me right now and so can't tell you
for sure if the mtd tests took 15 seconds or more.
I can try it out on a different board that I have and let you know
for sure about how many interrupts we get per second.
>
>>> But anyway, I'm not sure that completely answered my question. My
>>> question was whether you were removing the irqchip code solely for
>>> performance reasons, or are there others?
>>
>> Yes. Only for performance reasons.
>
> Hmm, that's not my favorite answer. I'd prefer that more analysis was
> done here before scrapping irqchip...
I agree. We could retain the irqchip model till we have more satisfying
analysis.
>
> But maybe that's not too bad. It seems like your patch set overall is a
> net positive for disentangling some of arch/ and drivers/.
:)
>
> I'll take another pass over your patch set, but if things are looking
> better, how do you expect to merge this? There are significant portions
> that touch at least 2 or 3 different subsystem trees, AFAICT.
Tony could create an immutable branch with all the dts and memory changes.
You could base the mtd changes on top of that?
cheers,
-roger
* Roger Quadros <[email protected]> [151201 21:13]:
> On 02/12/15 08:56, Brian Norris wrote:
> >
> > I'll take another pass over your patch set, but if things are looking
> > better, how do you expect to merge this? There are significant portions
> > that touch at least 2 or 3 different subsystem trees, AFAICT.
>
> Tony could create an immutable branch with all the dts and memory changes.
> You could base the mtd changes on top of that?
If we all agree on that it will be immutable Roger can set up the branch
with our acks and we can all merge it in as needed. I believe v4.4-rc1
should work as a base for us all?
Regards,
Tony
On Wed, Dec 02, 2015 at 07:03:17AM -0800, Tony Lindgren wrote:
> * Roger Quadros <[email protected]> [151201 21:13]:
> > On 02/12/15 08:56, Brian Norris wrote:
> > >
> > > I'll take another pass over your patch set, but if things are looking
> > > better, how do you expect to merge this? There are significant portions
> > > that touch at least 2 or 3 different subsystem trees, AFAICT.
> >
> > Tony could create an immutable branch with all the dts and memory changes.
> > You could base the mtd changes on top of that?
>
> If we all agree on that it will be immutable Roger can set up the branch
> with our acks and we can all merge it in as needed. I believe v4.4-rc1
> should work as a base for us all?
There are oustanding comments about the NAND ready/busy GPIO naming in
patch 18, which I'd like addressed. I'll re-review the rest before the
end of the day (West Coast U.S.A.). I'm not sure my acks are worth much
beyond the MTD stuff.
Regarding branches, 4.4-rc1 is fine with me.
Regards,
Brian
Hi Roger,
On Wed, Dec 02, 2015 at 10:42:12AM +0530, Roger Quadros wrote:
> On 02/12/15 08:56, Brian Norris wrote:
> > On Tue, Dec 01, 2015 at 04:41:16PM +0200, Roger Quadros wrote:
> >> On 30/11/15 21:54, Brian Norris wrote:
> >>> But anyway, I'm not sure that completely answered my question. My
> >>> question was whether you were removing the irqchip code solely for
> >>> performance reasons, or are there others?
> >>
> >> Yes. Only for performance reasons.
> >
> > Hmm, that's not my favorite answer. I'd prefer that more analysis was
> > done here before scrapping irqchip...
>
> I agree. We could retain the irqchip model till we have more satisfying
> analysis.
I won't insist, though if it's not too ugly/horrible to do so, I think
that would make sense. I'll leave it as your call.
Brian
* Brian Norris <[email protected]> [151202 10:14]:
> On Wed, Dec 02, 2015 at 07:03:17AM -0800, Tony Lindgren wrote:
> > * Roger Quadros <[email protected]> [151201 21:13]:
> > > On 02/12/15 08:56, Brian Norris wrote:
> > > >
> > > > I'll take another pass over your patch set, but if things are looking
> > > > better, how do you expect to merge this? There are significant portions
> > > > that touch at least 2 or 3 different subsystem trees, AFAICT.
> > >
> > > Tony could create an immutable branch with all the dts and memory changes.
> > > You could base the mtd changes on top of that?
> >
> > If we all agree on that it will be immutable Roger can set up the branch
> > with our acks and we can all merge it in as needed. I believe v4.4-rc1
> > should work as a base for us all?
>
> There are oustanding comments about the NAND ready/busy GPIO naming in
> patch 18, which I'd like addressed. I'll re-review the rest before the
> end of the day (West Coast U.S.A.). I'm not sure my acks are worth much
> beyond the MTD stuff.
OK, I'm happy with the gpmc related parts.
> Regarding branches, 4.4-rc1 is fine with me.
OK
Thanks,
Tony