From: Tao Ren <[email protected]>
This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:
improve vhub port irq handling"): for_each_set_bit() is replaced with
simple for() loop because for() loop runs faster on ASPEED BMC.
Signed-off-by: Tao Ren <[email protected]>
---
drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------
drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++
2 files changed, 6 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c
index cdf96911e4b1..be7bb64e3594 100644
--- a/drivers/usb/gadget/udc/aspeed-vhub/core.c
+++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c
@@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)
/* Handle device interrupts */
if (istat & vhub->port_irq_mask) {
- unsigned long bitmap = istat;
- int offset = VHUB_IRQ_DEV1_BIT;
- int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;
-
- for_each_set_bit_from(offset, &bitmap, size) {
- i = offset - VHUB_IRQ_DEV1_BIT;
- ast_vhub_dev_irq(&vhub->ports[i].dev);
+ for (i = 0; i < vhub->max_ports; i++) {
+ if (istat & VHUB_DEV_IRQ(i))
+ ast_vhub_dev_irq(&vhub->ports[i].dev);
}
}
diff --git a/drivers/usb/gadget/udc/aspeed-vhub/vhub.h b/drivers/usb/gadget/udc/aspeed-vhub/vhub.h
index 2e5a1ef14a75..87a5dea12d3c 100644
--- a/drivers/usb/gadget/udc/aspeed-vhub/vhub.h
+++ b/drivers/usb/gadget/udc/aspeed-vhub/vhub.h
@@ -67,6 +67,9 @@
#define VHUB_IRQ_HUB_EP0_SETUP (1 << 0)
#define VHUB_IRQ_ACK_ALL 0x1ff
+/* Downstream device IRQ mask. */
+#define VHUB_DEV_IRQ(n) (VHUB_IRQ_DEVICE1 << (n))
+
/* SW reset reg */
#define VHUB_SW_RESET_EP_POOL (1 << 9)
#define VHUB_SW_RESET_DMA_CONTROLLER (1 << 8)
--
2.17.1
Hi Ben,
I sent out the follow-on patch per Greg's suggestion, and the purpose is
to include the latest version (v4) of the original commit. As v4 was
already Acked-by you, can I add the tag for this patch? Or are you
willing to Ack it again?
Cheers,
Tao
On Wed, May 27, 2020 at 06:11:54PM -0700, [email protected] wrote:
> From: Tao Ren <[email protected]>
>
> This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:
> improve vhub port irq handling"): for_each_set_bit() is replaced with
> simple for() loop because for() loop runs faster on ASPEED BMC.
>
> Signed-off-by: Tao Ren <[email protected]>
> ---
> drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------
> drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++
> 2 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c
> index cdf96911e4b1..be7bb64e3594 100644
> --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c
> +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c
> @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)
>
> /* Handle device interrupts */
> if (istat & vhub->port_irq_mask) {
> - unsigned long bitmap = istat;
> - int offset = VHUB_IRQ_DEV1_BIT;
> - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;
> -
> - for_each_set_bit_from(offset, &bitmap, size) {
> - i = offset - VHUB_IRQ_DEV1_BIT;
> - ast_vhub_dev_irq(&vhub->ports[i].dev);
> + for (i = 0; i < vhub->max_ports; i++) {
> + if (istat & VHUB_DEV_IRQ(i))
> + ast_vhub_dev_irq(&vhub->ports[i].dev);
> }
> }
>
> diff --git a/drivers/usb/gadget/udc/aspeed-vhub/vhub.h b/drivers/usb/gadget/udc/aspeed-vhub/vhub.h
> index 2e5a1ef14a75..87a5dea12d3c 100644
> --- a/drivers/usb/gadget/udc/aspeed-vhub/vhub.h
> +++ b/drivers/usb/gadget/udc/aspeed-vhub/vhub.h
> @@ -67,6 +67,9 @@
> #define VHUB_IRQ_HUB_EP0_SETUP (1 << 0)
> #define VHUB_IRQ_ACK_ALL 0x1ff
>
> +/* Downstream device IRQ mask. */
> +#define VHUB_DEV_IRQ(n) (VHUB_IRQ_DEVICE1 << (n))
> +
> /* SW reset reg */
> #define VHUB_SW_RESET_EP_POOL (1 << 9)
> #define VHUB_SW_RESET_DMA_CONTROLLER (1 << 8)
> --
> 2.17.1
>
Hi,
[email protected] writes:
> From: Tao Ren <[email protected]>
>
> This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:
> improve vhub port irq handling"): for_each_set_bit() is replaced with
> simple for() loop because for() loop runs faster on ASPEED BMC.
>
> Signed-off-by: Tao Ren <[email protected]>
> ---
> drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------
> drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++
> 2 files changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c
> index cdf96911e4b1..be7bb64e3594 100644
> --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c
> +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c
> @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)
>
> /* Handle device interrupts */
> if (istat & vhub->port_irq_mask) {
> - unsigned long bitmap = istat;
> - int offset = VHUB_IRQ_DEV1_BIT;
> - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;
> -
> - for_each_set_bit_from(offset, &bitmap, size) {
> - i = offset - VHUB_IRQ_DEV1_BIT;
> - ast_vhub_dev_irq(&vhub->ports[i].dev);
> + for (i = 0; i < vhub->max_ports; i++) {
> + if (istat & VHUB_DEV_IRQ(i))
> + ast_vhub_dev_irq(&vhub->ports[i].dev);
how have you measured your statement above? for_each_set_bit() does
exactly what you did. Unless your architecture has an instruction which
helps finds the next set bit (like cls on ARM), which, then, makes it
much faster.
--
balbi
On Mon, Aug 17, 2020 at 04:49:32PM +0300, Felipe Balbi wrote:
>
> Hi,
>
> [email protected] writes:
> > From: Tao Ren <[email protected]>
> >
> > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:
> > improve vhub port irq handling"): for_each_set_bit() is replaced with
> > simple for() loop because for() loop runs faster on ASPEED BMC.
> >
> > Signed-off-by: Tao Ren <[email protected]>
> > ---
> > drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------
> > drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++
> > 2 files changed, 6 insertions(+), 7 deletions(-)
> >
> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c
> > index cdf96911e4b1..be7bb64e3594 100644
> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c
> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c
> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)
> >
> > /* Handle device interrupts */
> > if (istat & vhub->port_irq_mask) {
> > - unsigned long bitmap = istat;
> > - int offset = VHUB_IRQ_DEV1_BIT;
> > - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;
> > -
> > - for_each_set_bit_from(offset, &bitmap, size) {
> > - i = offset - VHUB_IRQ_DEV1_BIT;
> > - ast_vhub_dev_irq(&vhub->ports[i].dev);
> > + for (i = 0; i < vhub->max_ports; i++) {
> > + if (istat & VHUB_DEV_IRQ(i))
> > + ast_vhub_dev_irq(&vhub->ports[i].dev);
>
> how have you measured your statement above? for_each_set_bit() does
> exactly what you did. Unless your architecture has an instruction which
> helps finds the next set bit (like cls on ARM), which, then, makes it
> much faster.
I did some testing and result shows for() loop runs faster than
for_each_set_bit() loop. Please refer to details below (discussion with
Benjamin in the original patch) and kindly let me know your suggestions.
> On Mon, 2020-04-06 at 23:02 -0700, Tao Ren wrote:
> > I ran some testing on my ast2400 and ast2500 BMC and looks like the
> > for() loop runs faster than for_each_set_bit_from() loop in my
> > environment. I'm not sure if something needs to be revised in my test
> > code, but please kindly share your suggestions:
> >
> > I use get_cycles() to calculate execution time of 2 different loops, and
> > ast_vhub_dev_irq() is replaced with barrier() to avoid "noise"; below
> > are the results:
> >
> > - when downstream port number is 5 and only 1 irq bit is set, it takes
> > ~30 cycles to finish for_each_set_bit() loop, and 20-25 cycles to
> > finish the for() loop.
> >
> > - if downstream port number is 5 and all 5 bits are set, then
> > for_each_set_bit() loop takes ~50 cycles and for() loop takes ~25
> > cycles.
> >
> > - when I increase downsteam port number to 16 and set 1 irq bit, the
> > for_each_set_bit() loop takes ~30 cycles and for() loop takes 25
> > cycles. It's a little surprise to me because I thought for() loop
> > would cost 60+ cycles (3 times of the value when port number is 5).
> >
> > - if downstream port number is 16 and all irq status bits are set,
> > then for_each_set_bit() loop takes 60-70 cycles and for() loop takes
> > 30+ cycles.
Cheers,
Tao
Hi,
Tao Ren <[email protected]> writes:
> On Mon, Aug 17, 2020 at 04:49:32PM +0300, Felipe Balbi wrote:
>>
>> Hi,
>>
>> [email protected] writes:
>> > From: Tao Ren <[email protected]>
>> >
>> > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:
>> > improve vhub port irq handling"): for_each_set_bit() is replaced with
>> > simple for() loop because for() loop runs faster on ASPEED BMC.
>> >
>> > Signed-off-by: Tao Ren <[email protected]>
>> > ---
>> > drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------
>> > drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++
>> > 2 files changed, 6 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c
>> > index cdf96911e4b1..be7bb64e3594 100644
>> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c
>> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c
>> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)
>> >
>> > /* Handle device interrupts */
>> > if (istat & vhub->port_irq_mask) {
>> > - unsigned long bitmap = istat;
>> > - int offset = VHUB_IRQ_DEV1_BIT;
>> > - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;
>> > -
>> > - for_each_set_bit_from(offset, &bitmap, size) {
>> > - i = offset - VHUB_IRQ_DEV1_BIT;
>> > - ast_vhub_dev_irq(&vhub->ports[i].dev);
>> > + for (i = 0; i < vhub->max_ports; i++) {
>> > + if (istat & VHUB_DEV_IRQ(i))
>> > + ast_vhub_dev_irq(&vhub->ports[i].dev);
>>
>> how have you measured your statement above? for_each_set_bit() does
>> exactly what you did. Unless your architecture has an instruction which
>> helps finds the next set bit (like cls on ARM), which, then, makes it
>> much faster.
>
> I did some testing and result shows for() loop runs faster than
> for_each_set_bit() loop. Please refer to details below (discussion with
> Benjamin in the original patch) and kindly let me know your suggestions.
no strong feelings, just surprised that you're already worried about
20~40 cycles of cpu time ;-)
Patch applied for next merge window
--
balbi
Hi,
Tao Ren <[email protected]> writes:
> On Mon, Aug 17, 2020 at 04:49:32PM +0300, Felipe Balbi wrote:
>>
>> Hi,
>>
>> [email protected] writes:
>> > From: Tao Ren <[email protected]>
>> >
>> > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:
>> > improve vhub port irq handling"): for_each_set_bit() is replaced with
>> > simple for() loop because for() loop runs faster on ASPEED BMC.
>> >
>> > Signed-off-by: Tao Ren <[email protected]>
>> > ---
>> > drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------
>> > drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++
>> > 2 files changed, 6 insertions(+), 7 deletions(-)
>> >
>> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c
>> > index cdf96911e4b1..be7bb64e3594 100644
>> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c
>> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c
>> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)
>> >
>> > /* Handle device interrupts */
>> > if (istat & vhub->port_irq_mask) {
>> > - unsigned long bitmap = istat;
>> > - int offset = VHUB_IRQ_DEV1_BIT;
>> > - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;
>> > -
>> > - for_each_set_bit_from(offset, &bitmap, size) {
>> > - i = offset - VHUB_IRQ_DEV1_BIT;
>> > - ast_vhub_dev_irq(&vhub->ports[i].dev);
>> > + for (i = 0; i < vhub->max_ports; i++) {
>> > + if (istat & VHUB_DEV_IRQ(i))
>> > + ast_vhub_dev_irq(&vhub->ports[i].dev);
>>
>> how have you measured your statement above? for_each_set_bit() does
>> exactly what you did. Unless your architecture has an instruction which
>> helps finds the next set bit (like cls on ARM), which, then, makes it
>> much faster.
>
> I did some testing and result shows for() loop runs faster than
> for_each_set_bit() loop. Please refer to details below (discussion with
> Benjamin in the original patch) and kindly let me know your suggestions.
no strong feelings, just surprised that you're already worried about
20~40 cycles of cpu time ;-)
Patch applied for next merge window
--
balbi
Hi,
Tao Ren <[email protected]> writes:
>> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c
>> > index cdf96911e4b1..be7bb64e3594 100644
>> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c
>> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c
>> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)
>> >
>> > /* Handle device interrupts */
>> > if (istat & vhub->port_irq_mask) {
>> > - unsigned long bitmap = istat;
>> > - int offset = VHUB_IRQ_DEV1_BIT;
>> > - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;
>> > -
>> > - for_each_set_bit_from(offset, &bitmap, size) {
>> > - i = offset - VHUB_IRQ_DEV1_BIT;
>> > - ast_vhub_dev_irq(&vhub->ports[i].dev);
>> > + for (i = 0; i < vhub->max_ports; i++) {
>> > + if (istat & VHUB_DEV_IRQ(i))
>> > + ast_vhub_dev_irq(&vhub->ports[i].dev);
>>
>> how have you measured your statement above? for_each_set_bit() does
>> exactly what you did. Unless your architecture has an instruction which
>> helps finds the next set bit (like cls on ARM), which, then, makes it
>> much faster.
>
> I did some testing and result shows for() loop runs faster than
> for_each_set_bit() loop. Please refer to details below (discussion with
> Benjamin in the original patch) and kindly let me know your
> suggestions.
no strong feelings, just surprised you're already worried about 20~40
cycles of cpu time ;-)
patch queued for next merge window
--
balbi
On Mon, Aug 31, 2020 at 12:54:57PM +0300, Felipe Balbi wrote:
>
> Hi,
>
> Tao Ren <[email protected]> writes:
> > On Mon, Aug 17, 2020 at 04:49:32PM +0300, Felipe Balbi wrote:
> >>
> >> Hi,
> >>
> >> [email protected] writes:
> >> > From: Tao Ren <[email protected]>
> >> >
> >> > This is a follow-on patch for commit a23be4ed8f48 ("usb: gadget: aspeed:
> >> > improve vhub port irq handling"): for_each_set_bit() is replaced with
> >> > simple for() loop because for() loop runs faster on ASPEED BMC.
> >> >
> >> > Signed-off-by: Tao Ren <[email protected]>
> >> > ---
> >> > drivers/usb/gadget/udc/aspeed-vhub/core.c | 10 +++-------
> >> > drivers/usb/gadget/udc/aspeed-vhub/vhub.h | 3 +++
> >> > 2 files changed, 6 insertions(+), 7 deletions(-)
> >> >
> >> > diff --git a/drivers/usb/gadget/udc/aspeed-vhub/core.c b/drivers/usb/gadget/udc/aspeed-vhub/core.c
> >> > index cdf96911e4b1..be7bb64e3594 100644
> >> > --- a/drivers/usb/gadget/udc/aspeed-vhub/core.c
> >> > +++ b/drivers/usb/gadget/udc/aspeed-vhub/core.c
> >> > @@ -135,13 +135,9 @@ static irqreturn_t ast_vhub_irq(int irq, void *data)
> >> >
> >> > /* Handle device interrupts */
> >> > if (istat & vhub->port_irq_mask) {
> >> > - unsigned long bitmap = istat;
> >> > - int offset = VHUB_IRQ_DEV1_BIT;
> >> > - int size = VHUB_IRQ_DEV1_BIT + vhub->max_ports;
> >> > -
> >> > - for_each_set_bit_from(offset, &bitmap, size) {
> >> > - i = offset - VHUB_IRQ_DEV1_BIT;
> >> > - ast_vhub_dev_irq(&vhub->ports[i].dev);
> >> > + for (i = 0; i < vhub->max_ports; i++) {
> >> > + if (istat & VHUB_DEV_IRQ(i))
> >> > + ast_vhub_dev_irq(&vhub->ports[i].dev);
> >>
> >> how have you measured your statement above? for_each_set_bit() does
> >> exactly what you did. Unless your architecture has an instruction which
> >> helps finds the next set bit (like cls on ARM), which, then, makes it
> >> much faster.
> >
> > I did some testing and result shows for() loop runs faster than
> > for_each_set_bit() loop. Please refer to details below (discussion with
> > Benjamin in the original patch) and kindly let me know your suggestions.
>
> no strong feelings, just surprised that you're already worried about
> 20~40 cycles of cpu time ;-)
>
> Patch applied for next merge window
Thanks Felipe. Ben had some concerns about interrupt handling cost on
AST2400 BMC (ARM9), hence we did the comparison and noticed the small
difference :)
Cheers,
Tao