2010-11-11 13:02:24

by Tim Blechmann

[permalink] [raw]
Subject: rt61pci issue

hi all,

i am using tn rt61pci driver and besides having not the best wireless
performance, it network transfer sometimes disturbs the audio playback on my
machine. sometimes it works fine, but the machine can get into a state, where i
will have to reboot the machine to get back to normal. i suppose, this is
because of the rt61pci driver.

in general if found, that the specific irq thread produces a lot of interrupts
and does quite a lot of work. the specific irq thread takes quite some cpu time
(currently i have an uptime of 5 hours and the irq thread used 4 minutes of cpu
time) and about 200 interupts are issued per second:

cat /proc/interrupts |grep 16\: && sleep 1 && cat /proc/interrupts | grep 16\:
16: 0 5140126 0 0 IO-APIC-fasteoi
uhci_hcd:usb3, ahci, 0000:08:00.0
16: 0 5140357 0 0 IO-APIC-fasteoi
uhci_hcd:usb3, ahci, 0000:08:00.0

using perf, i get the following data about the spent cpu time:
# Events: 106 cycles
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ......
#
61.17% irq/16-0000:08: [rt61pci] [k] rt61pci_interrupt_thread
12.77% irq/16-0000:08: [rt61pci] [k] rt61pci_set_device_state
3.72% irq/16-0000:08: [kernel.kallsyms] [k] __ticket_spin_lock
3.17% irq/16-0000:08: [kernel.kallsyms] [k] local_bh_disable
2.14% irq/16-0000:08: [kernel.kallsyms] [k] schedule
1.81% irq/16-0000:08: [rt61pci] [k] rt61pci_kick_tx_queue
1.50% irq/16-0000:08: [mac80211] [k] prepare_for_handlers
1.38% irq/16-0000:08: [rt2x00pci] [k] rt2x00pci_rxdone
1.03% irq/16-0000:08: [kernel.kallsyms] [k] __ticket_spin_unlock
0.87% irq/16-0000:08: [kernel.kallsyms] [k] __inet_lookup_established
0.87% irq/16-0000:08: [kernel.kallsyms] [k] tcp_event_data_recv
0.82% irq/16-0000:08: [kernel.kallsyms] [k] irq_thread
0.66% irq/16-0000:08: [kernel.kallsyms] [k] update_curr_rt
0.64% irq/16-0000:08: [rt2x00lib] [k] rt2x00lib_rxdone
0.64% irq/16-0000:08: [kernel.kallsyms] [k] mod_timer
0.51% irq/16-0000:08: [kernel.kallsyms] [k] pick_next_task_rt
0.51% irq/16-0000:08: [kernel.kallsyms] [k] cpupri_set
0.51% irq/16-0000:08: [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.50% irq/16-0000:08: [kernel.kallsyms] [k] hrtick_update
0.49% irq/16-0000:08: [kernel.kallsyms] [k] __phys_addr
0.49% irq/16-0000:08: [kernel.kallsyms] [k] update_shares
0.49% irq/16-0000:08: [kernel.kallsyms] [k] dequeue_task
0.49% irq/16-0000:08: [kernel.kallsyms] [k] dequeue_rt_stack
0.49% irq/16-0000:08: [kernel.kallsyms] [k] skb_release_data
0.44% irq/16-0000:08: [kernel.kallsyms] [k] load_balance
0.40% irq/16-0000:08: [kernel.kallsyms] [k] memcpy
0.36% irq/16-0000:08: [kernel.kallsyms] [k] map_single
0.36% irq/16-0000:08: [kernel.kallsyms] [k] ip_output
0.36% irq/16-0000:08: [kernel.kallsyms] [k] tcp_ack
0.22% irq/16-0000:08: [mac80211] [k] minstrel_ht_tx_status
0.22% irq/16-0000:08: [kernel.kallsyms] [k] tcp_v4_rcv
0.00% irq/16-0000:08: [kernel.kallsyms] [k] native_write_msr_safe

not sure, if this is really helpful, but this amount of interrupts doesn't look
good to me. if i can be on any further help to track down this issue, i would be
happy to assist.

thanks, tim

--
[email protected]
http://tim.klingt.org

Life is really simple, but we insist on making it complicated.
Confucius




2010-11-11 13:45:42

by Helmut Schaa

[permalink] [raw]
Subject: Re: rt61pci issue

Hi,

On Thu, Nov 11, 2010 at 2:02 PM, Tim Blechmann <[email protected]> wrote:
> hi all,
>
> i am using tn rt61pci driver and besides having not the best wireless
> performance, it network transfer sometimes disturbs the audio playback on my
> machine. sometimes it works fine, but the machine can get into a state, where i
> will have to reboot the machine to get back to normal. i suppose, this is
> because of the rt61pci driver.
>
> in general if found, that the specific irq thread produces a lot of interrupts
> and does quite a lot of work. the specific irq thread takes quite some cpu time
> (currently i have an uptime of 5 hours and the irq thread used 4 minutes of cpu
> time) and about 200 interupts are issued per second:

You get ~200 interrupts/second from rt61pci while the device is idle? Or
while transmitting/receiving?

> cat /proc/interrupts |grep 16\: && sleep 1 && cat /proc/interrupts | grep 16\:
> ?16: ? ? ? ? ?0 ? ?5140126 ? ? ? ? ?0 ? ? ? ? ?0 ? IO-APIC-fasteoi
> uhci_hcd:usb3, ahci, 0000:08:00.0
> ?16: ? ? ? ? ?0 ? ?5140357 ? ? ? ? ?0 ? ? ? ? ?0 ? IO-APIC-fasteoi
> uhci_hcd:usb3, ahci, 0000:08:00.0
>
> using perf, i get the following data about the spent cpu time:
> # Events: 106 ?cycles
> #
> # Overhead ? ? ? ? ?Command ? ? ?Shared Object ?Symbol
> # ........ ?............... ?................. ?......
> #
> ? ?61.17% ?irq/16-0000:08: ?[rt61pci] ? ? ? ? ?[k] rt61pci_interrupt_thread
> ? ?12.77% ?irq/16-0000:08: ?[rt61pci] ? ? ? ? ?[k] rt61pci_set_device_state
> ? ? 3.72% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] __ticket_spin_lock
> ? ? 3.17% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] local_bh_disable
> ? ? 2.14% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] schedule
> ? ? 1.81% ?irq/16-0000:08: ?[rt61pci] ? ? ? ? ?[k] rt61pci_kick_tx_queue
> ? ? 1.50% ?irq/16-0000:08: ?[mac80211] ? ? ? ? [k] prepare_for_handlers
> ? ? 1.38% ?irq/16-0000:08: ?[rt2x00pci] ? ? ? ?[k] rt2x00pci_rxdone
> ? ? 1.03% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] __ticket_spin_unlock
> ? ? 0.87% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] __inet_lookup_established
> ? ? 0.87% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] tcp_event_data_recv
> ? ? 0.82% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] irq_thread
> ? ? 0.66% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] update_curr_rt
> ? ? 0.64% ?irq/16-0000:08: ?[rt2x00lib] ? ? ? ?[k] rt2x00lib_rxdone
> ? ? 0.64% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] mod_timer
> ? ? 0.51% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] pick_next_task_rt
> ? ? 0.51% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] cpupri_set
> ? ? 0.51% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] _raw_spin_unlock_irqrestore
> ? ? 0.50% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] hrtick_update
> ? ? 0.49% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] __phys_addr
> ? ? 0.49% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] update_shares
> ? ? 0.49% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] dequeue_task
> ? ? 0.49% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] dequeue_rt_stack
> ? ? 0.49% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] skb_release_data
> ? ? 0.44% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] load_balance
> ? ? 0.40% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] memcpy
> ? ? 0.36% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] map_single
> ? ? 0.36% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] ip_output
> ? ? 0.36% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] tcp_ack
> ? ? 0.22% ?irq/16-0000:08: ?[mac80211] ? ? ? ? [k] minstrel_ht_tx_status
> ? ? 0.22% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] tcp_v4_rcv
> ? ? 0.00% ?irq/16-0000:08: ?[kernel.kallsyms] ?[k] native_write_msr_safe
>
> not sure, if this is really helpful, but this amount of interrupts doesn't look
> good to me. if i can be on any further help to track down this issue, i would be
> happy to assist.

Looks to me as if the interrupt thread gets scheduled but doesn't do
anything useful as
otherwise we should also see the rxdone/txdone functions etc in the perf output.

Mind to put a (maybe rate limited) printk into the interrupt thread
that prints out "reg"
and "reg_mcu" so that we can see which interrupts get triggered?

Helmut

2010-11-12 13:50:44

by Helmut Schaa

[permalink] [raw]
Subject: Re: rt61pci issue

Hi Tim,

Am Donnerstag 11 November 2010 schrieb Tim Blechmann:
> cat /proc/interrupts |grep 16\: && sleep 1 && cat /proc/interrupts | grep 16\:
> 16: 0 5140126 0 0 IO-APIC-fasteoi
> uhci_hcd:usb3, ahci, 0000:08:00.0
> 16: 0 5140357 0 0 IO-APIC-fasteoi
> uhci_hcd:usb3, ahci, 0000:08:00.0

Ehm, one more thing: are you sure irq 16 is used for rt61pci. At least
that part of /proc/interrupts suggests that irq 16 is somehow USB related.

Could you check which irq is assigned to rt61pci?
Can you reproduce the hangs without rt61pci as well?

Helmut

2010-11-11 15:13:31

by Helmut Schaa

[permalink] [raw]
Subject: Re: rt61pci issue

On Thu, Nov 11, 2010 at 3:56 PM, Tim Blechmann <[email protected]> wrote:
>> Mind to put a (maybe rate limited) printk into the interrupt thread
>> that prints out "reg"
>> and "reg_mcu" so that we can see which interrupts get triggered?
>
> log attached, generated with:

Thanks. Unfortunately nothing special in there. Mostly RX and TX interrupts. So
there must be something else ...

Helmut

2010-11-12 07:04:30

by Helmut Schaa

[permalink] [raw]
Subject: Re: rt61pci issue

Am Freitag 12 November 2010 schrieb Helmut Schaa:
> Am Donnerstag 11 November 2010 schrieb Helmut Schaa:
> > On Thu, Nov 11, 2010 at 3:56 PM, Tim Blechmann <[email protected]> wrote:
> > >> Mind to put a (maybe rate limited) printk into the interrupt thread
> > >> that prints out "reg"
> > >> and "reg_mcu" so that we can see which interrupts get triggered?
> > >
> > > log attached, generated with:
> >
> > Thanks. Unfortunately nothing special in there. Mostly RX and TX interrupts. So
> > there must be something else ...
>
> Tim, is this on x86 hw? Or something else?

I don't know if this will do any good or harm but it could be worth a try as
the spec for rt61pci says something like: "Don't enable interrupt mitigation
in the same write as releasing the other masks.". Since we always write a
mitigation period of 0xff == "No mitigation period" we can simply leave
interrupt mitigation disabled.

I really don't have any clue if this will fix anything but it might be worth
a try.

Thanks,
Helmut

diff --git a/drivers/net/wireless/rt2x00/rt61pci.c b/drivers/net/wireless/rt2x00/rt61pci.c
index af548c8..fde8220 100644
--- a/drivers/net/wireless/rt2x00/rt61pci.c
+++ b/drivers/net/wireless/rt2x00/rt61pci.c
@@ -1655,8 +1655,6 @@ static void rt61pci_toggle_irq(struct rt2x00_dev *rt2x00dev,
rt2x00_set_field32(&reg, INT_MASK_CSR_TXDONE, mask);
rt2x00_set_field32(&reg, INT_MASK_CSR_RXDONE, mask);
rt2x00_set_field32(&reg, INT_MASK_CSR_BEACON_DONE, mask);
- rt2x00_set_field32(&reg, INT_MASK_CSR_ENABLE_MITIGATION, mask);
- rt2x00_set_field32(&reg, INT_MASK_CSR_MITIGATION_PERIOD, 0xff);
rt2x00pci_register_write(rt2x00dev, INT_MASK_CSR, reg);

rt2x00pci_register_read(rt2x00dev, MCU_INT_MASK_CSR, &reg);


2010-11-28 19:15:21

by Helmut Schaa

[permalink] [raw]
Subject: Re: rt61pci issue

Am Donnerstag 25 November 2010 schrieb Tim Blechmann:
> > > > >> Mind to put a (maybe rate limited) printk into the interrupt thread
> > > > >> that prints out "reg"
> > > > >> and "reg_mcu" so that we can see which interrupts get triggered?
> > > > >
> > > > > log attached, generated with:
> > > > Thanks. Unfortunately nothing special in there. Mostly RX and TX
> > > > interrupts. So there must be something else ...
> > >
> > > Tim, is this on x86 hw? Or something else?
> >
> > I don't know if this will do any good or harm but it could be worth a try
> > as the spec for rt61pci says something like: "Don't enable interrupt
> > mitigation in the same write as releasing the other masks.". Since we
> > always write a mitigation period of 0xff == "No mitigation period" we can
> > simply leave interrupt mitigation disabled.
> >
> > I really don't have any clue if this will fix anything but it might be
> > worth a try.
>
> i have been running this patch for a few days and i haven't experienced the
> problem again.

Ok, I'll officially submit the patch soon ...

> according to `top', the kernel thread `irq/16-0000:08:' still is the kernel
> thread, that is using the most cpu time (after an uptime of 8 hours) it is about
> 50 seconds, while the second most expensive kernel thread (`kworker/2:1') takes
> below 5 seconds of cpu time.

No idea :(

Helmut

2010-11-25 17:19:17

by Tim Blechmann

[permalink] [raw]
Subject: Re: rt61pci issue

> > > >> Mind to put a (maybe rate limited) printk into the interrupt thread
> > > >> that prints out "reg"
> > > >> and "reg_mcu" so that we can see which interrupts get triggered?
> > > >
> > > > log attached, generated with:
> > > Thanks. Unfortunately nothing special in there. Mostly RX and TX
> > > interrupts. So there must be something else ...
> >
> > Tim, is this on x86 hw? Or something else?
>
> I don't know if this will do any good or harm but it could be worth a try
> as the spec for rt61pci says something like: "Don't enable interrupt
> mitigation in the same write as releasing the other masks.". Since we
> always write a mitigation period of 0xff == "No mitigation period" we can
> simply leave interrupt mitigation disabled.
>
> I really don't have any clue if this will fix anything but it might be
> worth a try.

i have been running this patch for a few days and i haven't experienced the
problem again.

according to `top', the kernel thread `irq/16-0000:08:' still is the kernel
thread, that is using the most cpu time (after an uptime of 8 hours) it is about
50 seconds, while the second most expensive kernel thread (`kworker/2:1') takes
below 5 seconds of cpu time.

cheers, tim

--
[email protected]
http://tim.klingt.org

Cheat your landlord if you can and must, but do not try to shortchange
the Muse. It cannot be done. You can't fake quality any more than you
can fake a good meal.
William S. Burroughs


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2010-11-12 06:48:26

by Helmut Schaa

[permalink] [raw]
Subject: Re: rt61pci issue

Am Donnerstag 11 November 2010 schrieb Helmut Schaa:
> On Thu, Nov 11, 2010 at 3:56 PM, Tim Blechmann <[email protected]> wrote:
> >> Mind to put a (maybe rate limited) printk into the interrupt thread
> >> that prints out "reg"
> >> and "reg_mcu" so that we can see which interrupts get triggered?
> >
> > log attached, generated with:
>
> Thanks. Unfortunately nothing special in there. Mostly RX and TX interrupts. So
> there must be something else ...

Tim, is this on x86 hw? Or something else?

Helmut

2010-11-12 16:03:56

by Tim Blechmann

[permalink] [raw]
Subject: Re: rt61pci issue

> > cat /proc/interrupts |grep 16\: && sleep 1 && cat /proc/interrupts | grep
16\:
> > 16: 0 5140126 0 0 IO-APIC-fasteoi
> >
> > uhci_hcd:usb3, ahci, 0000:08:00.0
> >
> > 16: 0 5140357 0 0 IO-APIC-fasteoi
> >
> > uhci_hcd:usb3, ahci, 0000:08:00.0
>
> Ehm, one more thing: are you sure irq 16 is used for rt61pci. At least
> that part of /proc/interrupts suggests that irq 16 is somehow USB related.
>
> Could you check which irq is assigned to rt61pci?
> Can you reproduce the hangs without rt61pci as well?

it is irq 16: 0000:08:00.0
from lspci:
08:00.0 Network controller: RaLink RT2561/RT61 rev B 802.11g

the audio dropouts are somehow in sync with the network access. i actually
changed one thing today, increasing the timer frequency from 100 to 1000 hz, and
after 8 hours of uptime, the machine still works nicely. i will try to compile a
100hz kernel with your patch, but since from my experience, it takes some time
to show the behavior and i haven't found a way to trigger it consistently, it
may take some time ...

thanks, tim

--
[email protected]
http://tim.klingt.org

After one look at this planet any visitor from outer space would say
"I want to see the manager."
William S. Burroughs


Attachments:
signature.asc (198.00 B)
This is a digitally signed message part.

2011-02-17 11:53:15

by Tim Blechmann

[permalink] [raw]
Subject: Re: rt61pci issue

> > > > > >> Mind to put a (maybe rate limited) printk into the interrupt
> > > > > >> thread that prints out "reg"
> > > > > >> and "reg_mcu" so that we can see which interrupts get triggered?
> > > > > >
> > > > > > log attached, generated with:
> > > > > Thanks. Unfortunately nothing special in there. Mostly RX and TX
> > > > > interrupts. So there must be something else ...
> > > >
> > > > Tim, is this on x86 hw? Or something else?
> > >
> > > I don't know if this will do any good or harm but it could be worth a
> > > try as the spec for rt61pci says something like: "Don't enable
> > > interrupt mitigation in the same write as releasing the other masks.".
> > > Since we always write a mitigation period of 0xff == "No mitigation
> > > period" we can simply leave interrupt mitigation disabled.
> > >
> > > I really don't have any clue if this will fix anything but it might be
> > > worth a try.
> >
> > i have been running this patch for a few days and i haven't experienced
> > the problem again.
>
> Ok, I'll officially submit the patch soon ...

i've been running a simple patch like for some time:
@ -1743,8 +1743,8 @@ static void rt61pci_toggle_irq(struct rt2x00_dev *rt2x00dev,
rt2x00_set_field32(&reg, INT_MASK_CSR_TXDONE, mask);
rt2x00_set_field32(&reg, INT_MASK_CSR_RXDONE, mask);
rt2x00_set_field32(&reg, INT_MASK_CSR_BEACON_DONE, mask);
- rt2x00_set_field32(&reg, INT_MASK_CSR_ENABLE_MITIGATION, mask);
- rt2x00_set_field32(&reg, INT_MASK_CSR_MITIGATION_PERIOD, 0xff);
+ /* rt2x00_set_field32(&reg, INT_MASK_CSR_ENABLE_MITIGATION, mask); */
+ /* rt2x00_set_field32(&reg, INT_MASK_CSR_MITIGATION_PERIOD, 0xff); */
rt2x00pci_register_write(rt2x00dev, INT_MASK_CSR, reg);

rt2x00pci_register_read(rt2x00dev, MCU_INT_MASK_CSR, &reg);

trying out the unpatched 2.6.38-rc5, i hit the original issue again. but with
the patch, i sometimes have connection troubles using skype. i would really
appreciate a proper solution for this issue ...

thanks, tim

--
[email protected]
http://tim.klingt.org

You can play a shoestring if you're sincere
John Coltrane