2018-01-06 19:54:44

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Hi Josef,

Em Sat, 6 Jan 2018 16:04:16 +0100
"Josef Griebichler" <[email protected]> escreveu:

> Hi,
>
> the causing commit has been identified.
> After reverting commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
> its working again.

Just replying to me won't magically fix this. The ones that were involved on
this patch should also be c/c, plus USB people. Just added them.

> Please have a look into the thread https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13
> here are several users aknowledging the revert solves their issues with usb dvb cards.

I read the entire (long) thread there. In order to make easier for the
others, from what I understand, the problem happens on both x86 and arm,
although almost all comments there are mentioning tests with raspbian
Kernel (with uses a different USB host driver than the upstream one).

It happens when watching digital TV DVB-C channels, with usually means
a sustained bit rate of 11 MBps to 54 MBps.

The reports mention the dvbsky, with uses USB URB bulk transfers.
On every several minutes (5 to 10 mins), the stream suffer "glitches"
caused by frame losses.

The part of the thread that contains the bisect is at:
https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965

It indirectly mentions another comment on the thread with points
to:
https://github.com/raspberrypi/linux/issues/2134

There, it says that this fix part of the issues:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4

but it affects URB packet losses on a lesser extend.

The main issue is really the logic changes a the core softirq logic.

Using Kernel 4.14.10 on a Raspberry Pi 3 with 4cd13c2 commit reverted
fixed the issue.

Joseph, is the above right? Anything else to mention? Does the
same issue affect also on x86 with vanilla Kernel 4.14.10?

-

It seems that the original patch were designed to solve some IRQ issues
with network cards with causes data losses on high traffic. However,
it is also causing bad effects on sustained high bandwidth demands
required by DVB cards, at least on some USB host drivers.

Alan/Greg/Eric/David:

Any ideas about how to fix it without causing regressions to
network?

Regards,
Mauro

> Gesendet: Sonntag, 17. Dezember 2017 um 14:27 Uhr
> Von: "Mauro Carvalho Chehab" <[email protected]>
> An: "Sean Young" <[email protected]>
> Cc: "Josef Griebichler" <[email protected]>, [email protected], [email protected], [email protected], [email protected]
> Betreff: Re: dvb usb issues since kernel 4.9
> Em Sun, 17 Dec 2017 12:06:37 +0000
> Sean Young <[email protected]> escreveu:
>
> > Hi Josef,
>
> Em Sun, 17 Dec 2017 11:19:38 +0100
> "Josef Griebichler" <[email protected]> escreveu:
>
> > > Hello Mr. Caumont,
> > >
> > > since switch to kernel 4.9 there are several users which have issues with their usb dvb cards.
> > > Some get artifacts when watching livetv, I'm getting discontinuity errors in tvheadend when recording.
> > > I'm using latest test build of LibreElec with kernel 4.14.6 but the issues are still there.
> > > There's an librelec forum thread for this topic
> > > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/
> > > and also an open kernel bug
> > > https://bugzilla.kernel.org/show_bug.cgi?id=197835[https://bugzilla.kernel.org/show_bug.cgi?id=197835]
> > >
> > > This is my dmesg http://sprunge.us/WRIE[http://sprunge.us/WRIE]
> > > and tvh service log http://sprunge.us/bEiE[http://sprunge.us/bEiE]
> > >
> > > I saw in kernel changelog that you made an improvement/change for dvb und usb (commit 9a11204d2b26324636ff54f8d28095ed5dd17e91)
> > >
> > > Is there anything that can be done to improve our situation or are we forced to stay with kernel 4.8?
> > >
> > > Thanks for support!
> > >
> > > Josef
> >
> > Between kernel v4.8 and v4.9 there are many changes, and it is unlikely that
> > commit 9a11204d2b26324636ff54f8d28095ed5dd17e91 is responsible for this.
>
> Let me add [email protected] and [email protected] ML.
>
> Josef, Please be sure that your e-mailer won't be sending e-mails with
> HTML tags on it, otherwise the ML server will automatically drop.
>
> > What would be really helpful is if you could find out which commit did
> > cause a regression. This can be done by bisecting the kernel. There are
> > various guides to this:
> >
> > https://wiki.ubuntu.com/Kernel/KernelBisection[https://wiki.ubuntu.com/Kernel/KernelBisection]
> > or
> > https://wiki.archlinux.org/index.php/Bisecting_bugs[https://wiki.archlinux.org/index.php/Bisecting_bugs]
> >
> > Once the commit has been identified we can work together to narrow it down
> > to the exact change, and then work together on a fix.
>
> Yeah, we need more data in order to start tracking it. I suspect,
> however, that a simple git bisect may not work in this case, due to the
> USB changes that forbids DMA on stack that was added to Kernel 4.9, if
> the card Josef is using was affected by such change.
>
> Probably, he'll need to disable CONFIG_VMAP_STACK in the middle
> of bisect (e. g. when the patch that implements it is added),
> or to cherry-pick any needed DMA fixup patch on the top of Kernel
> 4.8 before starting bisect.
>
> It is also worth mentioning what's the USB host controller that
> are used, and what's the media driver, as this could be an issue
> there.
>
> That's said, from the bug report, it seems that this is
> happening on RPi3. Could you please test it also on a PC? That
> will help to identify if the bug is at RPi's host driver or
> on media drivers.
>
> With regards to RPi3, there are actually two different drivers
> for it: one used on Raspbian Kernel, and another one upstream.
> They're completely different ones.
>
> What driver are you using?
>
> Thanks,
> Mauro



Thanks,
Mauro


2018-01-06 21:08:14

by Josef Griebichler

[permalink] [raw]
Subject: Aw: Re: dvb usb issues since kernel 4.9

Hi,

thanks for adding the people involved!
Yes arm and x86 are affected.
Bisecting was not done by me on a x86_64 machine on mainline kernel and not raspbian kernel (https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965). In the mentioned post you also find the bisect log.

I'm using a dvb-s/s2 usb tv card (technotrend s2-4600 with firmware dvb-fe-ds3103.fw, components M88DS3103, M88TS2022), so not only dvb-c is affected.

Yes kernel 4.14.10 with revert of the mentioned commit works as before on kernel 4.8 with rpi3.

I hope this is of some help.

Regards,
Josef

Hi Josef,

Em Sat, 6 Jan 2018 16:04:16 +0100
"Josef Griebichler" <[email protected]> escreveu:

> Hi,
>
> the causing commit has been identified.
> After reverting commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
> its working again.

Just replying to me won't magically fix this. The ones that were involved on
this patch should also be c/c, plus USB people. Just added them.

> Please have a look into the thread https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13[https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13]
> here are several users aknowledging the revert solves their issues with usb dvb cards.

I read the entire (long) thread there. In order to make easier for the
others, from what I understand, the problem happens on both x86 and arm,
although almost all comments there are mentioning tests with raspbian
Kernel (with uses a different USB host driver than the upstream one).

It happens when watching digital TV DVB-C channels, with usually means
a sustained bit rate of 11 MBps to 54 MBps.

The reports mention the dvbsky, with uses USB URB bulk transfers.
On every several minutes (5 to 10 mins), the stream suffer "glitches"
caused by frame losses.

The part of the thread that contains the bisect is at:
https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965[https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965]

It indirectly mentions another comment on the thread with points
to:
https://github.com/raspberrypi/linux/issues/2134[https://github.com/raspberrypi/linux/issues/2134]

There, it says that this fix part of the issues:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4[https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4]

but it affects URB packet losses on a lesser extend.

The main issue is really the logic changes a the core softirq logic.

Using Kernel 4.14.10 on a Raspberry Pi 3 with 4cd13c2 commit reverted
fixed the issue.

Joseph, is the above right? Anything else to mention? Does the
same issue affect also on x86 with vanilla Kernel 4.14.10?

-

It seems that the original patch were designed to solve some IRQ issues
with network cards with causes data losses on high traffic. However,
it is also causing bad effects on sustained high bandwidth demands
required by DVB cards, at least on some USB host drivers.

Alan/Greg/Eric/David:

Any ideas about how to fix it without causing regressions to
network?

Regards,
Mauro

> Gesendet: Sonntag, 17. Dezember 2017 um 14:27 Uhr
> Von: "Mauro Carvalho Chehab" <[email protected]>
> An: "Sean Young" <[email protected]>
> Cc: "Josef Griebichler" <[email protected]>, [email protected], [email protected], [email protected], [email protected]
> Betreff: Re: dvb usb issues since kernel 4.9
> Em Sun, 17 Dec 2017 12:06:37 +0000
> Sean Young <[email protected]> escreveu:
>
> > Hi Josef,
>
> Em Sun, 17 Dec 2017 11:19:38 +0100
> "Josef Griebichler" <[email protected]> escreveu:
>
> > > Hello Mr. Caumont,
> > >
> > > since switch to kernel 4.9 there are several users which have issues with their usb dvb cards.
> > > Some get artifacts when watching livetv, I'm getting discontinuity errors in tvheadend when recording.
> > > I'm using latest test build of LibreElec with kernel 4.14.6 but the issues are still there.
> > > There's an librelec forum thread for this topic
> > > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/[https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/]
> > > and also an open kernel bug
> > > https://bugzilla.kernel.org/show_bug.cgi?id=197835[https://bugzilla.kernel.org/show_bug.cgi?id=197835][https://bugzilla.kernel.org/show_bug.cgi?id=197835[https://bugzilla.kernel.org/show_bug.cgi?id=197835]]
> > >
> > > This is my dmesg http://sprunge.us/WRIE[http://sprunge.us/WRIE][http://sprunge.us/WRIE[http://sprunge.us/WRIE]]
> > > and tvh service log http://sprunge.us/bEiE[http://sprunge.us/bEiE][http://sprunge.us/bEiE[http://sprunge.us/bEiE]]
> > >
> > > I saw in kernel changelog that you made an improvement/change for dvb und usb (commit 9a11204d2b26324636ff54f8d28095ed5dd17e91)
> > >
> > > Is there anything that can be done to improve our situation or are we forced to stay with kernel 4.8?
> > >
> > > Thanks for support!
> > >
> > > Josef
> >
> > Between kernel v4.8 and v4.9 there are many changes, and it is unlikely that
> > commit 9a11204d2b26324636ff54f8d28095ed5dd17e91 is responsible for this.
>
> Let me add [email protected] and [email protected] ML.
>
> Josef, Please be sure that your e-mailer won't be sending e-mails with
> HTML tags on it, otherwise the ML server will automatically drop.
>
> > What would be really helpful is if you could find out which commit did
> > cause a regression. This can be done by bisecting the kernel. There are
> > various guides to this:
> >
> > https://wiki.ubuntu.com/Kernel/KernelBisection[https://wiki.ubuntu.com/Kernel/KernelBisection][https://wiki.ubuntu.com/Kernel/KernelBisection[https://wiki.ubuntu.com/Kernel/KernelBisection]]
> > or
> > https://wiki.archlinux.org/index.php/Bisecting_bugs[https://wiki.archlinux.org/index.php/Bisecting_bugs][https://wiki.archlinux.org/index.php/Bisecting_bugs[https://wiki.archlinux.org/index.php/Bisecting_bugs]]
> >
> > Once the commit has been identified we can work together to narrow it down
> > to the exact change, and then work together on a fix.
>
> Yeah, we need more data in order to start tracking it. I suspect,
> however, that a simple git bisect may not work in this case, due to the
> USB changes that forbids DMA on stack that was added to Kernel 4.9, if
> the card Josef is using was affected by such change.
>
> Probably, he'll need to disable CONFIG_VMAP_STACK in the middle
> of bisect (e. g. when the patch that implements it is added),
> or to cherry-pick any needed DMA fixup patch on the top of Kernel
> 4.8 before starting bisect.
>
> It is also worth mentioning what's the USB host controller that
> are used, and what's the media driver, as this could be an issue
> there.
>
> That's said, from the bug report, it seems that this is
> happening on RPi3. Could you please test it also on a PC? That
> will help to identify if the bug is at RPi's host driver or
> on media drivers.
>
> With regards to RPi3, there are actually two different drivers
> for it: one used on Raspbian Kernel, and another one upstream.
> They're completely different ones.
>
> What driver are you using?
>
> Thanks,
> Mauro



Thanks,
Mauro

2018-01-06 21:44:24

by Alan Stern

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Sat, 6 Jan 2018, Mauro Carvalho Chehab wrote:

> Hi Josef,
>
> Em Sat, 6 Jan 2018 16:04:16 +0100
> "Josef Griebichler" <[email protected]> escreveu:
>
> > Hi,
> >
> > the causing commit has been identified.
> > After reverting commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
> > its working again.
>
> Just replying to me won't magically fix this. The ones that were involved on
> this patch should also be c/c, plus USB people. Just added them.
>
> > Please have a look into the thread https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13
> > here are several users aknowledging the revert solves their issues with usb dvb cards.
>
> I read the entire (long) thread there. In order to make easier for the
> others, from what I understand, the problem happens on both x86 and arm,
> although almost all comments there are mentioning tests with raspbian
> Kernel (with uses a different USB host driver than the upstream one).
>
> It happens when watching digital TV DVB-C channels, with usually means
> a sustained bit rate of 11 MBps to 54 MBps.
>
> The reports mention the dvbsky, with uses USB URB bulk transfers.
> On every several minutes (5 to 10 mins), the stream suffer "glitches"
> caused by frame losses.
>
> The part of the thread that contains the bisect is at:
> https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965
>
> It indirectly mentions another comment on the thread with points
> to:
> https://github.com/raspberrypi/linux/issues/2134
>
> There, it says that this fix part of the issues:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4
>
> but it affects URB packet losses on a lesser extend.
>
> The main issue is really the logic changes a the core softirq logic.
>
> Using Kernel 4.14.10 on a Raspberry Pi 3 with 4cd13c2 commit reverted
> fixed the issue.
>
> Joseph, is the above right? Anything else to mention? Does the
> same issue affect also on x86 with vanilla Kernel 4.14.10?
>
> -
>
> It seems that the original patch were designed to solve some IRQ issues
> with network cards with causes data losses on high traffic. However,
> it is also causing bad effects on sustained high bandwidth demands
> required by DVB cards, at least on some USB host drivers.
>
> Alan/Greg/Eric/David:
>
> Any ideas about how to fix it without causing regressions to
> network?

It would be good to know what hardware was involved on the x86 system
and to have some timing data. Can we see the output from lsusb and
usbmon, running on a vanilla kernel that gets plenty of video glitches?

Overall, this may be a very difficult problem to solve. The
4cd13c21b207 commit was intended to improve throughput at the cost of
increased latency. But then what do you do when the latency becomes
too high for the video subsystem to handle?

Alan Stern

2018-01-07 11:03:55

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Em Sat, 6 Jan 2018 16:44:20 -0500 (EST)
Alan Stern <[email protected]> escreveu:

> On Sat, 6 Jan 2018, Mauro Carvalho Chehab wrote:
>
> > Hi Josef,
> >
> > Em Sat, 6 Jan 2018 16:04:16 +0100
> > "Josef Griebichler" <[email protected]> escreveu:
> >
> > > Hi,
> > >
> > > the causing commit has been identified.
> > > After reverting commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
> > > its working again.
> >
> > Just replying to me won't magically fix this. The ones that were involved on
> > this patch should also be c/c, plus USB people. Just added them.
> >
> > > Please have a look into the thread https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13
> > > here are several users aknowledging the revert solves their issues with usb dvb cards.
> >
> > I read the entire (long) thread there. In order to make easier for the
> > others, from what I understand, the problem happens on both x86 and arm,
> > although almost all comments there are mentioning tests with raspbian
> > Kernel (with uses a different USB host driver than the upstream one).
> >
> > It happens when watching digital TV DVB-C channels, with usually means
> > a sustained bit rate of 11 MBps to 54 MBps.
> >
> > The reports mention the dvbsky, with uses USB URB bulk transfers.
> > On every several minutes (5 to 10 mins), the stream suffer "glitches"
> > caused by frame losses.
> >
> > The part of the thread that contains the bisect is at:
> > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965
> >
> > It indirectly mentions another comment on the thread with points
> > to:
> > https://github.com/raspberrypi/linux/issues/2134
> >
> > There, it says that this fix part of the issues:
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4
> >
> > but it affects URB packet losses on a lesser extend.
> >
> > The main issue is really the logic changes a the core softirq logic.
> >
> > Using Kernel 4.14.10 on a Raspberry Pi 3 with 4cd13c2 commit reverted
> > fixed the issue.
> >
> > Joseph, is the above right? Anything else to mention? Does the
> > same issue affect also on x86 with vanilla Kernel 4.14.10?
> >
> > -
> >
> > It seems that the original patch were designed to solve some IRQ issues
> > with network cards with causes data losses on high traffic. However,
> > it is also causing bad effects on sustained high bandwidth demands
> > required by DVB cards, at least on some USB host drivers.
> >
> > Alan/Greg/Eric/David:
> >
> > Any ideas about how to fix it without causing regressions to
> > network?
>
> It would be good to know what hardware was involved on the x86 system
> and to have some timing data. Can we see the output from lsusb and
> usbmon, running on a vanilla kernel that gets plenty of video glitches?

>From Josef's report, and from the BZ, the affected hardware seems
to be based on Montage Technology M88DS3103/M88TS2022 chipset.
The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c,
with shares a USB implementation that is used by a lot more drivers.
The URB handling code is at:

drivers/media/usb/dvb-usb-v2/usb_urb.c

This particular driver allocates 8 buffers with 4096 bytes each
for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP.

This become a popular USB hardware nowadays. I have one S960c
myself, so I can send you the lsusb from it. You should notice, however,
that a DVB-C/DVB-S2 channel can easily provide very high sustained bit
rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps
of payload after removing URB headers. A 10 minutes record with the
entire data (with typically contains 5-10 channels) can easily go
above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if
a usbmon dump would be useful.

I'm enclosing the lsusb from a S960C device, with is based on those
Montage chipsets:

Bus 002 Device 007: ID 0572:960c Conexant Systems (Rockwell), Inc. DVBSky S960C DVB-S2 tuner
Couldn't open device, some information will be missing
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x0572 Conexant Systems (Rockwell), Inc.
idProduct 0x960c DVBSky S960C DVB-S2 tuner
bcdDevice 0.00
iManufacturer 1
iProduct 2
iSerial 3
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 219
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 4
bmAttributes 0x80
(Bus Powered)
MaxPower 500mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 1
bInterfaceProtocol 1
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 1
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 1
bInterfaceProtocol 1
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 3
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x13f2 3x 1010 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 2
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 1
bInterfaceProtocol 1
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 3
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x12d6 3x 726 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 3
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 1
bInterfaceProtocol 1
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 3
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x12ae 3x 686 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 4
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 1
bInterfaceProtocol 1
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 3
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x03ca 1x 970 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 5
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 1
bInterfaceProtocol 1
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 3
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x02ac 1x 684 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 6
bNumEndpoints 3
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 1
bInterfaceProtocol 1
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x81 EP 1 IN
bmAttributes 3
Transfer Type Interrupt
Synch Type None
Usage Type Data
wMaxPacketSize 0x0040 1x 64 bytes
bInterval 3
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x01 EP 1 OUT
bmAttributes 2
Transfer Type Bulk
Synch Type None
Usage Type Data
wMaxPacketSize 0x0200 1x 512 bytes
bInterval 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x82 EP 2 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x03ac 1x 940 bytes
bInterval 1

> Overall, this may be a very difficult problem to solve. The
> 4cd13c21b207 commit was intended to improve throughput at the cost of
> increased latency. But then what do you do when the latency becomes
> too high for the video subsystem to handle?

Latency can't be too high, otherwise frames will be dropped.
Even if the Kernel itself doesn't drop, if the delay goes higher
than a certain threshold, userspace will need to drop, as it
should be presenting audio and video on real time. Yet, typically,
userspace will delay it by one or two seconds, with would mean
1500-3500 buffers, with I suspect it is a lot more than the hardware
limits. So I suspect that the hardware starves free buffers a way
before userspace, as media hardware don't have unlimited buffers
inside them, as they assume that the Kernel/userspace will be fast
enough to sustain bit rates up to 66 Mbps of payload.

Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP,
in order to revert the kernel logic to prioritize latency instead of
throughput.

Thanks,
Mauro

2018-01-07 15:41:41

by Alan Stern

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote:

> > > It seems that the original patch were designed to solve some IRQ issues
> > > with network cards with causes data losses on high traffic. However,
> > > it is also causing bad effects on sustained high bandwidth demands
> > > required by DVB cards, at least on some USB host drivers.
> > >
> > > Alan/Greg/Eric/David:
> > >
> > > Any ideas about how to fix it without causing regressions to
> > > network?
> >
> > It would be good to know what hardware was involved on the x86 system
> > and to have some timing data. Can we see the output from lsusb and
> > usbmon, running on a vanilla kernel that gets plenty of video glitches?
>
> From Josef's report, and from the BZ, the affected hardware seems
> to be based on Montage Technology M88DS3103/M88TS2022 chipset.

What type of USB host controller does the x86_64 system use? EHCI or
xHCI?

> The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c,
> with shares a USB implementation that is used by a lot more drivers.
> The URB handling code is at:
>
> drivers/media/usb/dvb-usb-v2/usb_urb.c
>
> This particular driver allocates 8 buffers with 4096 bytes each
> for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP.
>
> This become a popular USB hardware nowadays. I have one S960c
> myself, so I can send you the lsusb from it. You should notice, however,
> that a DVB-C/DVB-S2 channel can easily provide very high sustained bit
> rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps
> of payload after removing URB headers.

You mentioned earlier that the driver uses bulk transfers. In USB-2.0,
the maximum possible payload data transfer rate using bulk transfers is
53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s). And
even this is possible only if almost nothing else is using the bus at
the same time.

> A 10 minutes record with the
> entire data (with typically contains 5-10 channels) can easily go
> above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if
> a usbmon dump would be useful.

It might not be helpful at all. However, I'm not interested in the
payload data (which would be unintelligible to me anyway) but rather
the timing of URB submissions and completions. A usbmon trace which
didn't keep much of the payload data would only require on the order of
50 MB per minute -- and Josef said that glitches usually would show up
within a minute or so.

> I'm enclosing the lsusb from a S960C device, with is based on those
> Montage chipsets:

What I wanted to see was the output from "lsusb" on the affected
system, not the output from "lsusb -v -s B:D" on your system.

> > Overall, this may be a very difficult problem to solve. The
> > 4cd13c21b207 commit was intended to improve throughput at the cost of
> > increased latency. But then what do you do when the latency becomes
> > too high for the video subsystem to handle?
>
> Latency can't be too high, otherwise frames will be dropped.

Yes, that's the whole point.

> Even if the Kernel itself doesn't drop, if the delay goes higher
> than a certain threshold, userspace will need to drop, as it
> should be presenting audio and video on real time. Yet, typically,
> userspace will delay it by one or two seconds, with would mean
> 1500-3500 buffers, with I suspect it is a lot more than the hardware
> limits. So I suspect that the hardware starves free buffers a way
> before userspace, as media hardware don't have unlimited buffers
> inside them, as they assume that the Kernel/userspace will be fast
> enough to sustain bit rates up to 66 Mbps of payload.

The timing information would tell us how large the latency is.

In any case, you might be able to attack the problem simply by using
more than 8 buffers. With just eight 4096-byte buffers, the total
pipeline capacity is only about 0.62 ms (at the maximum possible
transfer rate). Increasing the number of buffers to 65 would give a
capacity of 5 ms, which is probably a lot better suited for situations
where completions are handled by the ksoftirqd thread.

> Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP,
> in order to revert the kernel logic to prioritize latency instead of
> throughput.

It can't be done without pervasive changes to the USB subsystem, which
I would greatly prefer to avoid. Besides, this wouldn't really solve
the problem. Decreasing the latency for one device will cause it to be
increased for others.

Alan Stern

2018-01-07 17:02:30

by Josef Griebichler

[permalink] [raw]
Subject: Aw: Re: dvb usb issues since kernel 4.9

Hi,

here I provide lsusb from my affected hardware (technotrend s2-4600).
http://ix.io/DLY

With this hardware I had errors when recording with tvheadend. Livetv was ok, only channel switching made some problems sometimes. Please see attached tvheadend service logs.

I also provide dmesg (libreelec on rpi3 with kernel 4.14.10 with revert of the mentioned commit).
http://ix.io/DM2


Regards
Josef
 
 

Gesendet: Sonntag, 07. Januar 2018 um 16:41 Uhr
Von: "Alan Stern" <[email protected]>
An: "Mauro Carvalho Chehab" <[email protected]>
Cc: "Josef Griebichler" <[email protected]>, "Greg Kroah-Hartman" <[email protected]>, [email protected], "Eric Dumazet" <[email protected]>, "Rik van Riel" <[email protected]>, "Paolo Abeni" <[email protected]>, "Hannes Frederic Sowa" <[email protected]>, "Jesper Dangaard Brouer" <[email protected]>, linux-kernel <[email protected]>, netdev <[email protected]>, "Jonathan Corbet" <[email protected]>, LMML <[email protected]>, "Peter Zijlstra" <[email protected]>, "David Miller" <[email protected]>, [email protected]
Betreff: Re: dvb usb issues since kernel 4.9
On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote: > > > It seems that the original patch were designed to solve some IRQ issues > > > with network cards with causes data losses on high traffic. However, > > > it is also causing bad effects on sustained high bandwidth demands > > > required by DVB cards, at least on some USB host drivers. > > > > > > Alan/Greg/Eric/David: > > > > > > Any ideas about how to fix it without causing regressions to > > > network? > > > > It would be good to know what hardware was involved on the x86 system > > and to have some timing data. Can we see the output from lsusb and > > usbmon, running on a vanilla kernel that gets plenty of video glitches? > > From Josef's report, and from the BZ, the affected hardware seems > to be based on Montage Technology M88DS3103/M88TS2022 chipset. What type of USB host controller does the x86_64 system use? EHCI or xHCI? > The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c, > with shares a USB implementation that is used by a lot more drivers. > The URB handling code is at: > > drivers/media/usb/dvb-usb-v2/usb_urb.c > > This particular driver allocates 8 buffers with 4096 bytes each > for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP. > > This become a popular USB hardware nowadays. I have one S960c > myself, so I can send you the lsusb from it. You should notice, however, > that a DVB-C/DVB-S2 channel can easily provide very high sustained bit > rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps > of payload after removing URB headers. You mentioned earlier that the driver uses bulk transfers. In USB-2.0, the maximum possible payload data transfer rate using bulk transfers is 53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s). And even this is possible only if almost nothing else is using the bus at the same time. > A 10 minutes record with the > entire data (with typically contains 5-10 channels) can easily go > above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if > a usbmon dump would be useful. It might not be helpful at all. However, I'm not interested in the payload data (which would be unintelligible to me anyway) but rather the timing of URB submissions and completions. A usbmon trace which didn't keep much of the payload data would only require on the order of 50 MB per minute -- and Josef said that glitches usually would show up within a minute or so. > I'm enclosing the lsusb from a S960C device, with is based on those > Montage chipsets: What I wanted to see was the output from "lsusb" on the affected system, not the output from "lsusb -v -s B:D" on your system. > > Overall, this may be a very difficult problem to solve. The > > 4cd13c21b207 commit was intended to improve throughput at the cost of > > increased latency. But then what do you do when the latency becomes > > too high for the video subsystem to handle? > > Latency can't be too high, otherwise frames will be dropped. Yes, that's the whole point. > Even if the Kernel itself doesn't drop, if the delay goes higher > than a certain threshold, userspace will need to drop, as it > should be presenting audio and video on real time. Yet, typically, > userspace will delay it by one or two seconds, with would mean > 1500-3500 buffers, with I suspect it is a lot more than the hardware > limits. So I suspect that the hardware starves free buffers a way > before userspace, as media hardware don't have unlimited buffers > inside them, as they assume that the Kernel/userspace will be fast > enough to sustain bit rates up to 66 Mbps of payload. The timing information would tell us how large the latency is. In any case, you might be able to attack the problem simply by using more than 8 buffers. With just eight 4096-byte buffers, the total pipeline capacity is only about 0.62 ms (at the maximum possible transfer rate). Increasing the number of buffers to 65 would give a capacity of 5 ms, which is probably a lot better suited for situations where completions are handled by the ksoftirqd thread. > Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP, > in order to revert the kernel logic to prioritize latency instead of > throughput. It can't be done without pervasive changes to the USB subsystem, which I would greatly prefer to avoid. Besides, this wouldn't really solve the problem. Decreasing the latency for one device will cause it to be increased for others. Alan Stern


Attachments:
service.log (42.83 kB)
service0.log (71.39 kB)
Download all attachments

2018-01-07 21:23:43

by Linus Torvalds

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Sat, Jan 6, 2018 at 11:54 AM, Mauro Carvalho Chehab
<[email protected]> wrote:
>
> Em Sat, 6 Jan 2018 16:04:16 +0100
> "Josef Griebichler" <[email protected]> escreveu:
>>
>> the causing commit has been identified.
>> After reverting commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
>> its working again.
>
> Just replying to me won't magically fix this. The ones that were involved on
> this patch should also be c/c, plus USB people. Just added them.

Actually, you seem to have added an odd subset of the people involved.

For example, Ingo - who actually committed that patch - wasn't on the cc.

I do think we need to simply revert that patch. It's very simple: it
has been reported to lead to actual problems for people, and we don't
fix one problem and then say "well, it fixed something else" when
something breaks.

When something breaks, we either unbreak it, or we revert the change
that caused the breakage.

It's really that simple. That's what "no regressions" means. We don't
accept changes that cause regressions. This one did.

Linus

2018-01-08 09:43:43

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Em Sun, 7 Jan 2018 10:41:37 -0500 (EST)
Alan Stern <[email protected]> escreveu:

> On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote:
>
> > > > It seems that the original patch were designed to solve some IRQ issues
> > > > with network cards with causes data losses on high traffic. However,
> > > > it is also causing bad effects on sustained high bandwidth demands
> > > > required by DVB cards, at least on some USB host drivers.
> > > >
> > > > Alan/Greg/Eric/David:
> > > >
> > > > Any ideas about how to fix it without causing regressions to
> > > > network?
> > >
> > > It would be good to know what hardware was involved on the x86 system
> > > and to have some timing data. Can we see the output from lsusb and
> > > usbmon, running on a vanilla kernel that gets plenty of video glitches?
> >
> > From Josef's report, and from the BZ, the affected hardware seems
> > to be based on Montage Technology M88DS3103/M88TS2022 chipset.
>
> What type of USB host controller does the x86_64 system use? EHCI or
> xHCI?

I'll let Josef answer this.

>
> > The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c,
> > with shares a USB implementation that is used by a lot more drivers.
> > The URB handling code is at:
> >
> > drivers/media/usb/dvb-usb-v2/usb_urb.c
> >
> > This particular driver allocates 8 buffers with 4096 bytes each
> > for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP.
> >
> > This become a popular USB hardware nowadays. I have one S960c
> > myself, so I can send you the lsusb from it. You should notice, however,
> > that a DVB-C/DVB-S2 channel can easily provide very high sustained bit
> > rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps
> > of payload after removing URB headers.
>
> You mentioned earlier that the driver uses bulk transfers. In USB-2.0,
> the maximum possible payload data transfer rate using bulk transfers is
> 53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s). And
> even this is possible only if almost nothing else is using the bus at
> the same time.

No, I said 58 Mbits/s (not bytes).

On DVB-C and DVB-S2 specs, AFAIKT, there's no hard limit for the maximum
payload data rate, although industry seems to limit it to be around
60 Mbits/s. On those standards, the maximal bit rate is defined by the
modulation type and by the channel symbol rate.

To give you a practical example, my DVB-S2 provider modulates each
transponder with 8/PSK (3 bits/symbol), and define channels with a
symbol rate of 30 Mbauds/s. So, it could, theoretically, transport
a MPEG-TS stream up to 90 Mbits/s (minus headers and guard intervals).
In practice, the streams there are transmitted with 58,026.5 Kbits/s.

> > A 10 minutes record with the
> > entire data (with typically contains 5-10 channels) can easily go
> > above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if
> > a usbmon dump would be useful.
>
> It might not be helpful at all. However, I'm not interested in the
> payload data (which would be unintelligible to me anyway) but rather
> the timing of URB submissions and completions. A usbmon trace which
> didn't keep much of the payload data would only require on the order of
> 50 MB per minute -- and Josef said that glitches usually would show up
> within a minute or so.

Yeah, this could help.

Josef,

You can get it with wireshark/tshark or tcpdump. See:
https://technolinchpin.wordpress.com/2015/10/23/usb-bus-sniffers-for-linux-system/
https://wiki.wireshark.org/CaptureSetup/USB

> > I'm enclosing the lsusb from a S960C device, with is based on those
> > Montage chipsets:
>
> What I wanted to see was the output from "lsusb" on the affected
> system, not the output from "lsusb -v -s B:D" on your system.
>
> > > Overall, this may be a very difficult problem to solve. The
> > > 4cd13c21b207 commit was intended to improve throughput at the cost of
> > > increased latency. But then what do you do when the latency becomes
> > > too high for the video subsystem to handle?
> >
> > Latency can't be too high, otherwise frames will be dropped.
>
> Yes, that's the whole point.
>
> > Even if the Kernel itself doesn't drop, if the delay goes higher
> > than a certain threshold, userspace will need to drop, as it
> > should be presenting audio and video on real time. Yet, typically,
> > userspace will delay it by one or two seconds, with would mean
> > 1500-3500 buffers, with I suspect it is a lot more than the hardware
> > limits. So I suspect that the hardware starves free buffers a way
> > before userspace, as media hardware don't have unlimited buffers
> > inside them, as they assume that the Kernel/userspace will be fast
> > enough to sustain bit rates up to 66 Mbps of payload.
>
> The timing information would tell us how large the latency is.
>
> In any case, you might be able to attack the problem simply by using
> more than 8 buffers. With just eight 4096-byte buffers, the total
> pipeline capacity is only about 0.62 ms (at the maximum possible
> transfer rate). Increasing the number of buffers to 65 would give a
> capacity of 5 ms, which is probably a lot better suited for situations
> where completions are handled by the ksoftirqd thread.

Increasing it to 65 shouldn't be hard. Not sure, however, if the hardware
will actually fill the 65 buffers, but it is worth to try.

> > Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP,
> > in order to revert the kernel logic to prioritize latency instead of
> > throughput.
>
> It can't be done without pervasive changes to the USB subsystem, which
> I would greatly prefer to avoid. Besides, this wouldn't really solve
> the problem. Decreasing the latency for one device will cause it to be
> increased for others.

If there is a TV streaming traffic at a USB bus, it means that the
user wants to either watch and/or record a TV program. On such
usecase scenario, a low latency is highly desired for the TV capture
(and display, if the GPU is USB), even it means a higher latency for
other traffic.

Josef,

Could you please try the following patch on Kernel 4.14.10 (without
reverting any changesets), and see if it fixes the issue?


media: dvbsky: Increase the number of buffers

Right now, This driver expects a 0.62 ms delay with 8 buffers on an USB 2.0
high speed bus. Increase it to 65 buffers, in order to give more time for
the top half of the USB transfer handler to complete its task.

Suggested-by: Alan Stern <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

diff --git a/drivers/media/usb/dvb-usb-v2/dvbsky.c b/drivers/media/usb/dvb-usb-v2/dvbsky.c
index 131b6c08e199..d3f5ffc54b25 100644
--- a/drivers/media/usb/dvb-usb-v2/dvbsky.c
+++ b/drivers/media/usb/dvb-usb-v2/dvbsky.c
@@ -740,7 +740,7 @@ static struct dvb_usb_device_properties dvbsky_s960_props = {
.num_adapters = 1,
.adapter = {
{
- .stream = DVB_USB_STREAM_BULK(0x82, 8, 4096),
+ .stream = DVB_USB_STREAM_BULK(0x82, 65, 4096),
}
}
};


>



Thanks,
Mauro

2018-01-08 10:02:18

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Hi Linus,

Em Sun, 7 Jan 2018 13:23:39 -0800
Linus Torvalds <[email protected]> escreveu:

> On Sat, Jan 6, 2018 at 11:54 AM, Mauro Carvalho Chehab
> <[email protected]> wrote:
> >
> > Em Sat, 6 Jan 2018 16:04:16 +0100
> > "Josef Griebichler" <[email protected]> escreveu:
> >>
> >> the causing commit has been identified.
> >> After reverting commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
> >> its working again.
> >
> > Just replying to me won't magically fix this. The ones that were involved on
> > this patch should also be c/c, plus USB people. Just added them.
>
> Actually, you seem to have added an odd subset of the people involved.
>
> For example, Ingo - who actually committed that patch - wasn't on the cc.

Sorry, my fault. I forgot to add him to it.

> I do think we need to simply revert that patch. It's very simple: it
> has been reported to lead to actual problems for people, and we don't
> fix one problem and then say "well, it fixed something else" when
> something breaks.
>
> When something breaks, we either unbreak it, or we revert the change
> that caused the breakage.
>
> It's really that simple. That's what "no regressions" means. We don't
> accept changes that cause regressions. This one did.

Yeah, we should either unbreak or revert it. In the specific case of
media devices, Alan came with a proposal of increasing the number of
buffers. This is an one line change, and increase a capture delay from
0.63 ms to 5 ms on this specific case (Digital TV) shouldn't make much
harm. So, I guess it would worth trying it before reverting the patch.

It is hard to foresee the consequences of the softirq changes for other
devices, though.

For example, we didn't have any reports about this issue affecting cameras,
Most cameras use ISOC nowadays, but some only provide bulk transfers.
We usually try to use the minimum number of buffers possible, as
increasing latency on cameras can be very annoying, specially on
videoconference applications.

Thanks,
Mauro

2018-01-08 11:59:27

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Mon, 8 Jan 2018 08:02:00 -0200
Mauro Carvalho Chehab <[email protected]> wrote:

> Hi Linus,
>
> Em Sun, 7 Jan 2018 13:23:39 -0800
> Linus Torvalds <[email protected]> escreveu:
>
> > On Sat, Jan 6, 2018 at 11:54 AM, Mauro Carvalho Chehab
> > <[email protected]> wrote:
> > >
> > > Em Sat, 6 Jan 2018 16:04:16 +0100
> > > "Josef Griebichler" <[email protected]> escreveu:
> > >>
> > >> the causing commit has been identified.
> > >> After reverting commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
> > >> its working again.
> > >
> > > Just replying to me won't magically fix this. The ones that were involved on
> > > this patch should also be c/c, plus USB people. Just added them.
> >
> > Actually, you seem to have added an odd subset of the people involved.
> >
> > For example, Ingo - who actually committed that patch - wasn't on the cc.
>
> Sorry, my fault. I forgot to add him to it.
>
> > I do think we need to simply revert that patch. It's very simple: it
> > has been reported to lead to actual problems for people, and we don't
> > fix one problem and then say "well, it fixed something else" when
> > something breaks.
> >
> > When something breaks, we either unbreak it, or we revert the change
> > that caused the breakage.
> >
> > It's really that simple. That's what "no regressions" means. We don't
> > accept changes that cause regressions. This one did.
>
> Yeah, we should either unbreak or revert it. In the specific case of
> media devices, Alan came with a proposal of increasing the number of
> buffers. This is an one line change, and increase a capture delay from
> 0.63 ms to 5 ms on this specific case (Digital TV) shouldn't make much
> harm. So, I guess it would worth trying it before reverting the patch.

Let find the root-cause of this before reverting, as this will hurt the
networking use-case.

I want to see if the increase buffer will solve the issue (the current
buffer of 0.63 ms seem too small).

I would also like to see experiments with adjusting adjust the sched
priority of the kthread's and/or the userspace prog. (e.g use command
like 'sudo chrt --fifo -p 10 $(pgrep udp_sink)' ).


Are we really sure that the regression is cause by 4cd13c21b207
("softirq: Let ksoftirqd do its job"), the forum thread also report
that the problem is almost gone after commit 34f41c0316ed ("timers: Fix
overflow in get_next_timer_interrupt")
https://git.kernel.org/torvalds/c/34f41c0316ed

It makes me suspicious that this fix changes things...
After this fix, I suspect that changing the sched priorities, will fix
the remaining glitches.


> It is hard to foresee the consequences of the softirq changes for other
> devices, though.

Yes, it is hard to foresee, I can only cover networking.

For networking, if reverting this, we will (again) open the kernel for
an easy DDoS vector with UDP packets. As mentioned in the commit desc,
before you could easily cause softirq to take all the CPU time from the
application, resulting in very low "good-put" in the UDP-app. (That's why
it was so easy to DDoS DNS servers before...)

With the softirqd patch in place, ksoftirqd is scheduled fairly between
other applications running on the same CPU. But in some cases this is
not what you want, so as the also commit mentions, the admin can now
more easily tune process scheduling parameters if needed, to adjust for
such use-cases (it was not really an admin choice before).


> For example, we didn't have any reports about this issue affecting cameras,
> Most cameras use ISOC nowadays, but some only provide bulk transfers.
> We usually try to use the minimum number of buffers possible, as
> increasing latency on cameras can be very annoying, specially on
> videoconference applications.

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2018-01-08 12:54:08

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Em Mon, 8 Jan 2018 12:59:10 +0100
Jesper Dangaard Brouer <[email protected]> escreveu:

> On Mon, 8 Jan 2018 08:02:00 -0200
> Mauro Carvalho Chehab <[email protected]> wrote:
>
> > Hi Linus,
> >
> > Em Sun, 7 Jan 2018 13:23:39 -0800
> > Linus Torvalds <[email protected]> escreveu:
> >
> > > On Sat, Jan 6, 2018 at 11:54 AM, Mauro Carvalho Chehab
> > > <[email protected]> wrote:
> > > >
> > > > Em Sat, 6 Jan 2018 16:04:16 +0100
> > > > "Josef Griebichler" <[email protected]> escreveu:
> > > >>
> > > >> the causing commit has been identified.
> > > >> After reverting commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
> > > >> its working again.
> > > >
> > > > Just replying to me won't magically fix this. The ones that were involved on
> > > > this patch should also be c/c, plus USB people. Just added them.
> > >
> > > Actually, you seem to have added an odd subset of the people involved.
> > >
> > > For example, Ingo - who actually committed that patch - wasn't on the cc.
> >
> > Sorry, my fault. I forgot to add him to it.
> >
> > > I do think we need to simply revert that patch. It's very simple: it
> > > has been reported to lead to actual problems for people, and we don't
> > > fix one problem and then say "well, it fixed something else" when
> > > something breaks.
> > >
> > > When something breaks, we either unbreak it, or we revert the change
> > > that caused the breakage.
> > >
> > > It's really that simple. That's what "no regressions" means. We don't
> > > accept changes that cause regressions. This one did.
> >
> > Yeah, we should either unbreak or revert it. In the specific case of
> > media devices, Alan came with a proposal of increasing the number of
> > buffers. This is an one line change, and increase a capture delay from
> > 0.63 ms to 5 ms on this specific case (Digital TV) shouldn't make much
> > harm. So, I guess it would worth trying it before reverting the patch.
>
> Let find the root-cause of this before reverting, as this will hurt the
> networking use-case.
>
> I want to see if the increase buffer will solve the issue (the current
> buffer of 0.63 ms seem too small).

For TV, high latency has mainly two practical consequences:

1) it increases the time to switch channels. MPEG-TS based transmissions
usually takes some time to start showing the channel contents. Adding
more buffers make it worse;

2) specially when watching sports, a higher latency means that you'll know
that your favorite team made a score when your neighbors start
celebrating... seeing the actual event only after them.

So, the lower, the merrier, but I think that 5 ms would be acceptable.

> I would also like to see experiments with adjusting adjust the sched
> priority of the kthread's and/or the userspace prog. (e.g use command
> like 'sudo chrt --fifo -p 10 $(pgrep udp_sink)' ).

If this fixes the issue, we'll need to do something inside the Kernel
to change the priority, as TV userspace apps should not run as root. Not
sure where such change should be done (USB? media?).

> Are we really sure that the regression is cause by 4cd13c21b207
> ("softirq: Let ksoftirqd do its job"), the forum thread also report
> that the problem is almost gone after commit 34f41c0316ed ("timers: Fix
> overflow in get_next_timer_interrupt")
> https://git.kernel.org/torvalds/c/34f41c0316ed

I'll see if I can mount a test scenario here in order to try reproduce
the reported bug. I suspect that I won't be able to reproduce it on my
"standard" i7core-based test machine, even with KPTI enabled.

> It makes me suspicious that this fix changes things...
> After this fix, I suspect that changing the sched priorities, will fix
> the remaining glitches.
>
>
> > It is hard to foresee the consequences of the softirq changes for other
> > devices, though.
>
> Yes, it is hard to foresee, I can only cover networking.
>
> For networking, if reverting this, we will (again) open the kernel for
> an easy DDoS vector with UDP packets. As mentioned in the commit desc,
> before you could easily cause softirq to take all the CPU time from the
> application, resulting in very low "good-put" in the UDP-app. (That's why
> it was so easy to DDoS DNS servers before...)
>
> With the softirqd patch in place, ksoftirqd is scheduled fairly between
> other applications running on the same CPU. But in some cases this is
> not what you want, so as the also commit mentions, the admin can now
> more easily tune process scheduling parameters if needed, to adjust for
> such use-cases (it was not really an admin choice before).

Can't the ksoftirq patch be modified to only apply to the networking
IRQ handling? That sounds less risky of affecting unrelated subsystems[1].

[1] Actually, DVB drivers can also implement networking for satellite
based Internet, but, in this case, the top half is implemented inside
the DVB core, as the IP traffic should be filtered out of an MPEG-TS
stream. Not sure if the UDP DDoS attack you're mentioning would affect
DVB net, but I guess not. AFAIKT, there aren't many users using DVB net
nowadays. I don't have any easy way to test DVB net here.

Thanks,
Mauro

2018-01-08 16:10:06

by Alan Stern

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Mon, 8 Jan 2018, Mauro Carvalho Chehab wrote:

> Em Sun, 7 Jan 2018 10:41:37 -0500 (EST)
> Alan Stern <[email protected]> escreveu:
>
> > On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote:
> >
> > > > > It seems that the original patch were designed to solve some IRQ issues
> > > > > with network cards with causes data losses on high traffic. However,
> > > > > it is also causing bad effects on sustained high bandwidth demands
> > > > > required by DVB cards, at least on some USB host drivers.
> > > > >
> > > > > Alan/Greg/Eric/David:
> > > > >
> > > > > Any ideas about how to fix it without causing regressions to
> > > > > network?
> > > >
> > > > It would be good to know what hardware was involved on the x86 system
> > > > and to have some timing data. Can we see the output from lsusb and
> > > > usbmon, running on a vanilla kernel that gets plenty of video glitches?
> > >
> > > From Josef's report, and from the BZ, the affected hardware seems
> > > to be based on Montage Technology M88DS3103/M88TS2022 chipset.
> >
> > What type of USB host controller does the x86_64 system use? EHCI or
> > xHCI?
>
> I'll let Josef answer this.
>
> >
> > > The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c,
> > > with shares a USB implementation that is used by a lot more drivers.
> > > The URB handling code is at:
> > >
> > > drivers/media/usb/dvb-usb-v2/usb_urb.c
> > >
> > > This particular driver allocates 8 buffers with 4096 bytes each
> > > for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP.
> > >
> > > This become a popular USB hardware nowadays. I have one S960c
> > > myself, so I can send you the lsusb from it. You should notice, however,
> > > that a DVB-C/DVB-S2 channel can easily provide very high sustained bit
> > > rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps
> > > of payload after removing URB headers.
> >
> > You mentioned earlier that the driver uses bulk transfers. In USB-2.0,
> > the maximum possible payload data transfer rate using bulk transfers is
> > 53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s). And
> > even this is possible only if almost nothing else is using the bus at
> > the same time.
>
> No, I said 58 Mbits/s (not bytes).

Well, what you actually _wrote_ was "58 Mpps of payload" (see above),
and I couldn't tell how to interpret that. :-)

58 Mb/s is obviously almost 8 times less than the full USB bus
bandwidth.

> On DVB-C and DVB-S2 specs, AFAIKT, there's no hard limit for the maximum
> payload data rate, although industry seems to limit it to be around
> 60 Mbits/s. On those standards, the maximal bit rate is defined by the
> modulation type and by the channel symbol rate.
>
> To give you a practical example, my DVB-S2 provider modulates each
> transponder with 8/PSK (3 bits/symbol), and define channels with a
> symbol rate of 30 Mbauds/s. So, it could, theoretically, transport
> a MPEG-TS stream up to 90 Mbits/s (minus headers and guard intervals).
> In practice, the streams there are transmitted with 58,026.5 Kbits/s.

Okay. This is 58 Kb/ms or 7.25 KB/ms. So your scheme of eight 4-KB
buffers gives a latency of 0.57 ms with a total capacity of 4.5 ms,
which is a lot better than what I was thinking.

> > In any case, you might be able to attack the problem simply by using
> > more than 8 buffers. With just eight 4096-byte buffers, the total
> > pipeline capacity is only about 0.62 ms (at the maximum possible
> > transfer rate). Increasing the number of buffers to 65 would give a
> > capacity of 5 ms, which is probably a lot better suited for situations
> > where completions are handled by the ksoftirqd thread.
>
> Increasing it to 65 shouldn't be hard. Not sure, however, if the hardware
> will actually fill the 65 buffers, but it is worth to try.

Given the new information, 65 would be overkill. But going from 8 to
16 might help.

> > > Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP,
> > > in order to revert the kernel logic to prioritize latency instead of
> > > throughput.
> >
> > It can't be done without pervasive changes to the USB subsystem, which
> > I would greatly prefer to avoid. Besides, this wouldn't really solve
> > the problem. Decreasing the latency for one device will cause it to be
> > increased for others.
>
> If there is a TV streaming traffic at a USB bus, it means that the
> user wants to either watch and/or record a TV program. On such
> usecase scenario, a low latency is highly desired for the TV capture
> (and display, if the GPU is USB), even it means a higher latency for
> other traffic.

Not if the other traffic is also a TV capture. :-)

It might make sense to classify softirq sources as "high priority" or
"low priority", and only defer the "low priority" work to ksoftirqd.

Alan Stern

2018-01-08 16:25:48

by Alan Stern

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Mon, 8 Jan 2018, Mauro Carvalho Chehab wrote:

> > Let find the root-cause of this before reverting, as this will hurt the
> > networking use-case.
> >
> > I want to see if the increase buffer will solve the issue (the current
> > buffer of 0.63 ms seem too small).
>
> For TV, high latency has mainly two practical consequences:
>
> 1) it increases the time to switch channels. MPEG-TS based transmissions
> usually takes some time to start showing the channel contents. Adding
> more buffers make it worse;
>
> 2) specially when watching sports, a higher latency means that you'll know
> that your favorite team made a score when your neighbors start
> celebrating... seeing the actual event only after them.
>
> So, the lower, the merrier, but I think that 5 ms would be acceptable.

That value 65 for the number of buffers was calculated based on a
misunderstanding of the actual bandwidth requirement. Still increasing
the number of buffers shouldn't hurt, and it's worth trying.

But there is another misunderstanding here which needs to be cleared
up. Adding more buffers does _not_ increase latency; it increases
capacity. Making each buffer larger _would_ increase latency, but
that's not what I proposed.

Going through this more explicitly... Suppose you receive 8 KB of data
every ms, and suppose you have four 8-KB buffers. Then the latency is
1 ms, because that's how long you have to wait for the first buffer to
be filled up after you submit an I/O request. (The driver does _not_
need to wait for all four buffers to be filled before it can start
displaying the data in the first buffer.) The capacity would be 4 ms,
because that's how much data your buffers can store. If you end up
waiting longer than 4 ms before ksoftirqd gets around to processing any
of the data, then some data will inevitably get lost.

That's why the way to deal with the delays caused by deferring softirqs
to ksoftirqd is to add more buffers (and not make the buffers larger
than they already are).

> > I would also like to see experiments with adjusting adjust the sched
> > priority of the kthread's and/or the userspace prog. (e.g use command
> > like 'sudo chrt --fifo -p 10 $(pgrep udp_sink)' ).
>
> If this fixes the issue, we'll need to do something inside the Kernel
> to change the priority, as TV userspace apps should not run as root. Not
> sure where such change should be done (USB? media?).

It would be interesting to try this, but I agree that it's not likely
to be a practical solution. Anyway, shouldn't ksoftirqd already be
running with very high priority?

> > Are we really sure that the regression is cause by 4cd13c21b207
> > ("softirq: Let ksoftirqd do its job"), the forum thread also report
> > that the problem is almost gone after commit 34f41c0316ed ("timers: Fix
> > overflow in get_next_timer_interrupt")
> > https://git.kernel.org/torvalds/c/34f41c0316ed

That is a good point. It's hard to see how the issues in the two
commits could be related, but who knows?

> I'll see if I can mount a test scenario here in order to try reproduce
> the reported bug. I suspect that I won't be able to reproduce it on my
> "standard" i7core-based test machine, even with KPTI enabled.

If you're using the same sort of hardware as Josef, under similar
circumstances, the buggy bahavior should be the same. If not, there
must be something else going on that we're not aware of.

> > It makes me suspicious that this fix changes things...
> > After this fix, I suspect that changing the sched priorities, will fix
> > the remaining glitches.
> >
> >
> > > It is hard to foresee the consequences of the softirq changes for other
> > > devices, though.
> >
> > Yes, it is hard to foresee, I can only cover networking.
> >
> > For networking, if reverting this, we will (again) open the kernel for
> > an easy DDoS vector with UDP packets. As mentioned in the commit desc,
> > before you could easily cause softirq to take all the CPU time from the
> > application, resulting in very low "good-put" in the UDP-app. (That's why
> > it was so easy to DDoS DNS servers before...)
> >
> > With the softirqd patch in place, ksoftirqd is scheduled fairly between
> > other applications running on the same CPU. But in some cases this is
> > not what you want, so as the also commit mentions, the admin can now
> > more easily tune process scheduling parameters if needed, to adjust for
> > such use-cases (it was not really an admin choice before).
>
> Can't the ksoftirq patch be modified to only apply to the networking
> IRQ handling? That sounds less risky of affecting unrelated subsystems[1].

That might work. Or more generally, allow drivers to specify which
softirq sources should be deferred to ksoftirqd and which should not.

Alan Stern

> [1] Actually, DVB drivers can also implement networking for satellite
> based Internet, but, in this case, the top half is implemented inside
> the DVB core, as the IP traffic should be filtered out of an MPEG-TS
> stream. Not sure if the UDP DDoS attack you're mentioning would affect
> DVB net, but I guess not. AFAIKT, there aren't many users using DVB net
> nowadays. I don't have any easy way to test DVB net here.
>
> Thanks,
> Mauro


2018-01-08 16:26:50

by Josef Griebichler

[permalink] [raw]
Subject: Aw: Re: dvb usb issues since kernel 4.9

Hi Maro,

I tried your mentioned patch but unfortunately no real improvement for me.
dmesg http://ix.io/DOg
tvheadend service log http://ix.io/DOi
Errors during recording are still there.
Errors increase if there is additional tcp load on raspberry.

Unfortunately there's no usbmon or tshark on libreelec so I can't provide further logs.

Regards,
Josef



> On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote:
>
> > > > It seems that the original patch were designed to solve some IRQ issues
> > > > with network cards with causes data losses on high traffic. However,
> > > > it is also causing bad effects on sustained high bandwidth demands
> > > > required by DVB cards, at least on some USB host drivers.
> > > >
> > > > Alan/Greg/Eric/David:
> > > >
> > > > Any ideas about how to fix it without causing regressions to
> > > > network?
> > >
> > > It would be good to know what hardware was involved on the x86 system
> > > and to have some timing data. Can we see the output from lsusb and
> > > usbmon, running on a vanilla kernel that gets plenty of video glitches?
> >
> > From Josef's report, and from the BZ, the affected hardware seems
> > to be based on Montage Technology M88DS3103/M88TS2022 chipset.
>
> What type of USB host controller does the x86_64 system use? EHCI or
> xHCI?

I'll let Josef answer this.

>
> > The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c,
> > with shares a USB implementation that is used by a lot more drivers.
> > The URB handling code is at:
> >
> > drivers/media/usb/dvb-usb-v2/usb_urb.c
> >
> > This particular driver allocates 8 buffers with 4096 bytes each
> > for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP.
> >
> > This become a popular USB hardware nowadays. I have one S960c
> > myself, so I can send you the lsusb from it. You should notice, however,
> > that a DVB-C/DVB-S2 channel can easily provide very high sustained bit
> > rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps
> > of payload after removing URB headers.
>
> You mentioned earlier that the driver uses bulk transfers. In USB-2.0,
> the maximum possible payload data transfer rate using bulk transfers is
> 53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s). And
> even this is possible only if almost nothing else is using the bus at
> the same time.

No, I said 58 Mbits/s (not bytes).

On DVB-C and DVB-S2 specs, AFAIKT, there's no hard limit for the maximum
payload data rate, although industry seems to limit it to be around
60 Mbits/s. On those standards, the maximal bit rate is defined by the
modulation type and by the channel symbol rate.

To give you a practical example, my DVB-S2 provider modulates each
transponder with 8/PSK (3 bits/symbol), and define channels with a
symbol rate of 30 Mbauds/s. So, it could, theoretically, transport
a MPEG-TS stream up to 90 Mbits/s (minus headers and guard intervals).
In practice, the streams there are transmitted with 58,026.5 Kbits/s.

> > A 10 minutes record with the
> > entire data (with typically contains 5-10 channels) can easily go
> > above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if
> > a usbmon dump would be useful.
>
> It might not be helpful at all. However, I'm not interested in the
> payload data (which would be unintelligible to me anyway) but rather
> the timing of URB submissions and completions. A usbmon trace which
> didn't keep much of the payload data would only require on the order of
> 50 MB per minute -- and Josef said that glitches usually would show up
> within a minute or so.

Yeah, this could help.

Josef,

You can get it with wireshark/tshark or tcpdump. See:
https://technolinchpin.wordpress.com/2015/10/23/usb-bus-sniffers-for-linux-system/
https://wiki.wireshark.org/CaptureSetup/USB

> > I'm enclosing the lsusb from a S960C device, with is based on those
> > Montage chipsets:
>
> What I wanted to see was the output from "lsusb" on the affected
> system, not the output from "lsusb -v -s B:D" on your system.
>
> > > Overall, this may be a very difficult problem to solve. The
> > > 4cd13c21b207 commit was intended to improve throughput at the cost of
> > > increased latency. But then what do you do when the latency becomes
> > > too high for the video subsystem to handle?
> >
> > Latency can't be too high, otherwise frames will be dropped.
>
> Yes, that's the whole point.
>
> > Even if the Kernel itself doesn't drop, if the delay goes higher
> > than a certain threshold, userspace will need to drop, as it
> > should be presenting audio and video on real time. Yet, typically,
> > userspace will delay it by one or two seconds, with would mean
> > 1500-3500 buffers, with I suspect it is a lot more than the hardware
> > limits. So I suspect that the hardware starves free buffers a way
> > before userspace, as media hardware don't have unlimited buffers
> > inside them, as they assume that the Kernel/userspace will be fast
> > enough to sustain bit rates up to 66 Mbps of payload.
>
> The timing information would tell us how large the latency is.
>
> In any case, you might be able to attack the problem simply by using
> more than 8 buffers. With just eight 4096-byte buffers, the total
> pipeline capacity is only about 0.62 ms (at the maximum possible
> transfer rate). Increasing the number of buffers to 65 would give a
> capacity of 5 ms, which is probably a lot better suited for situations
> where completions are handled by the ksoftirqd thread.

Increasing it to 65 shouldn't be hard. Not sure, however, if the hardware
will actually fill the 65 buffers, but it is worth to try.

> > Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP,
> > in order to revert the kernel logic to prioritize latency instead of
> > throughput.
>
> It can't be done without pervasive changes to the USB subsystem, which
> I would greatly prefer to avoid. Besides, this wouldn't really solve
> the problem. Decreasing the latency for one device will cause it to be
> increased for others.

If there is a TV streaming traffic at a USB bus, it means that the
user wants to either watch and/or record a TV program. On such
usecase scenario, a low latency is highly desired for the TV capture
(and display, if the GPU is USB), even it means a higher latency for
other traffic.

Josef,

Could you please try the following patch on Kernel 4.14.10 (without
reverting any changesets), and see if it fixes the issue?


media: dvbsky: Increase the number of buffers

Right now, This driver expects a 0.62 ms delay with 8 buffers on an USB 2.0
high speed bus. Increase it to 65 buffers, in order to give more time for
the top half of the USB transfer handler to complete its task.

Suggested-by: Alan Stern <[email protected]>
Signed-off-by: Mauro Carvalho Chehab <[email protected]>

diff --git a/drivers/media/usb/dvb-usb-v2/dvbsky.c b/drivers/media/usb/dvb-usb-v2/dvbsky.c
index 131b6c08e199..d3f5ffc54b25 100644
--- a/drivers/media/usb/dvb-usb-v2/dvbsky.c
+++ b/drivers/media/usb/dvb-usb-v2/dvbsky.c
@@ -740,7 +740,7 @@ static struct dvb_usb_device_properties dvbsky_s960_props = {
.num_adapters = 1,
.adapter = {
{
- .stream = DVB_USB_STREAM_BULK(0x82, 8, 4096),
+ .stream = DVB_USB_STREAM_BULK(0x82, 65, 4096),
}
}
};


>



Thanks,
Mauro

2018-01-08 16:31:14

by Alan Stern

[permalink] [raw]
Subject: Re: Aw: Re: dvb usb issues since kernel 4.9

On Mon, 8 Jan 2018, Josef Griebichler wrote:

> Hi Maro,
>
> I tried your mentioned patch but unfortunately no real improvement for me.
> dmesg http://ix.io/DOg
> tvheadend service log http://ix.io/DOi
> Errors during recording are still there.
> Errors increase if there is additional tcp load on raspberry.
>
> Unfortunately there's no usbmon or tshark on libreelec so I can't provide further logs.

Can you try running the same test on an x86_64 system?

Alan Stern

2018-01-08 17:15:42

by Josef Griebichler

[permalink] [raw]
Subject: Aw: Re: Re: dvb usb issues since kernel 4.9

No I can't sorry. There's no sat connection near to my workstation.
 
 

Gesendet: Montag, 08. Januar 2018 um 17:31 Uhr
Von: "Alan Stern" <[email protected]>
An: "Josef Griebichler" <[email protected]>
Cc: "Mauro Carvalho Chehab" <[email protected]>, "Greg Kroah-Hartman" <[email protected]>, [email protected], "Eric Dumazet" <[email protected]>, "Rik van Riel" <[email protected]>, "Paolo Abeni" <[email protected]>, "Hannes Frederic Sowa" <[email protected]>, "Jesper Dangaard Brouer" <[email protected]>, linux-kernel <[email protected]>, netdev <[email protected]>, "Jonathan Corbet" <[email protected]>, LMML <[email protected]>, "Peter Zijlstra" <[email protected]>, "David Miller" <[email protected]>, [email protected]
Betreff: Re: Aw: Re: dvb usb issues since kernel 4.9
On Mon, 8 Jan 2018, Josef Griebichler wrote: > Hi Maro, > > I tried your mentioned patch but unfortunately no real improvement for me. > dmesg http://ix.io/DOg > tvheadend service log http://ix.io/DOi[http://ix.io/DOi] > Errors during recording are still there. > Errors increase if there is additional tcp load on raspberry. > > Unfortunately there's no usbmon or tshark on libreelec so I can't provide further logs. Can you try running the same test on an x86_64 system? Alan Stern

2018-01-08 17:35:12

by Alan Stern

[permalink] [raw]
Subject: Re: Aw: Re: Re: dvb usb issues since kernel 4.9

On Mon, 8 Jan 2018, Josef Griebichler wrote:

> No I can't sorry. There's no sat connection near to my workstation.

Can we ask the person who made this post:

https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965

to run the test? The post says that the testing was done on an x86_64
machine.

> Gesendet: Montag, 08. Januar 2018 um 17:31 Uhr
> Von: "Alan Stern" <[email protected]>
> An: "Josef Griebichler" <[email protected]>
> Cc: "Mauro Carvalho Chehab" <[email protected]>, "Greg Kroah-Hartman" <[email protected]>, [email protected], "Eric Dumazet" <[email protected]>, "Rik van Riel" <[email protected]>, "Paolo Abeni" <[email protected]>, "Hannes Frederic Sowa" <[email protected]>, "Jesper Dangaard Brouer" <[email protected]>, linux-kernel <[email protected]>, netdev <[email protected]>, "Jonathan Corbet" <[email protected]>, LMML <[email protected]>, "Peter Zijlstra" <[email protected]>, "David Miller" <[email protected]>, [email protected]
> Betreff: Re: Aw: Re: dvb usb issues since kernel 4.9
> On Mon, 8 Jan 2018, Josef Griebichler wrote: > Hi Maro, > > I tried your mentioned patch but unfortunately no real improvement for me. > dmesg http://ix.io/DOg > tvheadend service log http://ix.io/DOi[http://ix.io/DOi] > Errors during recording are still there. > Errors increase if there is additional tcp load on raspberry. > > Unfortunately there's no usbmon or tshark on libreelec so I can't provide further logs. Can you try running the same test on an x86_64 system? Alan Stern

It appears that you are using a non-standard kernel. The vanilla
kernel does not include any "dwc_otg_hcd" driver.

Alan Stern

2018-01-08 17:55:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9


* Linus Torvalds <[email protected]> wrote:

> On Sat, Jan 6, 2018 at 11:54 AM, Mauro Carvalho Chehab
> <[email protected]> wrote:
> >
> > Em Sat, 6 Jan 2018 16:04:16 +0100
> > "Josef Griebichler" <[email protected]> escreveu:
> >>
> >> the causing commit has been identified.
> >> After reverting commit https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
> >> its working again.
> >
> > Just replying to me won't magically fix this. The ones that were involved on
> > this patch should also be c/c, plus USB people. Just added them.
>
> Actually, you seem to have added an odd subset of the people involved.
>
> For example, Ingo - who actually committed that patch - wasn't on the cc.
>
> I do think we need to simply revert that patch. It's very simple: it
> has been reported to lead to actual problems for people, and we don't
> fix one problem and then say "well, it fixed something else" when
> something breaks.
>
> When something breaks, we either unbreak it, or we revert the change
> that caused the breakage.
>
> It's really that simple. That's what "no regressions" means. We don't
> accept changes that cause regressions. This one did.

Yeah, absolutely - for the revert:

Acked-by: Ingo Molnar <[email protected]>

as I doubt we have enough time to root-case this properly.

Thanks,

Ingo

2018-01-08 18:33:03

by Linus Torvalds

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Mon, Jan 8, 2018 at 9:55 AM, Ingo Molnar <[email protected]> wrote:
>
> as I doubt we have enough time to root-case this properly.

Well, it's not like this is a new issue, and we don't have to get it
fixed for 4.15. It's been around since 4.9, it's not a "have to
suddenly fix it this week" issue.

I just think that people should plan on having to maybe revert it and
mark the revert for stable.

But if the USB or DVB layers can instead just make the packet queue a
bit deeper and not react so badly to the latency of a single softirq,
that would obviously be a good thing in general, and maybe fix this
issue. So I'm not saying that the revert is inevitable either.

But I have to say that that commit 4cd13c21b207 ("softirq: Let
ksoftirqd do its job") was a pretty damn big hammer, and entirely
ignored the "softirqs can have latency concerns" issue.

So I do feel like the UDP packet storm thing might want a somewhat
more directed fix than that huge hammer of trying to move softirqs
aggressively into the softirq thread.

This is not that different from threaded irqs. And while you can set
the "thread every irq" flag, that would be largely insane to do by
default and in general. So instead, people do it either for specific
irqs (ie "request_threaded_irq()") or they have a way to opt out of it
(IRQF_NO_THREAD).

I _suspect_ that the softirq thing really just wants the same thing.
Have the networking case maybe set the "prefer threaded" flag just for
networking, if it's less latency-sensitive for softirq handling than

In fact, even for networking, there are separate TX/RX softirqs, maybe
networking would only set it for the RX case? Or maybe even trigger it
only for cases where things queue up and it goes into a "polling mode"
(like NAPI already does).

Of course, I don't even know _which_ softirq it is that the DVB case
has issues with. Maybe it's the same NET_RX case?

But looking at that offending commit, I do note (for example), that we
literally have things like tasklet[_hi]_schedule() that might have
been explicitly expected to just run the tasklet at a fairly low
latency (maybe instead of a workqueue exactly because it doesn't need
to sleep and wants lower latency).

So saying "just because softirqd is possibly already woken up, let's
delay all those tasklets etc" does really seem very wrong to me.

Can somebody tell which softirq it is that dvb/usb cares about?

Linus

2018-01-08 19:19:54

by Alan Stern

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Mon, 8 Jan 2018, Linus Torvalds wrote:

> Can somebody tell which softirq it is that dvb/usb cares about?

I don't know about the DVB part. The USB part is a little difficult to
analyze, mostly because the bug reports I've seen are mostly from
people running non-vanilla kernels. For example, Josef is using a
Raspberry Pi 3B with a non-standard USB host controller driver:
dwc_otg_hcd is built into raspbian in place of the normal dwc2_hsotg
driver.

Both dwc2_hsotg and ehci-hcd use the tasklets embedded in the
giveback_urb_bh member of struct usb_hcd. See usb_hcd_giveback_urb()
in drivers/usb/core/hcd.c; the calls are

else if (high_prio_bh)
tasklet_hi_schedule(&bh->bh);
else
tasklet_schedule(&bh->bh);

As it turns out, high_prio_bh gets set for interrupt and isochronous
URBs but not for bulk and control URBs. The DVB driver in question
uses bulk transfers.

xhci-hcd, on the other hand, does not use these tasklets (it doesn't
set the HCD_BH bit in the hc_driver's .flags member).

Alan Stern

2018-01-08 19:51:08

by Linus Torvalds

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Mon, Jan 8, 2018 at 11:15 AM, Alan Stern <[email protected]> wrote:
>
> Both dwc2_hsotg and ehci-hcd use the tasklets embedded in the
> giveback_urb_bh member of struct usb_hcd. See usb_hcd_giveback_urb()
> in drivers/usb/core/hcd.c; the calls are
>
> else if (high_prio_bh)
> tasklet_hi_schedule(&bh->bh);
> else
> tasklet_schedule(&bh->bh);
>
> As it turns out, high_prio_bh gets set for interrupt and isochronous
> URBs but not for bulk and control URBs. The DVB driver in question
> uses bulk transfers.

Ok, so we could try out something like the appended?

NOTE! I have not tested this at all. It LooksObvious(tm), but...

Linus


Attachments:
patch.diff (1.31 kB)

2018-01-08 20:41:11

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: Re: dvb usb issues since kernel 4.9



On Mon, 8 Jan 2018 12:35:08 -0500 (EST) Alan Stern <[email protected]> wrote:

> On Mon, 8 Jan 2018, Josef Griebichler wrote:
>
> > No I can't sorry. There's no sat connection near to my workstation.
>
> Can we ask the person who made this post:
> https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965
>
> to run the test? The post says that the testing was done on an x86_64
> machine.

For >5 years ago I used to play a lot with IPTV multicast MPEG2-TS
streams (I implemented the wireshark mp2ts drop detecting, and a
out-of-tree netfilter kernel module to detect drops[1]). The web-site
is dead, but archive.org have a copy[2].

Let me quote my own Lab-setup documentation[3].

You don't need a live IPTV MPEG2TS signal, you can simply generate your
own using VLC:

$ vlc ~/Videos/test_video.mkv -I rc --sout '#duplicate{dst=std{access=udp,mux=ts,dst=239.254.1.1:5500}}'

Viewing your own signal: You can view your own generated signal, again,
by using VLC.

$ vlc udp/ts://@239.254.1.1:5500

I hope the vlc syntax is still valid. And remember to join the
multicast channels, if you don't have an application requesting the
stream, as desc in [4].


[1] https://github.com/netoptimizer/IPTV-Analyzer
[2] http://web.archive.org/web/20150328200122/http://www.iptv-analyzer.org:80/wiki/index.php/Main_Page
[3] http://web.archive.org/web/20150329095538/http://www.iptv-analyzer.org:80/wiki/index.php/Lab_Setup
[4] http://web.archive.org/web/20150328234459/http://www.iptv-analyzer.org:80/wiki/index.php/Multicast_Signal_on_Linux
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2018-01-08 21:31:52

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9


On Mon, 8 Jan 2018 17:26:10 +0100
"Josef Griebichler" <[email protected]> wrote:

> I tried your mentioned patch but unfortunately no real improvement for me.
> dmesg http://ix.io/DOg
> tvheadend service log http://ix.io/DOi
>
> Errors during recording are still there.

Are you _also_ recording the stream on the Raspberry Pi?

It seems to me, that you are expecting too much from this small device.

> Errors increase if there is additional tcp load on raspberry.

I did expected the issue to get worse, when you load the Pi with
network traffic, as now the softirq time-budget have to be shared
between networking and USB/DVB. Thus, I guess you are running TCP and
USB/mpeg2ts on the same CPU (why when you have 4 CPUs?...)

If you expect/want to get stable performance out of such a small box,
then you (or LibreELEC) need to tune the box for this usage. And it
does not have to be that complicated. First step is to move IRQ
handling for the NIC to another CPU and than the USB port handling the
DVB signal (/proc/irq/*/smp_affinity_list). And then pin the
userspace process (taskset) to another CPU than the one handling
USB-softirq.

> Unfortunately there's no usbmon or tshark on libreelec so I can't
> provide further logs.

Do you have perf or trace-cmd on the box? Maybe we could come up with
some kernel functions to trace, to measure/show the latency spikes?

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2018-01-08 21:44:44

by Peter Zijlstra

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Mon, Jan 08, 2018 at 10:31:09PM +0100, Jesper Dangaard Brouer wrote:
> I did expected the issue to get worse, when you load the Pi with
> network traffic, as now the softirq time-budget have to be shared
> between networking and USB/DVB. Thus, I guess you are running TCP and
> USB/mpeg2ts on the same CPU (why when you have 4 CPUs?...)

Isn't networking also over USB on the Pi ?

2018-01-08 22:17:16

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Mon, 8 Jan 2018 22:44:27 +0100
Peter Zijlstra <[email protected]> wrote:

> On Mon, Jan 08, 2018 at 10:31:09PM +0100, Jesper Dangaard Brouer wrote:
> > I did expected the issue to get worse, when you load the Pi with
> > network traffic, as now the softirq time-budget have to be shared
> > between networking and USB/DVB. Thus, I guess you are running TCP and
> > USB/mpeg2ts on the same CPU (why when you have 4 CPUs?...)
>
> Isn't networking also over USB on the Pi ?

Darn, that is true. Looking at the dmesg output in http://ix.io/DOg:

[ 0.405942] usbcore: registered new interface driver smsc95xx
[ 5.821104] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1

I don't know enough about USB... is it possible to control which CPU
handles the individual USB ports, or on some other level (than ports)?

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2018-01-09 16:51:58

by Josef Griebichler

[permalink] [raw]
Subject: Aw: Re: dvb usb issues since kernel 4.9

Hi Linus,

your patch works very good for me and others (please see https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=77006#post77006). No errors in recordings any more.
The patch was also tested on x86_64 (Revo 3700) with positive effect.
I agree with the forum poster, that there's still an issue when recording and watching livetv at same time. I also get audio dropouts and audio is out of sync.
According to user smp kernel 4.9.73 with your patch on rpi and according to user jahutchi kernel 4.11.12 on x86_64 have no such issues.
I don't know if this dropouts are related to this topic.

If of any help I could provide perf output on raspberry with libreelec and tvheadend.

Regards,
Josef 
 

Gesendet: Montag, 08. Januar 2018 um 23:16 Uhr
Von: "Jesper Dangaard Brouer" <[email protected]>
An: "Peter Zijlstra" <[email protected]>
Cc: "Josef Griebichler" <[email protected]>, "Mauro Carvalho Chehab" <[email protected]>, "Alan Stern" <[email protected]>, "Greg Kroah-Hartman" <[email protected]>, [email protected], "Eric Dumazet" <[email protected]>, "Rik van Riel" <[email protected]>, "Paolo Abeni" <[email protected]>, "Hannes Frederic Sowa" <[email protected]>, linux-kernel <[email protected]>, netdev <[email protected]>, "Jonathan Corbet" <[email protected]>, LMML <[email protected]>, "David Miller" <[email protected]>, [email protected]
Betreff: Re: dvb usb issues since kernel 4.9
On Mon, 8 Jan 2018 22:44:27 +0100
Peter Zijlstra <[email protected]> wrote:

> On Mon, Jan 08, 2018 at 10:31:09PM +0100, Jesper Dangaard Brouer wrote:
> > I did expected the issue to get worse, when you load the Pi with
> > network traffic, as now the softirq time-budget have to be shared
> > between networking and USB/DVB. Thus, I guess you are running TCP and
> > USB/mpeg2ts on the same CPU (why when you have 4 CPUs?...)
>
> Isn't networking also over USB on the Pi ?

Darn, that is true. Looking at the dmesg output in http://ix.io/DOg:

[ 0.405942] usbcore: registered new interface driver smsc95xx
[ 5.821104] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0x45E1

I don't know enough about USB... is it possible to control which CPU
handles the individual USB ports, or on some other level (than ports)?

--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer[http://www.linkedin.com/in/brouer]

2018-01-09 17:27:48

by Eric Dumazet

[permalink] [raw]
Subject: Re: Re: dvb usb issues since kernel 4.9

On Tue, Jan 9, 2018 at 8:51 AM, Josef Griebichler
<[email protected]> wrote:
> Hi Linus,
>
> your patch works very good for me and others (please see https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=77006#post77006). No errors in recordings any more.
> The patch was also tested on x86_64 (Revo 3700) with positive effect.
> I agree with the forum poster, that there's still an issue when recording and watching livetv at same time. I also get audio dropouts and audio is out of sync.
> According to user smp kernel 4.9.73 with your patch on rpi and according to user jahutchi kernel 4.11.12 on x86_64 have no such issues.
> I don't know if this dropouts are related to this topic.
>
> If of any help I could provide perf output on raspberry with libreelec and tvheadend.
>

Sorry to come late to the party.

It seems problem comes from some piece of hardware/driver having some
precise timing prereq, and opportunistic use of softirq/tasklet
(instead maybe of hard irq handlers )

While it is true that softirq might do the job in most cases, we
already have cases where this can be easily defeated,
say if one cpu has suddenly to handle multiple sources of interrupts
for various devices.
NET_RX can easily lock the cpu for 10ms (on HZ=100 builds)

So yes, commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") has
shown up multiple times in various 'regressions'
simply because it could surface the problem more often.
But even if you revert it, you can still make the faulty
driver/subsystem misbehave by adding more stress to the cpu handling
the IRQ.

Note that networking lacks fine control of its softirq processing.
Some people found/complained that relying more on ksoftirqd was
potentially adding tail latencies.

Maybe the answer is to tune the kernel for small latencies at the
price of small throughput (situation before the patch)

1) Revert the patch
2) get rid of ksoftirqd since it adds unexpected latencies.
3) Let applications that expect to have high throughput make sure to
pin their threads on cpus that are not processing IRQ.
(And make sure to not use irqbalance, and setup IRQ cpu affinities)

2018-01-09 17:42:46

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Em Mon, 8 Jan 2018 11:51:04 -0800
Linus Torvalds <[email protected]> escreveu:

> On Mon, Jan 8, 2018 at 11:15 AM, Alan Stern <[email protected]> wrote:
> >
> > Both dwc2_hsotg and ehci-hcd use the tasklets embedded in the
> > giveback_urb_bh member of struct usb_hcd. See usb_hcd_giveback_urb()
> > in drivers/usb/core/hcd.c; the calls are
> >
> > else if (high_prio_bh)
> > tasklet_hi_schedule(&bh->bh);
> > else
> > tasklet_schedule(&bh->bh);
> >
> > As it turns out, high_prio_bh gets set for interrupt and isochronous
> > URBs but not for bulk and control URBs. The DVB driver in question
> > uses bulk transfers.
>
> Ok, so we could try out something like the appended?
>
> NOTE! I have not tested this at all. It LooksObvious(tm), but...
>
> Linus



> kernel/softirq.c | 12 ++++++++----
> 1 file changed, 8 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 2f5e87f1bae2..97b080956fea 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -79,12 +79,16 @@ static void wakeup_softirqd(void)
>
> /*
> * If ksoftirqd is scheduled, we do not want to process pending softirqs
> - * right now. Let ksoftirqd handle this at its own rate, to get fairness.
> + * right now. Let ksoftirqd handle this at its own rate, to get fairness,
> + * unless we're doing some of the synchronous softirqs.
> */
> -static bool ksoftirqd_running(void)
> +#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
> +static bool ksoftirqd_running(unsigned long pending)
> {
> struct task_struct *tsk = __this_cpu_read(ksoftirqd);
>
> + if (pending & SOFTIRQ_NOW_MASK)
> + return false;
> return tsk && (tsk->state == TASK_RUNNING);
> }
>
> @@ -325,7 +329,7 @@ asmlinkage __visible void do_softirq(void)
>
> pending = local_softirq_pending();
>
> - if (pending && !ksoftirqd_running())
> + if (pending && !ksoftirqd_running(pending))
> do_softirq_own_stack();
>
> local_irq_restore(flags);
> @@ -352,7 +356,7 @@ void irq_enter(void)
>
> static inline void invoke_softirq(void)
> {
> - if (ksoftirqd_running())
> + if (ksoftirqd_running(local_softirq_pending()))
> return;
>
> if (!force_irqthreads) {


Hi Linus,

Patch makes sense to me, although I was not able to test it myself.

I set a RPi3 machine here with vanilla Kernel 4.14.11 running a standard
raspbian distribution (with elevator=deadline). Right now, I'm trying to
reproduce the bug with dvbv5-zap. I may eventually do more tests on
some other slow machines.

Usually, applications like tvheadend records just one channel. So, instead
of a ~58 Mbits/s payload, it uses, typically, ~11 Mbits/s for a HD channel.
This is usually filtered by hardware. Here, I'm forcing to record the
entire TS, in order to make easier to reproduce the issue. So, I'm forcing
a condition that it is usually worse than real usecases (at last for HD - I
I don't have any DVB stream here with a 4K channel).

>From what I checked so far, with vanila upstream Kernel on RPi3, just
receiving a DVB stream - or receiving it and writing to /dev/null works
with or without your patch.

The problem starts to happen when there are concurrency with writes.

On my preliminar tests, writing to a file on an ext4 partition at a
USB stick loses data up to the point to make it useless (1/4 of the data
is lost!). However, writing to a class 10 microSD card is doable.

If you're curious enough, this is what I'm doing (that are the results
while using class 10 microSD card):

$ FILE=/tmp/out.ts; for i in $(seq 1 6); do echo "step $i"; rm $FILE 2>/dev/null; dvbv5-zap -l universal -c ~/vivo-channels.conf NBR -o $FILE -P -t60 2>&1|grep -E "(buffer|received)"; du $FILE 2>/dev/null; done
step 1
Setting buffer length to 7250000
buffer overrun
buffer overrun
buffer overrun
buffer overrun
buffer overrun
buffer overrun
buffer overrun
received 347504652 bytes (5656 Kbytes/sec)
339368 /tmp/out.ts
step 2
Setting buffer length to 7250000
buffer overrun
received 408995880 bytes (6656 Kbytes/sec)
399416 /tmp/out.ts
step 3
Setting buffer length to 7250000
received 412999716 bytes (6722 Kbytes/sec)
403328 /tmp/out.ts
step 4
Setting buffer length to 7250000
buffer overrun
received 415564788 bytes (6763 Kbytes/sec)
405832 /tmp/out.ts
step 5
Setting buffer length to 7250000
received 412999716 bytes (6722 Kbytes/sec)
403324 /tmp/out.ts
step 6
Setting buffer length to 7250000
received 408366080 bytes (6646 Kbytes/sec)
398796 /tmp/out.ts

My plan is to do more tests along this week, and try to tweak a little
bit both userspace and kernelspace, in order to see if I can get better
results.

Thanks,
Mauro

2018-01-09 17:48:52

by Linus Torvalds

[permalink] [raw]
Subject: Re: Re: dvb usb issues since kernel 4.9

On Tue, Jan 9, 2018 at 9:27 AM, Eric Dumazet <[email protected]> wrote:
>
> So yes, commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") has
> shown up multiple times in various 'regressions'
> simply because it could surface the problem more often.
> But even if you revert it, you can still make the faulty
> driver/subsystem misbehave by adding more stress to the cpu handling
> the IRQ.

..but that's always true. People sometimes live on the edge - often by
design (ie hardware has been designed/selected to be the crappiest
possible that still work).

That doesn't change anything. A patch that takes "bad things can
happen" to "bad things DO happen" is a bad patch.

> Maybe the answer is to tune the kernel for small latencies at the
> price of small throughput (situation before the patch)

Generally we always want to tune for latency. Throughput is "easy",
but almost never interesting.

Sure, people do batch jobs. And yes, people often _benchmark_
throughput, because it's easy to benchmark. It's much harder to
benchmark latency, even when it's often much more important.

A prime example is the SSD benchmarks in the last few years - they
improved _dramatically_ when people noticed that the real problem was
latency, not the idiotic maximum big-block bandwidth numbers that have
almost zero impact on most people.

Put another way: we already have a very strong implicit bias towards
bandwidth just because it's easier to see and measure.

That means that we generally should strive to have a explicit bias
towards optimizing for latency when that choice comes up. Just to
balance things out (and just to not take the easy way out: bandwidth
can often be improved by adding more layers of buffering and bigger
buffers, and that often ends up really hurting latency).

> 1) Revert the patch

Well, we can revert it only partially - limiting it to just networking
for example.

Just saying "act the way you used to for tasklets" already seems to
have fixed the issue in DVB.

> 2) get rid of ksoftirqd since it adds unexpected latencies.

We can't get rid of it entirely, since the synchronous softirq code
can cause problems too. It's why we have that "maximum of ten
synchronous events" in __do_softirq().

And we don't *want* to get rid of it.

We've _always_ had that small-scale "at some point we can't do it
synchronously any more".

That is a small-scale "don't have horrible latency for _other_ things"
protection. So it's about latency too, it's just about protecting
latency of the rest of the system.

The problem with commit 4cd13c21b207 is that it turns the small-scale
latency issues in softirq handling (they get larger latencies for lots
of hardware interrupts or even from non-preemptible kernel code) into
the _huge_ scale latency of scheduling, and does so in a racy way too.

> 3) Let applications that expect to have high throughput make sure to
> pin their threads on cpus that are not processing IRQ.
> (And make sure to not use irqbalance, and setup IRQ cpu affinities)

The only people that really deal in "thoughput only" tend to be the
HPC people, and they already do things like this.

(The other end of the spectrum is the realtime people that have
extreme latency requirements, who do things like that for the reverse
reason: keeping one or more CPU's reserved for the particular
low-latency realtime job).

Linus

2018-01-09 17:55:22

by Linus Torvalds

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Tue, Jan 9, 2018 at 9:42 AM, Mauro Carvalho Chehab
<[email protected]> wrote:
>
> On my preliminar tests, writing to a file on an ext4 partition at a
> USB stick loses data up to the point to make it useless (1/4 of the data
> is lost!). However, writing to a class 10 microSD card is doable.

Note that most USB sticks are horrible crap. They can have write
latencies counted in _seconds_.

You can cause VM issues and various other non-hardware stalls with
them, simply because something gets stuck waiting for a page writeout
that should take a few ms on any reasonable hardware, but ends up
talking half a second or more.

For example, even really well-written software that tries to do things
like threaded write-behind to smooth out the IO will be _totally_
screwed by the USB stick behavior (where you might write a few MB at
high speeds, and then the next write - however small - takes a second
because the stupid USB stick does a synchronous garbage collection.
Suddenly all that clever software that tried to keep things moving
along smoothly without any hiccups, and tried hard to make the USB bus
have a nice constant loadm can't do anything at all about the crap
hardware.

So when testing writes to USB sticks, I'm not convinced you're
actually testing any USB bus limitations or even really any other
hardware limitations than the USB stick itself.

Linus

2018-01-09 17:57:30

by Eric Dumazet

[permalink] [raw]
Subject: Re: Re: dvb usb issues since kernel 4.9

On Tue, Jan 9, 2018 at 9:48 AM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Jan 9, 2018 at 9:27 AM, Eric Dumazet <[email protected]> wrote:
>>
>> So yes, commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") has
>> shown up multiple times in various 'regressions'
>> simply because it could surface the problem more often.
>> But even if you revert it, you can still make the faulty
>> driver/subsystem misbehave by adding more stress to the cpu handling
>> the IRQ.
>
> ..but that's always true. People sometimes live on the edge - often by
> design (ie hardware has been designed/selected to be the crappiest
> possible that still work).
>
> That doesn't change anything. A patch that takes "bad things can
> happen" to "bad things DO happen" is a bad patch.

I was expecting that people could get a chance to fix the root cause,
instead of trying to keep status quo.

Strangely, it took 18 months for someone to complain enough and
'bisect to this commit'

Your patch considers TASKLET_SOFTIRQ being a candidate for 'immediate
handling', but TCP Small queues heavily use TASKLET,
so as far as I am concerned a revert would have the same effect.

2018-01-09 18:58:37

by Linus Torvalds

[permalink] [raw]
Subject: Re: Re: dvb usb issues since kernel 4.9

On Tue, Jan 9, 2018 at 9:57 AM, Eric Dumazet <[email protected]> wrote:
>
> Your patch considers TASKLET_SOFTIRQ being a candidate for 'immediate
> handling', but TCP Small queues heavily use TASKLET,
> so as far as I am concerned a revert would have the same effect.

Does it actually?

TCP ends up dropping packets outside of the window etc, so flooding a
machine with TCP packets and causing some further processing up the
stack sounds very different from the basic packet flooding thing that
happens with NET_RX_SOFTIRQ.

Also, honestly, the kinds of people who really worry about flooding
tend to have packet filtering in the receive path etc.

So I really think "you can use up 90% of CPU time with a UDP packet
flood from the same network" is very very very different - and
honestly not at all as important - as "you want to be able to use a
USB DVB receiver and watch/record TV".

Because that whole "UDP packet flood from the same network" really is
something you _fundamentally_ have other mitigations for.

I bet that whole commit was introduced because of a benchmark test,
rather than real life. No?

In contrast, now people are complaining about real loads not working.

Linus

2018-01-09 21:26:24

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9


On Tue, 9 Jan 2018 15:42:35 -0200 Mauro Carvalho Chehab <[email protected]> wrote:
> Em Mon, 8 Jan 2018 11:51:04 -0800 Linus Torvalds <[email protected]> escreveu:
>
[...]
> Patch makes sense to me, although I was not able to test it myself.

The patch also make sense to me. I've done some basic testing with it
on my high-end Broadwell system (that I use for 100Gbit/s testing). As
expected the network overload case still works, as NET_RX_SOFTIRQ is
not matched.

> I set a RPi3 machine here with vanilla Kernel 4.14.11 running a
> standard raspbian distribution (with elevator=deadline).

I found a Raspberry Pi Model B+ (I think, BCM2835), that I loaded the
LibreELEC distro on. One of the guys even created an image for me with
a specific kernel[1] (that I just upgraded the system with).

[1] https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=77031#post77031

> My plan is to do more tests along this week, and try to tweak a little
> bit both userspace and kernelspace, in order to see if I can get
> better results.

I've previously experienced that you can be affected by the scheduler
granularity, which is adjustable (with CONFIG_SCHED_DEBUG=y):

$ grep -H . /proc/sys/kernel/sched_*_granularity_ns
/proc/sys/kernel/sched_min_granularity_ns:2250000
/proc/sys/kernel/sched_wakeup_granularity_ns:3000000

The above numbers were confirmed on the RPi2 (see[2]). With commit
4cd13c21b207 ("softirq: Let ksoftirqd do its job"), I expect/assume that
softirq processing latency is bounded by the sched_wakeup_granularity_ns,
which with 3 ms is not good enough for their use-case.

Thus, if you manage to reproduce the case, try to see if adjusting this
can mitigate the issue...


Their system have non-preempt kernel, should they use PREEMPT?

LibreELEC:~ # uname -a
Linux LibreELEC 4.14.10 #1 SMP Tue Jan 9 17:35:03 GMT 2018 armv7l GNU/Linux

[2] https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=76999#post76999
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2018-01-09 21:48:54

by Eric Dumazet

[permalink] [raw]
Subject: Re: Re: dvb usb issues since kernel 4.9

On Tue, Jan 9, 2018 at 10:58 AM, Linus Torvalds
<[email protected]> wrote:
> On Tue, Jan 9, 2018 at 9:57 AM, Eric Dumazet <[email protected]> wrote:
>>
>> Your patch considers TASKLET_SOFTIRQ being a candidate for 'immediate
>> handling', but TCP Small queues heavily use TASKLET,
>> so as far as I am concerned a revert would have the same effect.
>
> Does it actually?
>
> TCP ends up dropping packets outside of the window etc, so flooding a
> machine with TCP packets and causing some further processing up the
> stack sounds very different from the basic packet flooding thing that
> happens with NET_RX_SOFTIRQ.
>
> Also, honestly, the kinds of people who really worry about flooding
> tend to have packet filtering in the receive path etc.
>
> So I really think "you can use up 90% of CPU time with a UDP packet
> flood from the same network" is very very very different - and
> honestly not at all as important - as "you want to be able to use a
> USB DVB receiver and watch/record TV".
>
> Because that whole "UDP packet flood from the same network" really is
> something you _fundamentally_ have other mitigations for.
>
> I bet that whole commit was introduced because of a benchmark test,
> rather than real life. No?
>
> In contrast, now people are complaining about real loads not working.
>
> Linus

I said that a revert was fine, maybe I was not clear.
Clearly we can not touch anything scheduler related without breaking
someone workload/assumptions on how system behaved at some point.

Your patch wont solve other workloads that might have been impacted by my patch,
so in one year (or next week), we will have to cope with another device driver
not using tasklet but still relying on immediate softirq processing.
Apparently, we have to live with softirq model forever, or switch to RT kernels.

Note that we have no mitigation for something that involve flood of
valid packets that no firewall can drop
(without dropping legitimate packets).
The 'benchmark' here is not really the trigger, only a tool validating
an idea/patch.

2018-01-10 03:03:34

by Mike Galbraith

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Tue, 2018-01-09 at 22:26 +0100, Jesper Dangaard Brouer wrote:
>
> I've previously experienced that you can be affected by the scheduler
> granularity, which is adjustable (with CONFIG_SCHED_DEBUG=y):
>
> $ grep -H . /proc/sys/kernel/sched_*_granularity_ns
> /proc/sys/kernel/sched_min_granularity_ns:2250000
> /proc/sys/kernel/sched_wakeup_granularity_ns:3000000
>
> The above numbers were confirmed on the RPi2 (see[2]). With commit
> 4cd13c21b207 ("softirq: Let ksoftirqd do its job"), I expect/assume that
> softirq processing latency is bounded by the sched_wakeup_granularity_ns,
> which with 3 ms is not good enough for their use-case.

Note of caution wrt twiddling?sched_wakeup_granularity_ns: it must
remain < sched_latency_ns/2 else you effectively disable wakeup
preemption completely, turning CFS into a tick granularity scheduler.

-Mike

2018-01-10 09:45:35

by Jesper Dangaard Brouer

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9


On Tue, 9 Jan 2018 10:58:30 -0800 Linus Torvalds <[email protected]> wrote:

> So I really think "you can use up 90% of CPU time with a UDP packet
> flood from the same network" is very very very different - and
> honestly not at all as important - as "you want to be able to use a
> USB DVB receiver and watch/record TV".
>
> Because that whole "UDP packet flood from the same network" really is
> something you _fundamentally_ have other mitigations for.
>
> I bet that whole commit was introduced because of a benchmark test,
> rather than real life. No?

I believe this have happened in real-life. In the form of DNS servers
not being able to recover after long outage, where DNS-TTL had timeout
causing legitimate traffic to overload their DNS servers. The goodput
answers/sec from their DNS servers were too low, when bringing them
online again. (Based on talk over beer at NetDevConf from a guy
claiming they ran DNS for AWS).


The commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") tries to
address a fundamental problem that the network stack have when
interacting with softirq in overload situations.
(Maybe we can come up with a better solution?)

Before this commit, when application run on same CPU as softirq, the
kernel have a bad "drop off cliff" behavior, when reaching above the
saturation point.

This is confirmed in CloudFlare blogpost[1], which used a kernel that
predates this commit. From[1] section: "A note on NUMA performance"
Quote:"
1. Run receiver on another CPU, but on the same NUMA node as the RX
queue. The performance as we saw above is around 360kpps.

2. With receiver on exactly same CPU as the RX queue we can get up to
~430kpps. But it creates high variability. The performance drops down
to zero if the NIC is overwhelmed with packets."

The behavior problem here is "performance drops down to zero if the NIC
is overwhelmed with packets". That is a bad way to handle overload.
Not only when attacked, but also when bringing a service online after
an outage.

What essentially happens is that:
1. softirq NAPI enqueue 64 packets into socket.
2. application dequeue 1 packet and invoke local_bh_enable()
3. causing softirq to run in app-timeslice, again enq 64 packets
4. app only see goodput of 1/128 of packets

That is essentially what Eric solved with his commit, avoiding (3)
local_bh_enable() to invoke softirq if ksoftirqd is already running.

Maybe we can come up with a better solution?
(as I do agree this was a too big-hammer affecting other use-cases)


[1] https://blog.cloudflare.com/how-to-receive-a-million-packets/

p.s. Regarding quote[1] point "1.", after Paolo Abeni optimized the UDP
code, that statement is no longer true. It now (significantly) faster to
run/pin your UDP application to another CPU than the RX-CPU.
--
Best regards,
Jesper Dangaard Brouer
MSc.CS, Principal Kernel Engineer at Red Hat
LinkedIn: http://www.linkedin.com/in/brouer

2018-01-12 21:14:00

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Em Tue, 9 Jan 2018 09:48:47 -0800
Linus Torvalds <[email protected]> escreveu:

> On Tue, Jan 9, 2018 at 9:27 AM, Eric Dumazet <[email protected]> wrote:
> >
> > So yes, commit 4cd13c21b207 ("softirq: Let ksoftirqd do its job") has
> > shown up multiple times in various 'regressions'
> > simply because it could surface the problem more often.
> > But even if you revert it, you can still make the faulty
> > driver/subsystem misbehave by adding more stress to the cpu handling
> > the IRQ.
>
> ..but that's always true. People sometimes live on the edge - often by
> design (ie hardware has been designed/selected to be the crappiest
> possible that still work).
>
> That doesn't change anything. A patch that takes "bad things can
> happen" to "bad things DO happen" is a bad patch.
>
> > Maybe the answer is to tune the kernel for small latencies at the
> > price of small throughput (situation before the patch)
>
> Generally we always want to tune for latency. Throughput is "easy",
> but almost never interesting.
>
> Sure, people do batch jobs. And yes, people often _benchmark_
> throughput, because it's easy to benchmark. It's much harder to
> benchmark latency, even when it's often much more important.
>
> A prime example is the SSD benchmarks in the last few years - they
> improved _dramatically_ when people noticed that the real problem was
> latency, not the idiotic maximum big-block bandwidth numbers that have
> almost zero impact on most people.
>
> Put another way: we already have a very strong implicit bias towards
> bandwidth just because it's easier to see and measure.
>
> That means that we generally should strive to have a explicit bias
> towards optimizing for latency when that choice comes up. Just to
> balance things out (and just to not take the easy way out: bandwidth
> can often be improved by adding more layers of buffering and bigger
> buffers, and that often ends up really hurting latency).
>
> > 1) Revert the patch
>
> Well, we can revert it only partially - limiting it to just networking
> for example.
>
> Just saying "act the way you used to for tasklets" already seems to
> have fixed the issue in DVB.
>
> > 2) get rid of ksoftirqd since it adds unexpected latencies.
>
> We can't get rid of it entirely, since the synchronous softirq code
> can cause problems too. It's why we have that "maximum of ten
> synchronous events" in __do_softirq().
>
> And we don't *want* to get rid of it.
>
> We've _always_ had that small-scale "at some point we can't do it
> synchronously any more".
>
> That is a small-scale "don't have horrible latency for _other_ things"
> protection. So it's about latency too, it's just about protecting
> latency of the rest of the system.
>
> The problem with commit 4cd13c21b207 is that it turns the small-scale
> latency issues in softirq handling (they get larger latencies for lots
> of hardware interrupts or even from non-preemptible kernel code) into
> the _huge_ scale latency of scheduling, and does so in a racy way too.
>
> > 3) Let applications that expect to have high throughput make sure to
> > pin their threads on cpus that are not processing IRQ.
> > (And make sure to not use irqbalance, and setup IRQ cpu affinities)
>
> The only people that really deal in "thoughput only" tend to be the
> HPC people, and they already do things like this.
>
> (The other end of the spectrum is the realtime people that have
> extreme latency requirements, who do things like that for the reverse
> reason: keeping one or more CPU's reserved for the particular
> low-latency realtime job).

Ok, it took me some time - and a faster microSD - in order to be sure that
the data loss weren't due to bad storage performance, but I have now some
test results.

In summary, indeed the ksoftirq commit 4cd13c21b207 ("softirq: Let ksoftirqd
do its job") is causing data losses. On my tests, it generate at least one
continuity error on every 1-5 minutes.

Either reverting it or applying Linus proposal of partially reverting
it fixes the issues. Increasing the number of URBs doesn't seem to
help.

I'm enclosing the dirty details below.

Linus/Eric,

Now that I have an environment setup, I can test whatever other alternative
that would fix the UDP packet flow attack while won't break the softirq
handling code.

Regards,
Mauro

---

All tests below were done on a Raspberry Pi3 with a SanDisk Extreme U3 microSD
card with 32GB and a DVBSky S960C DVB-S2 tuner with an external power supply,
connected to a TCP/IP network via Ethernet (with uses USB on RPi). It also
have a serial cable connected to it.

It was installed with LibreELEC 8.2.2, using tvheadend backend.

I'm recording one MPEG-TS service/"channel" composed of one audio and
one video stream, The total traffic collected by tvheadend was about
4 Mbits/s (audio+video+EPG tables). It is part of a 58 mbits/s MPEG
Transport stream, with 23 TV service/"channels" on it.

While handling this issue, I found one unrelated bug, fixed on this patch:
https://git.linuxtv.org/mchehab/experimental.git/commit/?h=softirq_fixup&id=afb6c749c9da6e661335bc059f2b117421c09f77

This bug has no effect on DVB streaming. It only causes the signal
strength to be reported wrongly on 32 bit Kernels. On all tests below I
had this patch applied.

Test 1
======

Kernel (e. g. Raspbian Kernel), recording and watching the video at the same
time on Kodi, plus one VLC client, on an interval of time of 5 minutes,
it had 4 MPEG continuity errors on video (and one on audio):

Jan 12 15:05:39 rpi3 tvheadend[285]: TS: DVB-S Network/12090H/TV Senado: H264 @ #1601 Continuity counter error (total 1)
Jan 12 15:06:20 rpi3 tvheadend[285]: TS: DVB-S Network/12090H/TV Senado: H264 @ #1601 Continuity counter error (total 2)
Jan 12 15:07:36 rpi3 tvheadend[285]: TS: DVB-S Network/12090H/TV Senado: H264 @ #1601 Continuity counter error (total 3)
Jan 12 15:07:36 rpi3 tvheadend[285]: TS: DVB-S Network/12090H/TV Senado: MPEG2AUDIO @ #1602 Continuity counter error (total 1)
Jan 12 15:10:28 rpi3 tvheadend[285]: TS: DVB-S Network/12090H/TV Senado: H264 @ #1601 Continuity counter error (total 4)

With upstream Kernels, Kodi stops working (it depends on Raspbian video
driver). So, I opened two VLC players, on separate machines, in order
to also have 2 clients watching, plus the record task. That increased
the network and USB traffic as well. All the next tests were on such
scenario.

Test 2
======

With Kernel 4.14.12 vanilla with just one extra patch increasing the
number of URB buffers from 8 to 16, it got 2 video errors on a 6 minutes
interval:

Jan 12 15:56:09 rpi3 tvheadend[222]: TS: DVB-S Network/12090H/TV Senado: H264 @ #1601 Continuity counter error (total 2)
Jan 12 15:56:09 rpi3 tvheadend[222]: TS: DVB-S Network/12090H/TV Senado: MPEG2AUDIO @ #1602 Continuity counter error (total 1)
Jan 12 16:03:05 rpi3 tvheadend[222]: TS: DVB-S Network/12090H/TV Senado: H264 @ #1601 Continuity counter error (total 3)
Jan 12 16:03:05 rpi3 tvheadend[222]: TS: DVB-S Network/12090H/TV Senado: MPEG2AUDIO @ #1602 Continuity counter error (total 2)


Test 3
======

With upstream Kernel 4.14.12 + 16 buffers patch + commit 4cd13c21b207 reverted,
I kept it running for about 15-30 mins. No continuity errors.

Test 4
======

With upstream Kernel 4.14.12 with the partial softirq revert made by
Linus test patch[1], running for about 20 mins, it got just one
continuity error:

Jan 12 16:51:31 rpi3 tvheadend[237]: TS: DVB-S Network/12090H/TV Senado: H264 @ #1601 Continuity counter error (total 1)
Jan 12 16:51:31 rpi3 tvheadend[237]: TS: DVB-S Network/12090H/TV Senado: MPEG2AUDIO @ #1602 Continuity counter error (total 1)

[1] https://git.linuxtv.org/mchehab/experimental.git/commit/?h=softirq_fixup&id=7996c39af87d329f64e6b1b2af120d6ce11ede29

Test 5
======

I then moved to Kernel 4.15-rc7. In this case, I had to add an extra patch,
as the USB controller is currently broken upstream:
https://git.linuxtv.org/mchehab/experimental.git/commit/?h=softirq_fixup&id=6bcc57ea8a84e9d5fed9f5ebf13d63fd28ef181c

The .config file used to build the Kernel is at:
https://pastebin.com/wpZghann


With upstream Kernel 4.15-rc7 - with Linus patch applied[2], I kept the
record + 2 VLC clients running for about one hour. It got just one
continuity error:

Jan 12 20:06:26 rpi3 tvheadend[226]: TS: DVB-S Network/12090H/TV Senado: H264 @ #1601 Continuity counter error (total 1)
Jan 12 20:06:26 rpi3 tvheadend[226]: TS: DVB-S Network/12090H/TV Senado: MPEG2AUDIO @ #1602 Continuity counter error (total 1)

[2] The test Kernel is this one:
https://git.linuxtv.org/mchehab/experimental.git/log/?h=softirq_fixup

It is hard to tell if this one continuity error is due to some Kernel issue,
or if it is simply due to some PES packet with bad CRC that got discarded,
but it seems a normal condition to me.


Thanks,
Mauro

2018-01-12 21:48:52

by Eric Dumazet

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Fri, 2018-01-12 at 19:13 -0200, Mauro Carvalho Chehab wrote:
>
>
> The .config file used to build the Kernel is at:
> https://pastebin.com/wpZghann
>

Hi Mauro

Any chance you can try CONFIG_HZ_1000=y, CONFIG_HZ=1000 ?

Thanks.

2018-01-13 10:46:35

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Em Sat, 13 Jan 2018 07:09:20 -0200
Mauro Carvalho Chehab <[email protected]> escreveu:

> Em Fri, 12 Jan 2018 13:48:46 -0800
> Eric Dumazet <[email protected]> escreveu:
>
> > On Fri, 2018-01-12 at 19:13 -0200, Mauro Carvalho Chehab wrote:
> > >
> > >
> > > The .config file used to build the Kernel is at:
> > > https://pastebin.com/wpZghann
> > >
> >
> > Hi Mauro
> >
> > Any chance you can try CONFIG_HZ_1000=y, CONFIG_HZ=1000 ?

It actually made it a lot worse! without Linus patch (or reverting the
softirq patch), on a 4 minutes of capture, it got all those errors:

Jan 13 10:41:41 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 1)
Jan 13 10:41:42 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: MPEG2AUDIO @ #1912 Continuity counter error (total 1)
Jan 13 10:42:14 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 3)
Jan 13 10:42:47 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 4)
Jan 13 10:42:58 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 5)
Jan 13 10:42:58 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: MPEG2AUDIO @ #1912 Continuity counter error (total 2)
Jan 13 10:43:34 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 9)
Jan 13 10:43:37 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: MPEG2AUDIO @ #1912 Continuity counter error (total 5)
Jan 13 10:44:00 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 12)
Jan 13 10:44:29 rpi3 tvheadend[226]: TS: DVB-S Network/12130H/NBR: H264 @ #1911 Continuity counter error (total 13)

Thanks,
Mauro

2018-01-13 09:09:37

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Em Fri, 12 Jan 2018 13:48:46 -0800
Eric Dumazet <[email protected]> escreveu:

> On Fri, 2018-01-12 at 19:13 -0200, Mauro Carvalho Chehab wrote:
> >
> >
> > The .config file used to build the Kernel is at:
> > https://pastebin.com/wpZghann
> >
>
> Hi Mauro
>
> Any chance you can try CONFIG_HZ_1000=y, CONFIG_HZ=1000 ?

I can do such test to satisfy your curiosity, but that doesn't sound the right
fix.

See, almost all TV and set top boxes(STB) run Linux nowadays and usually come
with ARM cpus designed to "just do their job" (e. g. CPUs with low clocks).
There, power consumption is a must. This bug very likely affect those devices,
once migrated to Kernel 4.9+. Changing from NO_HZ to HZ=1000 on TV/STB will
for sure have bad side effects on those types of devices, increasing power
consumption.

Not saying that this will be environmentally very bad, as the number of just
TV unit sales is at the order of 230 million units per year[1].

[1] https://www.statista.com/statistics/461316/global-tv-unit-sales/


Thanks,
Mauro

2018-01-26 14:19:11

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Hi Alan,

Em Mon, 8 Jan 2018 14:15:35 -0500 (EST)
Alan Stern <[email protected]> escreveu:

> On Mon, 8 Jan 2018, Linus Torvalds wrote:
>
> > Can somebody tell which softirq it is that dvb/usb cares about?
>
> I don't know about the DVB part. The USB part is a little difficult to
> analyze, mostly because the bug reports I've seen are mostly from
> people running non-vanilla kernels.

I suspect that the main reason for people not using non-vanilla Kernels
is that, among other bugs, the dwc2 upstream driver has serious troubles
handling ISOCH traffic.

Using Kernel 4.15-rc7 from this git tree:
https://git.linuxtv.org/mchehab/experimental.git/log/?h=softirq_fixup

(e. g. with the softirq bug partially reverted with Linux patch, and
the DWC2 deferred probe fixed)

With a PCTV 461e device, with uses em28xx driver + Montage frontend
(with is the same used on dvbsky hardware - except for em28xx).

This device doesn't support bulk for DVB, just ISOCH. The drivers work
fine on x86.

Using a test signal at the bit rate of 56698,4 Kbits/s, that's what
happens, when capturing less than one second of data:

$ dvbv5-zap -c ~/dvb_channel.conf "tv brasil" -l universal -X 100 -m -t2dvbv5-zap -c ~/dvb_channel.conf "tv brasil" -l universal -X 100 -m -t2
Using LNBf UNIVERSAL
Universal, Europe
Freqs : 10800 to 11800 MHz, LO: 9750 MHz
Freqs : 11600 to 12700 MHz, LO: 10600 MHz
using demux 'dvb0.demux0'
reading channels from file '/home/mchehab/dvb_channel.conf'
tuning to 11468000 Hz
(0x00) Signal= -33.90dBm
Lock (0x1f) Signal= -33.90dBm C/N= 30.28dB postBER= 2.33x10^-6
dvb_dev_set_bufsize: buffer set to 6160384
dvb_set_pesfilter to 0x2000
354.08s: Starting capture
354.73s: only read 59220 bytes
354.73s: Stopping capture

[ 354.000827] dwc2 3f980000.usb: DWC OTG HCD EP DISABLE: bEndpointAddress=0x84, ep->hcpriv=116f41b2
[ 354.000859] dwc2 3f980000.usb: DWC OTG HCD EP RESET: bEndpointAddress=0x84
[ 354.010744] dwc2 3f980000.usb: --Host Channel 5 Interrupt: Frame Overrun--
... (hundreds of thousands of Frame Overrun messages)
[ 354.660857] dwc2 3f980000.usb: --Host Channel 5 Interrupt: Frame Overrun--
[ 354.660935] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
[ 354.660959] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
[ 354.660966] dwc2 3f980000.usb: urb->status = 0
[ 354.660992] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
[ 354.661001] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
[ 354.661008] dwc2 3f980000.usb: urb->status = 0
[ 354.661054] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
[ 354.661065] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
[ 354.661072] dwc2 3f980000.usb: urb->status = 0
[ 354.661107] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
[ 354.661120] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
[ 354.661127] dwc2 3f980000.usb: urb->status = 0
[ 354.661146] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
[ 354.661158] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
[ 354.661165] dwc2 3f980000.usb: urb->status = 0

Kernel was compiled with:

CONFIG_USB_DWC2=y
CONFIG_USB_DWC2_HOST=y
# CONFIG_USB_DWC2_PERIPHERAL is not set
# CONFIG_USB_DWC2_DUAL_ROLE is not set
# CONFIG_USB_DWC2_PCI is not set
CONFIG_USB_DWC2_DEBUG=y
# CONFIG_USB_DWC2_VERBOSE is not set
# CONFIG_USB_DWC2_TRACK_MISSED_SOFS is not set
CONFIG_USB_DWC2_DEBUG_PERIODIC=y

As reference, that's the output of lsusb for the PCTV usb hardware:

$ lsusb -v -d 2013:0258

Bus 001 Device 005: ID 2013:0258 PCTV Systems
Couldn't open device, some information will be missing
Device Descriptor:
bLength 18
bDescriptorType 1
bcdUSB 2.00
bDeviceClass 0 (Defined at Interface level)
bDeviceSubClass 0
bDeviceProtocol 0
bMaxPacketSize0 64
idVendor 0x2013 PCTV Systems
idProduct 0x0258
bcdDevice 1.00
iManufacturer 3
iProduct 1
iSerial 2
bNumConfigurations 1
Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength 41
bNumInterfaces 1
bConfigurationValue 1
iConfiguration 0
bmAttributes 0x80
(Bus Powered)
MaxPower 500mA
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 0
bNumEndpoints 1
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x84 EP 4 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x0000 1x 0 bytes
bInterval 1
Interface Descriptor:
bLength 9
bDescriptorType 4
bInterfaceNumber 0
bAlternateSetting 1
bNumEndpoints 1
bInterfaceClass 255 Vendor Specific Class
bInterfaceSubClass 0
bInterfaceProtocol 0
iInterface 0
Endpoint Descriptor:
bLength 7
bDescriptorType 5
bEndpointAddress 0x84 EP 4 IN
bmAttributes 1
Transfer Type Isochronous
Synch Type None
Usage Type Data
wMaxPacketSize 0x03ac 1x 940 bytes
bInterval 1

Cheers,
Mauro

2018-01-26 19:38:55

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Em Fri, 26 Jan 2018 12:17:37 -0200
Mauro Carvalho Chehab <[email protected]> escreveu:

> Hi Alan,
>
> Em Mon, 8 Jan 2018 14:15:35 -0500 (EST)
> Alan Stern <[email protected]> escreveu:
>
> > On Mon, 8 Jan 2018, Linus Torvalds wrote:
> >
> > > Can somebody tell which softirq it is that dvb/usb cares about?
> >
> > I don't know about the DVB part. The USB part is a little difficult to
> > analyze, mostly because the bug reports I've seen are mostly from
> > people running non-vanilla kernels.
>
> I suspect that the main reason for people not using non-vanilla Kernels
> is that, among other bugs, the dwc2 upstream driver has serious troubles
> handling ISOCH traffic.
>
> Using Kernel 4.15-rc7 from this git tree:
> https://git.linuxtv.org/mchehab/experimental.git/log/?h=softirq_fixup
>
> (e. g. with the softirq bug partially reverted with Linux patch, and
> the DWC2 deferred probe fixed)
>
> With a PCTV 461e device, with uses em28xx driver + Montage frontend
> (with is the same used on dvbsky hardware - except for em28xx).
>
> This device doesn't support bulk for DVB, just ISOCH. The drivers work
> fine on x86.
>
> Using a test signal at the bit rate of 56698,4 Kbits/s, that's what
> happens, when capturing less than one second of data:
>
> $ dvbv5-zap -c ~/dvb_channel.conf "tv brasil" -l universal -X 100 -m -t2dvbv5-zap -c ~/dvb_channel.conf "tv brasil" -l universal -X 100 -m -t2
> Using LNBf UNIVERSAL
> Universal, Europe
> Freqs : 10800 to 11800 MHz, LO: 9750 MHz
> Freqs : 11600 to 12700 MHz, LO: 10600 MHz
> using demux 'dvb0.demux0'
> reading channels from file '/home/mchehab/dvb_channel.conf'
> tuning to 11468000 Hz
> (0x00) Signal= -33.90dBm
> Lock (0x1f) Signal= -33.90dBm C/N= 30.28dB postBER= 2.33x10^-6
> dvb_dev_set_bufsize: buffer set to 6160384
> dvb_set_pesfilter to 0x2000
> 354.08s: Starting capture
> 354.73s: only read 59220 bytes
> 354.73s: Stopping capture
>
> [ 354.000827] dwc2 3f980000.usb: DWC OTG HCD EP DISABLE: bEndpointAddress=0x84, ep->hcpriv=116f41b2
> [ 354.000859] dwc2 3f980000.usb: DWC OTG HCD EP RESET: bEndpointAddress=0x84
> [ 354.010744] dwc2 3f980000.usb: --Host Channel 5 Interrupt: Frame Overrun--
> ... (hundreds of thousands of Frame Overrun messages)
> [ 354.660857] dwc2 3f980000.usb: --Host Channel 5 Interrupt: Frame Overrun--
> [ 354.660935] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
> [ 354.660959] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
> [ 354.660966] dwc2 3f980000.usb: urb->status = 0
> [ 354.660992] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
> [ 354.661001] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
> [ 354.661008] dwc2 3f980000.usb: urb->status = 0
> [ 354.661054] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
> [ 354.661065] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
> [ 354.661072] dwc2 3f980000.usb: urb->status = 0
> [ 354.661107] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
> [ 354.661120] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
> [ 354.661127] dwc2 3f980000.usb: urb->status = 0
> [ 354.661146] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
> [ 354.661158] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
> [ 354.661165] dwc2 3f980000.usb: urb->status = 0

Btw,

Just in case, I also applied all recent pending dwc2 patches I found at
linux-usb (even trivial unrelated ones) at:

https://git.linuxtv.org/mchehab/experimental.git/log/?h=dwc2_patches

No differences. ISOCH is still broken.

If anyone wants to see the full logs, it is there:
https://pastebin.com/XJYyTwPv


Cheers,
Mauro

2018-01-29 13:53:01

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Em Fri, 26 Jan 2018 17:37:39 -0200
Mauro Carvalho Chehab <[email protected]> escreveu:

> Em Fri, 26 Jan 2018 12:17:37 -0200
> Mauro Carvalho Chehab <[email protected]> escreveu:
>
> > Hi Alan,
> >
> > Em Mon, 8 Jan 2018 14:15:35 -0500 (EST)
> > Alan Stern <[email protected]> escreveu:
> >
> > > On Mon, 8 Jan 2018, Linus Torvalds wrote:
> > >
> > > > Can somebody tell which softirq it is that dvb/usb cares about?
> > >
> > > I don't know about the DVB part. The USB part is a little difficult to
> > > analyze, mostly because the bug reports I've seen are mostly from
> > > people running non-vanilla kernels.
> >
> > I suspect that the main reason for people not using non-vanilla Kernels
> > is that, among other bugs, the dwc2 upstream driver has serious troubles
> > handling ISOCH traffic.
> >
> > Using Kernel 4.15-rc7 from this git tree:
> > https://git.linuxtv.org/mchehab/experimental.git/log/?h=softirq_fixup
> >
> > (e. g. with the softirq bug partially reverted with Linux patch, and
> > the DWC2 deferred probe fixed)
> >
> > With a PCTV 461e device, with uses em28xx driver + Montage frontend
> > (with is the same used on dvbsky hardware - except for em28xx).
> >
> > This device doesn't support bulk for DVB, just ISOCH. The drivers work
> > fine on x86.
> >
> > Using a test signal at the bit rate of 56698,4 Kbits/s, that's what
> > happens, when capturing less than one second of data:
> >
> > $ dvbv5-zap -c ~/dvb_channel.conf "tv brasil" -l universal -X 100 -m -t2dvbv5-zap -c ~/dvb_channel.conf "tv brasil" -l universal -X 100 -m -t2
> > Using LNBf UNIVERSAL
> > Universal, Europe
> > Freqs : 10800 to 11800 MHz, LO: 9750 MHz
> > Freqs : 11600 to 12700 MHz, LO: 10600 MHz
> > using demux 'dvb0.demux0'
> > reading channels from file '/home/mchehab/dvb_channel.conf'
> > tuning to 11468000 Hz
> > (0x00) Signal= -33.90dBm
> > Lock (0x1f) Signal= -33.90dBm C/N= 30.28dB postBER= 2.33x10^-6
> > dvb_dev_set_bufsize: buffer set to 6160384
> > dvb_set_pesfilter to 0x2000
> > 354.08s: Starting capture
> > 354.73s: only read 59220 bytes
> > 354.73s: Stopping capture
> >
> > [ 354.000827] dwc2 3f980000.usb: DWC OTG HCD EP DISABLE: bEndpointAddress=0x84, ep->hcpriv=116f41b2
> > [ 354.000859] dwc2 3f980000.usb: DWC OTG HCD EP RESET: bEndpointAddress=0x84
> > [ 354.010744] dwc2 3f980000.usb: --Host Channel 5 Interrupt: Frame Overrun--
> > ... (hundreds of thousands of Frame Overrun messages)
> > [ 354.660857] dwc2 3f980000.usb: --Host Channel 5 Interrupt: Frame Overrun--
> > [ 354.660935] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
> > [ 354.660959] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
> > [ 354.660966] dwc2 3f980000.usb: urb->status = 0
> > [ 354.660992] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
> > [ 354.661001] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
> > [ 354.661008] dwc2 3f980000.usb: urb->status = 0
> > [ 354.661054] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
> > [ 354.661065] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
> > [ 354.661072] dwc2 3f980000.usb: urb->status = 0
> > [ 354.661107] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
> > [ 354.661120] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
> > [ 354.661127] dwc2 3f980000.usb: urb->status = 0
> > [ 354.661146] dwc2 3f980000.usb: DWC OTG HCD URB Dequeue
> > [ 354.661158] dwc2 3f980000.usb: Called usb_hcd_giveback_urb()
> > [ 354.661165] dwc2 3f980000.usb: urb->status = 0
>
> Btw,
>
> Just in case, I also applied all recent pending dwc2 patches I found at
> linux-usb (even trivial unrelated ones) at:
>
> https://git.linuxtv.org/mchehab/experimental.git/log/?h=dwc2_patches
>
> No differences. ISOCH is still broken.
>
> If anyone wants to see the full logs, it is there:
> https://pastebin.com/XJYyTwPv

Someone pointed me in priv that applying a change at DWC2 BRCM profile to
enable uframe_sched might help.

So, I wrote this patch:
https://git.linuxtv.org/mchehab/experimental.git/commit/?h=v4.15%2bmedia%2bdwc2&id=19abf0026b7bf1bd44aa9d2add9f958935760ded

And applied on the top of this branch:
https://git.linuxtv.org/mchehab/experimental.git/log/?h=v4.15%2bmedia%2bdwc2

It is based on Kernel 4.15 vanilla. I applied:
- all media -next patches that will be sent to Kernel 4.16-rc1;
- DWC2 patches submitted by Gregor at linux-usb ML;
- Linus softirq test patch:
https://git.linuxtv.org/mchehab/experimental.git/commit/?h=v4.15%2bmedia%2bdwc2&id=ccf833fd4a5b99c3d3cf2c09c065670f74a230a7
- A DT patch that enables VCIQ (needed by some GPU drivers):
https://git.linuxtv.org/mchehab/experimental.git/commit/?h=v4.15%2bmedia%2bdwc2&id=fd4e9ca6f41d35b6234c30fa29937141e0c09570
- a few debug patches like this one:
https://git.linuxtv.org/mchehab/experimental.git/commit/?h=v4.15%2bmedia%2bdwc2&id=f50669c18394f5b5674630e2ebf78a06b023626f

I didn't notice any difference. The dwc2 driver is still broken for
ISOCH transfers:
https://pastebin.com/nL1Fe9X5

Cheers,
Mauro

2018-07-17 11:59:19

by Hanna Hawa

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Hi,

I'm a software developer working in Marvell SoC team.
I'm facing kernel panic issue while running raid 5 on sata disks
connected to Macchiatobin (Marvell community board with Armada-8040 SoC
with 4 ARMv8 cores of CA72)
Raid 5 built with Marvell DMA engine and async_tx mechanism
(ASYNC_TX_DMA [=y]); the DMA driver (mv_xor_v2) uses a tasklet to clean
the done descriptors from the queue.

The panic (see below) occurs while building the RAID-5 (mdadm) or while
writing/reading to the raid partition.

After some debug/bisect/diff, found that patch "softirq: Let ksoftirqd
do its job" is problematic patch.

- Using v4.14.0 and problematic patch reverted - no timout issue.
- Using v4.14.0 (including softirq patch) and the additional fix
proposed by Linus - no timeout issue.

As others have reported in this thread, the softirq change is causing
some regression.
Would it be possible to either revert the patch or apply a fix such as
the one proposed by Linus ?

Below panic message:
[ 25.371495] mv_xor_v2 f0400000.xor: dma_sync_wait: timeout!
[ 25.377101] Kernel panic - not syncing: async_tx_quiesce: DMA error
waiting for transaction
[ 25.377101]
[ 25.386973] CPU: 0 PID: 1417 Comm: md0_raid5 Not tainted 4.14.0 #16
[ 25.393264] Hardware name: Marvell Armada 8040 DB board (DT)
[ 25.398946] Call trace:
[ 25.401410] [<ffff000008089310>] dump_backtrace+0x0/0x380
[ 25.406831] [<ffff0000080896a4>] show_stack+0x14/0x20
[ 25.411904] [<ffff00000890fa78>] dump_stack+0x98/0xb8
[ 25.416976] [<ffff0000080c8ef0>] panic+0x118/0x280
[ 25.421788] [<ffff000008386a44>] async_tx_quiesce+0x74/0x78
[ 25.427382] [<ffff000008386ca4>] async_memcpy+0x1a4/0x2a0
[ 25.432806] [<ffff000008747f9c>] async_copy_data.isra.16+0x1b4/0x280
[ 25.439186] [<ffff00000874b6fc>] raid_run_ops+0x514/0x1320
[ 25.444694] [<ffff000008751550>] handle_stripe+0x1040/0x2848
[ 25.450377] [<ffff000008752f98>]
handle_active_stripes.isra.28+0x240/0x460
[ 25.457279] [<ffff000008753468>] raid5d+0x2b0/0x450
[ 25.462177] [<ffff00000875ead4>] md_thread+0x104/0x160
[ 25.467338] [<ffff0000080e638c>] kthread+0xfc/0x128
[ 25.472234] [<ffff000008085354>] ret_from_fork+0x10/0x1c
[ 25.477571] Kernel Offset: disabled
[ 25.481073] CPU features: 0x002000
[ 25.484487] Memory Limit: none
[ 25.487556] ---[ end Kernel panic - not syncing: async_tx_quiesce:
DMA error waiting for transaction
[ 25.487556]

Thanks,
Hanna

2018-07-17 17:10:41

by Linus Torvalds

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

On Tue, Jul 17, 2018 at 4:58 AM Hanna Hawa <[email protected]> wrote:
>
> After some debug/bisect/diff, found that patch "softirq: Let ksoftirqd
> do its job" is problematic patch.

Ok, this thread died down without any resolution.

>- Using v4.14.0 (including softirq patch) and the additional fix
> proposed by Linus - no timeout issue.

Are you talking about the patch that made HI_SOFTIRQ and
TASKLET_SOFTIRQ special, and had this:

#define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))

in it?

I think I'll just commit the damn thing. It's hacky, but it's simple,
and it never got applied because we had smarter suggestions. But the
smarter suggestions never ended up being applied either, so..

Linus

2018-07-17 18:12:46

by Hanna Hawa

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Hi Linus,

On 07/17/2018 08:09 PM, Linus Torvalds wrote:
> On Tue, Jul 17, 2018 at 4:58 AM Hanna Hawa <[email protected]> wrote:
>>
>> After some debug/bisect/diff, found that patch "softirq: Let ksoftirqd
>> do its job" is problematic patch.
>
> Ok, this thread died down without any resolution.
>
>> - Using v4.14.0 (including softirq patch) and the additional fix
>> proposed by Linus - no timeout issue.
>
> Are you talking about the patch that made HI_SOFTIRQ and
> TASKLET_SOFTIRQ special, and had this:
>
> #define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
>
> in it?
yes, exactly..

Link to the patch:
https://git.linuxtv.org/mchehab/experimental.git/commit/?h=v4.15%2bmedia%2bdwc2&id=ccf833fd4a5b99c3d3cf2c09c065670f74a230a7

Thanks,
Hanna

>
> I think I'll just commit the damn thing. It's hacky, but it's simple,
> and it never got applied because we had smarter suggestions. But the
> smarter suggestions never ended up being applied either, so..
>
> Linus
>

2018-07-17 22:23:17

by Mauro Carvalho Chehab

[permalink] [raw]
Subject: Re: dvb usb issues since kernel 4.9

Hi Linus,

Em Tue, 17 Jul 2018 10:09:28 -0700
Linus Torvalds <[email protected]> escreveu:

> On Tue, Jul 17, 2018 at 4:58 AM Hanna Hawa <[email protected]> wrote:
> >
> > After some debug/bisect/diff, found that patch "softirq: Let ksoftirqd
> > do its job" is problematic patch.
>
> Ok, this thread died down without any resolution.
>
> >- Using v4.14.0 (including softirq patch) and the additional fix
> > proposed by Linus - no timeout issue.
>
> Are you talking about the patch that made HI_SOFTIRQ and
> TASKLET_SOFTIRQ special, and had this:
>
> #define SOFTIRQ_NOW_MASK ((1 << HI_SOFTIRQ) | (1 << TASKLET_SOFTIRQ))
>
> in it?
>
> I think I'll just commit the damn thing. It's hacky, but it's simple,
> and it never got applied because we had smarter suggestions. But the
> smarter suggestions never ended up being applied either, so..

Yeah, IMHO the best would be to apply your patch[1], c/c stable up to
4.9. Nothing prevents applying a better/smarter solution once we
have it. From my side, I can keep testing whatever smart suggestions
people propose. Yet, better to have one fix on our hand than two
fixes flying around.

[1] e. g.
https://git.linuxtv.org/mchehab/experimental.git/commit/?h=v4.15%2bmedia%2bdwc2&id=ccf833fd4a5b99c3d3cf2c09c065670f74a230a7

Regards,
Mauro