2019-01-10 16:31:50

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] drivers/tty: increase priority for tty_buffer_worker

On Thu, Jan 10, 2019 at 04:19:53PM +0100, Oleksij Rempel wrote:
> > My gut feel is that if somebody still cares deeply about serial line
> > latency, they should look at trying to see if they can do some of the
> > work directly without the bounce to the workqueue. We use workqueues
> > for a reason, but it's possible that some of it could be avoided at
> > least in special cases... And yours sounds like a special case.
>
> It is for industrial low latency RS-422 based application. The loopback
> test is just easy way to test/reproduce it without additional hardware.
>
> What is good, mainlineable way to implement it?

What is the real problem your systems are having? Are they serial-port
limited? Is latency a big issue? Trying to tune for a fake workload
isn't the best way to solve anything :)

thanks,
greg k-h


2019-01-28 08:24:12

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] drivers/tty: increase priority for tty_buffer_worker

On Mon, Jan 28, 2019 at 09:05:30AM +0100, Oleksij Rempel wrote:
>
>
> On 10.01.19 17:30, Greg Kroah-Hartman wrote:
> > On Thu, Jan 10, 2019 at 04:19:53PM +0100, Oleksij Rempel wrote:
> > > > My gut feel is that if somebody still cares deeply about serial line
> > > > latency, they should look at trying to see if they can do some of the
> > > > work directly without the bounce to the workqueue. We use workqueues
> > > > for a reason, but it's possible that some of it could be avoided at
> > > > least in special cases... And yours sounds like a special case.
> > >
> > > It is for industrial low latency RS-422 based application. The loopback
> > > test is just easy way to test/reproduce it without additional hardware.
> > >
> > > What is good, mainlineable way to implement it?
> >
> > What is the real problem your systems are having? Are they serial-port
> > limited? Is latency a big issue? Trying to tune for a fake workload
> > isn't the best way to solve anything :)
>
> The system in question is a high power laser cutter with live image-based inspection
> and adjustment of the cutting process. In this setup the RS422 interface is used to
> control parameters of the laser cutting unit in a tie control loop with the camera.
> This loops needs to operate at 1000 Hz.
>
> The xy-stage moves with a speed of approx. 60m/min, i.e. within 1ms it
> moves about 1mm. For a high precision control process a jitter of ? 500 us (+/- 0.5mm)
> is unacceptable.

Are you using the rt kernel patch for this type of thing? That should
bound your jitter at a much more deterministic level.

thanks,

greg k-h

2019-01-28 20:04:50

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] drivers/tty: increase priority for tty_buffer_worker

On Mon, Jan 28, 2019 at 1:22 AM Oleksij Rempel <[email protected]> wrote:
>
> Yes, I tested it with different linux-rt version with mostly similar results:

Hmm. It strikes me that you use very carefully timed serial *writes*
to control the laser cutter, but the flip buffer handling is mostly a
latency issue on the *read* side, isn't it?

Are you sure you are testing the right thing? Because your loopback
test is testing the latency not of writes, but of writes _and_ reads.

I'm wondering if we could/should try to simply avoid the workqueue
entirely if we could do the work in process context.

That's harder to do for reads - because incoming characters happen in
interrupt context, but shouldn't be all that hard to do for writes.

In fact, I thought we already did writes without any tty buffer
flipping at all, and that your patch series shouldn't actually affect
any write latency, but I've happily not had to work much with the
tty/serial layer in years..

Do you actually have read latency issues? Or is this whole series
perhaps an artifical effect of the benchmark you use?

Linus

2019-01-28 20:16:26

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] drivers/tty: increase priority for tty_buffer_worker

On Mon, Jan 28, 2019 at 12:03 PM Linus Torvalds
<[email protected]> wrote:
>
> That's harder to do for reads - because incoming characters happen in
> interrupt context, but shouldn't be all that hard to do for writes.

Side note: the reason I mention this part is that "harder" may not
mean "impossible".

In particular, I wonder if we could do the tty buffer flipping in the
reader context too. Currently, what happens is that when we receive
characters, we schedule things for flipping with the workqueues. *BUT*
we could also just wake up any pending readers directly, and maybe
have the *readers* do the flip if they wake up before the workqueue.

And that would allow you to do real-time serial work simply by marking
the process *you* care about as RT, and not worry so much about the
workqueue threads at all. The workqueue threads would be fallbacks for
when there isn't an active reader at all.

I dunno. A bit handwavy, I know, but it sounds like if you care about
the read latency, that would be a better model entirely (skipping the
technically unnecessary kernel workqueue entirely).

Linus

2019-01-28 08:06:06

by Oleksij Rempel

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] drivers/tty: increase priority for tty_buffer_worker



On 10.01.19 17:30, Greg Kroah-Hartman wrote:
> On Thu, Jan 10, 2019 at 04:19:53PM +0100, Oleksij Rempel wrote:
>>> My gut feel is that if somebody still cares deeply about serial line
>>> latency, they should look at trying to see if they can do some of the
>>> work directly without the bounce to the workqueue. We use workqueues
>>> for a reason, but it's possible that some of it could be avoided at
>>> least in special cases... And yours sounds like a special case.
>>
>> It is for industrial low latency RS-422 based application. The loopback
>> test is just easy way to test/reproduce it without additional hardware.
>>
>> What is good, mainlineable way to implement it?
>
> What is the real problem your systems are having? Are they serial-port
> limited? Is latency a big issue? Trying to tune for a fake workload
> isn't the best way to solve anything :)

The system in question is a high power laser cutter with live image-based inspection
and adjustment of the cutting process. In this setup the RS422 interface is used to
control parameters of the laser cutting unit in a tie control loop with the camera.
This loops needs to operate at 1000 Hz.

The xy-stage moves with a speed of approx. 60m/min, i.e. within 1ms it
moves about 1mm. For a high precision control process a jitter of ± 500 us (+/- 0.5mm)
is unacceptable.

Kind regards,
Oleksij Rempel

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |

2019-01-28 09:23:56

by Oleksij Rempel

[permalink] [raw]
Subject: Re: [PATCH v1 3/3] drivers/tty: increase priority for tty_buffer_worker



On 28.01.19 09:23, Greg Kroah-Hartman wrote:
> On Mon, Jan 28, 2019 at 09:05:30AM +0100, Oleksij Rempel wrote:
>>
>>
>> On 10.01.19 17:30, Greg Kroah-Hartman wrote:
>>> On Thu, Jan 10, 2019 at 04:19:53PM +0100, Oleksij Rempel wrote:
>>>>> My gut feel is that if somebody still cares deeply about serial line
>>>>> latency, they should look at trying to see if they can do some of the
>>>>> work directly without the bounce to the workqueue. We use workqueues
>>>>> for a reason, but it's possible that some of it could be avoided at
>>>>> least in special cases... And yours sounds like a special case.
>>>>
>>>> It is for industrial low latency RS-422 based application. The loopback
>>>> test is just easy way to test/reproduce it without additional hardware.
>>>>
>>>> What is good, mainlineable way to implement it?
>>>
>>> What is the real problem your systems are having? Are they serial-port
>>> limited? Is latency a big issue? Trying to tune for a fake workload
>>> isn't the best way to solve anything :)
>>
>> The system in question is a high power laser cutter with live image-based inspection
>> and adjustment of the cutting process. In this setup the RS422 interface is used to
>> control parameters of the laser cutting unit in a tie control loop with the camera.
>> This loops needs to operate at 1000 Hz.
>>
>> The xy-stage moves with a speed of approx. 60m/min, i.e. within 1ms it
>> moves about 1mm. For a high precision control process a jitter of ± 500 us (+/- 0.5mm)
>> is unacceptable.
>
> Are you using the rt kernel patch for this type of thing? That should
> bound your jitter at a much more deterministic level.

Yes, I tested it with different linux-rt version with mostly similar results:
kernel 4.8.15-rt10+
latency histogram:
0 ... < 250 usec : 1933104 transmissions
250 ... < 500 usec : 21339 transmissions
500 ... < 750 usec : 8952 transmissions
750 ... < 1000 usec : 6226 transmissions
1000 ... < 1500 usec : 7688 transmissions
1500 ... < 2000 usec : 5236 transmissions
2000 ... < 5000 usec : 11724 transmissions
5000 ... < 10000 usec : 3588 transmissions
10000 ... < 50000 usec : 2123 transmissions
50000 ... < 1000000 usec : 20 transmissions
>= 1000000 usec : 0 transmissions

kernel 4.9.0-rt1+
latency histogram:
0 ... < 250 usec : 1950222 transmissions
250 ... < 500 usec : 15041 transmissions
500 ... < 750 usec : 5968 transmissions
750 ... < 1000 usec : 4437 transmissions
1000 ... < 1500 usec : 6022 transmissions
1500 ... < 2000 usec : 4185 transmissions
2000 ... < 5000 usec : 9864 transmissions
5000 ... < 10000 usec : 2773 transmissions
10000 ... < 50000 usec : 1462 transmissions
50000 ... < 1000000 usec : 26 transmissions
>= 1000000 usec : 0 transmissions

4.19.10-rt8
latency histogram:
0 ... < 250 usec : 1906861 transmissions
250 ... < 500 usec : 35271 transmissions
500 ... < 750 usec : 13103 transmissions
750 ... < 1000 usec : 9084 transmissions
1000 ... < 1500 usec : 9434 transmissions
1500 ... < 2000 usec : 5644 transmissions
2000 ... < 5000 usec : 12737 transmissions
5000 ... < 10000 usec : 4511 transmissions
10000 ... < 50000 usec : 3201 transmissions
50000 ... < 1000000 usec : 154 transmissions
>= 1000000 usec : 0 transmissions


without extra CPU load the result on kernel 4.19.10-rt8 will be:
latency histogram:
0 ... < 250 usec : 1999992 transmissions
250 ... < 500 usec : 8 transmissions
500 ... < 750 usec : 0 transmissions
750 ... < 1000 usec : 0 transmissions
1000 ... < 1500 usec : 0 transmissions
1500 ... < 2000 usec : 0 transmissions
2000 ... < 5000 usec : 0 transmissions
5000 ... < 10000 usec : 0 transmissions
10000 ... < 50000 usec : 0 transmissions
50000 ... < 1000000 usec : 0 transmissions
>= 1000000 usec : 0 transmissions
=============================================================


test results with same load and replaced kworker with kthread and assigned an RT priority

min latency: 0 sec : 75 usec
max latency: 0 sec : 125 usec
average latency: 81 usec
latency measure cycles overall: 79000000
latency histogram:
0 ... < 250 usec : 79000000 transmissions
250 ... < 500 usec : 0 transmissions
500 ... < 750 usec : 0 transmissions
750 ... < 1000 usec : 0 transmissions
1000 ... < 1500 usec : 0 transmissions
1500 ... < 2000 usec : 0 transmissions
2000 ... < 5000 usec : 0 transmissions
5000 ... < 10000 usec : 0 transmissions
10000 ... < 50000 usec : 0 transmissions
50000 ... < 1000000 usec : 0 transmissions
>= 1000000 usec : 0 transmissions



Kind regards,
Oleksij Rempel

--
Pengutronix e.K. | |
Industrial Linux Solutions | http://www.pengutronix.de/ |
Peiner Str. 6-8, 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |