2002-08-27 09:54:05

by Wessler, Siegfried

[permalink] [raw]
Subject: interrupt latency

Hello,

I am running and will in near future kernel 2.4.18 on an embedded system.

I have to speed up interrupt latency and need to understand how in what
timing tasklets are called and arbitraded.

I have to dig deep, but the kernel tree is quiet huge. As a non kernel
programmer I ask you, if anyone could give me a hint, where to start reading
from and which kernel source to pick first.

Any help highly appreaciated.
(BTW: I will not bother you personaly with further questions unless you give
permission.)


What's behind it: We patched NMI and do some stuff we have to do very
regularly in there. After NMI we have to quiet fast start a kernel or even a
user space function with low latency. Also I measured 8 milliseconds after a
hardware interrupt before the corresponding interrupt function is called. At
RTI time it is even longer (around 12 microseconds). Need to find a way to
exactly understand why, and maybe speed up a bit.

Thank You.
Siegfried.


-------------
HBM - Hottinger Baldwin Messtechnik GmbH
Siegfried Wessler, Dipl.-Ing.
Entwicklung Messverst?rker T-V
Im Tiefen See 45, D-64293 Darmstadt
Fon: 06151/803-884, Fax: -524
eMail: [email protected]


2002-08-27 14:07:49

by Stephen Samuel

[permalink] [raw]
Subject: Re: interrupt latency

http://www.linux.org/dist/index.html contains an index to a number of
Linux distributions. Check out the embeded kernels. They include some
Realtime mods. As an example, RTLinux claims to do hard realtime by
running the Linux kernel as it's lowest priority task. This is supposed
to allow serious realtime work without having to mess too much with
the kernel.

Wessler, Siegfried wrote:
> Hello,
>
> I am running and will in near future kernel 2.4.18 on an embedded system.
>
> I have to speed up interrupt latency and need to understand how in what
> timing tasklets are called and arbitraded.
.....
> What's behind it: We patched NMI and do some stuff we have to do very
> regularly in there. After NMI we have to quiet fast start a kernel or even a
> user space function with low latency. Also I measured 8 milliseconds after a
> hardware interrupt before the corresponding interrupt function is called. At
> RTI time it is even longer (around 12 microseconds). Need to find a way to
> exactly understand why, and maybe speed up a bit.
--
Stephen Samuel +1(604)876-0426 [email protected]
http://www.bcgreen.com/~samuel/
Powerful committed communication, reaching through fear, uncertainty and
doubt to touch the jewel within each person and bring it to life.

2002-08-27 17:12:12

by Robert Schwebel

[permalink] [raw]
Subject: Re: interrupt latency

On Tue, Aug 27, 2002 at 07:11:58AM -0700, Stephen Samuel wrote:
> http://www.linux.org/dist/index.html contains an index to a number of
> Linux distributions. Check out the embeded kernels. They include some
> Realtime mods. As an example, RTLinux claims to do hard realtime by
> running the Linux kernel as it's lowest priority task. This is supposed
> to allow serious realtime work without having to mess too much with
> the kernel.

Take also into account that RT-Linux is patented technology (for details
see http://www.aero.polimi.it/~rtai/documentation/articles/moglen.html).

There is also RTAI as an alternative, which has a very supportive user
community.

Robert
--
Dipl.-Ing. Robert Schwebel | http://www.pengutronix.de
Pengutronix - Linux Solutions for Science and Industry
Braunschweiger Str. 79, 31134 Hildesheim, Germany
Phone: +49-5121-28619-0 | Fax: +49-5121-28619-4

2002-08-27 17:34:53

by Mark Hounschell

[permalink] [raw]
Subject: Re: interrupt latency

"Wessler, Siegfried" wrote:
>
> Hello,
>
> I am running and will in near future kernel 2.4.18 on an embedded system.
>
> I have to speed up interrupt latency and need to understand how in what
> timing tasklets are called and arbitraded.
>
> I have to dig deep, but the kernel tree is quiet huge. As a non kernel
> programmer I ask you, if anyone could give me a hint, where to start reading
> from and which kernel source to pick first.
>
> Any help highly appreaciated.
> (BTW: I will not bother you personaly with further questions unless you give
> permission.)
>
> What's behind it: We patched NMI and do some stuff we have to do very
> regularly in there. After NMI we have to quiet fast start a kernel or even a
> user space function with low latency. Also I measured 8 milliseconds after a
> hardware interrupt before the corresponding interrupt function is called. At
> RTI time it is even longer (around 12 microseconds). Need to find a way to
> exactly understand why, and maybe speed up a bit.
>
> Thank You.
> Siegfried.

I've found that with the combination of process affinity and irq affinity you
can get very good interrupt latency/determinism. We use a pci card that has
some external interrupts and some 250ns resolution timers and have found the
interrupt latency/determinism of the external interrupts to be more than
exceptable as long as the process and irq of that pci card are forced
to one cpu and ALL other processes/irq's are forced to another cpu. Of coarse
you need an SMP box for best results. We found that with a UMP box you can
get the latency but there is no determinism to that latency.

Mark

2002-08-27 17:54:53

by Richard B. Johnson

[permalink] [raw]
Subject: Re: interrupt latency

On Tue, 27 Aug 2002, Mark Hounschell wrote:

> "Wessler, Siegfried" wrote:
> >
> > Hello,
> >
> > I am running and will in near future kernel 2.4.18 on an embedded system.
> >
> > I have to speed up interrupt latency and need to understand how in what
> > timing tasklets are called and arbitraded.
> >
> > I have to dig deep, but the kernel tree is quiet huge. As a non kernel
> > programmer I ask you, if anyone could give me a hint, where to start reading
> > from and which kernel source to pick first.
> >
> > Any help highly appreaciated.
> > (BTW: I will not bother you personaly with further questions unless you give
> > permission.)
> >
> > What's behind it: We patched NMI and do some stuff we have to do very
> > regularly in there. After NMI we have to quiet fast start a kernel or even a
> > user space function with low latency. Also I measured 8 milliseconds after a
> > hardware interrupt before the corresponding interrupt function is called. At
> > RTI time it is even longer (around 12 microseconds). Need to find a way to
> > exactly understand why, and maybe speed up a bit.
> >
> > Thank You.
> > Siegfried.
>
> I've found that with the combination of process affinity and irq affinity you
> can get very good interrupt latency/determinism. We use a pci card that has
> some external interrupts and some 250ns resolution timers and have found the
> interrupt latency/determinism of the external interrupts to be more than
> exceptable as long as the process and irq of that pci card are forced
> to one cpu and ALL other processes/irq's are forced to another cpu. Of coarse
> you need an SMP box for best results. We found that with a UMP box you can
> get the latency but there is no determinism to that latency.
>
> Mark

> > user space function with low latency. Also I measured 8 milliseconds after a
^^^^^^^^^^^^^^^^^^^^^^^^

This cannot be. A stock kernel-2.4.18, running a 133 MHz AMD-SC520,
(like a i586) with a 33 MHz bus, handles interrupts off IRQ7 (the lowest
priority), from the 'printer port' at well over 75,000 per second without
skipping a beat or missing an edge. This means that latency is at least
as good as 1/57,000 sec = 0.013 microseconds.

To get data into user-space is not 'interrupt latency'. If that's what
is meant, you could lose a whole (worse case) HZ on a single CPU machine
before any user would even know that data was available or an interrupt
occurred.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-08-27 18:00:54

by Dag Nygren

[permalink] [raw]
Subject: Re: interrupt latency

> Hello,
>
> I am running and will in near future kernel 2.4.18 on an embedded system.
>
> I have to speed up interrupt latency and need to understand how in what
> timing tasklets are called and arbitraded.

Have you already checked if the low-latency patch and/or the preempt patch
that is out there could do the trick for you ?

Good luck


--
Dag Nygren email: [email protected]
Oy Espoon NewTech Ab phone: +358 9 8024910
Tr?sktorpet 3 fax: +358 9 8024916
02360 ESBO Mobile: +358 400 426312
FINLAND


2002-08-27 19:51:53

by Victor Yodaiken

[permalink] [raw]
Subject: Re: interrupt latency

On Tue, Aug 27, 2002 at 02:01:43PM -0400, Richard B. Johnson wrote:
> This cannot be. A stock kernel-2.4.18, running a 133 MHz AMD-SC520,
> (like a i586) with a 33 MHz bus, handles interrupts off IRQ7 (the lowest
> priority), from the 'printer port' at well over 75,000 per second without
> skipping a beat or missing an edge. This means that latency is at least
> as good as 1/57,000 sec = 0.013 microseconds.

Assuming you mean 75,000 then ...
Thats 0.013 MILLISECONDS which is 13 microseconds and its not likely.
I bet that your data source drops data or looks at some handshake
pins on the parallel connect.

--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
http://www.fsmlabs.com http://www.rtlinux.com

2002-08-27 20:37:18

by Richard B. Johnson

[permalink] [raw]
Subject: Re: interrupt latency

On Tue, 27 Aug 2002 [email protected] wrote:

> On Tue, Aug 27, 2002 at 02:01:43PM -0400, Richard B. Johnson wrote:
> > This cannot be. A stock kernel-2.4.18, running a 133 MHz AMD-SC520,
> > (like a i586) with a 33 MHz bus, handles interrupts off IRQ7 (the lowest
> > priority), from the 'printer port' at well over 75,000 per second without
> > skipping a beat or missing an edge. This means that latency is at least
> > as good as 1/57,000 sec = 0.013 microseconds.
>
> Assuming you mean 75,000 then ...
> Thats 0.013 MILLISECONDS which is 13 microseconds and its not likely.

Yes 13 microseconds.

> I bet that your data source drops data or looks at some handshake
> pins on the parallel connect.
>

No. You can easily read into memory 75,000 bytes per second from the
parallel port, hell RS-232C will do 22,400++ bytes per second (224,000
baud) on a Windows machine, done all the while to feed a PROM burner. I
never measured Linux RS-232C, but it's got to be at least as good.

On a faster machine, i.e., an ordinary 400 MHz Pentium, we have a
complete Tomographic Imaging machine that gets triggers from the
parallel port.

Off the parallel port, hardware writes a byte and sets the interrupt
line. There is no hand-shake with incoming data. It comes off a
trigger board that will generate between 50 and 80 thousand
triggers per second, depending upon some wheel speed. FYI, these
triggers mark the position at which an X-Ray beam generated image
data. If we missed a trigger, we end up losing a whole ray of
image data, which would be a mess.

Software reads then writes the byte into memory and executes
wake_up_interruptible() for somebody sleeping in poll(). There is a
fixed-length circular buffer with no dynamic allocation. This is
the only thing that could possibly make it fast.

At the same time, a high-speed data-link, interfacing to the PCI/Bus
gets 2k of data per trigger so the machine is not exactly idle
when the triggers are coming into the parallel port. Software
correlates the trigger data with the image data as part of a
tomographic reconstruction.

That's 2048 * 80,000 = ~163 MB/s over the PCI with 80,000 b/s over
the parallel port at the same time. We originally had a hardware
"funnel" to combine 4 bytes before generating an interrupt. This
turned out to be a synchronization nightmare and was scrapped once
it was found that Linux interrupted fast enough.

Make a simple module, create an ISR off the printer port, enable
the printer port (hardware) interrupt line, then use a function
generator to toggle the printer port interrupt line. You can then
do all kinds of diagnostics to find out the max rate you can
interrupt --and the maximum amount of code you can use in that
ISR before you get overruns. This is what I did before I ever
signed up to use a ^$@^$!(-@ printer-port for something important.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-08-27 20:54:16

by Victor Yodaiken

[permalink] [raw]
Subject: Re: interrupt latency

On Tue, Aug 27, 2002 at 04:44:34PM -0400, Richard B. Johnson wrote:
> On Tue, 27 Aug 2002 [email protected] wrote:
>
> > On Tue, Aug 27, 2002 at 02:01:43PM -0400, Richard B. Johnson wrote:
> > > This cannot be. A stock kernel-2.4.18, running a 133 MHz AMD-SC520,
> > > (like a i586) with a 33 MHz bus, handles interrupts off IRQ7 (the lowest
> > > priority), from the 'printer port' at well over 75,000 per second without
> > > skipping a beat or missing an edge. This means that latency is at least
> > > as good as 1/57,000 sec = 0.013 microseconds.
> >
> > Assuming you mean 75,000 then ...
> > Thats 0.013 MILLISECONDS which is 13 microseconds and its not likely.
>
> Yes 13 microseconds.
>
> > I bet that your data source drops data or looks at some handshake
> > pins on the parallel connect.
> >
>
> No. You can easily read into memory 75,000 bytes per second from the
> parallel port, hell RS-232C will do 22,400++ bytes per second (224,000
> baud) on a Windows machine, done all the while to feed a PROM burner. I

You can do it in a tight loop. But you cannot do it otherwise. RS232 works
because most UARTs have fifo buffers. Old Windows did pretty well, because
you could grab the machine and let nothing else happen.

What makes me dubious about your claim is that it is easy to test
and see that a single ISA operation can take 18 microseconds
on most PC hardware.

try:
cli
loop:
read tsc
inb
read tsc
compute difference
print worst case every 1000000 times.

sti

run for an hour on a busy machine.



> On a faster machine, i.e., an ordinary 400 MHz Pentium, we have a
> complete Tomographic Imaging machine that gets triggers from the
> parallel port.
>
> Off the parallel port, hardware writes a byte and sets the interrupt
> line. There is no hand-shake with incoming data. It comes off a
> trigger board that will generate between 50 and 80 thousand
> triggers per second, depending upon some wheel speed. FYI, these
> triggers mark the position at which an X-Ray beam generated image
> data. If we missed a trigger, we end up losing a whole ray of
> image data, which would be a mess.
>
> Software reads then writes the byte into memory and executes
> wake_up_interruptible() for somebody sleeping in poll(). There is a
> fixed-length circular buffer with no dynamic allocation. This is
> the only thing that could possibly make it fast.
>
> At the same time, a high-speed data-link, interfacing to the PCI/Bus
> gets 2k of data per trigger so the machine is not exactly idle
> when the triggers are coming into the parallel port. Software
> correlates the trigger data with the image data as part of a
> tomographic reconstruction.
>
> That's 2048 * 80,000 = ~163 MB/s over the PCI with 80,000 b/s over
> the parallel port at the same time. We originally had a hardware
> "funnel" to combine 4 bytes before generating an interrupt. This
> turned out to be a synchronization nightmare and was scrapped once
> it was found that Linux interrupted fast enough.
>
> Make a simple module, create an ISR off the printer port, enable
> the printer port (hardware) interrupt line, then use a function
> generator to toggle the printer port interrupt line. You can then
> do all kinds of diagnostics to find out the max rate you can
> interrupt --and the maximum amount of code you can use in that
> ISR before you get overruns. This is what I did before I ever
> signed up to use a ^$@^$!(-@ printer-port for something important.
>
>
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
> The US military has given us many words, FUBAR, SNAFU, now ENRON.
> Yes, top management were graduates of West Point and Annapolis.

--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
http://www.fsmlabs.com http://www.rtlinux.com

2002-08-27 21:40:41

by Stephen Samuel

[permalink] [raw]
Subject: Re: interrupt latency

From the message that y ou linked to, it would appear that the
RTLinux patent isn't an issue. The letter seems to state that
RTLinux has licensed the patent free to any GPL distribution.
In other words, their patent would only be a costly problem for
a non-GPL system (e.g. Windows or BSD) running under the RT kernel.

Robert Schwebel wrote:

> Take also into account that RT-Linux is patented technology (for details
> see http://www.aero.polimi.it/~rtai/documentation/articles/moglen.html).

Did I miss something?

--
Stephen Samuel +1(604)876-0426 [email protected]
http://www.bcgreen.com/~samuel/
Powerful committed communication, reaching through fear, uncertainty and
doubt to touch the jewel within each person and bring it to life.

2002-08-27 22:43:22

by Rogier Wolff

[permalink] [raw]
Subject: Re: interrupt latency

On Tue, Aug 27, 2002 at 02:56:31PM -0600, [email protected] wrote:
> > No. You can easily read into memory 75,000 bytes per second from the
> > parallel port, hell RS-232C will do 22,400++ bytes per second (224,000
> > baud) on a Windows machine, done all the while to feed a PROM burner. I
>
> You can do it in a tight loop. But you cannot do it otherwise. RS232 works
> because most UARTs have fifo buffers. Old Windows did pretty well, because
> you could grab the machine and let nothing else happen.
>
> What makes me dubious about your claim is that it is easy to test
> and see that a single ISA operation can take 18 microseconds
> on most PC hardware.
>
> try:
> cli
> loop:
> read tsc
> inb
> read tsc
> compute difference
> print worst case every 1000000 times.
>
> sti
>
> run for an hour on a busy machine.

That machine won't be busy if you disable interrupts for an hour... :-)

I have benchmarked a Linux system (probably 2.0 era!) to handle
about 140k interrupts per second. I was NOT worried about missing
one interrupt. We would see userspace significantly slowing down
around 120k interrupts per second, and at around 140k interrupts
per second, the machine would grind to a halt. Until you turned
the interrupt generator back down below the limit.

This was with a 120MHz Pentium.

I wouldn't be surprised if you could handle around 75k interrrupts
per second without missing one if all other interrupts are behaving.
(i.e. don't disable interrupts for more than 7 us).

(Of course pulling in 163 Mb per second over an ordinary 33MHz 32bit
PCI bus is impossible, and quite difficult on 33MHz/64bit or 66MHz/32bit
and doable on 66MHz/64bit).

Roger.

--
** [email protected] ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
* There are old pilots, and there are bold pilots.
* There are also old, bald pilots.

2002-08-28 12:12:36

by Richard B. Johnson

[permalink] [raw]
Subject: Re: interrupt latency

On Tue, 27 Aug 2002 [email protected] wrote:

> On Tue, Aug 27, 2002 at 04:44:34PM -0400, Richard B. Johnson wrote:
> > On Tue, 27 Aug 2002 [email protected] wrote:
> >

[SNIPPED...]B
>
> You can do it in a tight loop. But you cannot do it otherwise. RS232 works
> because most UARTs have fifo buffers. Old Windows did pretty well, because
> you could grab the machine and let nothing else happen.
>
> What makes me dubious about your claim is that it is easy to test
> and see that a single ISA operation can take 18 microseconds
> on most PC hardware.
>
> try:
> cli
> loop:
> read tsc
> inb
> read tsc
> compute difference
> print worst case every 1000000 times.
>
> sti
>
> run for an hour on a busy machine.
>
>

No, no, no. There is no such port read that takes 18 microseconds, even
on old '386 machines with real ISR slots. A port read on those took
almost exactly 300 nanoseconds and, in fact, was the limiting factor
for the programmed I/O devices on the ISA bus.

Modern machines, if they have an ISA bus, keep them isolated off the
end of a bridge. I/O to the printer port and the IR/RS-232 device(s)
runs through another bus, variously called the GP (General Purpose)
bus. That's where the "Super I/O chips" that are used for floppy,
keyboard, printer, and RS-232C ports, is connected.

Attached, is a directory that contains the driver code used to
qualify a proposed product design several years ago. It was used
to measure latency and the number of interrupts that could be
handled, etc. You might find it useful.

Also, there is this hack I just wrote to show how many byte reads
you can do in user-mode from the printer port. You need to run
this as root because it executes iopl().

Script started on Wed Aug 28 08:01:44 2002
# cat usermode.c
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <asm/io.h>

extern int iopl(int);

volatile int run=0;
void timer(int unused)
{
run = 0;
}


int main()
{
unsigned long count = 0;
char foo[1];
(void)iopl(3);
fprintf(stdout, "Wait.....");
fflush(stdout);
(void)signal(SIGALRM, timer);
(void)alarm(1);
run++;
while(run)
{
foo[0] = inb(0x378); /* Actually put into memory */
count++; /* This takes as long as bumping a pointer */
}
printf("\nPort reads in a second = %lu\n", count);
return 0;
}

# ./usermode
Wait.....
Port reads in a second = 666072
# exit
exit

Script done on Wed Aug 28 08:02:07 2002


So, you can see that 660,000++ bytes/second can be read and put into
memory from a printer port. If you mess around with the driver code,
you will find that 80,000 interrupts/second and 80,000 bytes/second
read and put into memory is conservative. The modern Intel machines
are very good.



Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.


Attachments:
device.tar.gz (7.87 kB)

2002-08-28 13:39:29

by Victor Yodaiken

[permalink] [raw]
Subject: Re: interrupt latency

On Wed, Aug 28, 2002 at 08:18:25AM -0400, Richard B. Johnson wrote:
> On Tue, 27 Aug 2002 [email protected] wrote:
>
> > On Tue, Aug 27, 2002 at 04:44:34PM -0400, Richard B. Johnson wrote:
> > > On Tue, 27 Aug 2002 [email protected] wrote:
> > >
>
> [SNIPPED...]B
> >
> > You can do it in a tight loop. But you cannot do it otherwise. RS232 works
> > because most UARTs have fifo buffers. Old Windows did pretty well, because
> > you could grab the machine and let nothing else happen.
> >
> > What makes me dubious about your claim is that it is easy to test
> > and see that a single ISA operation can take 18 microseconds
> > on most PC hardware.
> >
> > try:
> > cli
> > loop:
> > read tsc
> > inb
> > read tsc
> > compute difference
> > print worst case every 1000000 times.
> >
> > sti
> >
> > run for an hour on a busy machine.
> >
> >
>
> No, no, no. There is no such port read that takes 18 microseconds, even
> on old '386 machines with real ISR slots. A port read on those took

Sorry, but the numbers don't lie. It's an easy test to make.
The test you have below tests something else entirely. It tests
average time over a period of something around 1 second.


> (void)alarm(1);
> run++;
> while(run)
> {
> foo[0] = inb(0x378); /* Actually put into memory */
> count++; /* This takes as long as bumping a pointer */
> }
> printf("\nPort reads in a second = %lu\n", count);
> return 0;
> }

Average and worst case are different.

2002-08-28 13:50:54

by Victor Yodaiken

[permalink] [raw]
Subject: Re: interrupt latency

On Wed, Aug 28, 2002 at 08:18:25AM -0400, Richard B. Johnson wrote:
> No, no, no. There is no such port read that takes 18 microseconds, even
> on old '386 machines with real ISR slots. A port read on those took
> almost exactly 300 nanoseconds and, in fact, was the limiting factor
> for the programmed I/O devices on the ISA bus.

Amazing how they can do that with a bus clock that is much slower -)



--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
http://www.fsmlabs.com http://www.rtlinux.com

2002-08-28 14:21:16

by Jonathan Lundell

[permalink] [raw]
Subject: Re: interrupt latency

At 7:41 am -0600 8/28/02, [email protected] wrote:
>Average and worst case are different.

I can believe that. The explanation that jumps to mind is PCI bus
contention because of DMA, causing the inb to stall or to get
repeatedly retried. 13us seems a little extreme, but not completely
impossible.
--
/Jonathan Lundell.

2002-08-28 14:56:40

by Richard B. Johnson

[permalink] [raw]
Subject: Re: interrupt latency

On Wed, 28 Aug 2002 [email protected] wrote:

> On Wed, Aug 28, 2002 at 08:18:25AM -0400, Richard B. Johnson wrote:
> > No, no, no. There is no such port read that takes 18 microseconds, even
> > on old '386 machines with real ISR slots. A port read on those took
> > almost exactly 300 nanoseconds and, in fact, was the limiting factor
> > for the programmed I/O devices on the ISA bus.
>
> Amazing how they can do that with a bus clock that is much slower -)

The ISA bus is not a clocked bus. It is entirely asynchronous. The
fact that there is a clock on the bus is irrelevant. It is not used
for any bus-related operations and, in fact, its phase is not guaranteed.

Now, I read your previous post which claimed that what I stated was
impossible and that I didn't measure anything useful. Now, I note
with trepidation that you are in some kind of "real-time-Linux" business
for which I am supposed to be shaking in my shoes. However, you are
spewing much hype for which I have no countenance.

Here is a re-write that takes your "min and max" rdtsc readings. You will
note that even the time to write to memory is included in the numbers.

Script started on Wed Aug 28 10:34:10 2002
# cat usermode.c
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <asm/io.h>

extern int iopl(int);
extern long long tim(void);

volatile int run=0;
void timer(int unused)
{
run = 0;
}

int main()
{
unsigned long long ticks_sec;
unsigned long long calibrate;
unsigned long long ticks_port;
unsigned long long worst = 0;
unsigned long long best = ~0;
unsigned int count = 0;
double ns;
char foo[1];
(void)iopl(3);
(void)tim();
calibrate = tim(); /* Time to make the function call */
fprintf(stdout, "Wait.....");
fflush(stdout);
(void)signal(SIGALRM, timer);
(void)alarm(1);
run++;
(void)tim();
while(run)
;
ticks_sec = tim() - calibrate;
for(;;)
{
__asm("cli");
(void)tim();
foo[0] = inb(0x378); /* Actually put into memory */
ticks_port = tim() - calibrate;
__asm("sti");
if(ticks_port > worst)
worst = ticks_port;
if(ticks_port < best)
best = ticks_port;
if(!(count++ % 1000000))
{
ns = (double)worst / (double)ticks_sec;
ns *= 1e9; /* Nanoseconds */
printf("Worse case ticks = %llu\n", worst);
printf("Worst port read took %f nanoseconds\n", ns);
printf("CPU ticks/second = %llu\n", ticks_sec);
printf("Best case ticks = %llu\n", best);
ns = (double)best / (double)ticks_sec;
ns *= 1e9; /* Nanoseconds */
printf("Best port read took %f nanoseconds\n", ns);
fflush(stdout);
}
}
return 0;
}

# cat rdtsc.S
#
#
#

.data
lastl: .long 0
lasth: .long 0
.text
.align 8
.globl tim
.type tim@function

#
# Return the CPU clock difference between successive calls.
#
tim: pushl %ebx
rdtsc
movl lastl, %ebx # Get last low longword
movl lasth, %ecx # Get last high longword
movl %eax, lastl # Save current low longword
movl %edx, lasth # Save current high longword
subl %ebx, %eax # Current - last
sbbl %ecx, %edx # Same with borrow
popl %ebx
ret
.end

# gcc -O2 -o usermode usermode.c rdtsc.S
# ./usermode
Wait.....Worse case ticks = 624
Worst port read took 1572.445620 nanoseconds
CPU ticks/second = 396834073
Best case ticks = 624
Best port read took 1572.445620 nanoseconds
Worse case ticks = 1121
Worst port read took 2824.858237 nanoseconds
CPU ticks/second = 396834073
Best case ticks = 529
Best port read took 1333.050854 nanoseconds
Worse case ticks = 1209
Worst port read took 3046.613389 nanoseconds
CPU ticks/second = 396834073
Best case ticks = 529
Best port read took 1333.050854 nanoseconds
Worse case ticks = 1209
Worst port read took 3046.613389 nanoseconds
CPU ticks/second = 396834073
Best case ticks = 529
Best port read took 1333.050854 nanoseconds
Worse case ticks = 1349
Worst port read took 3399.405676 nanoseconds
CPU ticks/second = 396834073
Best case ticks = 529
Best port read took 1333.050854 nanoseconds
Worse case ticks = 1417
Worst port read took 3570.761929 nanoseconds
CPU ticks/second = 396834073
Best case ticks = 529
Best port read took 1333.050854 nanoseconds
Worse case ticks = 1549
Worst port read took 3903.394656 nanoseconds
CPU ticks/second = 396834073
Best case ticks = 529
Best port read took 1333.050854 nanoseconds
Worse case ticks = 1549
Worst port read took 3903.394656 nanoseconds
CPU ticks/second = 396834073
Best case ticks = 529
Best port read took 1333.050854 nanoseconds
Worse case ticks = 1549
Worst port read took 3903.394656 nanoseconds
CPU ticks/second = 396834073
Best case ticks = 529
Best port read took 1333.050854 nanoseconds
Worse case ticks = 3165
Worst port read took 7975.625621 nanoseconds
CPU ticks/second = 396834073
Best case ticks = 529
Best port read took 1333.050854 nanoseconds

# exit
exit

Script done on Wed Aug 28 10:35:35 2002


The best-case of 529 ticks seems stable, therefore likely what
an inactive machine will produce this, 1,300 nanoseconds to
get data from a port into memory.

The worse-case may not have happened yet even though the numbers
are stable, but I show 8 microseconds, no where near 18 microseconds
and, if I disconnect my network card so the PCI/Bus wasn't
continually grabbing everything via Bus Mastering, the best case
and the worse case ticks are within 400 ticks of each other.

FYI, if you make a real-time system, you must control the Bus Masters
on the bus, otherwise you can't guarantee anything, and again, that
is not "latency". It's something else.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-08-28 15:16:58

by Victor Yodaiken

[permalink] [raw]
Subject: Re: interrupt latency


So you ran the test a bit and got 8 microseconds! Shake in your
shoes or not, your original numbers are way off.
Your calibrate number is most likely way too
high. Again, the problem is worst case versus average. The first
call to "tim" causes instruction cache misses ... The later calls
will be faster. There may be other problems, but I sincerely suggest
you do the whole thing using some macros. It's not hard and avoids
experimentor bias. Also run the tests for at least an hour and
for a week if you are serious.

Anyways, I simply report our measurements. There are other sources of
long delays in PC systems. For example, the measurement test

cli
rdtsc
rdtsc
sti

can show results of 200 microseconds on some P4s. Numbers vary depending
on ethernet cards, IDE interface, chipset, ....
When we test, we try to find worst case. Ping flood is a generally good
source of worst case delays.





On Wed, Aug 28, 2002 at 11:02:57AM -0400, Richard B. Johnson wrote:
> On Wed, 28 Aug 2002 [email protected] wrote:
>
> > On Wed, Aug 28, 2002 at 08:18:25AM -0400, Richard B. Johnson wrote:
> > > No, no, no. There is no such port read that takes 18 microseconds, even
> > > on old '386 machines with real ISR slots. A port read on those took
> > > almost exactly 300 nanoseconds and, in fact, was the limiting factor
> > > for the programmed I/O devices on the ISA bus.
> >
> > Amazing how they can do that with a bus clock that is much slower -)
>
> The ISA bus is not a clocked bus. It is entirely asynchronous. The
> fact that there is a clock on the bus is irrelevant. It is not used
> for any bus-related operations and, in fact, its phase is not guaranteed.
>
> Now, I read your previous post which claimed that what I stated was
> impossible and that I didn't measure anything useful. Now, I note
> with trepidation that you are in some kind of "real-time-Linux" business
> for which I am supposed to be shaking in my shoes. However, you are
> spewing much hype for which I have no countenance.
>
> Here is a re-write that takes your "min and max" rdtsc readings. You will
> note that even the time to write to memory is included in the numbers.
>
> Script started on Wed Aug 28 10:34:10 2002
> # cat usermode.c
> #include <stdio.h>
> #include <unistd.h>
> #include <signal.h>
> #include <asm/io.h>
>
> extern int iopl(int);
> extern long long tim(void);
>
> volatile int run=0;
> void timer(int unused)
> {
> run = 0;
> }
>
> int main()
> {
> unsigned long long ticks_sec;
> unsigned long long calibrate;
> unsigned long long ticks_port;
> unsigned long long worst = 0;
> unsigned long long best = ~0;
> unsigned int count = 0;
> double ns;
> char foo[1];
> (void)iopl(3);
> (void)tim();
> calibrate = tim(); /* Time to make the function call */
> fprintf(stdout, "Wait.....");
> fflush(stdout);
> (void)signal(SIGALRM, timer);
> (void)alarm(1);
> run++;
> (void)tim();
> while(run)
> ;
> ticks_sec = tim() - calibrate;
> for(;;)
> {
> __asm("cli");
> (void)tim();
> foo[0] = inb(0x378); /* Actually put into memory */
> ticks_port = tim() - calibrate;
> __asm("sti");
> if(ticks_port > worst)
> worst = ticks_port;
> if(ticks_port < best)
> best = ticks_port;
> if(!(count++ % 1000000))
> {
> ns = (double)worst / (double)ticks_sec;
> ns *= 1e9; /* Nanoseconds */
> printf("Worse case ticks = %llu\n", worst);
> printf("Worst port read took %f nanoseconds\n", ns);
> printf("CPU ticks/second = %llu\n", ticks_sec);
> printf("Best case ticks = %llu\n", best);
> ns = (double)best / (double)ticks_sec;
> ns *= 1e9; /* Nanoseconds */
> printf("Best port read took %f nanoseconds\n", ns);
> fflush(stdout);
> }
> }
> return 0;
> }
>
> # cat rdtsc.S
> #
> #
> #
>
> .data
> lastl: .long 0
> lasth: .long 0
> .text
> .align 8
> .globl tim
> .type tim@function
>
> #
> # Return the CPU clock difference between successive calls.
> #
> tim: pushl %ebx
> rdtsc
> movl lastl, %ebx # Get last low longword
> movl lasth, %ecx # Get last high longword
> movl %eax, lastl # Save current low longword
> movl %edx, lasth # Save current high longword
> subl %ebx, %eax # Current - last
> sbbl %ecx, %edx # Same with borrow
> popl %ebx
> ret
> .end
>
> # gcc -O2 -o usermode usermode.c rdtsc.S
> # ./usermode
> Wait.....Worse case ticks = 624
> Worst port read took 1572.445620 nanoseconds
> CPU ticks/second = 396834073
> Best case ticks = 624
> Best port read took 1572.445620 nanoseconds
> Worse case ticks = 1121
> Worst port read took 2824.858237 nanoseconds
> CPU ticks/second = 396834073
> Best case ticks = 529
> Best port read took 1333.050854 nanoseconds
> Worse case ticks = 1209
> Worst port read took 3046.613389 nanoseconds
> CPU ticks/second = 396834073
> Best case ticks = 529
> Best port read took 1333.050854 nanoseconds
> Worse case ticks = 1209
> Worst port read took 3046.613389 nanoseconds
> CPU ticks/second = 396834073
> Best case ticks = 529
> Best port read took 1333.050854 nanoseconds
> Worse case ticks = 1349
> Worst port read took 3399.405676 nanoseconds
> CPU ticks/second = 396834073
> Best case ticks = 529
> Best port read took 1333.050854 nanoseconds
> Worse case ticks = 1417
> Worst port read took 3570.761929 nanoseconds
> CPU ticks/second = 396834073
> Best case ticks = 529
> Best port read took 1333.050854 nanoseconds
> Worse case ticks = 1549
> Worst port read took 3903.394656 nanoseconds
> CPU ticks/second = 396834073
> Best case ticks = 529
> Best port read took 1333.050854 nanoseconds
> Worse case ticks = 1549
> Worst port read took 3903.394656 nanoseconds
> CPU ticks/second = 396834073
> Best case ticks = 529
> Best port read took 1333.050854 nanoseconds
> Worse case ticks = 1549
> Worst port read took 3903.394656 nanoseconds
> CPU ticks/second = 396834073
> Best case ticks = 529
> Best port read took 1333.050854 nanoseconds
> Worse case ticks = 3165
> Worst port read took 7975.625621 nanoseconds
> CPU ticks/second = 396834073
> Best case ticks = 529
> Best port read took 1333.050854 nanoseconds
>
> # exit
> exit
>
> Script done on Wed Aug 28 10:35:35 2002
>
>
> The best-case of 529 ticks seems stable, therefore likely what
> an inactive machine will produce this, 1,300 nanoseconds to
> get data from a port into memory.
>
> The worse-case may not have happened yet even though the numbers
> are stable, but I show 8 microseconds, no where near 18 microseconds
> and, if I disconnect my network card so the PCI/Bus wasn't
> continually grabbing everything via Bus Mastering, the best case
> and the worse case ticks are within 400 ticks of each other.
>
> FYI, if you make a real-time system, you must control the Bus Masters
> on the bus, otherwise you can't guarantee anything, and again, that
> is not "latency". It's something else.
>
>
> Cheers,
> Dick Johnson
> Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
> The US military has given us many words, FUBAR, SNAFU, now ENRON.
> Yes, top management were graduates of West Point and Annapolis.

--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
http://www.fsmlabs.com http://www.rtlinux.com

2002-08-28 15:25:06

by Alan

[permalink] [raw]
Subject: Re: interrupt latency

I would expect port 0x378 on any modern PC to be on the X-bus not on ISA


2002-08-28 15:32:14

by Richard B. Johnson

[permalink] [raw]
Subject: Re: interrupt latency

On 28 Aug 2002, Alan Cox wrote:

> I would expect port 0x378 on any modern PC to be on the X-bus not on ISA

Correct. All that stuff is in the Super-I/O chip now-a-days as I
previously stated. It's also called the GP bus (General Purpose).
It doesn't have slots and their attendent capacity so it's a lot
faster than ISA was.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

2002-08-28 16:50:50

by Randy.Dunlap

[permalink] [raw]
Subject: Re: interrupt latency

On 28 Aug 2002, Alan Cox wrote:

| I would expect port 0x378 on any modern PC to be on the X-bus not on ISA

Yes, or what (intel) calls the Low Pin Count (LPC) bus.

--
~Randy

2002-08-28 21:52:59

by Pavel Machek

[permalink] [raw]
Subject: Re: interrupt latency

Hi!

> > On Tue, Aug 27, 2002 at 02:01:43PM -0400, Richard B. Johnson wrote:
> > > This cannot be. A stock kernel-2.4.18, running a 133 MHz AMD-SC520,
> > > (like a i586) with a 33 MHz bus, handles interrupts off IRQ7 (the lowest
> > > priority), from the 'printer port' at well over 75,000 per second without
> > > skipping a beat or missing an edge. This means that latency is at least
> > > as good as 1/57,000 sec = 0.013 microseconds.
> >
> > Assuming you mean 75,000 then ...
> > Thats 0.013 MILLISECONDS which is 13 microseconds and its not likely.
>
> Yes 13 microseconds.
>
> > I bet that your data source drops data or looks at some handshake
> > pins on the parallel connect.
> >
>
> No. You can easily read into memory 75,000 bytes per second from the
> parallel port, hell RS-232C will do 22,400++ bytes per second (224,000
> baud) on a Windows machine, done all the while to feed a PROM burner. I
> never measured Linux RS-232C, but it's got to be at least as good.

There's >16bytes FIFO at the rs-232, and kernel uses flip buffers so
that it does nlot have to wake userspace each time.

--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.