LinuxLists.cc - some questions using rdtsc in user space

2002-08-02 17:05:46

Subject: some questions using rdtsc in user space

Hi,

Both I and a friend have with an interesting scenario, maybe someone can
help us.

We have to access a device connected to parallel port, which works in
the following way: you send a byte to the port, to turn some bits on
(reflecting on some pins on the parallel port), which is interpreted by
the device as a command. Then you are supposed to sleep about ~200ns
(maybe more, just can't be much less), and then you send a byte which is
received by the device as data, pertinent to command.

We wrote a program which accomplishes this by doing outb() to
appropriate address(es), followed by usleep(1), but that seems to take
about 10 ms at average or so, which is far from good for our application.

I read somewhere that putting the process in real-time priority could
lead the average to 2ms, but I had this though that I could solve this
by using rdtsc instruction, because as far as I know it won't cause a
trap to kernel mode, which maybe expensive, am I right?

I don't have the need to use real time linux (though I'm considering
real-time priority), nor desperate time precision needs, what I don't
want is to have huge delays. I cannot relay on the low-latency patches
too, if possible (though I know it could help), because the program
will eventually run on standard kernels.

If using rdtsc is a good way, someone knows how do I do some sort of
loop, converting the rdtsc difference (is it in cpu clocks, right?) to
nano/microseconds, and if there could be bad behaviour from this (I
believe there could be some SMP issues, but for now this is irrelavant
for us).

Thanks!

Alexandre

P.S..: carbon-copy me, since I'm not subscribed to the list.

2002-08-02 19:02:46

by Alexandre Pereira Nunes

[permalink] [raw]

Subject: Re: some questions using rdtsc in user space

Chris Friesen wrote:

>[snip]
>
>>If using rdtsc is a good way, someone knows how do I do some sort of
>>loop, converting the rdtsc difference (is it in cpu clocks, right?) to
>>nano/microseconds, and if there could be bad behaviour from this (I
>>believe there could be some SMP issues, but for now this is irrelavant
>>for us).
>>
>>
>
>You'd be more portable looping on gettimeofday(), which returns back seconds and
>microseconds. The disadvantage is that if you change the system time while your
>program is running you can get screwed.
>

I guess ntpd would be the evil on this issue, because at least in some
machines I have here, it makes minor (sometimes not "that" minor)
corrections every half hour or so...

>
>If you want to loop on rdtsc (which as you say is often clockspeed, but that is
>not necessarily exactly what is advertised), then the first thing you need to do
>is to figure out how fast you're going. What I usually do is grab a timestamp,
>sleep for 10 seconds using select(), and grab another timestamp. Figure out the
>difference in timestamps, divide by 10, and you get ticks/sec. If it's close to
>your advertised clock rate, then round it up to the clock rate for ease of
>calculation. As an example, my 550MHz P3 gives 548629000 ticks/sec using this
>method, so for your purposes I'd round it up to 550000000.
>
>Once you have this information, just divide by 5 million to get the number of
>ticks in 200ns.
>
>
>
Ok, that's exactly where I was trying to get to, I'll do some
experiments, thank you!

Alexandre

2002-08-02 19:26:24

by Alexandre Pereira Nunes

[permalink] [raw]

Subject: Re: some questions using rdtsc in user space

Mark Hahn wrote:

>[snip]
>pretty gross. the reason is that it's not reasonable to use
>interrupt-based timers for anything close to a 200ns wait,
>(since programming a timer costs O(1us).) therefore, this kind
>of hardware requires either busy-waiting (eating 200ns each op),
>or using the existing timer, which is 100 Hz in normal kernels
>(faster in 2.5, and can be altered if you wish).
>
>
Raising it up don't make the kernel scheduler itself take up too much
time? In my application it won't bother too much since there are
presumably not many competing tasks, but I could only think in
fine-tuning the kernel as a last resort, since the idea is allow it to
run fine in customer's kernel, preferably any version/series...

>
>
>>We wrote a program which accomplishes this by doing outb() to
>>appropriate address(es), followed by usleep(1), but that seems to take
>>about 10 ms at average or so, which is far from good for our application.
>>
>>
>
>what's your target rate?
>
>
The hardware is quite asynchronous, I need to get as fast as possible,
but I guess my least acceptable average case would be something like a
half milisecond or so.

>>I read somewhere that putting the process in real-time priority could
>>lead the average to 2ms, but I had this though that I could solve this
>>by using rdtsc instruction, because as far as I know it won't cause a
>>trap to kernel mode, which maybe expensive, am I right?
>>
>>
>
>you can easily busy-wait using rdtsc. I do this all the time in
>my realtime video code for presenting psychophysiological stimuli.
>(it often polls the video retrace register, as well.)
>
That was the original idea...

>you don't need RT prio to do busy-waits on rdtsc, though you will
>naturally get preempted sometimes. if you do use RT prio,
>then you can always do at most 200ns, and this might wind up
>being more efficient (depending on what else the system's doing.)
>
>
>

I was mentioning the non busy-wait case (usleep), but that don't seemed
good enough at first anyway.
I could live up with preemption, because it happens very seldom (viewing
from this application perspective), but if I can avoid that, it's even
better :-)

>I do something like this:
>
>
>
Exactly what I was looking for, I'll experiment some different
approaches, but the main idea is just that.

Thanks!

Alexandre

2002-08-02 20:55:35

by George Anzinger

[permalink] [raw]

Subject: Re: some questions using rdtsc in user space

"Alexandre P. Nunes" wrote:
>
> Mark Hahn wrote:
>
> >[snip]
> >pretty gross. the reason is that it's not reasonable to use
> >interrupt-based timers for anything close to a 200ns wait,
> >(since programming a timer costs O(1us).) therefore, this kind
> >of hardware requires either busy-waiting (eating 200ns each op),
> >or using the existing timer, which is 100 Hz in normal kernels
> >(faster in 2.5, and can be altered if you wish).
> >
> >
> Raising it up don't make the kernel scheduler itself take up too much
> time? In my application it won't bother too much since there are
> presumably not many competing tasks, but I could only think in
> fine-tuning the kernel as a last resort, since the idea is allow it to
> run fine in customer's kernel, preferably any version/series...
>
> >
> >
> >>We wrote a program which accomplishes this by doing outb() to
> >>appropriate address(es), followed by usleep(1), but that seems to take
> >>about 10 ms at average or so, which is far from good for our application.
> >>
> >>
> >
> >what's your target rate?
> >
> >
> The hardware is quite asynchronous, I need to get as fast as possible,
> but I guess my least acceptable average case would be something like a
> half milisecond or so.
>
> >>I read somewhere that putting the process in real-time priority could
> >>lead the average to 2ms, but I had this though that I could solve this
> >>by using rdtsc instruction, because as far as I know it won't cause a
> >>trap to kernel mode, which maybe expensive, am I right?
> >>
> >>
> >
> >you can easily busy-wait using rdtsc. I do this all the time in
> >my realtime video code for presenting psychophysiological stimuli.
> >(it often polls the video retrace register, as well.)
> >
> That was the original idea...
>
> >you don't need RT prio to do busy-waits on rdtsc, though you will
> >naturally get preempted sometimes. if you do use RT prio,
> >then you can always do at most 200ns, and this might wind up
> >being more efficient (depending on what else the system's doing.)
> >
> >
> >
>
> I was mentioning the non busy-wait case (usleep), but that don't seemed
> good enough at first anyway.
> I could live up with preemption, because it happens very seldom (viewing
> from this application perspective), but if I can avoid that, it's even
> better :-)

The only way to prevent preemption in user land is to be the
most important (highest priority) task on the system. Even
then interrupts can take the cpu away for a time.

If you only need to wait for ~200ns, a gettimeofday call is
the best bet in that it is VERY portable. It should give
you resolution down to the micro second so you will wait,
most likely, 1.5 micro seconds. If this is too long, you
will have to either calibrate a user space delay loop (still
very portable) or use the TSC, which is not available on all
x86 platforms, let alone other platforms, i.e. is not
portable.
>
> >I do something like this:
> >
> >
> >
> Exactly what I was looking for, I'll experiment some different
> approaches, but the main idea is just that.
>
> Thanks!
>
> Alexandre
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
George Anzinger [email protected]
High-res-timers:
http://sourceforge.net/projects/high-res-timers/
Real time sched: http://sourceforge.net/projects/rtsched/
Preemption patch:
http://www.kernel.org/pub/linux/kernel/people/rml

2002-08-03 04:28:39

by Ben Greear

[permalink] [raw]

Subject: Re: some questions using rdtsc in user space

Have you just tried spinning in a tight loop using gettimeofday
to get the time? It's not exactly efficient, but if you need that
kind (200ns) of precision that is about the only way. You'll still
be stuck with 1usec precision since that is as good as gettimeofday
can do, but that's a far sight better than 10ms or even 2ms.

Ben

Alexandre P. Nunes wrote:
> Hi,
>
> Both I and a friend have with an interesting scenario, maybe someone can
> help us.
>
> We have to access a device connected to parallel port, which works in
> the following way: you send a byte to the port, to turn some bits on
> (reflecting on some pins on the parallel port), which is interpreted by
> the device as a command. Then you are supposed to sleep about ~200ns
> (maybe more, just can't be much less), and then you send a byte which is
> received by the device as data, pertinent to command.
>
> We wrote a program which accomplishes this by doing outb() to
> appropriate address(es), followed by usleep(1), but that seems to take
> about 10 ms at average or so, which is far from good for our application.
>
> I read somewhere that putting the process in real-time priority could
> lead the average to 2ms, but I had this though that I could solve this
> by using rdtsc instruction, because as far as I know it won't cause a
> trap to kernel mode, which maybe expensive, am I right?
>
> I don't have the need to use real time linux (though I'm considering
> real-time priority), nor desperate time precision needs, what I don't
> want is to have huge delays. I cannot relay on the low-latency patches
> too, if possible (though I know it could help), because the program
> will eventually run on standard kernels.
>
> If using rdtsc is a good way, someone knows how do I do some sort of
> loop, converting the rdtsc difference (is it in cpu clocks, right?) to
> nano/microseconds, and if there could be bad behaviour from this (I
> believe there could be some SMP issues, but for now this is irrelavant
> for us).
>
>
> Thanks!
>
>
> Alexandre
>
> P.S..: carbon-copy me, since I'm not subscribed to the list.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

--
Ben Greear <[email protected]> <Ben_Greear AT excite.com>
President of Candela Technologies Inc http://www.candelatech.com
ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear

2002-08-03 18:16:41

by Alexandre Pereira Nunes

[permalink] [raw]

Subject: Re: some questions using rdtsc in user space

On Fri, 2 Aug 2002, Brian Evans wrote:

> In a similar case here we just added a PIC controller to buffer
> the commands and results. They are cheap and easy to program
> and if your using devices that are very 'dumb' can take a lot
> of headaches out of making sure timing is correct. Another
> advantage is you get easier use in all operating systems.
>
> Brian
>

My friend had this idea, and we are considering switching to using the
serial port in this case. We're just trying to see what we can do in
software, if we got nothing but bad results, that (using a pic) will
eventualy become our main choice. The reason to nothing doing so is that
the device is somewhat tolerant, so if we got good average results, maybe
we keep it as is. But if we decide to use the serial port, we'd better
using the pic solution anyway, killing both problems (the parallel port
uses to be used by a printer, while almost everyone has a spare serial
port). In future we might consider using USB, It seems that there's a PIC
series with usb interfacing, but for now that's future.

Thanks for your help,

Alexandre