I saw the latest patch for 2.4.10 and saw that people are still blindly
copying the code in serial.c (au1000), though it uses save_flags() and cli().
Is somebody looking to replace these with either spinlocks or __cli() where
applicable, I do not mind spending sometime looking into these issues.
I would request people to look at the global-spin-lock document at lse.sf.net
before doing any locking. Also please look at kernel-locking.tmpl (using db2pdf
or db2ps). Please understand how locking works and then use this in your code.
Imagine a driver using save_flags(); cli(); and essentially serializing an entire
SMP system. Please do not do this until extremely necessary.
BTW, that brings me to another issue, once the kernel becomes preemptibel, what
are the locking issues? how are semaphores and spin-locks affected? Has anybody
defined or come up with the rules/document yet?
I hope, I have understood these issues :-D
Comments, flames
Balbir Singh.
On Mon, Oct 08, 2001 at 07:59:05PM +0530, BALBIR SINGH wrote:
> BTW, that brings me to another issue, once the kernel becomes preemptibel, what
> are the locking issues? how are semaphores and spin-locks affected? Has anybody
> defined or come up with the rules/document yet?
IF the kernel becomes preemptible it will be so slow, so buggy, and so painful
to maintain, that those issues won't matter.
Victor Yodaiken <[email protected]> writes:
> On Mon, Oct 08, 2001 at 07:59:05PM +0530, BALBIR SINGH wrote:
> > BTW, that brings me to another issue, once the kernel becomes preemptibel,
> what
>
> > are the locking issues? how are semaphores and spin-locks affected? Has
> anybody
>
> > defined or come up with the rules/document yet?
>
> IF the kernel becomes preemptible it will be so slow, so buggy, and so painful
> to maintain, that those issues won't matter.
The preemptible kernel work just takes the current SMP code, and
allows it to work on a single processor. You are not interruptted if
you have a lock held. This makes the number of cases in the kernel
simpler, and should improve maintenance as more people will be
affected by the SMP issues.
Right now there is a preemptible kernel patch being maintained
somewhere. I haven't had a chance to look recently. But the recent
threads on low latency mentioned it.
As for rules. They are the usual SMP rules. In earlier version there
was a requirement or that you used balanced constructs.
i.e.
spin_lock_irqsave
...
spin_unlock_irqrestore
and not.
spin_lock_irqsave
...
spin_unlock
..
restore_flags.
Eric
On Mon, Oct 08, 2001 at 09:11:57AM -0600, Eric W. Biederman wrote:
> > IF the kernel becomes preemptible it will be so slow, so buggy, and so painful
> > to maintain, that those issues won't matter.
>
> The preemptible kernel work just takes the current SMP code, and
> allows it to work on a single processor. You are not interruptted if
> you have a lock held. This makes the number of cases in the kernel
> simpler, and should improve maintenance as more people will be
> affected by the SMP issues.
i.e. "since we are already committed to making the kernel more complex, slower, and
harder to maintain, there is no problem ... "
>
> Right now there is a preemptible kernel patch being maintained
> somewhere. I haven't had a chance to look recently. But the recent
> threads on low latency mentioned it.
Try it out. Try running a kernel compile while a POSIX SCHED_FIFO process
is running.
> As for rules. They are the usual SMP rules. In earlier version there
> was a requirement or that you used balanced constructs.
I'm sorry, but this is not correct. SMP is different from low-latency and
has different goals. You certainly can piggyback low-latency of a sort on
on the finer-grained locking you get from SMP support, but if you optimize
a kernel for SMP support you don't necessarily look at the same lock issues
as you would if your goal was to reduce latency. E.g. for SMP a design like
each processor maintains a local cache of resource X. Getting X from
the local cache takes 100ns and only local locking.
there is a slow and expensive spin locked central resource for X
used to replenish local caches. Getting X from the central resource
takes 1 second.
Cache success rate is over 99.99%.
With 10000 accesses to X, total time is 1.01 seconds for an average of 100 microseconds and this
is overstating the case, for most processes never see the 1second delay and average
100ns per access.
But worst case is 1 second!
If you were to design for low latency, you'd prefer the design
an elaborate resource control mechanism allows all processors to
share X and get X resources within 1 millisecond.
1000 times better latency, 10000 times worse average case.
You cannot escape a tradeoff by pretending it's not there.
Look we handle this all the time in RTLinux: we have to throw away heartbreakingly
beautiful solutions because worst case numbers are bad.
"Victor Yodaiken" <[email protected]> writes:
> On Mon, Oct 08, 2001 at 09:11:57AM -0600, Eric W. Biederman wrote:
> > > IF the kernel becomes preemptible it will be so slow, so buggy, and so
> painful
>
> > > to maintain, that those issues won't matter.
> >
> > The preemptible kernel work just takes the current SMP code, and
> > allows it to work on a single processor. You are not interruptted if
> > you have a lock held. This makes the number of cases in the kernel
> > simpler, and should improve maintenance as more people will be
> > affected by the SMP issues.
>
> i.e. "since we are already committed to making the kernel more complex, slower,
> and
>
> harder to maintain, there is no problem ... "
Already committed? Already completed. Personally I think a model where
you have the cpu until you or one of the functions you call is hard to
maintain because you can easily loose control by accident.
> > Right now there is a preemptible kernel patch being maintained
> > somewhere. I haven't had a chance to look recently. But the recent
> > threads on low latency mentioned it.
>
> Try it out. Try running a kernel compile while a POSIX SCHED_FIFO process
> is running.
My expectation would be that the SCHED_FIFO process would totally stop
the kernel compile. At least until it saw fit to sleep.
> > As for rules. They are the usual SMP rules. In earlier version there
> > was a requirement or that you used balanced constructs.
>
There are 3 different goals we are mentioning here.
1) SMP
2) Soft Realtime. (low-latency)
This is things like playing audio or video where it is o.k. to skip
but want it to happen as rarely as possible.
3) Hard Realtime. (guaranteed-latency)
All 3 are different.
> I'm sorry, but this is not correct. SMP is different from low-latency and
> has different goals.
SMP has the basic goal of having locks held for as short a time as possible,
and a preemptible kernel has the goal of having the kernel being unpreemptible
for as short a time as possible. Both of these are compatible.
> You certainly can piggyback low-latency of a sort on
> on the finer-grained locking you get from SMP support, but if you optimize
> a kernel for SMP support you don't necessarily look at the same lock issues
> as you would if your goal was to reduce latency. E.g. for SMP a design like
>
> each processor maintains a local cache of resource X. Getting X from
> the local cache takes 100ns and only local locking.
>
> there is a slow and expensive spin locked central resource for X
> used to replenish local caches. Getting X from the central resource
> takes 1 second.
>
> Cache success rate is over 99.99%.
>
> With 10000 accesses to X, total time is 1.01 seconds for an average of 100
> microseconds and this
>
> is overstating the case, for most processes never see the 1second delay and
> average
>
> 100ns per access.
>
> But worst case is 1 second!
>
> If you were to design for low latency, you'd prefer the design
>
> an elaborate resource control mechanism allows all processors to
> share X and get X resources within 1 millisecond.
>
> 1000 times better latency, 10000 times worse average case.
>
> You cannot escape a tradeoff by pretending it's not there.
Nope you can't. However in your example I would only prefer your design if
I was designing for guaranteed latency. Not just trying to keep the
latencies low.
I don't have a problem with my audio player skipping once ever 2 or 3
years. A classic low-latency problem.
> Look we handle this all the time in RTLinux: we have to throw away
> heartbreakingly
>
> beautiful solutions because worst case numbers are bad.
That is because you care about guaranteed latency, which is a truly
different case from the normal linux kernel deals with.
Personally I have a hard time buying hard real time code. Because
there are always things cases where you cannot make guarantees. To
even have a clue what the speed of your code will really run at you
need to take into account the exact platform your code will run at,
and you need to know the operating environment for that platform. And
even with the best analysis and planning something still goes wrong.
Eric
On Mon, Oct 08, 2001 at 10:45:19AM -0600, Eric W. Biederman wrote:
> "Victor Yodaiken" <[email protected]> writes:
>
> > On Mon, Oct 08, 2001 at 09:11:57AM -0600, Eric W. Biederman wrote:
> > > > IF the kernel becomes preemptible it will be so slow, so buggy, and so
> > painful
> >
> > > > to maintain, that those issues won't matter.
> > >
> > > The preemptible kernel work just takes the current SMP code, and
> > > allows it to work on a single processor. You are not interruptted if
> > > you have a lock held. This makes the number of cases in the kernel
> > > simpler, and should improve maintenance as more people will be
> > > affected by the SMP issues.
> >
> > i.e. "since we are already committed to making the kernel more complex, slower,
> > and
> >
> > harder to maintain, there is no problem ... "
>
> Already committed? Already completed. Personally I think a model where
> you have the cpu until you or one of the functions you call is hard to
> maintain because you can easily loose control by accident.
Really? I see totally otherwise. The rule: "kernel code runs until it does a blocking
call" makes sense. What's really hard is figuring out which 12 spinlocks you need
in what order.
> > > Right now there is a preemptible kernel patch being maintained
> > > somewhere. I haven't had a chance to look recently. But the recent
> > > threads on low latency mentioned it.
> >
> > Try it out. Try running a kernel compile while a POSIX SCHED_FIFO process
> > is running.
>
> My expectation would be that the SCHED_FIFO process would totally stop
> the kernel compile. At least until it saw fit to sleep.
Right - even for a periodic SCHED_FIFO process that gives up the processor a lot.
So how do you ensure that the router daemon runs, that network packets get processed ...
that all the other things that normally happen because kernel tasks run to completion
and because user programs all progress still happen?
>
> > > As for rules. They are the usual SMP rules. In earlier version there
> > > was a requirement or that you used balanced constructs.
> >
>
> There are 3 different goals we are mentioning here.
> 1) SMP
> 2) Soft Realtime. (low-latency)
> This is things like playing audio or video where it is o.k. to skip
> but want it to happen as rarely as possible.
> 3) Hard Realtime. (guaranteed-latency)
>
> All 3 are different.
Agree. People forget though. And "as rarely as possible" is pretty damn vague.
>
> > I'm sorry, but this is not correct. SMP is different from low-latency and
> > has different goals.
>
> SMP has the basic goal of having locks held for as short a time as possible,
It does? I thought SMP had the basic goal of optimizing average case use of multiple
processors and getting a decent ratio between throughput and responsiveness.
That's very different.
> and a preemptible kernel has the goal of having the kernel being unpreemptible
> for as short a time as possible. Both of these are compatible.
Nope. You are taking an implementation rule of thumb "lock should not be held long"
which is often true for SMP and noting that it is compatible with the core design goal
of low-latency. The problem is that for low latency, "we often find it useful to keep
lock hold times short" does not get you very far.
> > You certainly can piggyback low-latency of a sort on
> > on the finer-grained locking you get from SMP support, but if you optimize
> > a kernel for SMP support you don't necessarily look at the same lock issues
> > as you would if your goal was to reduce latency. E.g. for SMP a design like
> >
> > each processor maintains a local cache of resource X. Getting X from
> > the local cache takes 100ns and only local locking.
> >
> > there is a slow and expensive spin locked central resource for X
> > used to replenish local caches. Getting X from the central resource
> > takes 1 second.
> >
> > Cache success rate is over 99.99%.
> >
> > With 10000 accesses to X, total time is 1.01 seconds for an average of 100
> > microseconds and this
> >
> > is overstating the case, for most processes never see the 1second delay and
> > average
> >
> > 100ns per access.
> >
> > But worst case is 1 second!
> >
> > If you were to design for low latency, you'd prefer the design
> >
> > an elaborate resource control mechanism allows all processors to
> > share X and get X resources within 1 millisecond.
> >
> > 1000 times better latency, 10000 times worse average case.
> >
> > You cannot escape a tradeoff by pretending it's not there.
>
> Nope you can't. However in your example I would only prefer your design if
> I was designing for guaranteed latency. Not just trying to keep the
> latencies low.
>
> I don't have a problem with my audio player skipping once ever 2 or 3
> years. A classic low-latency problem.
Actually, soft RT usually means "good performance until really I need it".
> > Look we handle this all the time in RTLinux: we have to throw away
> > heartbreakingly
> >
> > beautiful solutions because worst case numbers are bad.
>
> That is because you care about guaranteed latency, which is a truly
> different case from the normal linux kernel deals with.
Yes indeed.
> Personally I have a hard time buying hard real time code. Because
> there are always things cases where you cannot make guarantees. To
> even have a clue what the speed of your code will really run at you
> need to take into account the exact platform your code will run at,
> and you need to know the operating environment for that platform. And
> even with the best analysis and planning something still goes wrong.
Odd, I could have sworn it worked.
"Eric W. Biederman" wrote:
>
> Victor Yodaiken <[email protected]> writes:
>
> > On Mon, Oct 08, 2001 at 07:59:05PM +0530, BALBIR SINGH wrote:
> > > BTW, that brings me to another issue, once the kernel becomes preemptibel,
> > what
> >
> > > are the locking issues? how are semaphores and spin-locks affected? Has
> > anybody
> >
> > > defined or come up with the rules/document yet?
> >
> > IF the kernel becomes preemptible it will be so slow, so buggy, and so painful
> > to maintain, that those issues won't matter.
>
> The preemptible kernel work just takes the current SMP code, and
> allows it to work on a single processor. You are not interruptted if
> you have a lock held. This makes the number of cases in the kernel
> simpler, and should improve maintenance as more people will be
> affected by the SMP issues.
>
> Right now there is a preemptible kernel patch being maintained
> somewhere. I haven't had a chance to look recently. But the recent
> threads on low latency mentioned it.
>
> As for rules. They are the usual SMP rules. In earlier version there
> was a requirement or that you used balanced constructs.
>
> i.e.
> spin_lock_irqsave
> ...
> spin_unlock_irqrestore
>
> and not.
>
> spin_lock_irqsave
> ...
> spin_unlock
> ..
> restore_flags.
>
This rule is not there any more, but there are a few more:
SMP code that uses the cpu number (i.e. cpu data structures, etc.) and
thus depends on staying on that cpu, should be protected to prevent
preemption, which, MAY move the task to another cpu. Currently there is
code in the preemption patch to prevent this movement, however, it would
be faster to allow it and protect the areas that care.
Also, areas that use hareware resources that are not saved on preemption
must be protected. In the x86 this includes some of the MMX code which
uses the fp registers.
We also had some problems with the page info register on page faults,
however this proved to be a hazard in systems without preemption and
thus was fixed.
George
"Victor Yodaiken" <[email protected]> writes:
> Really? I see totally otherwise. The rule: "kernel code runs until it does a
> blocking
>
> call" makes sense.
Oh, you mean the kernel can go away anytime it feels like it. Because this
happens pontentially at every function call. When we can I'm very
much in favor of not using locks at all.
The problem with the run until block approach is that people forget that
some rare cases they will block, or those cases are modified so that
they will block after your code is written. And since it never shows
up in testing everything looks good until you hit the real world.
If you start with the SMP premise that you have multiple cpus
executing in the kernel at the same time you start with a healthier
assumption.
Now I do agree that the whole nested lock issue is not a very healthy
solution to the problem case of multiple threads executing within the
kernel. I don't think we even have any of those execept in the kernel
core. But for RTLinux this is mostly likely what you are working on,
the worst of it.
> What's really hard is figuring out which 12 spinlocks you
> need
>
> in what order.
I agree that anytime you get deeper than 1 spinlock there are issues.
I don't think the current kernel gets deeper than 3 locks. And yes
this is a nasty case no questions asked. This actually sounds like
the lazy page table mechanism...
> > > > Right now there is a preemptible kernel patch being maintained
> > > > somewhere. I haven't had a chance to look recently. But the recent
> > > > threads on low latency mentioned it.
> > >
> > > Try it out. Try running a kernel compile while a POSIX SCHED_FIFO process
> > > is running.
> >
> > My expectation would be that the SCHED_FIFO process would totally stop
> > the kernel compile. At least until it saw fit to sleep.
>
> Right - even for a periodic SCHED_FIFO process that gives up the processor a
> lot.
>
> So how do you ensure that the router daemon runs, that network packets get
> processed ...
>
> that all the other things that normally happen because kernel tasks run to
> completion
>
> and because user programs all progress still happen?
Progress is a hard requirement especially when you give up fairness,
and especially when you are wanting a specific amount of progress.
> > > > As for rules. They are the usual SMP rules. In earlier version there
> > > > was a requirement or that you used balanced constructs.
> > >
> >
> > There are 3 different goals we are mentioning here.
> > 1) SMP
> > 2) Soft Realtime. (low-latency)
> > This is things like playing audio or video where it is o.k. to skip
> > but want it to happen as rarely as possible.
> > 3) Hard Realtime. (guaranteed-latency)
> >
> > All 3 are different.
>
>
> Agree. People forget though. And "as rarely as possible" is pretty
> damn vague.
Agrred it isn't the most clear. The core with soft realtime is that
it is o.k. to fail, nothing nasty will happen. The you work on
reducing the latencies without penalizing other things. In that
cases keeping the latencies down is not the most important thing.
> > > I'm sorry, but this is not correct. SMP is different from low-latency and
> > > has different goals.
> >
> > SMP has the basic goal of having locks held for as short a time as possible,
>
> It does? I thought SMP had the basic goal of optimizing average case use of
> multiple
>
> processors and getting a decent ratio between throughput and responsiveness.
> That's very different.
Yes overall. But in the point case of lock hold times it is true.
> > and a preemptible kernel has the goal of having the kernel being unpreemptible
>
> > for as short a time as possible. Both of these are compatible.
>
> Nope. You are taking an implementation rule of thumb "lock should not be held
> long"
>
> which is often true for SMP and noting that it is compatible with the core
> design goal
>
> of low-latency. The problem is that for low latency, "we often find it useful to
> keep
>
> lock hold times short" does not get you very far.
It solves a lot of the corner cases like long compute loops in the
kernel like the swap off path. For many of the other cases keeping
both latency low, and throughput high is a general design goal. I
agree that preemption is not a silver bullet.
The obvious case this solves is copy_from_user. Which often is a real
bottleneck, but if you add code to see if it should preempt itself it
will run slower, and get less throughput. At the same time
copy_from_user does not have any locks held. In that case preemption
is the optimal approach.
Another case is the swapoff path. Which can run a very long time in
kernel mode, and when it doesn't find a page to do I/O on it doesn't
make a single blocking call. But it also has periods of time when
it doesn't hold locks.
So there are cases preemption is a good solution. Reviewing the current
algorithms and find a good way to handle things is also good.
> > Nope you can't. However in your example I would only prefer your design if
> > I was designing for guaranteed latency. Not just trying to keep the
> > latencies low.
> >
> > I don't have a problem with my audio player skipping once ever 2 or 3
> > years. A classic low-latency problem.
>
>
> Actually, soft RT usually means "good performance until really I need it".
I admit a lot of times that does seem to be the implementation.
Because the people offering it don't have it as a goal. With a touch
of care the people doing the current low-latency work on linux should
be able to keep the kernel hackers honest on that score. Without
users who complain a lot of things look good until you use them.
Again I aim soft real time type things at audio, and video players,
not robot control. Where skips are obviously o.k. just not pleasant
to the ear.
> > Personally I have a hard time buying hard real time code. Because
> > there are always things cases where you cannot make guarantees. To
> > even have a clue what the speed of your code will really run at you
> > need to take into account the exact platform your code will run at,
> > and you need to know the operating environment for that platform. And
> > even with the best analysis and planning something still goes wrong.
>
>
> Odd, I could have sworn it worked.
Hmm. The mars probe problem not withstanding? As I recall in that
case it was only because they had a backup strategy, of rebooting the
machine, for when the hard realtime requirements failed to be met that
the system managed to land on mars.
Personally I think doing everything in your power to bound latencies
and do everything possible to ensure that something will happen in a
timely manner is the correct thing to do. However you still have to
keep the engineering assumption that something can still go wrong. A
gamma ray could hit your clock generater causing the cpu to run 5%
slower, you could be running on a subtlely defective cpu, or a whole
host of things.
My problem with hard real time, is that just with a skim of it I get
the feeling people are certain they can achieve their goals and not
just come within 99.9999999999% of them.
And as systems get more complex the probability of the inevitiable
failure is due human error goes up. As I recall the mars lander it
was a bug in the implementation of priority inheritance.
Eric
On Mon, Oct 08, 2001 at 01:09:33PM -0600, Eric W. Biederman wrote:
> "Victor Yodaiken" <[email protected]> writes:
>
> > Really? I see totally otherwise. The rule: "kernel code runs until it does a
> > blocking
> >
> > call" makes sense.
>
> Oh, you mean the kernel can go away anytime it feels like it. Because this
> happens pontentially at every function call. When we can I'm very
> much in favor of not using locks at all.
>
> The problem with the run until block approach is that people forget that
> some rare cases they will block, or those cases are modified so that
> they will block after your code is written. And since it never shows
> up in testing everything looks good until you hit the real world.
The same problem appears with spin_locks
spin_lock A
f();
...
And next week a new version of f containing
if(rarecondition)
{
spinlock A
do ...
unlock A
}
Why do you think Linux now needs recursive mutexes and locks?
>
> If you start with the SMP premise that you have multiple cpus
> executing in the kernel at the same time you start with a healthier
> assumption.
And you loose a lot of potential optimizations and gain complexity.
>
> Now I do agree that the whole nested lock issue is not a very healthy
> solution to the problem case of multiple threads executing within the
> kernel. I don't think we even have any of those execept in the kernel
You bet.
[...]
> I agree that anytime you get deeper than 1 spinlock there are issues.
> I don't think the current kernel gets deeper than 3 locks. And yes
> this is a nasty case no questions asked. This actually sounds like
> the lazy page table mechanism...
>
> > > > > Right now there is a preemptible kernel patch being maintained
> > > > > somewhere. I haven't had a chance to look recently. But the recent
> > > > > threads on low latency mentioned it.
> > > >
> > > > Try it out. Try running a kernel compile while a POSIX SCHED_FIFO process
> > > > is running.
> > >
> > > My expectation would be that the SCHED_FIFO process would totally stop
> > > the kernel compile. At least until it saw fit to sleep.
> >
> > Right - even for a periodic SCHED_FIFO process that gives up the processor a
> > lot.
> >
> > So how do you ensure that the router daemon runs, that network packets get
> > processed ...
> >
> > that all the other things that normally happen because kernel tasks run to
> > completion
> >
> > and because user programs all progress still happen?
>
> Progress is a hard requirement especially when you give up fairness,
> and especially when you are wanting a specific amount of progress.
Its good to run a general purpose OS under the rule that all processes
eventually progress. But then you cannot add the rule that the highest
priority RT task runs within some small T time units.
These rules are not compatible.
[...]
> The obvious case this solves is copy_from_user. Which often is a real
> bottleneck, but if you add code to see if it should preempt itself it
> will run slower, and get less throughput. At the same time
> copy_from_user does not have any locks held. In that case preemption
> is the optimal approach.
Why is it optimal compared to just doing the copy and getting it over with?
copy data and blow cache
vs
copy some data and blow cache
get preempted
repeat N times
> > > Personally I have a hard time buying hard real time code. Because
> > > there are always things cases where you cannot make guarantees. To
> > > even have a clue what the speed of your code will really run at you
> > > need to take into account the exact platform your code will run at,
> > > and you need to know the operating environment for that platform. And
> > > even with the best analysis and planning something still goes wrong.
> >
> >
> > Odd, I could have sworn it worked.
>
> Hmm. The mars probe problem not withstanding? As I recall in that
Bug free software is not possible. Bug free hardware is not possible.
> case it was only because they had a backup strategy, of rebooting the
> machine, for when the hard realtime requirements failed to be met that
> the system managed to land on mars.
>
> Personally I think doing everything in your power to bound latencies
> and do everything possible to ensure that something will happen in a
> timely manner is the correct thing to do. However you still have to
> keep the engineering assumption that something can still go wrong. A
> gamma ray could hit your clock generater causing the cpu to run 5%
> slower, you could be running on a subtlely defective cpu, or a whole
> host of things.
Sure. But consider:
Personally I have a hard time buying bridge designs to always
support max load. You could have defective cables, or a comet could
land on the bridge, or ...
>
> My problem with hard real time, is that just with a skim of it I get
> the feeling people are certain they can achieve their goals and not
> just come within 99.9999999999% of them.
>
> And as systems get more complex the probability of the inevitiable
> failure is due human error goes up. As I recall the mars lander it
> was a bug in the implementation of priority inheritance.
Actually it was a design error in VxWorks.