2009-11-13 22:04:03

by Thomas Gleixner

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

Jean,

On Thu, 12 Nov 2009, Jean Delvare wrote:
> As far as I can see, yield() doesn't have clear semantics attached.
> It's more of a utility function everyone could use as they see fit. In
> that respect, I understand your concerns about the coders' expectations
> and how they could be dismissed by RT. OTOH, I don't think that asking
> all developers to get rid of yield() because it _may not be_
> RT-compliant will lead you anywhere.
>
> In the 3 occurrences I've looked at, yield() is used as a way to
> busy-wait in a multitask-friendly way. What other use cases do you
> expect? I've never seen yield() used as a way to avoid deadlocks, which
> seems to be what you fear. I guess it could statistically be used that
> way, but obviously I wouldn't recommend it. Anything else?
>
> I would recommend that you audit the various use cases of yield(), and
> then offer replacements for the cases which are RT-unfriendly, leaving
> the rest alone and possibly clarifying the intended use of yield() to
> avoid future problems.
>
> In the i2c-algo-bit case, which started this thread, I can't really see
> what "something more explicit of an implementation" would be. yield()
> is as optimum as you can get, only delaying the processing if other
> tasks want to run. A sleep or a delay would delay the processing
> unconditionally, for an arbitrary amount of time, with no good reason.
> Removing yield() would increase the latency. This particular feature of
> i2c-algo-bit isn't necessarily terribly useful, but the code does the
> right thing, regardless of RT, so I just can't see any reason to change
> it.

The problem with yield() is not RT specific. Let's just look at the
yield semantics in mainline:

>From kernel/sched_fair.c

unsigned int __read_mostly sysctl_sched_compat_yield;
...
static void yield_task_fair(struct rq *rq)
{
if (likely(!sysctl_sched_compat_yield) && curr->policy != SCHED_BATCH) {
...
return;
}
}

So even in mainline yield() is doing nothing vs. latencies and is not
multitask friendly by default. It's just not complaining about it.

yield itself is a design failure in almost all of its aspects. It's
the poor mans replacement for proper async notification.

Q: Why put people yield() into their code ?
A: Because:
- it is less nasty than busy waiting for a long time
- it works better
- it solves the hang
- it happened to be in the driver they copied
- ....

At least 90% of the in kernel users of yield() are either a bug or a
design problem or lack of understanding or all of those.

I can see the idea of using yield() to let other tasks making progress
in situations where the hardware is a design failure as well,
e.g. bitbang devices. But if we have to deal with hardware which is
crap by design yield() is the worst of all answers simply due to its
horrible semantics.

Let's look at the code in question:

for (i = 0; i <= retries; i++) {
ret = i2c_outb(i2c_adap, addr);
if (ret == 1 || i == retries)
break;
bit_dbg(3, &i2c_adap->dev, "emitting stop condition\n");
i2c_stop(adap);
udelay(adap->udelay);
yield();

We are in the error path as we failed to communicate with the device
in the first place. So there a two scenarios:

1) the device is essential for the boot process:

you have hit a problem anyway

2) this is just some random device probing:

who cares ?

So in both cases the following change is a noop vs. latency and
whatever your concerns are:

- udelay(adap->udelay);
- yield();
+ msleep(1);

Thanks,

tglx


2009-11-14 18:02:05

by Jean Delvare

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

Hi Thomas,

On Fri, 13 Nov 2009 23:03:39 +0100 (CET), Thomas Gleixner wrote:
> Jean,
>
> On Thu, 12 Nov 2009, Jean Delvare wrote:
> > As far as I can see, yield() doesn't have clear semantics attached.
> > It's more of a utility function everyone could use as they see fit. In
> > that respect, I understand your concerns about the coders' expectations
> > and how they could be dismissed by RT. OTOH, I don't think that asking
> > all developers to get rid of yield() because it _may not be_
> > RT-compliant will lead you anywhere.
> >
> > In the 3 occurrences I've looked at, yield() is used as a way to
> > busy-wait in a multitask-friendly way. What other use cases do you
> > expect? I've never seen yield() used as a way to avoid deadlocks, which
> > seems to be what you fear. I guess it could statistically be used that
> > way, but obviously I wouldn't recommend it. Anything else?
> >
> > I would recommend that you audit the various use cases of yield(), and
> > then offer replacements for the cases which are RT-unfriendly, leaving
> > the rest alone and possibly clarifying the intended use of yield() to
> > avoid future problems.
> >
> > In the i2c-algo-bit case, which started this thread, I can't really see
> > what "something more explicit of an implementation" would be. yield()
> > is as optimum as you can get, only delaying the processing if other
> > tasks want to run. A sleep or a delay would delay the processing
> > unconditionally, for an arbitrary amount of time, with no good reason.
> > Removing yield() would increase the latency. This particular feature of
> > i2c-algo-bit isn't necessarily terribly useful, but the code does the
> > right thing, regardless of RT, so I just can't see any reason to change
> > it.
>
> The problem with yield() is not RT specific. Let's just look at the
> yield semantics in mainline:
>
> From kernel/sched_fair.c
>
> unsigned int __read_mostly sysctl_sched_compat_yield;
> ...
> static void yield_task_fair(struct rq *rq)
> {
> if (likely(!sysctl_sched_compat_yield) && curr->policy != SCHED_BATCH) {
> ...
> return;
> }
> }
>
> So even in mainline yield() is doing nothing vs. latencies and is not
> multitask friendly by default. It's just not complaining about it.

I admit I have no clue what you're talking about. What I see is:

/**
* yield - yield the current processor to other threads.
*
* This is a shortcut for kernel-space yielding - it marks the
* thread runnable and calls sys_sched_yield().
*/
void __sched yield(void)
{
set_current_state(TASK_RUNNING);
sys_sched_yield();
}

I have no clue where sys_sched_yield is defined, but I trust the
comment as to what sched() is doing.

> yield itself is a design failure in almost all of its aspects. It's
> the poor mans replacement for proper async notification.

All the use cases in the i2c subsystem have nothing to do with async
notification.

> Q: Why put people yield() into their code ?
> A: Because:
> - it is less nasty than busy waiting for a long time

That's what I've seen, yes.

> - it works better

This is vague.

> - it solves the hang

In which case it is definitely a design failure, granted.

> - it happened to be in the driver they copied
> - ....
>
> At least 90% of the in kernel users of yield() are either a bug or a
> design problem or lack of understanding or all of those.
>
> I can see the idea of using yield() to let other tasks making progress
> in situations where the hardware is a design failure as well,
> e.g. bitbang devices. But if we have to deal with hardware which is
> crap by design yield() is the worst of all answers simply due to its
> horrible semantics.
>
> Let's look at the code in question:
>
> for (i = 0; i <= retries; i++) {
> ret = i2c_outb(i2c_adap, addr);
> if (ret == 1 || i == retries)
> break;
> bit_dbg(3, &i2c_adap->dev, "emitting stop condition\n");
> i2c_stop(adap);
> udelay(adap->udelay);
> yield();
>
> We are in the error path as we failed to communicate with the device
> in the first place. So there a two scenarios:
>
> 1) the device is essential for the boot process:
>
> you have hit a problem anyway

No, you have not. If you exhaust all the retries, then yes you have a
problem. But if the first attempt fails and the second works, then all
is fine. And this can happen. This is the whole point of the retry loop!

>
> 2) this is just some random device probing:
>
> who cares ?

"Who cares" about what? Me, I certainly care about this probing not
taking too much time, to not slow down the boot process for example
(framebuffer drivers do such probes over the DDC bus.)

On top of this, this may not be "random device probing" but a real
access to a known device, which is busy and thus doesn't ack its
address. This is permitted by I2C. A common case is I2C EEPROMs, which
are busy when writing a page, and need some time before they will ack
their address again. But it will happen.

> So in both cases the following change is a noop vs. latency and
> whatever your concerns are:
>
> - udelay(adap->udelay);
> - yield();
> + msleep(1);

This introduces up to 20 ms of delay (at HZ=100) for each retry.
"retries" being 3 by default, this means up to 60 ms total per
transaction. So you can't claim it is equivalent to the original code,
it simply is not, timing-wise.

Then again, this particular piece of code may go away someday, because I
see no reason why i2c-algo-bit would retry automatically in this
particular case, when other I2C adapter drivers do not. It would seem
better to let the caller determine how long to wait and/or how many
times to retry, depending on the target device.

But my initial point still holds: if you are unhappy about yield,
discuss it with Ingo, Peter, Linus or anyone else who cares, and change
it and/or delete it. Asking all driver authors/maintainers to stop
using this function but leaving it defined without a deprecation
warning won't lead you anywhere. Developers will add new calls faster
than you remove them.

--
Jean Delvare

2009-11-16 15:56:10

by Mark Brown

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

On Fri, Nov 13, 2009 at 11:03:39PM +0100, Thomas Gleixner wrote:

> Q: Why put people yield() into their code ?
> A: Because:
> - it is less nasty than busy waiting for a long time
> - it works better

...

> I can see the idea of using yield() to let other tasks making progress
> in situations where the hardware is a design failure as well,
> e.g. bitbang devices. But if we have to deal with hardware which is
> crap by design yield() is the worst of all answers simply due to its
> horrible semantics.

What other options are there available for the first case (which is
often why things work better with the use of yield) that don't involve
sleeps, or is the idea that in situations like this drivers should
always sleep?

2009-11-18 00:50:04

by Leon Woestenberg

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

Hello,

On Mon, Nov 16, 2009 at 4:56 PM, Mark Brown
<[email protected]> wrote:
> On Fri, Nov 13, 2009 at 11:03:39PM +0100, Thomas Gleixner wrote:
>
>> Q: Why put people yield() into their code ?
>> A: Because:
>> ? ?- it is less nasty than busy waiting for a long time
>> ? ?- it works better
>
> ...
>
>> I can see the idea of using yield() to let other tasks making progress
>> in situations where the hardware is a design failure as well,
>> e.g. bitbang devices. But if we have to deal with hardware which is
>> crap by design yield() is the worst of all answers simply due to its
>> horrible semantics.
>
> What other options are there available for the first case (which is
> often why things work better with the use of yield) that don't involve
> sleeps, or is the idea that in situations like this drivers should
> always sleep?
>
Good point.

I think the yield()s in the device driver code means "I need a small
delay before the hardware is ready" which might translate to some
arbitrary "let me msleep()" or "do not select this task in the next
scheduler run, EVEN IF this task is highest priority".

Can we mark a task sleeping infinitely short, in such a way that the
scheduler does not schedule it at first resched?

During a next timer timeout check, the task would be marked schedulable again.

I assume this is rather dirty and has too much overhead on the timer interfaces.

Regards,

Leon.

--
Leon

2009-11-18 01:04:12

by Alan

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

> I think the yield()s in the device driver code means "I need a small
> delay before the hardware is ready" which might translate to some
> arbitrary "let me msleep()" or "do not select this task in the next
> scheduler run, EVEN IF this task is highest priority".

Yield() in a driver is almost always a bug. The reason for that is that
doing

do {
inb();
} while(!something);

which is what yield can end up as being if there is nothing else on that
CPU is extremely bad for bus performance on most systems. It's almost
always better to be using msleep() or even mdelay + a check to see if a
reschedule is needed/schedule().

> I assume this is rather dirty and has too much overhead on the timer interfaces.

Our timers are very efficient and some day we will need to make jiffies a
function and stop the timer ticking for best performance. At that point
timers are probably the most efficient way to do much of this.

Be that as it may, yield() in a driver is almost always the wrong thing
to do.

2009-11-18 16:28:51

by Leon Woestenberg

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

Hello,

On Wed, Nov 18, 2009 at 2:05 AM, Alan Cox <[email protected]> wrote:
>> I think the yield()s in the device driver code means "I need a small
>> delay before the hardware is ready" which might translate to some
>> arbitrary "let me msleep()" or "do not select this task in the next
>> scheduler run, EVEN IF this task is highest priority".
>
> Yield() in a driver is almost always a bug.
>

I know and that's exactly why I started this thread (and of course,
because I ran into the bug on my system).


> Our timers are very efficient and some day we will need to make jiffies a
> function and stop the timer ticking for best performance. At that point
> timers are probably the most efficient way to do much of this.
>
The problem with I2C bitbanged is the stringent timing, we need a way
to have fine-grained sleeping
mixed with real-time tasks in order to make this work.

As Thomas already said, the hardware is broken (in the sense that I2C
should really rely on hardware timers, i.e. an I2C host controller).

However, much of the cheaper/older/... embedded hardware is broken.
Given that I2C devices are relatively easy on the timing, we need
the least-dirty way that is not buggy in the kernel.

> Be that as it may, yield() in a driver is almost always the wrong thing
> to do.
>
Yes. What is your idea on removing those without breaking
functionality? Fine-graining sleep()ing?

Regards,
--
Leon

2009-11-18 16:52:01

by Jean Delvare

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

On Wed, 18 Nov 2009 17:28:53 +0100, Leon Woestenberg wrote:
> On Wed, Nov 18, 2009 at 2:05 AM, Alan Cox <[email protected]> wrote:
> > Our timers are very efficient and some day we will need to make jiffies a
> > function and stop the timer ticking for best performance. At that point
> > timers are probably the most efficient way to do much of this.
>
> The problem with I2C bitbanged is the stringent timing, we need a way
> to have fine-grained sleeping
> mixed with real-time tasks in order to make this work.

FWIW, the problem that was initially reported has nothing to do with
this. i2c-algo-bit used mdelay() during transactions, not yield().
yield() is used only in once place, _between_ transactions attempts.
There are no strict timing constraints there.

--
Jean Delvare

2009-11-18 20:37:13

by Thomas Gleixner

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

On Wed, 18 Nov 2009, Jean Delvare wrote:

> On Wed, 18 Nov 2009 17:28:53 +0100, Leon Woestenberg wrote:
> > On Wed, Nov 18, 2009 at 2:05 AM, Alan Cox <[email protected]> wrote:
> > > Our timers are very efficient and some day we will need to make jiffies a
> > > function and stop the timer ticking for best performance. At that point
> > > timers are probably the most efficient way to do much of this.
> >
> > The problem with I2C bitbanged is the stringent timing, we need a way
> > to have fine-grained sleeping
> > mixed with real-time tasks in order to make this work.
>
> FWIW, the problem that was initially reported has nothing to do with
> this. i2c-algo-bit used mdelay() during transactions, not yield().
> yield() is used only in once place, _between_ transactions attempts.
> There are no strict timing constraints there.

That still does not explain why yield() is necessary _between_ the
transaction attempts.

That code is fully preemptible, otherwise you could not call
yield(). And as I said before nobody even noticed that the yield()
default implementation was changed to a NOOP by default in the
scheduler.

Thanks,

tglx

2009-11-18 20:46:39

by Sven-Thorsten Dietrich

[permalink] [raw]
Subject: [PATCH] cleanup sched_yield (sys)call nesting.

On Wed, 2009-11-18 at 17:52 +0100, Jean Delvare wrote:
> On Wed, 18 Nov 2009 17:28:53 +0100, Leon Woestenberg wrote:
> > On Wed, Nov 18, 2009 at 2:05 AM, Alan Cox <[email protected]> wrote:
> > > Our timers are very efficient and some day we will need to make jiffies a
> > > function and stop the timer ticking for best performance. At that point
> > > timers are probably the most efficient way to do much of this.
> >
> > The problem with I2C bitbanged is the stringent timing, we need a way
> > to have fine-grained sleeping
> > mixed with real-time tasks in order to make this work.
>
> FWIW, the problem that was initially reported has nothing to do with
> this. i2c-algo-bit used mdelay() during transactions, not yield().
> yield() is used only in once place, _between_ transactions attempts.
> There are no strict timing constraints there.
>

I agree that dropping out sched_yield entirely should maybe start by
deprecating / flagging as a warning in sched_rt.c.

This is just a minimal cleanup I stumbled across while looking at it -
to get away from the uglyness of calling into the syscall interface from
inside the Kernel.

I'll generate something more substantial for discussion later.

Subject: clean up chaining in sched_yield()
From: Sven-Thorsten Dietrich <[email protected]>

The call to sys_sched_yield for in-Kernel is messy.
and the return code from sys_sched_yield is ignored when called from
in-kernel.

Signed-off-by: Sven-Thorsten Dietrich <[email protected]>

diff --git a/kernel/sched.c b/kernel/sched.c
index 3c11ae0..db2c0f9 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6647,12 +6647,12 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len,
}

/**
- * sys_sched_yield - yield the current processor to other threads.
+ * do_sched_yield - yield the current processor to other threads.
*
* This function yields the current CPU to other tasks. If there are no
* other threads running on this CPU then this function will return.
*/
-SYSCALL_DEFINE0(sched_yield)
+static inline void do_sched_yield(void)
{
struct rq *rq = this_rq_lock();

@@ -6669,6 +6669,11 @@ SYSCALL_DEFINE0(sched_yield)
preempt_enable_no_resched();

schedule();
+}
+
+SYSCALL_DEFINE0(sched_yield)
+{
+ do_sched_yield();

return 0;
}
@@ -6746,7 +6751,7 @@ EXPORT_SYMBOL(__cond_resched_softirq);
void __sched yield(void)
{
set_current_state(TASK_RUNNING);
- sys_sched_yield();
+ do_sched_yield();
}
EXPORT_SYMBOL(yield);


2009-11-18 20:57:11

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] cleanup sched_yield (sys)call nesting.

On Wed, 18 Nov 2009, Sven-Thorsten Dietrich wrote:
> On Wed, 2009-11-18 at 17:52 +0100, Jean Delvare wrote:
> > On Wed, 18 Nov 2009 17:28:53 +0100, Leon Woestenberg wrote:
> > > On Wed, Nov 18, 2009 at 2:05 AM, Alan Cox <[email protected]> wrote:
> > > > Our timers are very efficient and some day we will need to make jiffies a
> > > > function and stop the timer ticking for best performance. At that point
> > > > timers are probably the most efficient way to do much of this.
> > >
> > > The problem with I2C bitbanged is the stringent timing, we need a way
> > > to have fine-grained sleeping
> > > mixed with real-time tasks in order to make this work.
> >
> > FWIW, the problem that was initially reported has nothing to do with
> > this. i2c-algo-bit used mdelay() during transactions, not yield().
> > yield() is used only in once place, _between_ transactions attempts.
> > There are no strict timing constraints there.
> >
>
> I agree that dropping out sched_yield entirely should maybe start by
> deprecating / flagging as a warning in sched_rt.c.

Errm, that's unrelated to sched_rt.c.

yield() in the kernel in general is needs to be deprecated.

> This is just a minimal cleanup I stumbled across while looking at it -
> to get away from the uglyness of calling into the syscall interface from
> inside the Kernel.

And why exactly is that ugly ?

> I'll generate something more substantial for discussion later.
>
> Subject: clean up chaining in sched_yield()
> From: Sven-Thorsten Dietrich <[email protected]>
>
> The call to sys_sched_yield for in-Kernel is messy.
> and the return code from sys_sched_yield is ignored when called from
> in-kernel.

Which is completely irrelevant because the return code is always 0.

That patch adds just code bloat for no value.

Thanks,

tglx

2009-11-18 21:04:55

by Sven-Thorsten Dietrich

[permalink] [raw]
Subject: Re: [PATCH] cleanup sched_yield (sys)call nesting.

On Wed, 2009-11-18 at 21:56 +0100, Thomas Gleixner wrote:
> On Wed, 18 Nov 2009, Sven-Thorsten Dietrich wrote:
> > On Wed, 2009-11-18 at 17:52 +0100, Jean Delvare wrote:
> > > On Wed, 18 Nov 2009 17:28:53 +0100, Leon Woestenberg wrote:
> > > > On Wed, Nov 18, 2009 at 2:05 AM, Alan Cox <[email protected]> wrote:
> > > > > Our timers are very efficient and some day we will need to make jiffies a
> > > > > function and stop the timer ticking for best performance. At that point
> > > > > timers are probably the most efficient way to do much of this.
> > > >
> > > > The problem with I2C bitbanged is the stringent timing, we need a way
> > > > to have fine-grained sleeping
> > > > mixed with real-time tasks in order to make this work.
> > >
> > > FWIW, the problem that was initially reported has nothing to do with
> > > this. i2c-algo-bit used mdelay() during transactions, not yield().
> > > yield() is used only in once place, _between_ transactions attempts.
> > > There are no strict timing constraints there.
> > >
> >
> > I agree that dropping out sched_yield entirely should maybe start by
> > deprecating / flagging as a warning in sched_rt.c.
>
> Errm, that's unrelated to sched_rt.c.
>
> yield() in the kernel in general is needs to be deprecated.
>
> > This is just a minimal cleanup I stumbled across while looking at it -
> > to get away from the uglyness of calling into the syscall interface from
> > inside the Kernel.
>
> And why exactly is that ugly ?

Calling from a function returning void into a non-void function and then
ignoring the return code ?


2009-11-18 21:34:59

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] cleanup sched_yield (sys)call nesting.

On Wed, 18 Nov 2009, Sven-Thorsten Dietrich wrote:

> On Wed, 2009-11-18 at 21:56 +0100, Thomas Gleixner wrote:
> > On Wed, 18 Nov 2009, Sven-Thorsten Dietrich wrote:
> > > On Wed, 2009-11-18 at 17:52 +0100, Jean Delvare wrote:
> > > > On Wed, 18 Nov 2009 17:28:53 +0100, Leon Woestenberg wrote:
> > > > > On Wed, Nov 18, 2009 at 2:05 AM, Alan Cox <[email protected]> wrote:
> > > > > > Our timers are very efficient and some day we will need to make jiffies a
> > > > > > function and stop the timer ticking for best performance. At that point
> > > > > > timers are probably the most efficient way to do much of this.
> > > > >
> > > > > The problem with I2C bitbanged is the stringent timing, we need a way
> > > > > to have fine-grained sleeping
> > > > > mixed with real-time tasks in order to make this work.
> > > >
> > > > FWIW, the problem that was initially reported has nothing to do with
> > > > this. i2c-algo-bit used mdelay() during transactions, not yield().
> > > > yield() is used only in once place, _between_ transactions attempts.
> > > > There are no strict timing constraints there.
> > > >
> > >
> > > I agree that dropping out sched_yield entirely should maybe start by
> > > deprecating / flagging as a warning in sched_rt.c.
> >
> > Errm, that's unrelated to sched_rt.c.
> >
> > yield() in the kernel in general is needs to be deprecated.
> >
> > > This is just a minimal cleanup I stumbled across while looking at it -
> > > to get away from the uglyness of calling into the syscall interface from
> > > inside the Kernel.
> >
> > And why exactly is that ugly ?
>
> Calling from a function returning void into a non-void function and then
> ignoring the return code ?

Care to read what I wrote further down ?

>> Which is completely irrelevant because the return code is always 0.

Thanks,

tglx

2009-11-19 03:21:59

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH] cleanup sched_yield (sys)call nesting.


* Sven-Thorsten Dietrich <[email protected]> wrote:

> Subject: clean up chaining in sched_yield()
> From: Sven-Thorsten Dietrich <[email protected]>
>
> The call to sys_sched_yield for in-Kernel is messy.
> and the return code from sys_sched_yield is ignored when called from
> in-kernel.
>
> Signed-off-by: Sven-Thorsten Dietrich <[email protected]>
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 3c11ae0..db2c0f9 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -6647,12 +6647,12 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t, pid, unsigned int, len,
> }
>
> /**
> - * sys_sched_yield - yield the current processor to other threads.
> + * do_sched_yield - yield the current processor to other threads.
> *
> * This function yields the current CPU to other tasks. If there are no
> * other threads running on this CPU then this function will return.
> */
> -SYSCALL_DEFINE0(sched_yield)
> +static inline void do_sched_yield(void)
> {
> struct rq *rq = this_rq_lock();
>
> @@ -6669,6 +6669,11 @@ SYSCALL_DEFINE0(sched_yield)
> preempt_enable_no_resched();
>
> schedule();
> +}
> +
> +SYSCALL_DEFINE0(sched_yield)
> +{
> + do_sched_yield();
>
> return 0;
> }
> @@ -6746,7 +6751,7 @@ EXPORT_SYMBOL(__cond_resched_softirq);
> void __sched yield(void)
> {
> set_current_state(TASK_RUNNING);
> - sys_sched_yield();
> + do_sched_yield();
> }
> EXPORT_SYMBOL(yield);

Why do you consider an in-kernel call to sys_*() 'messy'? It is not -
and we rely on being able to do it with various syscalls.

Also, your patch bloats the scheduler a bit, for no good reason.

Thanks,

Ingo

2009-11-19 04:48:35

by Sven-Thorsten Dietrich

[permalink] [raw]
Subject: Re: [PATCH] cleanup sched_yield (sys)call nesting.

On Wed, 2009-11-18 at 22:34 +0100, Thomas Gleixner wrote:
> On Wed, 18 Nov 2009, Sven-Thorsten Dietrich wrote:
>
> > On Wed, 2009-11-18 at 21:56 +0100, Thomas Gleixner wrote:
> > > On Wed, 18 Nov 2009, Sven-Thorsten Dietrich wrote:
> > > > On Wed, 2009-11-18 at 17:52 +0100, Jean Delvare wrote:
> > > > > On Wed, 18 Nov 2009 17:28:53 +0100, Leon Woestenberg wrote:
> > > > > > On Wed, Nov 18, 2009 at 2:05 AM, Alan Cox <[email protected]> wrote:
> > > > > > > Our timers are very efficient and some day we will need to make jiffies a
> > > > > > > function and stop the timer ticking for best performance. At that point
> > > > > > > timers are probably the most efficient way to do much of this.
> > > > > >
> > > > > > The problem with I2C bitbanged is the stringent timing, we need a way
> > > > > > to have fine-grained sleeping
> > > > > > mixed with real-time tasks in order to make this work.
> > > > >
> > > > > FWIW, the problem that was initially reported has nothing to do with
> > > > > this. i2c-algo-bit used mdelay() during transactions, not yield().
> > > > > yield() is used only in once place, _between_ transactions attempts.
> > > > > There are no strict timing constraints there.
> > > > >
> > > >
> > > > I agree that dropping out sched_yield entirely should maybe start by
> > > > deprecating / flagging as a warning in sched_rt.c.
> > >
> > > Errm, that's unrelated to sched_rt.c.
> > >
> > > yield() in the kernel in general is needs to be deprecated.
> > >
> > > > This is just a minimal cleanup I stumbled across while looking at it -
> > > > to get away from the uglyness of calling into the syscall interface from
> > > > inside the Kernel.
> > >
> > > And why exactly is that ugly ?
> >
> > Calling from a function returning void into a non-void function and then
> > ignoring the return code ?
>
> Care to read what I wrote further down ?
>
> >> Which is completely irrelevant because the return code is always 0.
>


We are trying to get rid of __sched_yield calls from-inside-the-Kernel,
but sys_sched_yield() from user-space will remain.

This patch breaks out the in-Kernel interface for the yield()
functionality and deprecates it explicitly.

The sys_sched_yield() variety, however is not deprecated.

The objective is to deprecate *only* the in-kernel calls to
sched_yield(), in hopes of reducing new calls to sched_yield() being
added.

Eventually, when the in-Kernel calls are gone, the __sched_yield() would
be removed, and the first 2 hunks would essentially be reverted, leaving
only the user-space caller sys_sched_yield.

For the time being we add 2 lines and 2 braces of bulk, in hopes that
elsewhere this eliminates more __sched_yield() calls being added while
we work to eliminate the ones that exist already.

In regards to the return code, maybe we can talk about returning an
error when an RT task calls sys_sched_yield(). But that is another
topic.

Thanks,

Sven



Subject: Deprecate in-Kernel use of __sched_yield()
From: Sven-Thorsten Dietrich <[email protected]>

Break out the syscall interface from the deprecated in-kernel
sched_yield() interface that is to be removed.

Acked-by: Sven-Thorsten Dietrich <[email protected]>

---
kernel/sched.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)

Index: linux-2.6.31-openSUSE-11.2/kernel/sched.c
===================================================================
--- linux-2.6.31-openSUSE-11.2.orig/kernel/sched.c
+++ linux-2.6.31-openSUSE-11.2/kernel/sched.c
@@ -6566,12 +6566,12 @@ SYSCALL_DEFINE3(sched_getaffinity, pid_t
}

/**
- * sys_sched_yield - yield the current processor to other threads.
+ * do_sched_yield - yield the current processor to other threads.
*
* This function yields the current CPU to other tasks. If there are no
* other threads running on this CPU then this function will return.
*/
-SYSCALL_DEFINE0(sched_yield)
+static inline void do_sched_yield(void)
{
struct rq *rq = this_rq_lock();

@@ -6588,7 +6588,11 @@ SYSCALL_DEFINE0(sched_yield)
preempt_enable_no_resched();

schedule();
+}

+SYSCALL_DEFINE0(sched_yield)
+{
+ do_sched_yield();
return 0;
}

@@ -6670,10 +6674,10 @@ EXPORT_SYMBOL(cond_resched_softirq);
* This is a shortcut for kernel-space yielding - it marks the
* thread runnable and calls sys_sched_yield().
*/
-void __sched yield(void)
+void __sched __deprecated yield(void)
{
set_current_state(TASK_RUNNING);
- sys_sched_yield();
+ do_sched_yield();
}
EXPORT_SYMBOL(yield);


2009-11-19 10:37:10

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] cleanup sched_yield (sys)call nesting.

On Wed, 18 Nov 2009, Sven-Thorsten Dietrich wrote:
> We are trying to get rid of __sched_yield calls from-inside-the-Kernel,
> but sys_sched_yield() from user-space will remain.
>
> This patch breaks out the in-Kernel interface for the yield()
> functionality and deprecates it explicitly.
>
> The sys_sched_yield() variety, however is not deprecated.
>
> The objective is to deprecate *only* the in-kernel calls to
> sched_yield(), in hopes of reducing new calls to sched_yield() being
> added.

Nothing in the kernel calls sched_yield() because there is no such
function.

> Eventually, when the in-Kernel calls are gone, the __sched_yield() would
> be removed, and the first 2 hunks would essentially be reverted, leaving
> only the user-space caller sys_sched_yield.
>
> For the time being we add 2 lines and 2 braces of bulk, in hopes that
> elsewhere this eliminates more __sched_yield() calls being added while
> we work to eliminate the ones that exist already.

Err ? WTF do you need to fiddle in sched.c to deprecate a function ?

Nothing in the kernel calls sys_sched_yield() except the syscall and
the implementation of yield() in sched.c. The drivers,... call yield()
nothing else.

To deprecate yield() all you need is adding __deprecated to the
function prototype in sched.h. And that's the only way you alert users
because it warns when compiling code which _calls_ yield() not when
compiling the implementation in sched.c.

Sigh,

tglx

2009-11-19 12:05:25

by Jean Delvare

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

On Wed, 18 Nov 2009 21:36:48 +0100 (CET), Thomas Gleixner wrote:
> On Wed, 18 Nov 2009, Jean Delvare wrote:
>
> > On Wed, 18 Nov 2009 17:28:53 +0100, Leon Woestenberg wrote:
> > > On Wed, Nov 18, 2009 at 2:05 AM, Alan Cox <[email protected]> wrote:
> > > > Our timers are very efficient and some day we will need to make jiffies a
> > > > function and stop the timer ticking for best performance. At that point
> > > > timers are probably the most efficient way to do much of this.
> > >
> > > The problem with I2C bitbanged is the stringent timing, we need a way
> > > to have fine-grained sleeping
> > > mixed with real-time tasks in order to make this work.
> >
> > FWIW, the problem that was initially reported has nothing to do with
> > this. i2c-algo-bit used mdelay() during transactions, not yield().
> > yield() is used only in once place, _between_ transactions attempts.
> > There are no strict timing constraints there.
>
> That still does not explain why yield() is necessary _between_ the
> transaction attempts.

It is not _necessary_. We are just trying to be fair to other kernel
threads, because bit-banging is expensive and this is the only case
where we know we can tolerate a delay.

Just to clarify things... does (or did) yield() have anything to do
with CONFIG_PREEMPT_VOLUNTARY?

> That code is fully preemptible, otherwise you could not call
> yield().

How does one know what code is preemtible and what code is not? The
rest of the i2c-algo-bit code should definitely _not_ be preemtible, as
it is highly timing sensitive.

> And as I said before nobody even noticed that the yield()
> default implementation was changed to a NOOP by default in the
> scheduler.

Well, I guess only people monitoring system latency would notice, as
this is the only thing yield() was supposed to help with in the first
place.

You say "NOOP by default", does this imply there is a way to change
this?

Was yield() turned into NOOP by design, or was it a bug?

--
Jean Delvare

2009-11-19 12:57:45

by Alan

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

> Well, I guess only people monitoring system latency would notice, as
> this is the only thing yield() was supposed to help with in the first
> place.

if (need_resched())
schedule();

will make non-rt tasks act politely at the right moments. RT tasks will
likely immediately get to take the CPU again depending upon the
scheduling parameters in use.

2009-11-19 13:05:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

On Thu, 2009-11-19 at 12:59 +0000, Alan Cox wrote:
> > Well, I guess only people monitoring system latency would notice, as
> > this is the only thing yield() was supposed to help with in the first
> > place.
>
> if (need_resched())
> schedule();

aka.

cond_resched();

> will make non-rt tasks act politely at the right moments. RT tasks will
> likely immediately get to take the CPU again depending upon the
> scheduling parameters in use.

Right, FIFO will simply NOP it, since if it was the highest running
task, it will still be. RR could possibly run out of its slice and
schedule to another RR of the same prio.

2009-11-19 13:11:39

by Thomas Gleixner

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

On Thu, 19 Nov 2009, Jean Delvare wrote:
> > That still does not explain why yield() is necessary _between_ the
> > transaction attempts.
>
> It is not _necessary_. We are just trying to be fair to other kernel
> threads, because bit-banging is expensive and this is the only case
> where we know we can tolerate a delay.
>
> Just to clarify things... does (or did) yield() have anything to do
> with CONFIG_PREEMPT_VOLUNTARY?

No.

> > That code is fully preemptible, otherwise you could not call
> > yield().
>
> How does one know what code is preemtible and what code is not? The
> rest of the i2c-algo-bit code should definitely _not_ be preemtible, as
> it is highly timing sensitive.

Code is preemptible when preempt_count() == 0 and interrupts are
enabled. spin_lock() implicitely disables preemption.

> > And as I said before nobody even noticed that the yield()
> > default implementation was changed to a NOOP by default in the
> > scheduler.
>
> Well, I guess only people monitoring system latency would notice, as
> this is the only thing yield() was supposed to help with in the first
> place.
>
> You say "NOOP by default", does this imply there is a way to change
> this?

There is a sysctl: sysctl_sched_compat_yield

> Was yield() turned into NOOP by design, or was it a bug?

By design. The semantics of yield and the fairness approach of CFS are
not really working well together. Also yield() for SCHED_OTHER is not
really specified.

Thanks,

tglx

2009-11-19 13:17:17

by Peter Zijlstra

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

On Thu, 2009-11-19 at 13:05 +0100, Jean Delvare wrote:
>
> Was yield() turned into NOOP by design, or was it a bug?

Design.

yield() for SCHED_OTHER is not specified, so everything goes.

There's two possible 'sane' options for yield for (CFS's) SCHED_OTHER:

- place the task behind all other tasks of the same nice level

This is however an O(n) operation for CFS since we don't separate
things out based on nice level, hence we don't do that.

- service the task that is most starved of service

That fits nicely into the fairness thing, and is what we default to.
The thing is, that's current in 99% of the cases, otherwise we would
already be running another task.

So its not strictly a NOP, but in practice it is.


There is also another option, place the task behind _all_ other tasks,
but that also surprises people and causes regressions, because they
don't expect yield() to wait _that_ long.


And all this is irrespective of the question of whether yield() is ever
a sane thing to do (I say not).

2009-11-19 13:19:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

On Thu, 2009-11-19 at 14:11 +0100, Thomas Gleixner wrote:
> > You say "NOOP by default", does this imply there is a way to change
> > this?
>
> There is a sysctl: sysctl_sched_compat_yield

This makes yield() place current behind all other tasks, and sucks too
for some workloads.

2009-11-19 13:23:18

by Thomas Gleixner

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

On Thu, 19 Nov 2009, Peter Zijlstra wrote:

> On Thu, 2009-11-19 at 14:11 +0100, Thomas Gleixner wrote:
> > > You say "NOOP by default", does this imply there is a way to change
> > > this?
> >
> > There is a sysctl: sysctl_sched_compat_yield
>
> This makes yield() place current behind all other tasks, and sucks too
> for some workloads.

yield() sucks anyway, so it depends which flavour of suckage you
prefer.

tglx

2009-11-19 14:00:08

by Jean Delvare

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

Hi Peter,

On Thu, 19 Nov 2009 14:06:54 +0100, Peter Zijlstra wrote:
> On Thu, 2009-11-19 at 12:59 +0000, Alan Cox wrote:
> > > Well, I guess only people monitoring system latency would notice, as
> > > this is the only thing yield() was supposed to help with in the first
> > > place.
> >
> > if (need_resched())
> > schedule();
>
> aka.
>
> cond_resched();

Are you saying that most calls to yield() should be replaced with calls
to cond_resched()?

I admit I a little skeptical. While the description of cond_resched()
("latency reduction via explicit rescheduling in places that are safe")
sounds promising, following the calls leads me to:

static inline int need_resched(void)
{
return unlikely(test_thread_flag(TIF_NEED_RESCHED));
}

So apparently the condition for need_resched() to do anything is
considered unlikely... suggesting that cond_resched() is a no-op in
most cases? I don't quite get the point of moving away from sched()
because it is a no-op, if we end up with a no-op under a different name.

--
Jean Delvare

2009-11-19 14:16:11

by Peter Zijlstra

[permalink] [raw]
Subject: Re: yield() in i2c non-happy paths hits BUG under -rt patch

On Thu, 2009-11-19 at 15:00 +0100, Jean Delvare wrote:

> > cond_resched();
>
> Are you saying that most calls to yield() should be replaced with calls
> to cond_resched()?

No, depends on the reason yield() is used. Some cases can be replaced by
locking constructs, such as a condition variable.

> I admit I a little skeptical. While the description of cond_resched()
> ("latency reduction via explicit rescheduling in places that are safe")
> sounds promising, following the calls leads me to:
>
> static inline int need_resched(void)
> {
> return unlikely(test_thread_flag(TIF_NEED_RESCHED));
> }
>
> So apparently the condition for need_resched() to do anything is
> considered unlikely... suggesting that cond_resched() is a no-op in
> most cases? I don't quite get the point of moving away from sched()
> because it is a no-op, if we end up with a no-op under a different name.

TIF_NEED_RESCHED gets set by the scheduler whenever it decides current
needs to get preempted, its unlikely() because that reduces the code
impact of cond_resched() and similar in the case we don't schedule, if
we do schedule() a mis-predicted branch isn't going to be noticed on the
overhead of scheduling.

So there's a few cases,

1) PREEMPT=n
2) Voluntary preempt
3) PREEMPT=y


1) non of this has any effect, if the scheduler wants to reschedule a
task that's in the kernel, it'll have to wait until it gets back to
user-space.

2) uses cond_resched() and similar to have explicit preemption points,
so we don't need to wait as long as 1).

3) preempts directly when !preempt_count(), when IRQs are disabled, the
IPI that will accompany TIF_NEED_RESCHED will be delayed and
local_irq_enable()/restore() will effect a reschedule due to the pending
IPI. If preemption was disabled while the IPI hit nothing will happen,
but preempt_enable() will do the reschedule once preempt_count reaches
0.