In development of RAMster, I have frequently been bitten
by indirect use of existing kernel subsystems that
unexpectedly sleep. As such, I have hacked the
following "debug" code fragments for use where I need to
ensure that doesn't happen.
DEFINE_PER_CPU(int, mustnotsleep_count);
void mustnotsleep_start(void)
{
int cpu = smp_processor_id();
per_cpu(mustnotsleep_count, cpu)++;
}
void mustnotsleep_done(void)
{
int cpu = smp_processor_id();
per_cpu(mustnotsleep_count, cpu)--;
}
and in schedule.c in schedule():
if (per_cpu(mustnotsleep_count))
panic("scheduler called in mustnotsleep code");
This has enabled me to start identifying code that
is causing me problems. (I know this is a horrible
hack, but that's OK right now.)
Rather than panic, an alternative would be for the
scheduler to check mustnotsleep_count and simply
always schedule the same thread (i.e. instantly wake).
I wasn't sure how to do that.
I know this is unusual, but still am wondering if there
is already some existing kernel mechanism for doing this?
Rationalization: Historically, CPUs were king and an OS
was designed to ensure that, if there was any work to do,
kernel code should yield (sleep) to ensure that those
precious CPUs are free to do the work. With modern many-core
CPUs and inexpensive servers, it is often the case that CPU
availability is no longer the bottleneck, and some other
resource is.
The design of Transcendent Memory ("tmem") makes the
assumption that RAM is the bottleneck and that CPU cycles are
abundant and can be wasted as necessary. Specifically, tmem
interfaces are assumed to be synchronous... a CPU that is
performing a tmem operation (e.g. in-kernel compression,
access to hypervisor memory, or access to RAM on a different
physical machine) must NOT sleep and so must busy-wait
(in some cases with irqs and bottom-halfs enabled) for events
to occur.
Comments welcome!
Dan
---
Thanks... for the memory!
I really could use more / my throughput's on the floor
The balloon is flat / my swap disk's fat / I've OOM's in store
Overcommitted so much
(with apologies to Bob Hope)
Dan Magenheimer <[email protected]> writes:
> In development of RAMster, I have frequently been bitten
> by indirect use of existing kernel subsystems that
> unexpectedly sleep. As such, I have hacked the
> following "debug" code fragments for use where I need to
> ensure that doesn't happen.
>
> DEFINE_PER_CPU(int, mustnotsleep_count);
>
> void mustnotsleep_start(void)
> {
> int cpu = smp_processor_id();
> per_cpu(mustnotsleep_count, cpu)++;
> }
>
> void mustnotsleep_done(void)
> {
> int cpu = smp_processor_id();
> per_cpu(mustnotsleep_count, cpu)--;
> }
>
> and in schedule.c in schedule():
>
> if (per_cpu(mustnotsleep_count))
> panic("scheduler called in mustnotsleep code");
>
> This has enabled me to start identifying code that
> is causing me problems. (I know this is a horrible
> hack, but that's OK right now.)
I'm pretty sure I'm missing something here but... what if you just use
CONFIG_DEBUG_PREEMPT? Isn't that good enough?
Cheers,
--
Luis Henriques
> From: Luis Henriques [mailto:[email protected]]
> Subject: Re: [RFC] "mustnotsleep"
>
> Dan Magenheimer <[email protected]> writes:
>
> > In development of RAMster, I have frequently been bitten
> > by indirect use of existing kernel subsystems that
> > unexpectedly sleep. As such, I have hacked the
> > following "debug" code fragments for use where I need to
> > ensure that doesn't happen.
> >
> > DEFINE_PER_CPU(int, mustnotsleep_count);
> >
> > void mustnotsleep_start(void)
> > {
> > int cpu = smp_processor_id();
> > per_cpu(mustnotsleep_count, cpu)++;
> > }
> >
> > void mustnotsleep_done(void)
> > {
> > int cpu = smp_processor_id();
> > per_cpu(mustnotsleep_count, cpu)--;
> > }
> >
> > and in schedule.c in schedule():
> >
> > if (per_cpu(mustnotsleep_count))
> > panic("scheduler called in mustnotsleep code");
> >
> > This has enabled me to start identifying code that
> > is causing me problems. (I know this is a horrible
> > hack, but that's OK right now.)
>
> I'm pretty sure I'm missing something here but... what if you just use
> CONFIG_DEBUG_PREEMPT? Isn't that good enough?
Thanks for the reply Luis.
Looking at the code enabled by CONFIG_DEBUG_PREEMPT,
I don't think it does what I'm looking for. I need
to ensure that no code called inside the boundaries
of mustnotsleep_start/done ever calls the scheduler,
e.g. cond_resched is never called etc.
On Thu, Jun 2, 2011 at 5:00 PM, Dan Magenheimer
<[email protected]> wrote:
>> From: Luis Henriques [mailto:[email protected]]
>> Subject: Re: [RFC] "mustnotsleep"
>>
>> Dan Magenheimer <[email protected]> writes:
>>
>> > In development of RAMster, I have frequently been bitten
>> > by indirect use of existing kernel subsystems that
>> > unexpectedly sleep. As such, I have hacked the
>> > following "debug" code fragments for use where I need to
>> > ensure that doesn't happen.
>> >
>> > DEFINE_PER_CPU(int, mustnotsleep_count);
>> >
>> > void mustnotsleep_start(void)
>> > {
>> > int cpu = smp_processor_id();
>> > per_cpu(mustnotsleep_count, cpu)++;
>> > }
>> >
>> > void mustnotsleep_done(void)
>> > {
>> > int cpu = smp_processor_id();
>> > per_cpu(mustnotsleep_count, cpu)--;
>> > }
>> >
>> > and in schedule.c in schedule():
>> >
>> > if (per_cpu(mustnotsleep_count))
>> > panic("scheduler called in mustnotsleep code");
>> >
>> > This has enabled me to start identifying code that
>> > is causing me problems. (I know this is a horrible
>> > hack, but that's OK right now.)
>>
>> I'm pretty sure I'm missing something here but... what if you just use
>> CONFIG_DEBUG_PREEMPT? Isn't that good enough?
>
> Thanks for the reply Luis.
>
> Looking at the code enabled by CONFIG_DEBUG_PREEMPT,
> I don't think it does what I'm looking for. I need
> to ensure that no code called inside the boundaries
> of mustnotsleep_start/done ever calls the scheduler,
> e.g. cond_resched is never called etc.
Luis is surely on the right lines, but maybe CONFIG_DEBUG_PREEMPT was
not the crucial option.
You should be building with CONFIG_PREEMPT and
CONFIG_DEBUG_SPINLOCK_SLEEP for best testing, and it would be good to
throw in CONFIG_DEBUG_PREEMPT too.
You need to preempt_disable() instead of mustnotsleep_start(), and
preempt_enable() instead of mustnotsleep_done(). You cannot even
safely use their smp_processor_id() without disabling preemption -
otherwise that task might on a different cpu by the time you get to
use the cpu it told you. It's essential to disable preemption if you
cannot bear to be rescheduled.
Then, if I've read the ifdefs correctly (please verify),
cond_resched() will include a __might_sleep() to report "BUG: sleeping
function called from invalid context..." from cond_resched() and other
useful places.
But please don't disable preemption unnecessarily, nor for very long:
it's better to remain preemptible wherever possible.
Hugh
> Luis is surely on the right lines, but maybe CONFIG_DEBUG_PREEMPT was
> not the crucial option.
>
> You should be building with CONFIG_PREEMPT and
> CONFIG_DEBUG_SPINLOCK_SLEEP for best testing, and it would be good to
> throw in CONFIG_DEBUG_PREEMPT too.
>
> You need to preempt_disable() instead of mustnotsleep_start(), and
> preempt_enable() instead of mustnotsleep_done(). You cannot even
> safely use their smp_processor_id() without disabling preemption -
> otherwise that task might on a different cpu by the time you get to
> use the cpu it told you. It's essential to disable preemption if you
> cannot bear to be rescheduled.
>
> Then, if I've read the ifdefs correctly (please verify),
> cond_resched() will include a __might_sleep() to report "BUG: sleeping
> function called from invalid context..." from cond_resched() and other
> useful places.
Excellent! Thanks, I will give it a try! In my case,
preemption is indeed already disabled for all this code
(but irq's and bottom halves are enabled).
> But please don't disable preemption unnecessarily, nor for very long:
> it's better to remain preemptible wherever possible.
Sure, but what I'm arguing is that in many cases (and contrary
to conventional wisdom), a "somewhat fast" access to a page of data
with preemption DISabled is better than a "slow" disk read
with preemption ENabled. Naturally this depends on if
all of the CPUs have useful work to do while waiting for the
disk read to complete... but as cores proliferate, CPU saturation
is becoming less likely.
Thanks again!
Dan