This is a bit late, but here goes anyway.
Having played with the x86 context tracking hooks for awhile, I think
it would be nice if core code that needs to be aware of CPU context
(kernel, user, idle, guest, etc) could come up with single,
comprehensible, easily validated set of hooks that arch code is
supposed to call.
Currently we have:
- RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
- Context tracking hooks. Only used by some arches. Calling these
calls the RCU hooks for you in most cases. They have weird
interactions with interrupts and they're slow.
- vtime. Beats the heck out of me.
- Whatever deferred things Christoph keeps reminding us about.
Honestly, I don't fully understand what all these hooks are supposed
to do, nor do I care all that much. From my perspective, the code
code should be able to do whatever it wants and rely on appropriate
notifications from arch code. It would be great if we could come up
with something straightforward that covers everything. For example:
user_mode_to_kernel_mode()
kernel_mode_to_user_mode()
kernel_mode_to_guest_mode()
in_a_periodic_tick()
starting_nmi()
ending_nmi()
may_i_turn_off_ticks_right_now()
or, better yet:
i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
Some arches may need:
i_am_lame_and_forgot_my_previous_context()
x86 will soon (4.3 or 4.4, depending on how my syscall cleanup goes)
no longer need that.
Paul says that some arches need something that goes straight from IRQ
to user mode (?) -- sigh.
etc.
It might make sense to get enough people who understand what's going
on behind the scenes together to hash out the requirements.
--Andy
On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> This is a bit late, but here goes anyway.
>
> Having played with the x86 context tracking hooks for awhile, I think
> it would be nice if core code that needs to be aware of CPU context
> (kernel, user, idle, guest, etc) could come up with single,
> comprehensible, easily validated set of hooks that arch code is
> supposed to call.
>
> Currently we have:
>
> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
Something about people yelling at me for waking up idle CPUs, thus
degrading their battery lifetimes. ;-)
> - Context tracking hooks. Only used by some arches. Calling these
> calls the RCU hooks for you in most cases. They have weird
> interactions with interrupts and they're slow.
Combining these would be good, but there are subtleties. For example,
some arches don't have context tracking, but RCU still needs to correctly
identify idle CPUs without in any way interrupting or awakening that CPU.
It would be good to make this faster, but it does have to work.
> - vtime. Beats the heck out of me.
>
> - Whatever deferred things Christoph keeps reminding us about.
>
> Honestly, I don't fully understand what all these hooks are supposed
> to do, nor do I care all that much. From my perspective, the code
> code should be able to do whatever it wants and rely on appropriate
> notifications from arch code. It would be great if we could come up
> with something straightforward that covers everything. For example:
>
> user_mode_to_kernel_mode()
> kernel_mode_to_user_mode()
> kernel_mode_to_guest_mode()
> in_a_periodic_tick()
> starting_nmi()
> ending_nmi()
kernel_mode_nonidle_to_idle()
kernel_mode_idle_to_nonidle()
> may_i_turn_off_ticks_right_now()
This is RCU if CONFIG_RCU_FAST_NO_HZ=n.
> or, better yet:
> i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
This is RCU if CONFIG_RCU_FAST_NO_HZ=y. It would not be difficult to
make RCU do this if CONFIG_RCU_FAST_NO_HZ=n as well, but doing so would
increase to/from idle overhead.
> Some arches may need:
>
> i_am_lame_and_forgot_my_previous_context()
>
> x86 will soon (4.3 or 4.4, depending on how my syscall cleanup goes)
> no longer need that.
>
> Paul says that some arches need something that goes straight from IRQ
> to user mode (?) -- sigh.
Straight from IRQ to process-level kernel mode. I ran into this in
late 2011, and clearly should have documented exactly what code was
doing this. Something about invoking system calls from within the
kernel on some architectures.
Hey, if no architectures do this anymore, I could simplify RCU a bit! ;-)
> etc.
>
> It might make sense to get enough people who understand what's going
> on behind the scenes together to hash out the requirements.
Please count me in!
Thanx, Paul
On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> This is a bit late, but here goes anyway.
>
> Having played with the x86 context tracking hooks for awhile, I think
> it would be nice if core code that needs to be aware of CPU context
> (kernel, user, idle, guest, etc) could come up with single,
> comprehensible, easily validated set of hooks that arch code is
> supposed to call.
>
> Currently we have:
>
> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
>
> - Context tracking hooks. Only used by some arches. Calling these
> calls the RCU hooks for you in most cases. They have weird
> interactions with interrupts and they're slow.
>
> - vtime. Beats the heck out of me.
>
> - Whatever deferred things Christoph keeps reminding us about.
>
> Honestly, I don't fully understand what all these hooks are supposed
> to do, nor do I care all that much. From my perspective, the code
> code should be able to do whatever it wants and rely on appropriate
> notifications from arch code. It would be great if we could come up
> with something straightforward that covers everything. For example:
>
> user_mode_to_kernel_mode()
> kernel_mode_to_user_mode()
> kernel_mode_to_guest_mode()
> in_a_periodic_tick()
> starting_nmi()
> ending_nmi()
> may_i_turn_off_ticks_right_now()
> or, better yet:
> i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
>
> Some arches may need:
>
> i_am_lame_and_forgot_my_previous_context()
Can all this information be generalized with some basic core hooks
or could some of this contextual informatioin typically vary depending
on the sequence we are in ? It sounds like its the later and that's
the issue ?
Reason I ask is I've been working on a slightly different series of arch
problems lately but its gotten me wondering about the possibility over adding a
shared layer of hooks that some arch init code could use to relay back
information about some other contextual information (in my case yielding
execution in some paravirtualized scenerios, in my case I only need this during
init sequences though). My reasoning for considering this didn't seem
sufficient to add yet-another-layer or boilet-plate code for arch init sequence
code but if there is a slew of other meta data contextual information which we
could use in arch code perhaps this might make more sense then. This of course
only makes sense for your use case if things really vary depending on the
sequence reaching out to check for any of the above. It would not need to be
tied down to init sequences alone, the way this could work for instance could
be for certain critial code to feed meta data over contextual information which
needs to be vetted which we currently have sloppy, or difficult waays of
retrieving. Then the onus would be for all of us to vet each critial section
carefully and to identify clearly all required contextual information.
Luis
On Tue, Aug 11, 2015 at 11:33 AM, Paul E. McKenney
<[email protected]> wrote:
> On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
>> This is a bit late, but here goes anyway.
>>
>> Having played with the x86 context tracking hooks for awhile, I think
>> it would be nice if core code that needs to be aware of CPU context
>> (kernel, user, idle, guest, etc) could come up with single,
>> comprehensible, easily validated set of hooks that arch code is
>> supposed to call.
>>
>> Currently we have:
>>
>> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
>
> Something about people yelling at me for waking up idle CPUs, thus
> degrading their battery lifetimes. ;-)
>
>> - Context tracking hooks. Only used by some arches. Calling these
>> calls the RCU hooks for you in most cases. They have weird
>> interactions with interrupts and they're slow.
>
> Combining these would be good, but there are subtleties. For example,
> some arches don't have context tracking, but RCU still needs to correctly
> identify idle CPUs without in any way interrupting or awakening that CPU.
> It would be good to make this faster, but it does have to work.
Could we maybe have one set of old RCU-only (no context tracking)
callbacks and a completely separate set of callbacks for arches that
support full context tracking? The implementation of the latter would
presumably call into RCU.
>> may_i_turn_off_ticks_right_now()
>
> This is RCU if CONFIG_RCU_FAST_NO_HZ=n.
>
>> or, better yet:
>> i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
>
> This is RCU if CONFIG_RCU_FAST_NO_HZ=y. It would not be difficult to
> make RCU do this if CONFIG_RCU_FAST_NO_HZ=n as well, but doing so would
> increase to/from idle overhead.
If things actually end up using hrtimers, we might also want
get_off_my_lawn() aka "isolate this cpu now and try to do all the
deferred stuff right now and kill off those hrtimers".
Rik is (was?) trying to make some housekeeper CPU probe other CPUs'
state to eliminate the need for exact vtime accounting and thus speed
up transitions to/from user or idle. It would be really neat if we
could simultaneously have quick idle/user transitions *and* avoid
deferred per-cpu work interrupting idle/user mode.
Chris Metcalf seems quite excited about the kernel staying far away
from his CPU once he's ready :)
>
>> Some arches may need:
>>
>> i_am_lame_and_forgot_my_previous_context()
>>
>> x86 will soon (4.3 or 4.4, depending on how my syscall cleanup goes)
>> no longer need that.
>>
>> Paul says that some arches need something that goes straight from IRQ
>> to user mode (?) -- sigh.
>
> Straight from IRQ to process-level kernel mode. I ran into this in
> late 2011, and clearly should have documented exactly what code was
> doing this. Something about invoking system calls from within the
> kernel on some architectures.
>
> Hey, if no architectures do this anymore, I could simplify RCU a bit! ;-)
I wonder if whatever arches do this could do it in two steps: exit IRQ
and then enter normal kernel mode.
--Andy
On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> This is a bit late, but here goes anyway.
>
> Having played with the x86 context tracking hooks for awhile, I think
> it would be nice if core code that needs to be aware of CPU context
> (kernel, user, idle, guest, etc) could come up with single,
> comprehensible, easily validated set of hooks that arch code is
> supposed to call.
>
> Currently we have:
>
> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
>
> - Context tracking hooks. Only used by some arches. Calling these
> calls the RCU hooks for you in most cases. They have weird
> interactions with interrupts and they're slow.
>
> - vtime. Beats the heck out of me.
>
> - Whatever deferred things Christoph keeps reminding us about.
>
> Honestly, I don't fully understand what all these hooks are supposed
> to do, nor do I care all that much. From my perspective, the code
> code should be able to do whatever it wants and rely on appropriate
> notifications from arch code. It would be great if we could come up
> with something straightforward that covers everything. For example:
>
> user_mode_to_kernel_mode()
> kernel_mode_to_user_mode()
> kernel_mode_to_guest_mode()
> in_a_periodic_tick()
> starting_nmi()
> ending_nmi()
> may_i_turn_off_ticks_right_now()
> or, better yet:
> i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
>
> Some arches may need:
>
> i_am_lame_and_forgot_my_previous_context()
>
> x86 will soon (4.3 or 4.4, depending on how my syscall cleanup goes)
> no longer need that.
>
> Paul says that some arches need something that goes straight from IRQ
> to user mode (?) -- sigh.
>
> etc.
>
> It might make sense to get enough people who understand what's going
> on behind the scenes together to hash out the requirements.
I'm quite interested in this topic as well. In addition to the above
requirements, we also want to ensure that the kernel entry/exit fast
paths are as optimized as possible; everyone wants to hook those, and
very few things actually should.
- Josh Triplett
On Tue, Aug 11, 2015 at 10:49 AM, Andy Lutomirski <[email protected]> wrote:
> This is a bit late, but here goes anyway.
>
> Having played with the x86 context tracking hooks for awhile, I think
> it would be nice if core code that needs to be aware of CPU context
> (kernel, user, idle, guest, etc) could come up with single,
> comprehensible, easily validated set of hooks that arch code is
> supposed to call.
Having worked on both the arm and arm64 implementations of context
tracking, I'm interested in this as well.
Kevin
On Tue, Aug 11, 2015 at 12:07:54PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 11, 2015 at 11:33 AM, Paul E. McKenney
> <[email protected]> wrote:
> > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> >> This is a bit late, but here goes anyway.
> >>
> >> Having played with the x86 context tracking hooks for awhile, I think
> >> it would be nice if core code that needs to be aware of CPU context
> >> (kernel, user, idle, guest, etc) could come up with single,
> >> comprehensible, easily validated set of hooks that arch code is
> >> supposed to call.
> >>
> >> Currently we have:
> >>
> >> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
> >
> > Something about people yelling at me for waking up idle CPUs, thus
> > degrading their battery lifetimes. ;-)
> >
> >> - Context tracking hooks. Only used by some arches. Calling these
> >> calls the RCU hooks for you in most cases. They have weird
> >> interactions with interrupts and they're slow.
> >
> > Combining these would be good, but there are subtleties. For example,
> > some arches don't have context tracking, but RCU still needs to correctly
> > identify idle CPUs without in any way interrupting or awakening that CPU.
> > It would be good to make this faster, but it does have to work.
>
> Could we maybe have one set of old RCU-only (no context tracking)
> callbacks and a completely separate set of callbacks for arches that
> support full context tracking? The implementation of the latter would
> presumably call into RCU.
It should be possible for RCU to use context tracking if it is available
and to have RCU maintain its own state otherwise, if that is what you
are getting at. Assuming that the decision is global and made at either
build or boot time, anyway. Having some CPUs tracking context and others
not sounds like an invitation for subtle bugs.
> >> may_i_turn_off_ticks_right_now()
> >
> > This is RCU if CONFIG_RCU_FAST_NO_HZ=n.
> >
> >> or, better yet:
> >> i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
> >
> > This is RCU if CONFIG_RCU_FAST_NO_HZ=y. It would not be difficult to
> > make RCU do this if CONFIG_RCU_FAST_NO_HZ=n as well, but doing so would
> > increase to/from idle overhead.
>
> If things actually end up using hrtimers, we might also want
> get_off_my_lawn() aka "isolate this cpu now and try to do all the
> deferred stuff right now and kill off those hrtimers".
If too many different subsystems use hrtimers, then we might well
find ourselves worse off than if we used scheduler-clock interrupts.
I suppose we could have some way of multiplexing a single hrtimer,
which could be thought of as an on-demand scheduling-clock interrupts.
> Rik is (was?) trying to make some housekeeper CPU probe other CPUs'
> state to eliminate the need for exact vtime accounting and thus speed
> up transitions to/from user or idle. It would be really neat if we
> could simultaneously have quick idle/user transitions *and* avoid
> deferred per-cpu work interrupting idle/user mode.
Careful here! Rik's vtime accounting is allowed to be approximate.
Using approximate accounting for RCU is an excellent way to sharply
increase your kernel's life-insurance premiums.
> Chris Metcalf seems quite excited about the kernel staying far away
> from his CPU once he's ready :)
Completely understandable. But I suspect that if push were to come to
shove, he would be even more excited about his kernel not crashing.
> >> Some arches may need:
> >>
> >> i_am_lame_and_forgot_my_previous_context()
> >>
> >> x86 will soon (4.3 or 4.4, depending on how my syscall cleanup goes)
> >> no longer need that.
> >>
> >> Paul says that some arches need something that goes straight from IRQ
> >> to user mode (?) -- sigh.
> >
> > Straight from IRQ to process-level kernel mode. I ran into this in
> > late 2011, and clearly should have documented exactly what code was
> > doing this. Something about invoking system calls from within the
> > kernel on some architectures.
> >
> > Hey, if no architectures do this anymore, I could simplify RCU a bit! ;-)
>
> I wonder if whatever arches do this could do it in two steps: exit IRQ
> and then enter normal kernel mode.
That certainly would make RCU's life easier! No idea on feasibility
otherwise, though.
Thanx, Paul
On Tue, Aug 11, 2015 at 08:42:58PM +0200, Luis R. Rodriguez wrote:
> On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> > This is a bit late, but here goes anyway.
> >
> > Having played with the x86 context tracking hooks for awhile, I think
> > it would be nice if core code that needs to be aware of CPU context
> > (kernel, user, idle, guest, etc) could come up with single,
> > comprehensible, easily validated set of hooks that arch code is
> > supposed to call.
> >
> > Currently we have:
> >
> > - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
> >
> > - Context tracking hooks. Only used by some arches. Calling these
> > calls the RCU hooks for you in most cases. They have weird
> > interactions with interrupts and they're slow.
> >
> > - vtime. Beats the heck out of me.
> >
> > - Whatever deferred things Christoph keeps reminding us about.
> >
> > Honestly, I don't fully understand what all these hooks are supposed
> > to do, nor do I care all that much. From my perspective, the code
> > code should be able to do whatever it wants and rely on appropriate
> > notifications from arch code. It would be great if we could come up
> > with something straightforward that covers everything. For example:
> >
> > user_mode_to_kernel_mode()
> > kernel_mode_to_user_mode()
> > kernel_mode_to_guest_mode()
> > in_a_periodic_tick()
> > starting_nmi()
> > ending_nmi()
> > may_i_turn_off_ticks_right_now()
> > or, better yet:
> > i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
> >
> > Some arches may need:
> >
> > i_am_lame_and_forgot_my_previous_context()
>
> Can all this information be generalized with some basic core hooks
> or could some of this contextual informatioin typically vary depending
> on the sequence we are in ? It sounds like its the later and that's
> the issue ?
Not sure exactly what you are suggesting, but given that many of these
need to be placed in fastpaths, I am not at all excited about having to
put switch statements in each of them.
> Reason I ask is I've been working on a slightly different series of arch
> problems lately but its gotten me wondering about the possibility over adding a
> shared layer of hooks that some arch init code could use to relay back
> information about some other contextual information (in my case yielding
> execution in some paravirtualized scenerios, in my case I only need this during
> init sequences though). My reasoning for considering this didn't seem
> sufficient to add yet-another-layer or boilet-plate code for arch init sequence
> code but if there is a slew of other meta data contextual information which we
> could use in arch code perhaps this might make more sense then. This of course
> only makes sense for your use case if things really vary depending on the
> sequence reaching out to check for any of the above. It would not need to be
> tied down to init sequences alone, the way this could work for instance could
> be for certain critial code to feed meta data over contextual information which
> needs to be vetted which we currently have sloppy, or difficult waays of
> retrieving. Then the onus would be for all of us to vet each critial section
> carefully and to identify clearly all required contextual information.
However, switch statements would probably be just fine for boot-time-only
code.
Thanx, Paul
On Tue, Aug 11, 2015 at 2:47 PM, Paul E. McKenney
<[email protected]> wrote:
> On Tue, Aug 11, 2015 at 12:07:54PM -0700, Andy Lutomirski wrote:
>> On Tue, Aug 11, 2015 at 11:33 AM, Paul E. McKenney
>> <[email protected]> wrote:
>> > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
>> >> This is a bit late, but here goes anyway.
>> >>
>> >> Having played with the x86 context tracking hooks for awhile, I think
>> >> it would be nice if core code that needs to be aware of CPU context
>> >> (kernel, user, idle, guest, etc) could come up with single,
>> >> comprehensible, easily validated set of hooks that arch code is
>> >> supposed to call.
>> >>
>> >> Currently we have:
>> >>
>> >> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
>> >
>> > Something about people yelling at me for waking up idle CPUs, thus
>> > degrading their battery lifetimes. ;-)
>> >
>> >> - Context tracking hooks. Only used by some arches. Calling these
>> >> calls the RCU hooks for you in most cases. They have weird
>> >> interactions with interrupts and they're slow.
>> >
>> > Combining these would be good, but there are subtleties. For example,
>> > some arches don't have context tracking, but RCU still needs to correctly
>> > identify idle CPUs without in any way interrupting or awakening that CPU.
>> > It would be good to make this faster, but it does have to work.
>>
>> Could we maybe have one set of old RCU-only (no context tracking)
>> callbacks and a completely separate set of callbacks for arches that
>> support full context tracking? The implementation of the latter would
>> presumably call into RCU.
>
> It should be possible for RCU to use context tracking if it is available
> and to have RCU maintain its own state otherwise, if that is what you
> are getting at. Assuming that the decision is global and made at either
> build or boot time, anyway. Having some CPUs tracking context and others
> not sounds like an invitation for subtle bugs.
I think that, if this happens, the decision should be made at build
time, per arch, and not be configurable. If x86_64 uses context
tracking, then I think x86_64 shouldn't need additional RCU callbacks,
assuming that context tracking is comprehensive enough for RCU's
purposes.
--Andy
On Tue, Aug 11, 2015 at 02:52:59PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 11, 2015 at 2:47 PM, Paul E. McKenney
> <[email protected]> wrote:
> > On Tue, Aug 11, 2015 at 12:07:54PM -0700, Andy Lutomirski wrote:
> >> On Tue, Aug 11, 2015 at 11:33 AM, Paul E. McKenney
> >> <[email protected]> wrote:
> >> > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> >> >> This is a bit late, but here goes anyway.
> >> >>
> >> >> Having played with the x86 context tracking hooks for awhile, I think
> >> >> it would be nice if core code that needs to be aware of CPU context
> >> >> (kernel, user, idle, guest, etc) could come up with single,
> >> >> comprehensible, easily validated set of hooks that arch code is
> >> >> supposed to call.
> >> >>
> >> >> Currently we have:
> >> >>
> >> >> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
> >> >
> >> > Something about people yelling at me for waking up idle CPUs, thus
> >> > degrading their battery lifetimes. ;-)
> >> >
> >> >> - Context tracking hooks. Only used by some arches. Calling these
> >> >> calls the RCU hooks for you in most cases. They have weird
> >> >> interactions with interrupts and they're slow.
> >> >
> >> > Combining these would be good, but there are subtleties. For example,
> >> > some arches don't have context tracking, but RCU still needs to correctly
> >> > identify idle CPUs without in any way interrupting or awakening that CPU.
> >> > It would be good to make this faster, but it does have to work.
> >>
> >> Could we maybe have one set of old RCU-only (no context tracking)
> >> callbacks and a completely separate set of callbacks for arches that
> >> support full context tracking? The implementation of the latter would
> >> presumably call into RCU.
> >
> > It should be possible for RCU to use context tracking if it is available
> > and to have RCU maintain its own state otherwise, if that is what you
> > are getting at. Assuming that the decision is global and made at either
> > build or boot time, anyway. Having some CPUs tracking context and others
> > not sounds like an invitation for subtle bugs.
>
> I think that, if this happens, the decision should be made at build
> time, per arch, and not be configurable. If x86_64 uses context
> tracking, then I think x86_64 shouldn't need additional RCU callbacks,
> assuming that context tracking is comprehensive enough for RCU's
> purposes.
If by "shouldn't need additional RCU callbacks" you mean that x86_64
shouldn't need to call the existing rcu_user_enter() and rcu_user_exit()
functions, I agree. Ditto for rcu_irq_enter(), rcu_irq_exit(),
rcu_nmi_enter(), rcu_nmi_exit(), I would guess. But would be necessary
to invoke rcu_idle_enter() and rcu_idle_exit(), especially for
CONFIG_NO_HZ_FULL_SYSIDLE=y kernels.
Thanx, Paul
On Tue, Aug 11, 2015 at 5:51 PM, Paul E. McKenney
<[email protected]> wrote:
> On Tue, Aug 11, 2015 at 02:52:59PM -0700, Andy Lutomirski wrote:
>> On Tue, Aug 11, 2015 at 2:47 PM, Paul E. McKenney
>> <[email protected]> wrote:
>> > On Tue, Aug 11, 2015 at 12:07:54PM -0700, Andy Lutomirski wrote:
>> >> On Tue, Aug 11, 2015 at 11:33 AM, Paul E. McKenney
>> >> <[email protected]> wrote:
>> >> > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
>> >> >> This is a bit late, but here goes anyway.
>> >> >>
>> >> >> Having played with the x86 context tracking hooks for awhile, I think
>> >> >> it would be nice if core code that needs to be aware of CPU context
>> >> >> (kernel, user, idle, guest, etc) could come up with single,
>> >> >> comprehensible, easily validated set of hooks that arch code is
>> >> >> supposed to call.
>> >> >>
>> >> >> Currently we have:
>> >> >>
>> >> >> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
>> >> >
>> >> > Something about people yelling at me for waking up idle CPUs, thus
>> >> > degrading their battery lifetimes. ;-)
>> >> >
>> >> >> - Context tracking hooks. Only used by some arches. Calling these
>> >> >> calls the RCU hooks for you in most cases. They have weird
>> >> >> interactions with interrupts and they're slow.
>> >> >
>> >> > Combining these would be good, but there are subtleties. For example,
>> >> > some arches don't have context tracking, but RCU still needs to correctly
>> >> > identify idle CPUs without in any way interrupting or awakening that CPU.
>> >> > It would be good to make this faster, but it does have to work.
>> >>
>> >> Could we maybe have one set of old RCU-only (no context tracking)
>> >> callbacks and a completely separate set of callbacks for arches that
>> >> support full context tracking? The implementation of the latter would
>> >> presumably call into RCU.
>> >
>> > It should be possible for RCU to use context tracking if it is available
>> > and to have RCU maintain its own state otherwise, if that is what you
>> > are getting at. Assuming that the decision is global and made at either
>> > build or boot time, anyway. Having some CPUs tracking context and others
>> > not sounds like an invitation for subtle bugs.
>>
>> I think that, if this happens, the decision should be made at build
>> time, per arch, and not be configurable. If x86_64 uses context
>> tracking, then I think x86_64 shouldn't need additional RCU callbacks,
>> assuming that context tracking is comprehensive enough for RCU's
>> purposes.
>
> If by "shouldn't need additional RCU callbacks" you mean that x86_64
> shouldn't need to call the existing rcu_user_enter() and rcu_user_exit()
> functions, I agree. Ditto for rcu_irq_enter(), rcu_irq_exit(),
> rcu_nmi_enter(), rcu_nmi_exit(), I would guess. But would be necessary
> to invoke rcu_idle_enter() and rcu_idle_exit(), especially for
> CONFIG_NO_HZ_FULL_SYSIDLE=y kernels.
Except that something wants vtime for idle, too, so maybe just
kernel_to_idle(). On the other hand, the idle loop is already fully
stocked with vtime stuff.
--Andy
On Wed, Aug 12, 2015 at 1:49 AM, Andy Lutomirski <[email protected]> wrote:
> This is a bit late, but here goes anyway.
>
> Having played with the x86 context tracking hooks for awhile, I think
> it would be nice if core code that needs to be aware of CPU context
> (kernel, user, idle, guest, etc) could come up with single,
> comprehensible, easily validated set of hooks that arch code is
> supposed to call.
>
> Currently we have:
>
> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
>
> - Context tracking hooks. Only used by some arches. Calling these
> calls the RCU hooks for you in most cases. They have weird
> interactions with interrupts and they're slow.
>
> - vtime. Beats the heck out of me.
>
> - Whatever deferred things Christoph keeps reminding us about.
>
> Honestly, I don't fully understand what all these hooks are supposed
> to do, nor do I care all that much. From my perspective, the code
> code should be able to do whatever it wants and rely on appropriate
> notifications from arch code. It would be great if we could come up
> with something straightforward that covers everything. For example:
>
> user_mode_to_kernel_mode()
> kernel_mode_to_user_mode()
> kernel_mode_to_guest_mode()
> in_a_periodic_tick()
> starting_nmi()
> ending_nmi()
> may_i_turn_off_ticks_right_now()
> or, better yet:
> i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
>
> Some arches may need:
>
> i_am_lame_and_forgot_my_previous_context()
>
> x86 will soon (4.3 or 4.4, depending on how my syscall cleanup goes)
> no longer need that.
>
> Paul says that some arches need something that goes straight from IRQ
> to user mode (?) -- sigh.
>
> etc.
>
> It might make sense to get enough people who understand what's going
> on behind the scenes together to hash out the requirements.
>
I am also interested by the topic. I hope we can find out a common
infrastructure to handle these callbacks. I am interested in
optimizing/simplifying the these callbacks of RCU as well.
Thanks,
Lai
> --Andy
> _______________________________________________
> Ksummit-discuss mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss
On Tue, Aug 11, 2015 at 06:16:01PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 11, 2015 at 5:51 PM, Paul E. McKenney
> <[email protected]> wrote:
> > On Tue, Aug 11, 2015 at 02:52:59PM -0700, Andy Lutomirski wrote:
> >> On Tue, Aug 11, 2015 at 2:47 PM, Paul E. McKenney
> >> <[email protected]> wrote:
> >> > On Tue, Aug 11, 2015 at 12:07:54PM -0700, Andy Lutomirski wrote:
> >> >> On Tue, Aug 11, 2015 at 11:33 AM, Paul E. McKenney
> >> >> <[email protected]> wrote:
> >> >> > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> >> >> >> This is a bit late, but here goes anyway.
> >> >> >>
> >> >> >> Having played with the x86 context tracking hooks for awhile, I think
> >> >> >> it would be nice if core code that needs to be aware of CPU context
> >> >> >> (kernel, user, idle, guest, etc) could come up with single,
> >> >> >> comprehensible, easily validated set of hooks that arch code is
> >> >> >> supposed to call.
> >> >> >>
> >> >> >> Currently we have:
> >> >> >>
> >> >> >> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
> >> >> >
> >> >> > Something about people yelling at me for waking up idle CPUs, thus
> >> >> > degrading their battery lifetimes. ;-)
> >> >> >
> >> >> >> - Context tracking hooks. Only used by some arches. Calling these
> >> >> >> calls the RCU hooks for you in most cases. They have weird
> >> >> >> interactions with interrupts and they're slow.
> >> >> >
> >> >> > Combining these would be good, but there are subtleties. For example,
> >> >> > some arches don't have context tracking, but RCU still needs to correctly
> >> >> > identify idle CPUs without in any way interrupting or awakening that CPU.
> >> >> > It would be good to make this faster, but it does have to work.
> >> >>
> >> >> Could we maybe have one set of old RCU-only (no context tracking)
> >> >> callbacks and a completely separate set of callbacks for arches that
> >> >> support full context tracking? The implementation of the latter would
> >> >> presumably call into RCU.
> >> >
> >> > It should be possible for RCU to use context tracking if it is available
> >> > and to have RCU maintain its own state otherwise, if that is what you
> >> > are getting at. Assuming that the decision is global and made at either
> >> > build or boot time, anyway. Having some CPUs tracking context and others
> >> > not sounds like an invitation for subtle bugs.
> >>
> >> I think that, if this happens, the decision should be made at build
> >> time, per arch, and not be configurable. If x86_64 uses context
> >> tracking, then I think x86_64 shouldn't need additional RCU callbacks,
> >> assuming that context tracking is comprehensive enough for RCU's
> >> purposes.
> >
> > If by "shouldn't need additional RCU callbacks" you mean that x86_64
> > shouldn't need to call the existing rcu_user_enter() and rcu_user_exit()
> > functions, I agree. Ditto for rcu_irq_enter(), rcu_irq_exit(),
> > rcu_nmi_enter(), rcu_nmi_exit(), I would guess. But would be necessary
> > to invoke rcu_idle_enter() and rcu_idle_exit(), especially for
> > CONFIG_NO_HZ_FULL_SYSIDLE=y kernels.
>
> Except that something wants vtime for idle, too, so maybe just
> kernel_to_idle(). On the other hand, the idle loop is already fully
> stocked with vtime stuff.
But vtime can work with approximation, and RCU cannot. Also vtime
needs to measure time, and RCU needs to count transitions. So I am
having some difficulty seeing the benefit of unifying vtime's and RCU's
idle entry/exit mechanism.
Now, if you are instead arguing for co-location of these mechanisms,
that might well be a different issue.
Thanx, Paul
On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> This is a bit late, but here goes anyway.
>
> Having played with the x86 context tracking hooks for awhile, I think
> it would be nice if core code that needs to be aware of CPU context
> (kernel, user, idle, guest, etc) could come up with single,
> comprehensible, easily validated set of hooks that arch code is
> supposed to call.
>
> Currently we have:
>
> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
Given how special is RCU, I wonder if it's a good idea to make it use some general
purpose state tracking such as preempt_count. Such general purpose states are meant
to be per CPU and only used locally whereas RCU needs remote access with ordering.
Besides, RCU doesn't use them in all configs.
I'm sure we can do it but I'm not sure we'll be proud of the result.
>
> - Context tracking hooks. Only used by some arches. Calling these
> calls the RCU hooks for you in most cases. They have weird
> interactions with interrupts and they're slow.
Well, considering their interaction with irqs, I don't think it's so
bad. The irqs hooks simply are in generic code.
> - vtime. Beats the heck out of me.
We are currently rethinking it. Not sure where we'll go.
>
> - Whatever deferred things Christoph keeps reminding us about.
>
> Honestly, I don't fully understand what all these hooks are supposed
> to do, nor do I care all that much. From my perspective, the code
> code should be able to do whatever it wants and rely on appropriate
> notifications from arch code. It would be great if we could come up
> with something straightforward that covers everything. For example:
>
> user_mode_to_kernel_mode()
> kernel_mode_to_user_mode()
> kernel_mode_to_guest_mode()
> in_a_periodic_tick()
> starting_nmi()
> ending_nmi()
> may_i_turn_off_ticks_right_now()
We have all these things already. But many of them are handled by the core code
already: NMIs, IRQS, guests, ticks. Archs shouldn't care about these.
Now probably all the preempt count stuff should belong to some global context tracking
subsystem. But since most of these calls are inlines...
> or, better yet:
> i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
>
> Some arches may need:
>
> i_am_lame_and_forgot_my_previous_context()
I'm still not sure it's a good idea to mix up hard and soft tracking.
> x86 will soon (4.3 or 4.4, depending on how my syscall cleanup goes)
> no longer need that.
Syscalls should be fine with if we have only one call to user_exit() and
user_enter(). Assuming signals and rescheduling are handled in between.
On Tue, Aug 11, 2015 at 08:42:58PM +0200, Luis R. Rodriguez wrote:
> On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> > This is a bit late, but here goes anyway.
> >
> > Having played with the x86 context tracking hooks for awhile, I think
> > it would be nice if core code that needs to be aware of CPU context
> > (kernel, user, idle, guest, etc) could come up with single,
> > comprehensible, easily validated set of hooks that arch code is
> > supposed to call.
> >
> > Currently we have:
> >
> > - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
> >
> > - Context tracking hooks. Only used by some arches. Calling these
> > calls the RCU hooks for you in most cases. They have weird
> > interactions with interrupts and they're slow.
> >
> > - vtime. Beats the heck out of me.
> >
> > - Whatever deferred things Christoph keeps reminding us about.
> >
> > Honestly, I don't fully understand what all these hooks are supposed
> > to do, nor do I care all that much. From my perspective, the code
> > code should be able to do whatever it wants and rely on appropriate
> > notifications from arch code. It would be great if we could come up
> > with something straightforward that covers everything. For example:
> >
> > user_mode_to_kernel_mode()
> > kernel_mode_to_user_mode()
> > kernel_mode_to_guest_mode()
> > in_a_periodic_tick()
> > starting_nmi()
> > ending_nmi()
> > may_i_turn_off_ticks_right_now()
> > or, better yet:
> > i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
> >
> > Some arches may need:
> >
> > i_am_lame_and_forgot_my_previous_context()
>
> Can all this information be generalized with some basic core hooks
> or could some of this contextual informatioin typically vary depending
> on the sequence we are in ? It sounds like its the later and that's
> the issue ?
That's what we do with context tracking. It tracks the context (user/kernel)
and stores these informations. And indeed the contextual informations can vary
depending for example if an exception triggered in userspace or kernelspace.
On Tue, Aug 11, 2015 at 11:33:12AM -0700, Paul E. McKenney wrote:
> On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> > Some arches may need:
> >
> > i_am_lame_and_forgot_my_previous_context()
> >
> > x86 will soon (4.3 or 4.4, depending on how my syscall cleanup goes)
> > no longer need that.
> >
> > Paul says that some arches need something that goes straight from IRQ
> > to user mode (?) -- sigh.
>
> Straight from IRQ to process-level kernel mode. I ran into this in
> late 2011, and clearly should have documented exactly what code was
> doing this. Something about invoking system calls from within the
> kernel on some architectures.
>
> Hey, if no architectures do this anymore, I could simplify RCU a bit! ;-)
That issue has always been a bit foggy to me :-)
We never really stated what exactly the issue was. Just performing syscalls
from kernel mode shouldn't fiddle with the dynticks count.
IIUC, the issue was that some IRQs triggered and never returned. But we
certainly can't remove the safety code without clearly identifying the
issue...
On Tue, Aug 11, 2015 at 12:07:54PM -0700, Andy Lutomirski wrote:
> On Tue, Aug 11, 2015 at 11:33 AM, Paul E. McKenney
> <[email protected]> wrote:
> > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> >> This is a bit late, but here goes anyway.
> >>
> >> Having played with the x86 context tracking hooks for awhile, I think
> >> it would be nice if core code that needs to be aware of CPU context
> >> (kernel, user, idle, guest, etc) could come up with single,
> >> comprehensible, easily validated set of hooks that arch code is
> >> supposed to call.
> >>
> >> Currently we have:
> >>
> >> - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
> >
> > Something about people yelling at me for waking up idle CPUs, thus
> > degrading their battery lifetimes. ;-)
> >
> >> - Context tracking hooks. Only used by some arches. Calling these
> >> calls the RCU hooks for you in most cases. They have weird
> >> interactions with interrupts and they're slow.
> >
> > Combining these would be good, but there are subtleties. For example,
> > some arches don't have context tracking, but RCU still needs to correctly
> > identify idle CPUs without in any way interrupting or awakening that CPU.
> > It would be good to make this faster, but it does have to work.
>
> Could we maybe have one set of old RCU-only (no context tracking)
> callbacks and a completely separate set of callbacks for arches that
> support full context tracking? The implementation of the latter would
> presumably call into RCU.
That's already what we do I think.
rcu_idle_enter()/rcu_idle_exit() are the old RCU-only stuffs and the rest
(rcu_user_exit()/enter()) uses context tracking.
>
> >> may_i_turn_off_ticks_right_now()
> >
> > This is RCU if CONFIG_RCU_FAST_NO_HZ=n.
> >
> >> or, better yet:
> >> i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
> >
> > This is RCU if CONFIG_RCU_FAST_NO_HZ=y. It would not be difficult to
> > make RCU do this if CONFIG_RCU_FAST_NO_HZ=n as well, but doing so would
> > increase to/from idle overhead.
>
> If things actually end up using hrtimers, we might also want
> get_off_my_lawn() aka "isolate this cpu now and try to do all the
> deferred stuff right now and kill off those hrtimers".
Yeah that's what we are trying to do. But hrtimers aren't special here,
they are noise just like any other.
>
> Rik is (was?) trying to make some housekeeper CPU probe other CPUs'
> state to eliminate the need for exact vtime accounting and thus speed
> up transitions to/from user or idle.
Only user. And that's only about vtime. RCU still needs to be handled
locally.
> It would be really neat if we
> could simultaneously have quick idle/user transitions *and* avoid
> deferred per-cpu work interrupting idle/user mode.
I think that's the goal. If we eventually offline the vtime accounting,
all that remains is RCU hooks on user/kernel transitions.
On Wed, Aug 12, 2015 at 04:38:21PM +0200, Frederic Weisbecker wrote:
> On Tue, Aug 11, 2015 at 11:33:12AM -0700, Paul E. McKenney wrote:
> > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> > > Some arches may need:
> > >
> > > i_am_lame_and_forgot_my_previous_context()
> > >
> > > x86 will soon (4.3 or 4.4, depending on how my syscall cleanup goes)
> > > no longer need that.
> > >
> > > Paul says that some arches need something that goes straight from IRQ
> > > to user mode (?) -- sigh.
> >
> > Straight from IRQ to process-level kernel mode. I ran into this in
> > late 2011, and clearly should have documented exactly what code was
> > doing this. Something about invoking system calls from within the
> > kernel on some architectures.
> >
> > Hey, if no architectures do this anymore, I could simplify RCU a bit! ;-)
>
> That issue has always been a bit foggy to me :-)
>
> We never really stated what exactly the issue was. Just performing syscalls
> from kernel mode shouldn't fiddle with the dynticks count.
>
> IIUC, the issue was that some IRQs triggered and never returned. But we
> certainly can't remove the safety code without clearly identifying the
> issue...
This was not a theoretical problem -- there were real failures.
But yes, the safety code is there and seems to work OK, so I do need
confirmation of a change before removing it. I do recall someone
arguing that the half-interrupts should go away, but I never did hear
that they really did go away.
Adding linux-arch in the hope that someone can say for sure.
Thanx, Paul
On Wed, Aug 12, 2015 at 04:27:34PM +0200, Frederic Weisbecker wrote:
> On Tue, Aug 11, 2015 at 08:42:58PM +0200, Luis R. Rodriguez wrote:
> > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> > > This is a bit late, but here goes anyway.
> > >
> > > Having played with the x86 context tracking hooks for awhile, I think
> > > it would be nice if core code that needs to be aware of CPU context
> > > (kernel, user, idle, guest, etc) could come up with single,
> > > comprehensible, easily validated set of hooks that arch code is
> > > supposed to call.
> > >
> > > Currently we have:
> > >
> > > - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
> > >
> > > - Context tracking hooks. Only used by some arches. Calling these
> > > calls the RCU hooks for you in most cases. They have weird
> > > interactions with interrupts and they're slow.
> > >
> > > - vtime. Beats the heck out of me.
> > >
> > > - Whatever deferred things Christoph keeps reminding us about.
> > >
> > > Honestly, I don't fully understand what all these hooks are supposed
> > > to do, nor do I care all that much. From my perspective, the code
> > > code should be able to do whatever it wants and rely on appropriate
> > > notifications from arch code. It would be great if we could come up
> > > with something straightforward that covers everything. For example:
> > >
> > > user_mode_to_kernel_mode()
> > > kernel_mode_to_user_mode()
> > > kernel_mode_to_guest_mode()
> > > in_a_periodic_tick()
> > > starting_nmi()
> > > ending_nmi()
> > > may_i_turn_off_ticks_right_now()
> > > or, better yet:
> > > i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
> > >
> > > Some arches may need:
> > >
> > > i_am_lame_and_forgot_my_previous_context()
> >
> > Can all this information be generalized with some basic core hooks
> > or could some of this contextual informatioin typically vary depending
> > on the sequence we are in ? It sounds like its the later and that's
> > the issue ?
>
> That's what we do with context tracking. It tracks the context (user/kernel)
> and stores these informations. And indeed the contextual informations can vary
> depending for example if an exception triggered in userspace or kernelspace.
Another question of interest is "Can things be arranged so that RCU uses
the context-tracking information directly in place of rcu_dynticks?"
In theory, the answer is clearly "yes", but the reason that RCU's
accounting is heavyweight is the need to get precise state readout on
other CPUs. So it is quite possible that making RCU directly use the
context-tracking information will make that tracking slower and more
complex, so that the overall effect will be zero net improvement.
But it does seem worth a look.
Thanx, Paul
On Tue, Aug 11, 2015 at 02:50:29PM -0700, Paul E. McKenney wrote:
> On Tue, Aug 11, 2015 at 08:42:58PM +0200, Luis R. Rodriguez wrote:
> > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> > > This is a bit late, but here goes anyway.
> > >
> > > Having played with the x86 context tracking hooks for awhile, I think
> > > it would be nice if core code that needs to be aware of CPU context
> > > (kernel, user, idle, guest, etc) could come up with single,
> > > comprehensible, easily validated set of hooks that arch code is
> > > supposed to call.
> > >
> > > Currently we have:
> > >
> > > - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
> > >
> > > - Context tracking hooks. Only used by some arches. Calling these
> > > calls the RCU hooks for you in most cases. They have weird
> > > interactions with interrupts and they're slow.
> > >
> > > - vtime. Beats the heck out of me.
> > >
> > > - Whatever deferred things Christoph keeps reminding us about.
> > >
> > > Honestly, I don't fully understand what all these hooks are supposed
> > > to do, nor do I care all that much. From my perspective, the code
> > > code should be able to do whatever it wants and rely on appropriate
> > > notifications from arch code. It would be great if we could come up
> > > with something straightforward that covers everything. For example:
> > >
> > > user_mode_to_kernel_mode()
> > > kernel_mode_to_user_mode()
> > > kernel_mode_to_guest_mode()
> > > in_a_periodic_tick()
> > > starting_nmi()
> > > ending_nmi()
> > > may_i_turn_off_ticks_right_now()
> > > or, better yet:
> > > i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
> > >
> > > Some arches may need:
> > >
> > > i_am_lame_and_forgot_my_previous_context()
> >
> > Can all this information be generalized with some basic core hooks
> > or could some of this contextual informatioin typically vary depending
> > on the sequence we are in ? It sounds like its the later and that's
> > the issue ?
>
> Not sure exactly what you are suggesting,
At this point I was not suggesting anything in particular but trying to verify
the type of problem and see if the contextual issues might be similar to the
contextual issues I have been looking into and see if the solutions that could
be drawn up for the above issues noted by Andy could be resused for the
problems I have been looking into.
In my case the issues come from the fact that paravirt hypervisors end up
intiializing Linux through an alternate init sequence and assumptions vary,
both by version of hypervisor and hypervisor type. That and the fact that
we don't want to extend pv_ops further. pv_ops was designed to cope with
*multiple* hypervisors and let us end up with one binary, ie, it didn't
necessarily address required yielding by the OS for a slew of different
functionality. There are different ad-hoc solutions to the yielding problem
today but they are all reactive, not proactive, and I'm looking for a proactive
solution. Since we don't want to extend pv_ops even further I've been trying
to keep an open eye for similar types of further context-needing problems on
the kernel which could likely share a solution. If the above yielding issues
seems obscure, my apologies, I'll soon send something out to elaborate a bit
more on that which might help fill in context.
> but given that many of these
> need to be placed in fastpaths, I am not at all excited about having to
> put switch statements in each of them.
Sure.
> > Reason I ask is I've been working on a slightly different series of arch
> > problems lately but its gotten me wondering about the possibility over adding a
> > shared layer of hooks that some arch init code could use to relay back
> > information about some other contextual information (in my case yielding
> > execution in some paravirtualized scenerios, in my case I only need this during
> > init sequences though). My reasoning for considering this didn't seem
> > sufficient to add yet-another-layer or boilet-plate code for arch init sequence
> > code but if there is a slew of other meta data contextual information which we
> > could use in arch code perhaps this might make more sense then. This of course
> > only makes sense for your use case if things really vary depending on the
> > sequence reaching out to check for any of the above. It would not need to be
> > tied down to init sequences alone, the way this could work for instance could
> > be for certain critial code to feed meta data over contextual information which
> > needs to be vetted which we currently have sloppy, or difficult waays of
> > retrieving. Then the onus would be for all of us to vet each critial section
> > carefully and to identify clearly all required contextual information.
To answer your question above so far I only had two leading ideas on this, and
frankly its still fuzzy. One was to driver-tize critial sequences with hooks to
provide the required context. IMO this would be introducing too much overhead
unless there would be other users for extra context information other than for
paravirt yielding. Another idea is to override contextual information (perhaps
through CPU variable data) which would otherwise be looked at through other
means (perhaps a series of more complex branch checks on CPU variable data) or
would have some sort of defaults. In the adhoc situations at random kernel run
times it seems for instance we end up using CPU variables to keep track of
certain context information, but if a path is known to have a static context,
can we introduce something to override that lookup / avoid that lookup ? For
this to work though the context would need to be known though at specific
points in time though. For init sequences this seems likely for early init,
later on though its not clear to me how many areas like these would exist.
> However, switch statements would probably be just fine for boot-time-only
> code.
I'm actually all for avoiding these as well if possible though, and since we
have binary patching, and it seems run time binary patching could in theory
work too, I'd have hopes some switches / branches could be patched
out *iff* certain contextual information could be gauranteed for certain areas
of the kernel.
Luis
On Thu, Aug 13, 2015 at 12:03 AM, Paul E. McKenney
<[email protected]> wrote:
> On Wed, Aug 12, 2015 at 04:27:34PM +0200, Frederic Weisbecker wrote:
>> On Tue, Aug 11, 2015 at 08:42:58PM +0200, Luis R. Rodriguez wrote:
>> > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
>> > > This is a bit late, but here goes anyway.
>> > >
>> > > Having played with the x86 context tracking hooks for awhile, I think
>> > > it would be nice if core code that needs to be aware of CPU context
>> > > (kernel, user, idle, guest, etc) could come up with single,
>> > > comprehensible, easily validated set of hooks that arch code is
>> > > supposed to call.
>> > >
>> > > Currently we have:
>> > >
>> > > - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
>> > >
>> > > - Context tracking hooks. Only used by some arches. Calling these
>> > > calls the RCU hooks for you in most cases. They have weird
>> > > interactions with interrupts and they're slow.
>> > >
>> > > - vtime. Beats the heck out of me.
>> > >
>> > > - Whatever deferred things Christoph keeps reminding us about.
>> > >
>> > > Honestly, I don't fully understand what all these hooks are supposed
>> > > to do, nor do I care all that much. From my perspective, the code
>> > > code should be able to do whatever it wants and rely on appropriate
>> > > notifications from arch code. It would be great if we could come up
>> > > with something straightforward that covers everything. For example:
>> > >
>> > > user_mode_to_kernel_mode()
>> > > kernel_mode_to_user_mode()
>> > > kernel_mode_to_guest_mode()
>> > > in_a_periodic_tick()
>> > > starting_nmi()
>> > > ending_nmi()
>> > > may_i_turn_off_ticks_right_now()
>> > > or, better yet:
>> > > i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
>> > >
>> > > Some arches may need:
>> > >
>> > > i_am_lame_and_forgot_my_previous_context()
>> >
>> > Can all this information be generalized with some basic core hooks
>> > or could some of this contextual informatioin typically vary depending
>> > on the sequence we are in ? It sounds like its the later and that's
>> > the issue ?
>>
>> That's what we do with context tracking. It tracks the context (user/kernel)
>> and stores these informations. And indeed the contextual informations can vary
>> depending for example if an exception triggered in userspace or kernelspace.
>
> Another question of interest is "Can things be arranged so that RCU uses
> the context-tracking information directly in place of rcu_dynticks?"
> In theory, the answer is clearly "yes", but the reason that RCU's
> accounting is heavyweight is the need to get precise state readout on
> other CPUs. So it is quite possible that making RCU directly use the
> context-tracking information will make that tracking slower and more
> complex, so that the overall effect will be zero net improvement.
rcu_dynticks can be directly renamed and moved to context-tracking code.
^_^.
If there any other code need to access the context-tracking information,
rearranging the code will be better.
I once tried to use pure context-tracking information to
implement rcu_sys_is_idle(), the rearranging is needed,
and it is to complicated to continue. Current rcu_sys_is_idle()
is complicated though.
>
> But it does seem worth a look.
>
> Thanx, Paul
>
> _______________________________________________
> Ksummit-discuss mailing list
> [email protected]
> https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss
On Wed, Aug 12, 2015 at 09:03:42AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 12, 2015 at 04:27:34PM +0200, Frederic Weisbecker wrote:
> > On Tue, Aug 11, 2015 at 08:42:58PM +0200, Luis R. Rodriguez wrote:
> > > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> > > > This is a bit late, but here goes anyway.
> > > >
> > > > Having played with the x86 context tracking hooks for awhile, I think
> > > > it would be nice if core code that needs to be aware of CPU context
> > > > (kernel, user, idle, guest, etc) could come up with single,
> > > > comprehensible, easily validated set of hooks that arch code is
> > > > supposed to call.
> > > >
> > > > Currently we have:
> > > >
> > > > - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
> > > >
> > > > - Context tracking hooks. Only used by some arches. Calling these
> > > > calls the RCU hooks for you in most cases. They have weird
> > > > interactions with interrupts and they're slow.
> > > >
> > > > - vtime. Beats the heck out of me.
> > > >
> > > > - Whatever deferred things Christoph keeps reminding us about.
> > > >
> > > > Honestly, I don't fully understand what all these hooks are supposed
> > > > to do, nor do I care all that much. From my perspective, the code
> > > > code should be able to do whatever it wants and rely on appropriate
> > > > notifications from arch code. It would be great if we could come up
> > > > with something straightforward that covers everything. For example:
> > > >
> > > > user_mode_to_kernel_mode()
> > > > kernel_mode_to_user_mode()
> > > > kernel_mode_to_guest_mode()
> > > > in_a_periodic_tick()
> > > > starting_nmi()
> > > > ending_nmi()
> > > > may_i_turn_off_ticks_right_now()
> > > > or, better yet:
> > > > i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
> > > >
> > > > Some arches may need:
> > > >
> > > > i_am_lame_and_forgot_my_previous_context()
> > >
> > > Can all this information be generalized with some basic core hooks
> > > or could some of this contextual informatioin typically vary depending
> > > on the sequence we are in ? It sounds like its the later and that's
> > > the issue ?
> >
> > That's what we do with context tracking. It tracks the context (user/kernel)
> > and stores these informations. And indeed the contextual informations can vary
> > depending for example if an exception triggered in userspace or kernelspace.
>
> Another question of interest is "Can things be arranged so that RCU uses
> the context-tracking information directly in place of rcu_dynticks?"
> In theory, the answer is clearly "yes", but the reason that RCU's
> accounting is heavyweight is the need to get precise state readout on
> other CPUs. So it is quite possible that making RCU directly use the
> context-tracking information will make that tracking slower and more
> complex, so that the overall effect will be zero net improvement.
Yeah, that's partly what I meant by "it's possible, but we might not be proud of
the result".
> But it does seem worth a look.
Sure.
On Thu, Aug 13, 2015 at 09:29:03AM +0800, Lai Jiangshan wrote:
> On Thu, Aug 13, 2015 at 12:03 AM, Paul E. McKenney
> <[email protected]> wrote:
> > On Wed, Aug 12, 2015 at 04:27:34PM +0200, Frederic Weisbecker wrote:
> >> On Tue, Aug 11, 2015 at 08:42:58PM +0200, Luis R. Rodriguez wrote:
> >> > On Tue, Aug 11, 2015 at 10:49:36AM -0700, Andy Lutomirski wrote:
> >> > > This is a bit late, but here goes anyway.
> >> > >
> >> > > Having played with the x86 context tracking hooks for awhile, I think
> >> > > it would be nice if core code that needs to be aware of CPU context
> >> > > (kernel, user, idle, guest, etc) could come up with single,
> >> > > comprehensible, easily validated set of hooks that arch code is
> >> > > supposed to call.
> >> > >
> >> > > Currently we have:
> >> > >
> >> > > - RCU hooks, which come in a wide variety to notify about IRQs, NMIs, etc.
> >> > >
> >> > > - Context tracking hooks. Only used by some arches. Calling these
> >> > > calls the RCU hooks for you in most cases. They have weird
> >> > > interactions with interrupts and they're slow.
> >> > >
> >> > > - vtime. Beats the heck out of me.
> >> > >
> >> > > - Whatever deferred things Christoph keeps reminding us about.
> >> > >
> >> > > Honestly, I don't fully understand what all these hooks are supposed
> >> > > to do, nor do I care all that much. From my perspective, the code
> >> > > code should be able to do whatever it wants and rely on appropriate
> >> > > notifications from arch code. It would be great if we could come up
> >> > > with something straightforward that covers everything. For example:
> >> > >
> >> > > user_mode_to_kernel_mode()
> >> > > kernel_mode_to_user_mode()
> >> > > kernel_mode_to_guest_mode()
> >> > > in_a_periodic_tick()
> >> > > starting_nmi()
> >> > > ending_nmi()
> >> > > may_i_turn_off_ticks_right_now()
> >> > > or, better yet:
> >> > > i_am_turning_off_ticks_right_now_and_register_your_own_darned_hrtimer_if_thats_a_problem()
> >> > >
> >> > > Some arches may need:
> >> > >
> >> > > i_am_lame_and_forgot_my_previous_context()
> >> >
> >> > Can all this information be generalized with some basic core hooks
> >> > or could some of this contextual informatioin typically vary depending
> >> > on the sequence we are in ? It sounds like its the later and that's
> >> > the issue ?
> >>
> >> That's what we do with context tracking. It tracks the context (user/kernel)
> >> and stores these informations. And indeed the contextual informations can vary
> >> depending for example if an exception triggered in userspace or kernelspace.
> >
> > Another question of interest is "Can things be arranged so that RCU uses
> > the context-tracking information directly in place of rcu_dynticks?"
> > In theory, the answer is clearly "yes", but the reason that RCU's
> > accounting is heavyweight is the need to get precise state readout on
> > other CPUs. So it is quite possible that making RCU directly use the
> > context-tracking information will make that tracking slower and more
> > complex, so that the overall effect will be zero net improvement.
>
> rcu_dynticks can be directly renamed and moved to context-tracking code.
> ^_^.
>
> If there any other code need to access the context-tracking information,
> rearranging the code will be better.
>
> I once tried to use pure context-tracking information to
> implement rcu_sys_is_idle(), the rearranging is needed,
> and it is to complicated to continue. Current rcu_sys_is_idle()
> is complicated though.
It's very complicated but that lockless state machine is fascinating :-)
Too bad that for now we aren't using it. In fact nobody complained about
the unoptimized power consumption by nohz full. There are so many things
that people are interested in first and allowing housekeping's dynticks idle
involve very complicated changes.
Thanks.
>
> >
> > But it does seem worth a look.
> >
> > Thanx, Paul
> >
> > _______________________________________________
> > Ksummit-discuss mailing list
> > [email protected]
> > https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss