2014-11-26 22:27:08

by Luis Chamberlain

[permalink] [raw]
Subject: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

From: "Luis R. Rodriguez" <[email protected]>

Some folks had reported that some xen hypercalls take a long time
to complete when issued from the userspace private ioctl mechanism,
this can happen for instance with some hypercalls that have many
sub-operations, this can happen for instance on hypercalls that use
multi-call feature whereby Xen lets one hypercall batch out a series
of other hypercalls on the hypervisor. At times such hypercalls can
even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
120 seconds), this a non-issue issue on preemptible kernels though as
the kernel may deschedule such long running tasks. Xen for instance
supports multicalls to be preempted as well, this is what Xen calls
continuation (see xen commit 42217cbc5b which introduced this [0]).
On systems without CONFIG_PREEMPT though -- a kernel with voluntary
or no preemption -- a long running hypercall will not be descheduled
until the hypercall is complete and the ioctl returns to user space.

To help with this David had originally implemented support for use
of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
solution never went upstream though and upon review to help refactor
this I've concluded that usage of preempt_schedule_irq() would be
a bit abussive of existing APIs -- for a few reasons:

0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels

1) we want try to consider solutions that might work for other
hypervisors for this same problem, and identify it its an issue
even present on other hypervisors or if this is a self
inflicted architectural issue caused by use of multicalls

2) there is no documentation or profiling of the exact hypercalls
that were causing these issues, nor do we have any context
to help evaluate this any further

I at least checked with kvm folks and it seems hypercall preemption
is not needed there. We can survey other hypervisors...

If 'something like preemption' is needed then CONFIG_PREEMPT
should just be enabled and encouraged, it seems we want to
encourage CONFIG_PREEMPT on xen, specially when multicalls are
used. In the meantime this tries to address a solution to help
xen on non CONFIG_PREEMPT kernels.

One option tested and evaluated was to put private hypercalls in
process context, however this would introduce complexities such
originating hypercalls from different contexts. Current xen
hypercall callback handlers would need to be changed per architecture,
for instance, we'd also incur the cost of switching states from
user / kernel (this cost is also present if preempt_schedule_irq()
is used). There may be other issues which could be introduced with
this strategy as well. The simplest *shared* alternative is instead
to just explicitly schedule() at the end of a private hypercall on non
preempt kernels. This forces our private hypercall call mechanism
to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
more context switch but keeps the private hypercall context intact.

[0] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
[1] http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch

Cc: Davidlohr Bueso <[email protected]>
Cc: Joerg Roedel <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Konrad Rzeszutek Wilk <[email protected]>
Cc: Jan Beulich <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Olaf Hering <[email protected]>
Cc: David Vrabel <[email protected]>
Signed-off-by: Luis R. Rodriguez <[email protected]>
---
drivers/xen/privcmd.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 569a13b..e29edba 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
hypercall.arg[0], hypercall.arg[1],
hypercall.arg[2], hypercall.arg[3],
hypercall.arg[4]);
+#ifndef CONFIG_PREEMPT
+ schedule();
+#endif

return ret;
}
--
2.1.1


2014-11-27 06:36:38

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
> From: "Luis R. Rodriguez" <[email protected]>
>
> Some folks had reported that some xen hypercalls take a long time
> to complete when issued from the userspace private ioctl mechanism,
> this can happen for instance with some hypercalls that have many
> sub-operations, this can happen for instance on hypercalls that use
> multi-call feature whereby Xen lets one hypercall batch out a series
> of other hypercalls on the hypervisor. At times such hypercalls can
> even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
> 120 seconds), this a non-issue issue on preemptible kernels though as
> the kernel may deschedule such long running tasks. Xen for instance
> supports multicalls to be preempted as well, this is what Xen calls
> continuation (see xen commit 42217cbc5b which introduced this [0]).
> On systems without CONFIG_PREEMPT though -- a kernel with voluntary
> or no preemption -- a long running hypercall will not be descheduled
> until the hypercall is complete and the ioctl returns to user space.
>
> To help with this David had originally implemented support for use
> of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
> solution never went upstream though and upon review to help refactor
> this I've concluded that usage of preempt_schedule_irq() would be
> a bit abussive of existing APIs -- for a few reasons:
>
> 0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels
>
> 1) we want try to consider solutions that might work for other
> hypervisors for this same problem, and identify it its an issue
> even present on other hypervisors or if this is a self
> inflicted architectural issue caused by use of multicalls
>
> 2) there is no documentation or profiling of the exact hypercalls
> that were causing these issues, nor do we have any context
> to help evaluate this any further
>
> I at least checked with kvm folks and it seems hypercall preemption
> is not needed there. We can survey other hypervisors...
>
> If 'something like preemption' is needed then CONFIG_PREEMPT
> should just be enabled and encouraged, it seems we want to
> encourage CONFIG_PREEMPT on xen, specially when multicalls are
> used. In the meantime this tries to address a solution to help
> xen on non CONFIG_PREEMPT kernels.
>
> One option tested and evaluated was to put private hypercalls in
> process context, however this would introduce complexities such
> originating hypercalls from different contexts. Current xen
> hypercall callback handlers would need to be changed per architecture,
> for instance, we'd also incur the cost of switching states from
> user / kernel (this cost is also present if preempt_schedule_irq()
> is used). There may be other issues which could be introduced with
> this strategy as well. The simplest *shared* alternative is instead
> to just explicitly schedule() at the end of a private hypercall on non
> preempt kernels. This forces our private hypercall call mechanism
> to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
> more context switch but keeps the private hypercall context intact.
>
> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
> [1] http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
>
> Cc: Davidlohr Bueso <[email protected]>
> Cc: Joerg Roedel <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: Konrad Rzeszutek Wilk <[email protected]>
> Cc: Jan Beulich <[email protected]>
> Cc: Juergen Gross <[email protected]>
> Cc: Olaf Hering <[email protected]>
> Cc: David Vrabel <[email protected]>
> Signed-off-by: Luis R. Rodriguez <[email protected]>
> ---
> drivers/xen/privcmd.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index 569a13b..e29edba 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
> hypercall.arg[0], hypercall.arg[1],
> hypercall.arg[2], hypercall.arg[3],
> hypercall.arg[4]);
> +#ifndef CONFIG_PREEMPT
> + schedule();
> +#endif
>
> return ret;
> }
>

Sorry, I don't think this will solve anything. You're calling schedule()
right after the long running hypercall just nanoseconds before returning
to the user.

I suppose you were mislead by the "int 0x82" in [0]. This is the
hypercall from the kernel into the hypervisor, e.g. inside of
privcmd_call().


Juergen

2014-11-27 18:36:20

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
>> From: "Luis R. Rodriguez" <[email protected]>
>>
>> Some folks had reported that some xen hypercalls take a long time
>> to complete when issued from the userspace private ioctl mechanism,
>> this can happen for instance with some hypercalls that have many
>> sub-operations, this can happen for instance on hypercalls that use
>> multi-call feature whereby Xen lets one hypercall batch out a series
>> of other hypercalls on the hypervisor. At times such hypercalls can
>> even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
>> 120 seconds), this a non-issue issue on preemptible kernels though as
>> the kernel may deschedule such long running tasks. Xen for instance
>> supports multicalls to be preempted as well, this is what Xen calls
>> continuation (see xen commit 42217cbc5b which introduced this [0]).
>> On systems without CONFIG_PREEMPT though -- a kernel with voluntary
>> or no preemption -- a long running hypercall will not be descheduled
>> until the hypercall is complete and the ioctl returns to user space.
>>
>> To help with this David had originally implemented support for use
>> of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
>> solution never went upstream though and upon review to help refactor
>> this I've concluded that usage of preempt_schedule_irq() would be
>> a bit abussive of existing APIs -- for a few reasons:
>>
>> 0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels
>>
>> 1) we want try to consider solutions that might work for other
>> hypervisors for this same problem, and identify it its an issue
>> even present on other hypervisors or if this is a self
>> inflicted architectural issue caused by use of multicalls
>>
>> 2) there is no documentation or profiling of the exact hypercalls
>> that were causing these issues, nor do we have any context
>> to help evaluate this any further
>>
>> I at least checked with kvm folks and it seems hypercall preemption
>> is not needed there. We can survey other hypervisors...
>>
>> If 'something like preemption' is needed then CONFIG_PREEMPT
>> should just be enabled and encouraged, it seems we want to
>> encourage CONFIG_PREEMPT on xen, specially when multicalls are
>> used. In the meantime this tries to address a solution to help
>> xen on non CONFIG_PREEMPT kernels.
>>
>> One option tested and evaluated was to put private hypercalls in
>> process context, however this would introduce complexities such
>> originating hypercalls from different contexts. Current xen
>> hypercall callback handlers would need to be changed per architecture,
>> for instance, we'd also incur the cost of switching states from
>> user / kernel (this cost is also present if preempt_schedule_irq()
>> is used). There may be other issues which could be introduced with
>> this strategy as well. The simplest *shared* alternative is instead
>> to just explicitly schedule() at the end of a private hypercall on non
>> preempt kernels. This forces our private hypercall call mechanism
>> to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
>> more context switch but keeps the private hypercall context intact.
>>
>> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
>> [1] http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
>>
>> Cc: Davidlohr Bueso <[email protected]>
>> Cc: Joerg Roedel <[email protected]>
>> Cc: Borislav Petkov <[email protected]>
>> Cc: Konrad Rzeszutek Wilk <[email protected]>
>> Cc: Jan Beulich <[email protected]>
>> Cc: Juergen Gross <[email protected]>
>> Cc: Olaf Hering <[email protected]>
>> Cc: David Vrabel <[email protected]>
>> Signed-off-by: Luis R. Rodriguez <[email protected]>
>> ---
>> drivers/xen/privcmd.c | 3 +++
>> 1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>> index 569a13b..e29edba 100644
>> --- a/drivers/xen/privcmd.c
>> +++ b/drivers/xen/privcmd.c
>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
>> hypercall.arg[0], hypercall.arg[1],
>> hypercall.arg[2], hypercall.arg[3],
>> hypercall.arg[4]);
>> +#ifndef CONFIG_PREEMPT
>> + schedule();
>> +#endif
>>
>> return ret;
>> }
>>
>
> Sorry, I don't think this will solve anything. You're calling schedule()
> right after the long running hypercall just nanoseconds before returning
> to the user.

Yeah, well that is what [1] tried as well only it tried using
preempt_schedule_irq() on the hypercall callback...

> I suppose you were mislead by the "int 0x82" in [0]. This is the
> hypercall from the kernel into the hypervisor, e.g. inside of
> privcmd_call().

Nope, you have to consider what was done in [1], I was trying to
do something similar but less complex that didn't involve mucking
with the callbacks but also not abusing APIs.

I'm afraid we don't have much leg room.

Luis

2014-11-27 18:46:57

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Thu, Nov 27, 2014 at 1:36 PM, Luis R. Rodriguez <[email protected]> wrote:
> I'm afraid we don't have much leg room.

Let me be clear, I still think putting some hypercalls in process
context *might help* but because of notes 1) and 2) I highlighted I
think this is the best we can do, with more information we should be
able to consider weighing pros / cons with actual metrics from
alternatives, without more information we're just shooting in the dark
and the last thing I want is to see APIs abused or setting precedents.

Luis

2014-11-27 18:50:37

by Andrew Cooper

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On 27/11/14 18:36, Luis R. Rodriguez wrote:
> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
>>> From: "Luis R. Rodriguez" <[email protected]>
>>>
>>> Some folks had reported that some xen hypercalls take a long time
>>> to complete when issued from the userspace private ioctl mechanism,
>>> this can happen for instance with some hypercalls that have many
>>> sub-operations, this can happen for instance on hypercalls that use
>>> multi-call feature whereby Xen lets one hypercall batch out a series
>>> of other hypercalls on the hypervisor. At times such hypercalls can
>>> even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
>>> 120 seconds), this a non-issue issue on preemptible kernels though as
>>> the kernel may deschedule such long running tasks. Xen for instance
>>> supports multicalls to be preempted as well, this is what Xen calls
>>> continuation (see xen commit 42217cbc5b which introduced this [0]).
>>> On systems without CONFIG_PREEMPT though -- a kernel with voluntary
>>> or no preemption -- a long running hypercall will not be descheduled
>>> until the hypercall is complete and the ioctl returns to user space.
>>>
>>> To help with this David had originally implemented support for use
>>> of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
>>> solution never went upstream though and upon review to help refactor
>>> this I've concluded that usage of preempt_schedule_irq() would be
>>> a bit abussive of existing APIs -- for a few reasons:
>>>
>>> 0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels
>>>
>>> 1) we want try to consider solutions that might work for other
>>> hypervisors for this same problem, and identify it its an issue
>>> even present on other hypervisors or if this is a self
>>> inflicted architectural issue caused by use of multicalls
>>>
>>> 2) there is no documentation or profiling of the exact hypercalls
>>> that were causing these issues, nor do we have any context
>>> to help evaluate this any further
>>>
>>> I at least checked with kvm folks and it seems hypercall preemption
>>> is not needed there. We can survey other hypervisors...
>>>
>>> If 'something like preemption' is needed then CONFIG_PREEMPT
>>> should just be enabled and encouraged, it seems we want to
>>> encourage CONFIG_PREEMPT on xen, specially when multicalls are
>>> used. In the meantime this tries to address a solution to help
>>> xen on non CONFIG_PREEMPT kernels.
>>>
>>> One option tested and evaluated was to put private hypercalls in
>>> process context, however this would introduce complexities such
>>> originating hypercalls from different contexts. Current xen
>>> hypercall callback handlers would need to be changed per architecture,
>>> for instance, we'd also incur the cost of switching states from
>>> user / kernel (this cost is also present if preempt_schedule_irq()
>>> is used). There may be other issues which could be introduced with
>>> this strategy as well. The simplest *shared* alternative is instead
>>> to just explicitly schedule() at the end of a private hypercall on non
>>> preempt kernels. This forces our private hypercall call mechanism
>>> to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
>>> more context switch but keeps the private hypercall context intact.
>>>
>>> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
>>> [1] http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
>>>
>>> Cc: Davidlohr Bueso <[email protected]>
>>> Cc: Joerg Roedel <[email protected]>
>>> Cc: Borislav Petkov <[email protected]>
>>> Cc: Konrad Rzeszutek Wilk <[email protected]>
>>> Cc: Jan Beulich <[email protected]>
>>> Cc: Juergen Gross <[email protected]>
>>> Cc: Olaf Hering <[email protected]>
>>> Cc: David Vrabel <[email protected]>
>>> Signed-off-by: Luis R. Rodriguez <[email protected]>
>>> ---
>>> drivers/xen/privcmd.c | 3 +++
>>> 1 file changed, 3 insertions(+)
>>>
>>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>>> index 569a13b..e29edba 100644
>>> --- a/drivers/xen/privcmd.c
>>> +++ b/drivers/xen/privcmd.c
>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
>>> hypercall.arg[0], hypercall.arg[1],
>>> hypercall.arg[2], hypercall.arg[3],
>>> hypercall.arg[4]);
>>> +#ifndef CONFIG_PREEMPT
>>> + schedule();
>>> +#endif
>>>
>>> return ret;
>>> }
>>>
>> Sorry, I don't think this will solve anything. You're calling schedule()
>> right after the long running hypercall just nanoseconds before returning
>> to the user.
> Yeah, well that is what [1] tried as well only it tried using
> preempt_schedule_irq() on the hypercall callback...
>
>> I suppose you were mislead by the "int 0x82" in [0]. This is the
>> hypercall from the kernel into the hypervisor, e.g. inside of
>> privcmd_call().
> Nope, you have to consider what was done in [1], I was trying to
> do something similar but less complex that didn't involve mucking
> with the callbacks but also not abusing APIs.
>
> I'm afraid we don't have much leg room.

XenServer uses

https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch

to deal with these issues. That patch is based on 3.10.

I can remember whether this has been submitted upstream before (and
there were outstanding issues), or whether it fell at an inconvenient
time with our development cycles.

David: do you recall?

~Andrew

2014-11-28 04:49:24

by Jürgen Groß

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On 11/27/2014 07:50 PM, Andrew Cooper wrote:
> On 27/11/14 18:36, Luis R. Rodriguez wrote:
>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
>>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
>>>> From: "Luis R. Rodriguez" <[email protected]>
>>>>
>>>> Some folks had reported that some xen hypercalls take a long time
>>>> to complete when issued from the userspace private ioctl mechanism,
>>>> this can happen for instance with some hypercalls that have many
>>>> sub-operations, this can happen for instance on hypercalls that use
>>>> multi-call feature whereby Xen lets one hypercall batch out a series
>>>> of other hypercalls on the hypervisor. At times such hypercalls can
>>>> even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
>>>> 120 seconds), this a non-issue issue on preemptible kernels though as
>>>> the kernel may deschedule such long running tasks. Xen for instance
>>>> supports multicalls to be preempted as well, this is what Xen calls
>>>> continuation (see xen commit 42217cbc5b which introduced this [0]).
>>>> On systems without CONFIG_PREEMPT though -- a kernel with voluntary
>>>> or no preemption -- a long running hypercall will not be descheduled
>>>> until the hypercall is complete and the ioctl returns to user space.
>>>>
>>>> To help with this David had originally implemented support for use
>>>> of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
>>>> solution never went upstream though and upon review to help refactor
>>>> this I've concluded that usage of preempt_schedule_irq() would be
>>>> a bit abussive of existing APIs -- for a few reasons:
>>>>
>>>> 0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels
>>>>
>>>> 1) we want try to consider solutions that might work for other
>>>> hypervisors for this same problem, and identify it its an issue
>>>> even present on other hypervisors or if this is a self
>>>> inflicted architectural issue caused by use of multicalls
>>>>
>>>> 2) there is no documentation or profiling of the exact hypercalls
>>>> that were causing these issues, nor do we have any context
>>>> to help evaluate this any further
>>>>
>>>> I at least checked with kvm folks and it seems hypercall preemption
>>>> is not needed there. We can survey other hypervisors...
>>>>
>>>> If 'something like preemption' is needed then CONFIG_PREEMPT
>>>> should just be enabled and encouraged, it seems we want to
>>>> encourage CONFIG_PREEMPT on xen, specially when multicalls are
>>>> used. In the meantime this tries to address a solution to help
>>>> xen on non CONFIG_PREEMPT kernels.
>>>>
>>>> One option tested and evaluated was to put private hypercalls in
>>>> process context, however this would introduce complexities such
>>>> originating hypercalls from different contexts. Current xen
>>>> hypercall callback handlers would need to be changed per architecture,
>>>> for instance, we'd also incur the cost of switching states from
>>>> user / kernel (this cost is also present if preempt_schedule_irq()
>>>> is used). There may be other issues which could be introduced with
>>>> this strategy as well. The simplest *shared* alternative is instead
>>>> to just explicitly schedule() at the end of a private hypercall on non
>>>> preempt kernels. This forces our private hypercall call mechanism
>>>> to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
>>>> more context switch but keeps the private hypercall context intact.
>>>>
>>>> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
>>>> [1] http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
>>>>
>>>> Cc: Davidlohr Bueso <[email protected]>
>>>> Cc: Joerg Roedel <[email protected]>
>>>> Cc: Borislav Petkov <[email protected]>
>>>> Cc: Konrad Rzeszutek Wilk <[email protected]>
>>>> Cc: Jan Beulich <[email protected]>
>>>> Cc: Juergen Gross <[email protected]>
>>>> Cc: Olaf Hering <[email protected]>
>>>> Cc: David Vrabel <[email protected]>
>>>> Signed-off-by: Luis R. Rodriguez <[email protected]>
>>>> ---
>>>> drivers/xen/privcmd.c | 3 +++
>>>> 1 file changed, 3 insertions(+)
>>>>
>>>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>>>> index 569a13b..e29edba 100644
>>>> --- a/drivers/xen/privcmd.c
>>>> +++ b/drivers/xen/privcmd.c
>>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
>>>> hypercall.arg[0], hypercall.arg[1],
>>>> hypercall.arg[2], hypercall.arg[3],
>>>> hypercall.arg[4]);
>>>> +#ifndef CONFIG_PREEMPT
>>>> + schedule();
>>>> +#endif
>>>>
>>>> return ret;
>>>> }
>>>>
>>> Sorry, I don't think this will solve anything. You're calling schedule()
>>> right after the long running hypercall just nanoseconds before returning
>>> to the user.
>> Yeah, well that is what [1] tried as well only it tried using
>> preempt_schedule_irq() on the hypercall callback...
>>
>>> I suppose you were mislead by the "int 0x82" in [0]. This is the
>>> hypercall from the kernel into the hypervisor, e.g. inside of
>>> privcmd_call().
>> Nope, you have to consider what was done in [1], I was trying to
>> do something similar but less complex that didn't involve mucking
>> with the callbacks but also not abusing APIs.
>>
>> I'm afraid we don't have much leg room.
>
> XenServer uses
>
> https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
>
> to deal with these issues. That patch is based on 3.10.

Clever. :-)

>
> I can remember whether this has been submitted upstream before (and
> there were outstanding issues), or whether it fell at an inconvenient
> time with our development cycles.

I found

http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html

and nothing else.


Juergen

2014-11-28 21:51:07

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Thu, Nov 27, 2014 at 06:50:31PM +0000, Andrew Cooper wrote:
> On 27/11/14 18:36, Luis R. Rodriguez wrote:
> > On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
> >> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
> >>> From: "Luis R. Rodriguez" <[email protected]>
> >>>
> >>> Some folks had reported that some xen hypercalls take a long time
> >>> to complete when issued from the userspace private ioctl mechanism,
> >>> this can happen for instance with some hypercalls that have many
> >>> sub-operations, this can happen for instance on hypercalls that use
> >>> multi-call feature whereby Xen lets one hypercall batch out a series
> >>> of other hypercalls on the hypervisor. At times such hypercalls can
> >>> even end up triggering the TASK_UNINTERRUPTIBLE hanger check (default
> >>> 120 seconds), this a non-issue issue on preemptible kernels though as
> >>> the kernel may deschedule such long running tasks. Xen for instance
> >>> supports multicalls to be preempted as well, this is what Xen calls
> >>> continuation (see xen commit 42217cbc5b which introduced this [0]).
> >>> On systems without CONFIG_PREEMPT though -- a kernel with voluntary
> >>> or no preemption -- a long running hypercall will not be descheduled
> >>> until the hypercall is complete and the ioctl returns to user space.
> >>>
> >>> To help with this David had originally implemented support for use
> >>> of preempt_schedule_irq() [1] for non CONFIG_PREEMPT kernels. This
> >>> solution never went upstream though and upon review to help refactor
> >>> this I've concluded that usage of preempt_schedule_irq() would be
> >>> a bit abussive of existing APIs -- for a few reasons:
> >>>
> >>> 0) we want to avoid spreading its use on non CONFIG_PREEMPT kernels
> >>>
> >>> 1) we want try to consider solutions that might work for other
> >>> hypervisors for this same problem, and identify it its an issue
> >>> even present on other hypervisors or if this is a self
> >>> inflicted architectural issue caused by use of multicalls
> >>>
> >>> 2) there is no documentation or profiling of the exact hypercalls
> >>> that were causing these issues, nor do we have any context
> >>> to help evaluate this any further
> >>>
> >>> I at least checked with kvm folks and it seems hypercall preemption
> >>> is not needed there. We can survey other hypervisors...
> >>>
> >>> If 'something like preemption' is needed then CONFIG_PREEMPT
> >>> should just be enabled and encouraged, it seems we want to
> >>> encourage CONFIG_PREEMPT on xen, specially when multicalls are
> >>> used. In the meantime this tries to address a solution to help
> >>> xen on non CONFIG_PREEMPT kernels.
> >>>
> >>> One option tested and evaluated was to put private hypercalls in
> >>> process context, however this would introduce complexities such
> >>> originating hypercalls from different contexts. Current xen
> >>> hypercall callback handlers would need to be changed per architecture,
> >>> for instance, we'd also incur the cost of switching states from
> >>> user / kernel (this cost is also present if preempt_schedule_irq()
> >>> is used). There may be other issues which could be introduced with
> >>> this strategy as well. The simplest *shared* alternative is instead
> >>> to just explicitly schedule() at the end of a private hypercall on non
> >>> preempt kernels. This forces our private hypercall call mechanism
> >>> to try to be fair only on non CONFIG_PREEMPT kernels at the cost of
> >>> more context switch but keeps the private hypercall context intact.
> >>>
> >>> [0] http://xenbits.xen.org/gitweb/?p=xen.git;a=commitdiff;h=42217cbc5b3e84b8c145d8cfb62dd5de0134b9e8;hp=3a0b9c57d5c9e82c55dd967c84dd06cb43c49ee9
> >>> [1] http://ftp.suse.com/pub/people/mcgrof/xen-preempt-hypercalls/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
> >>>
> >>> Cc: Davidlohr Bueso <[email protected]>
> >>> Cc: Joerg Roedel <[email protected]>
> >>> Cc: Borislav Petkov <[email protected]>
> >>> Cc: Konrad Rzeszutek Wilk <[email protected]>
> >>> Cc: Jan Beulich <[email protected]>
> >>> Cc: Juergen Gross <[email protected]>
> >>> Cc: Olaf Hering <[email protected]>
> >>> Cc: David Vrabel <[email protected]>
> >>> Signed-off-by: Luis R. Rodriguez <[email protected]>
> >>> ---
> >>> drivers/xen/privcmd.c | 3 +++
> >>> 1 file changed, 3 insertions(+)
> >>>
> >>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> >>> index 569a13b..e29edba 100644
> >>> --- a/drivers/xen/privcmd.c
> >>> +++ b/drivers/xen/privcmd.c
> >>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
> >>> hypercall.arg[0], hypercall.arg[1],
> >>> hypercall.arg[2], hypercall.arg[3],
> >>> hypercall.arg[4]);
> >>> +#ifndef CONFIG_PREEMPT
> >>> + schedule();
> >>> +#endif
> >>>
> >>> return ret;
> >>> }
> >>>
> >> Sorry, I don't think this will solve anything. You're calling schedule()
> >> right after the long running hypercall just nanoseconds before returning
> >> to the user.
> > Yeah, well that is what [1] tried as well only it tried using
> > preempt_schedule_irq() on the hypercall callback...
> >
> >> I suppose you were mislead by the "int 0x82" in [0]. This is the
> >> hypercall from the kernel into the hypervisor, e.g. inside of
> >> privcmd_call().
> > Nope, you have to consider what was done in [1], I was trying to
> > do something similar but less complex that didn't involve mucking
> > with the callbacks but also not abusing APIs.
> >
> > I'm afraid we don't have much leg room.
>
> XenServer uses
>
> https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
>
> to deal with these issues. That patch is based on 3.10.
>
> I can remember whether this has been submitted upstream before (and
> there were outstanding issues), or whether it fell at an inconvenient
> time with our development cycles.
>
> David: do you recall?

This was precicely the patch I reviewed in [1].

Luis

2014-12-01 11:01:27

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On 28/11/14 04:49, Juergen Gross wrote:
> On 11/27/2014 07:50 PM, Andrew Cooper wrote:
>>
>> XenServer uses
>>
>> https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
>>
>>
>> to deal with these issues. That patch is based on 3.10.
>
> Clever. :-)
>
>>
>> I can remember whether this has been submitted upstream before (and
>> there were outstanding issues), or whether it fell at an inconvenient
>> time with our development cycles.
>
> I found
>
> http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html
>
> and nothing else.

I dropped it because it copy-and-paste a bunch of otherwise generic x86
assembler and looked unlikely to get an x86 maintainer ack. If you
think otherwise, feel free to pick it up and run with it.

David

2014-12-01 11:11:48

by David Vrabel

[permalink] [raw]
Subject: Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On 27/11/14 18:36, Luis R. Rodriguez wrote:
> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
>>> From: "Luis R. Rodriguez" <[email protected]>
>>>
>>> Some folks had reported that some xen hypercalls take a long time
>>> to complete when issued from the userspace private ioctl mechanism,
>>> this can happen for instance with some hypercalls that have many
>>> sub-operations, this can happen for instance on hypercalls that use
[...]
>>> --- a/drivers/xen/privcmd.c
>>> +++ b/drivers/xen/privcmd.c
>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
>>> hypercall.arg[0], hypercall.arg[1],
>>> hypercall.arg[2], hypercall.arg[3],
>>> hypercall.arg[4]);
>>> +#ifndef CONFIG_PREEMPT
>>> + schedule();
>>> +#endif

As Juergen points out, this does nothing. You need to schedule while in
the middle of the hypercall.

Remember that Xen's hypercall preemption only preempts the hypercall to
run interrupts in the guest.

>>>
>>> return ret;
>>> }
>>>
>>
>> Sorry, I don't think this will solve anything. You're calling schedule()
>> right after the long running hypercall just nanoseconds before returning
>> to the user.
>
> Yeah, well that is what [1] tried as well only it tried using
> preempt_schedule_irq() on the hypercall callback...

No. My patch added a schedule point in the middle of a hypercall on the
return from an interrupt (e.g., the timer interrupt).

David

2014-12-01 13:32:48

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Mon, Dec 01, 2014 at 11:01:18AM +0000, David Vrabel wrote:
> On 28/11/14 04:49, Juergen Gross wrote:
> > On 11/27/2014 07:50 PM, Andrew Cooper wrote:
> >>
> >> XenServer uses
> >>
> >> https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
> >>
> >>
> >> to deal with these issues. That patch is based on 3.10.
> >
> > Clever. :-)
> >
> >>
> >> I can remember whether this has been submitted upstream before (and
> >> there were outstanding issues), or whether it fell at an inconvenient
> >> time with our development cycles.
> >
> > I found
> >
> > http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html
> >
> > and nothing else.
>
> I dropped it because it copy-and-paste a bunch of otherwise generic x86
> assembler and looked unlikely to get an x86 maintainer ack. If you
> think otherwise, feel free to pick it up and run with it.

I was trying to run with it, but my biggest gripe with this was
the use of preempt_schedule_irq(), but we can review that on the
other thread.

Luis

2014-12-01 14:42:39

by Jürgen Groß

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On 12/01/2014 02:32 PM, Luis R. Rodriguez wrote:
> On Mon, Dec 01, 2014 at 11:01:18AM +0000, David Vrabel wrote:
>> On 28/11/14 04:49, Juergen Gross wrote:
>>> On 11/27/2014 07:50 PM, Andrew Cooper wrote:
>>>>
>>>> XenServer uses
>>>>
>>>> https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
>>>>
>>>>
>>>> to deal with these issues. That patch is based on 3.10.
>>>
>>> Clever. :-)
>>>
>>>>
>>>> I can remember whether this has been submitted upstream before (and
>>>> there were outstanding issues), or whether it fell at an inconvenient
>>>> time with our development cycles.
>>>
>>> I found
>>>
>>> http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html
>>>
>>> and nothing else.
>>
>> I dropped it because it copy-and-paste a bunch of otherwise generic x86
>> assembler and looked unlikely to get an x86 maintainer ack. If you
>> think otherwise, feel free to pick it up and run with it.
>
> I was trying to run with it, but my biggest gripe with this was
> the use of preempt_schedule_irq(), but we can review that on the
> other thread.

I think this can be handled completely inside xen_evtchn_do_upcall().

xen_preemptible_hcall_begin() (possibly with another name) could take
the pointer of a function as parameter which is used as continuation
point after an asynchronous interrupt in the critical section.
Parameter for that function would be the exception frame of the
original interrupt to be able to continue at the interrupted position
after e.g. calling schedule().


Juergen

2014-12-01 15:05:54

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Mon, Dec 01, 2014 at 11:11:43AM +0000, David Vrabel wrote:
> On 27/11/14 18:36, Luis R. Rodriguez wrote:
> > On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
> >> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
> >>> From: "Luis R. Rodriguez" <[email protected]>
> >>>
> >>> Some folks had reported that some xen hypercalls take a long time
> >>> to complete when issued from the userspace private ioctl mechanism,
> >>> this can happen for instance with some hypercalls that have many
> >>> sub-operations, this can happen for instance on hypercalls that use
> [...]
> >>> --- a/drivers/xen/privcmd.c
> >>> +++ b/drivers/xen/privcmd.c
> >>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
> >>> hypercall.arg[0], hypercall.arg[1],
> >>> hypercall.arg[2], hypercall.arg[3],
> >>> hypercall.arg[4]);
> >>> +#ifndef CONFIG_PREEMPT
> >>> + schedule();
> >>> +#endif
>
> As Juergen points out, this does nothing. You need to schedule while in
> the middle of the hypercall.
>
> Remember that Xen's hypercall preemption only preempts the hypercall to
> run interrupts in the guest.

How is it ensured that when the kernel preempts on this code path on
CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?

> >>>
> >>> return ret;
> >>> }
> >>>
> >>
> >> Sorry, I don't think this will solve anything. You're calling schedule()
> >> right after the long running hypercall just nanoseconds before returning
> >> to the user.
> >
> > Yeah, well that is what [1] tried as well only it tried using
> > preempt_schedule_irq() on the hypercall callback...
>
> No. My patch added a schedule point in the middle of a hypercall on the
> return from an interrupt (e.g., the timer interrupt).

OK that provides much better context and given that I do see the above hunk as
pointless. I was completely misrepresenting what the callback was for. Now --
just to address my issues with the use of preempt_schedule_irq(). If the above
is addressed that I think should address most of my concerns, if we can figure
out a way to not deal with it to be arch specific that'd be neat, and if we
could not have to ifdef around stuff even better.

Luis

2014-12-01 15:18:39

by David Vrabel

[permalink] [raw]
Subject: Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On 01/12/14 15:05, Luis R. Rodriguez wrote:
> On Mon, Dec 01, 2014 at 11:11:43AM +0000, David Vrabel wrote:
>> On 27/11/14 18:36, Luis R. Rodriguez wrote:
>>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
>>>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
>>>>> From: "Luis R. Rodriguez" <[email protected]>
>>>>>
>>>>> Some folks had reported that some xen hypercalls take a long time
>>>>> to complete when issued from the userspace private ioctl mechanism,
>>>>> this can happen for instance with some hypercalls that have many
>>>>> sub-operations, this can happen for instance on hypercalls that use
>> [...]
>>>>> --- a/drivers/xen/privcmd.c
>>>>> +++ b/drivers/xen/privcmd.c
>>>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
>>>>> hypercall.arg[0], hypercall.arg[1],
>>>>> hypercall.arg[2], hypercall.arg[3],
>>>>> hypercall.arg[4]);
>>>>> +#ifndef CONFIG_PREEMPT
>>>>> + schedule();
>>>>> +#endif
>>
>> As Juergen points out, this does nothing. You need to schedule while in
>> the middle of the hypercall.
>>
>> Remember that Xen's hypercall preemption only preempts the hypercall to
>> run interrupts in the guest.
>
> How is it ensured that when the kernel preempts on this code path on
> CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?

Sorry, I really didn't describe this very well.

If a hypercall needs a continuation, Xen returns to the guest with the
IP set to the hypercall instruction, and on the way back to the guest
Xen may schedule a different VCPU or it will do any upcalls (as per normal).

The guest is free to return from the upcall to the original task
(continuing the hypercall) or to a different one.

David

2014-12-01 15:45:15

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel <[email protected]> wrote:
> On 01/12/14 15:05, Luis R. Rodriguez wrote:
>> On Mon, Dec 01, 2014 at 11:11:43AM +0000, David Vrabel wrote:
>>> On 27/11/14 18:36, Luis R. Rodriguez wrote:
>>>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
>>>>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
>>>>>> From: "Luis R. Rodriguez" <[email protected]>
>>>>>>
>>>>>> Some folks had reported that some xen hypercalls take a long time
>>>>>> to complete when issued from the userspace private ioctl mechanism,
>>>>>> this can happen for instance with some hypercalls that have many
>>>>>> sub-operations, this can happen for instance on hypercalls that use
>>> [...]
>>>>>> --- a/drivers/xen/privcmd.c
>>>>>> +++ b/drivers/xen/privcmd.c
>>>>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
>>>>>> hypercall.arg[0], hypercall.arg[1],
>>>>>> hypercall.arg[2], hypercall.arg[3],
>>>>>> hypercall.arg[4]);
>>>>>> +#ifndef CONFIG_PREEMPT
>>>>>> + schedule();
>>>>>> +#endif
>>>
>>> As Juergen points out, this does nothing. You need to schedule while in
>>> the middle of the hypercall.
>>>
>>> Remember that Xen's hypercall preemption only preempts the hypercall to
>>> run interrupts in the guest.
>>
>> How is it ensured that when the kernel preempts on this code path on
>> CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?
>
> Sorry, I really didn't describe this very well.
>
> If a hypercall needs a continuation, Xen returns to the guest with the
> IP set to the hypercall instruction, and on the way back to the guest
> Xen may schedule a different VCPU or it will do any upcalls (as per normal).
>
> The guest is free to return from the upcall to the original task
> (continuing the hypercall) or to a different one.

OK so that addresses what Xen will do when using continuation and
hypercall preemption, my concern here was that using
preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
hypercall on the return from an interrupt (e.g., the timer interrupt)
would still let the kernel preempt to tasks other than those related
to Xen.

My gripe was that if this was being done it'd be a bit abusive of the
API even if its safe.

Luis

2014-12-01 15:50:39

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Mon, Dec 01, 2014 at 03:42:33PM +0100, Juergen Gross wrote:
> On 12/01/2014 02:32 PM, Luis R. Rodriguez wrote:
>> On Mon, Dec 01, 2014 at 11:01:18AM +0000, David Vrabel wrote:
>>> On 28/11/14 04:49, Juergen Gross wrote:
>>>> On 11/27/2014 07:50 PM, Andrew Cooper wrote:
>>>>>
>>>>> XenServer uses
>>>>>
>>>>> https://github.com/xenserver/linux-3.x.pg/blob/master/master/0001-x86-xen-allow-privcmd-hypercalls-to-be-preempted.patch
>>>>>
>>>>>
>>>>> to deal with these issues. That patch is based on 3.10.
>>>>
>>>> Clever. :-)
>>>>
>>>>>
>>>>> I can remember whether this has been submitted upstream before (and
>>>>> there were outstanding issues), or whether it fell at an inconvenient
>>>>> time with our development cycles.
>>>>
>>>> I found
>>>>
>>>> http://lists.xen.org/archives/html/xen-devel/2014-02/msg02540.html
>>>>
>>>> and nothing else.
>>>
>>> I dropped it because it copy-and-paste a bunch of otherwise generic x86
>>> assembler and looked unlikely to get an x86 maintainer ack. If you
>>> think otherwise, feel free to pick it up and run with it.
>>
>> I was trying to run with it, but my biggest gripe with this was
>> the use of preempt_schedule_irq(), but we can review that on the
>> other thread.

So much for the other thread :)

> I think this can be handled completely inside xen_evtchn_do_upcall().
>
> xen_preemptible_hcall_begin() (possibly with another name) could take
> the pointer of a function as parameter which is used as continuation
> point after an asynchronous interrupt in the critical section.

Oh so we'd only preempt to one specific task?

> Parameter for that function would be the exception frame of the
> original interrupt to be able to continue at the interrupted position
> after e.g. calling schedule().

Interesting... if reasonable I wonder if this should be generalized.
Are there other CONFIG_PREEMPT=n use cases for such excemptions?

Luis

2014-12-01 15:54:31

by David Vrabel

[permalink] [raw]
Subject: Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On 01/12/14 15:44, Luis R. Rodriguez wrote:
> On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel <[email protected]> wrote:
>> On 01/12/14 15:05, Luis R. Rodriguez wrote:
>>> On Mon, Dec 01, 2014 at 11:11:43AM +0000, David Vrabel wrote:
>>>> On 27/11/14 18:36, Luis R. Rodriguez wrote:
>>>>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
>>>>>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
>>>>>>> From: "Luis R. Rodriguez" <[email protected]>
>>>>>>>
>>>>>>> Some folks had reported that some xen hypercalls take a long time
>>>>>>> to complete when issued from the userspace private ioctl mechanism,
>>>>>>> this can happen for instance with some hypercalls that have many
>>>>>>> sub-operations, this can happen for instance on hypercalls that use
>>>> [...]
>>>>>>> --- a/drivers/xen/privcmd.c
>>>>>>> +++ b/drivers/xen/privcmd.c
>>>>>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
>>>>>>> hypercall.arg[0], hypercall.arg[1],
>>>>>>> hypercall.arg[2], hypercall.arg[3],
>>>>>>> hypercall.arg[4]);
>>>>>>> +#ifndef CONFIG_PREEMPT
>>>>>>> + schedule();
>>>>>>> +#endif
>>>>
>>>> As Juergen points out, this does nothing. You need to schedule while in
>>>> the middle of the hypercall.
>>>>
>>>> Remember that Xen's hypercall preemption only preempts the hypercall to
>>>> run interrupts in the guest.
>>>
>>> How is it ensured that when the kernel preempts on this code path on
>>> CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?
>>
>> Sorry, I really didn't describe this very well.
>>
>> If a hypercall needs a continuation, Xen returns to the guest with the
>> IP set to the hypercall instruction, and on the way back to the guest
>> Xen may schedule a different VCPU or it will do any upcalls (as per normal).
>>
>> The guest is free to return from the upcall to the original task
>> (continuing the hypercall) or to a different one.
>
> OK so that addresses what Xen will do when using continuation and
> hypercall preemption, my concern here was that using
> preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
> hypercall on the return from an interrupt (e.g., the timer interrupt)
> would still let the kernel preempt to tasks other than those related
> to Xen.

Um. Why would that be a problem? We do want to switch to any task the
Linux scheduler thinks is best.

David

2014-12-01 16:19:12

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Mon, Dec 01, 2014 at 03:54:24PM +0000, David Vrabel wrote:
> On 01/12/14 15:44, Luis R. Rodriguez wrote:
> > On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel <[email protected]> wrote:
> >> On 01/12/14 15:05, Luis R. Rodriguez wrote:
> >>> On Mon, Dec 01, 2014 at 11:11:43AM +0000, David Vrabel wrote:
> >>>> On 27/11/14 18:36, Luis R. Rodriguez wrote:
> >>>>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
> >>>>>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
> >>>>>>> From: "Luis R. Rodriguez" <[email protected]>
> >>>>>>>
> >>>>>>> Some folks had reported that some xen hypercalls take a long time
> >>>>>>> to complete when issued from the userspace private ioctl mechanism,
> >>>>>>> this can happen for instance with some hypercalls that have many
> >>>>>>> sub-operations, this can happen for instance on hypercalls that use
> >>>> [...]
> >>>>>>> --- a/drivers/xen/privcmd.c
> >>>>>>> +++ b/drivers/xen/privcmd.c
> >>>>>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
> >>>>>>> hypercall.arg[0], hypercall.arg[1],
> >>>>>>> hypercall.arg[2], hypercall.arg[3],
> >>>>>>> hypercall.arg[4]);
> >>>>>>> +#ifndef CONFIG_PREEMPT
> >>>>>>> + schedule();
> >>>>>>> +#endif
> >>>>
> >>>> As Juergen points out, this does nothing. You need to schedule while in
> >>>> the middle of the hypercall.
> >>>>
> >>>> Remember that Xen's hypercall preemption only preempts the hypercall to
> >>>> run interrupts in the guest.
> >>>
> >>> How is it ensured that when the kernel preempts on this code path on
> >>> CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?
> >>
> >> Sorry, I really didn't describe this very well.
> >>
> >> If a hypercall needs a continuation, Xen returns to the guest with the
> >> IP set to the hypercall instruction, and on the way back to the guest
> >> Xen may schedule a different VCPU or it will do any upcalls (as per normal).
> >>
> >> The guest is free to return from the upcall to the original task
> >> (continuing the hypercall) or to a different one.
> >
> > OK so that addresses what Xen will do when using continuation and
> > hypercall preemption, my concern here was that using
> > preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
> > hypercall on the return from an interrupt (e.g., the timer interrupt)
> > would still let the kernel preempt to tasks other than those related
> > to Xen.
>
> Um. Why would that be a problem? We do want to switch to any task the
> Linux scheduler thinks is best.

Its safe but -- it technically is doing kernel preemption, unless we want
to adjust the definition of CONFIG_PREEMPT=n to exclude hypercalls. This
was my original concern with the use of preempt_schedule_irq() to do this.
I am afraid of setting precedents without being clear or wider review and
acceptance.

Luis

2014-12-01 17:08:02

by Jürgen Groß

[permalink] [raw]
Subject: Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On 12/01/2014 05:19 PM, Luis R. Rodriguez wrote:
> On Mon, Dec 01, 2014 at 03:54:24PM +0000, David Vrabel wrote:
>> On 01/12/14 15:44, Luis R. Rodriguez wrote:
>>> On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel <[email protected]> wrote:
>>>> On 01/12/14 15:05, Luis R. Rodriguez wrote:
>>>>> On Mon, Dec 01, 2014 at 11:11:43AM +0000, David Vrabel wrote:
>>>>>> On 27/11/14 18:36, Luis R. Rodriguez wrote:
>>>>>>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
>>>>>>>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
>>>>>>>>> From: "Luis R. Rodriguez" <[email protected]>
>>>>>>>>>
>>>>>>>>> Some folks had reported that some xen hypercalls take a long time
>>>>>>>>> to complete when issued from the userspace private ioctl mechanism,
>>>>>>>>> this can happen for instance with some hypercalls that have many
>>>>>>>>> sub-operations, this can happen for instance on hypercalls that use
>>>>>> [...]
>>>>>>>>> --- a/drivers/xen/privcmd.c
>>>>>>>>> +++ b/drivers/xen/privcmd.c
>>>>>>>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
>>>>>>>>> hypercall.arg[0], hypercall.arg[1],
>>>>>>>>> hypercall.arg[2], hypercall.arg[3],
>>>>>>>>> hypercall.arg[4]);
>>>>>>>>> +#ifndef CONFIG_PREEMPT
>>>>>>>>> + schedule();
>>>>>>>>> +#endif
>>>>>>
>>>>>> As Juergen points out, this does nothing. You need to schedule while in
>>>>>> the middle of the hypercall.
>>>>>>
>>>>>> Remember that Xen's hypercall preemption only preempts the hypercall to
>>>>>> run interrupts in the guest.
>>>>>
>>>>> How is it ensured that when the kernel preempts on this code path on
>>>>> CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?
>>>>
>>>> Sorry, I really didn't describe this very well.
>>>>
>>>> If a hypercall needs a continuation, Xen returns to the guest with the
>>>> IP set to the hypercall instruction, and on the way back to the guest
>>>> Xen may schedule a different VCPU or it will do any upcalls (as per normal).
>>>>
>>>> The guest is free to return from the upcall to the original task
>>>> (continuing the hypercall) or to a different one.
>>>
>>> OK so that addresses what Xen will do when using continuation and
>>> hypercall preemption, my concern here was that using
>>> preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
>>> hypercall on the return from an interrupt (e.g., the timer interrupt)
>>> would still let the kernel preempt to tasks other than those related
>>> to Xen.
>>
>> Um. Why would that be a problem? We do want to switch to any task the
>> Linux scheduler thinks is best.
>
> Its safe but -- it technically is doing kernel preemption, unless we want
> to adjust the definition of CONFIG_PREEMPT=n to exclude hypercalls. This
> was my original concern with the use of preempt_schedule_irq() to do this.
> I am afraid of setting precedents without being clear or wider review and
> acceptance.

I wonder whether it would be more acceptable to add (or completely
switch to) another preemption model: PREEMPT_SWITCHABLE. This would be
similar to CONFIG_PREEMPT, but the "normal" value of __preempt_count
would be settable via kernel parameter (default 2):

0: preempt
1: preempt_voluntary
2: preempt_none

The kernel would run with preemption enabled. cond_sched() would
reschedule if __preempt_count <= 1. And in case of long running kernel
activities (like the hypercall case or other stuff requiring schedule()
calls to avoid hangups) we would just set __preempt_count to 0 during
these periods and restore the old value afterwards.

This would be a rather intrusive but clean change IMO.

Any thoughts?


Juergen

2014-12-01 17:52:28

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Mon, Dec 01, 2014 at 06:07:48PM +0100, Juergen Gross wrote:
> On 12/01/2014 05:19 PM, Luis R. Rodriguez wrote:
>> On Mon, Dec 01, 2014 at 03:54:24PM +0000, David Vrabel wrote:
>>> On 01/12/14 15:44, Luis R. Rodriguez wrote:
>>>> On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel <[email protected]> wrote:
>>>>> On 01/12/14 15:05, Luis R. Rodriguez wrote:
>>>>>> On Mon, Dec 01, 2014 at 11:11:43AM +0000, David Vrabel wrote:
>>>>>>> On 27/11/14 18:36, Luis R. Rodriguez wrote:
>>>>>>>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
>>>>>>>>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
>>>>>>>>>> From: "Luis R. Rodriguez" <[email protected]>
>>>>>>>>>>
>>>>>>>>>> Some folks had reported that some xen hypercalls take a long time
>>>>>>>>>> to complete when issued from the userspace private ioctl mechanism,
>>>>>>>>>> this can happen for instance with some hypercalls that have many
>>>>>>>>>> sub-operations, this can happen for instance on hypercalls that use
>>>>>>> [...]
>>>>>>>>>> --- a/drivers/xen/privcmd.c
>>>>>>>>>> +++ b/drivers/xen/privcmd.c
>>>>>>>>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
>>>>>>>>>> hypercall.arg[0], hypercall.arg[1],
>>>>>>>>>> hypercall.arg[2], hypercall.arg[3],
>>>>>>>>>> hypercall.arg[4]);
>>>>>>>>>> +#ifndef CONFIG_PREEMPT
>>>>>>>>>> + schedule();
>>>>>>>>>> +#endif
>>>>>>>
>>>>>>> As Juergen points out, this does nothing. You need to schedule while in
>>>>>>> the middle of the hypercall.
>>>>>>>
>>>>>>> Remember that Xen's hypercall preemption only preempts the hypercall to
>>>>>>> run interrupts in the guest.
>>>>>>
>>>>>> How is it ensured that when the kernel preempts on this code path on
>>>>>> CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?
>>>>>
>>>>> Sorry, I really didn't describe this very well.
>>>>>
>>>>> If a hypercall needs a continuation, Xen returns to the guest with the
>>>>> IP set to the hypercall instruction, and on the way back to the guest
>>>>> Xen may schedule a different VCPU or it will do any upcalls (as per normal).
>>>>>
>>>>> The guest is free to return from the upcall to the original task
>>>>> (continuing the hypercall) or to a different one.
>>>>
>>>> OK so that addresses what Xen will do when using continuation and
>>>> hypercall preemption, my concern here was that using
>>>> preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
>>>> hypercall on the return from an interrupt (e.g., the timer interrupt)
>>>> would still let the kernel preempt to tasks other than those related
>>>> to Xen.
>>>
>>> Um. Why would that be a problem? We do want to switch to any task the
>>> Linux scheduler thinks is best.
>>
>> Its safe but -- it technically is doing kernel preemption, unless we want
>> to adjust the definition of CONFIG_PREEMPT=n to exclude hypercalls. This
>> was my original concern with the use of preempt_schedule_irq() to do this.
>> I am afraid of setting precedents without being clear or wider review and
>> acceptance.
>
> I wonder whether it would be more acceptable to add (or completely
> switch to) another preemption model: PREEMPT_SWITCHABLE. This would be
> similar to CONFIG_PREEMPT, but the "normal" value of __preempt_count
> would be settable via kernel parameter (default 2):
>
> 0: preempt
> 1: preempt_voluntary
> 2: preempt_none
>
> The kernel would run with preemption enabled. cond_sched() would
> reschedule if __preempt_count <= 1. And in case of long running kernel
> activities (like the hypercall case or other stuff requiring schedule()
> calls to avoid hangups) we would just set __preempt_count to 0 during
> these periods and restore the old value afterwards.
>
> This would be a rather intrusive but clean change IMO.
>
> Any thoughts?

I like the idea of dynamically changing at run time the preemption model and
personally find this reasonable, however I am not certain if this would
introduce a series of issues hard to address. Thoughts by others who linger
deep in the cold lonely scheduler caves ?

Luis

2014-12-01 18:16:36

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On 01/12/14 16:19, Luis R. Rodriguez wrote:
> On Mon, Dec 01, 2014 at 03:54:24PM +0000, David Vrabel wrote:
>> On 01/12/14 15:44, Luis R. Rodriguez wrote:
>>> On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel <[email protected]> wrote:
>>>> On 01/12/14 15:05, Luis R. Rodriguez wrote:
>>>>> On Mon, Dec 01, 2014 at 11:11:43AM +0000, David Vrabel wrote:
>>>>>> On 27/11/14 18:36, Luis R. Rodriguez wrote:
>>>>>>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
>>>>>>>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
>>>>>>>>> From: "Luis R. Rodriguez" <[email protected]>
>>>>>>>>>
>>>>>>>>> Some folks had reported that some xen hypercalls take a long time
>>>>>>>>> to complete when issued from the userspace private ioctl mechanism,
>>>>>>>>> this can happen for instance with some hypercalls that have many
>>>>>>>>> sub-operations, this can happen for instance on hypercalls that use
>>>>>> [...]
>>>>>>>>> --- a/drivers/xen/privcmd.c
>>>>>>>>> +++ b/drivers/xen/privcmd.c
>>>>>>>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
>>>>>>>>> hypercall.arg[0], hypercall.arg[1],
>>>>>>>>> hypercall.arg[2], hypercall.arg[3],
>>>>>>>>> hypercall.arg[4]);
>>>>>>>>> +#ifndef CONFIG_PREEMPT
>>>>>>>>> + schedule();
>>>>>>>>> +#endif
>>>>>>
>>>>>> As Juergen points out, this does nothing. You need to schedule while in
>>>>>> the middle of the hypercall.
>>>>>>
>>>>>> Remember that Xen's hypercall preemption only preempts the hypercall to
>>>>>> run interrupts in the guest.
>>>>>
>>>>> How is it ensured that when the kernel preempts on this code path on
>>>>> CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?
>>>>
>>>> Sorry, I really didn't describe this very well.
>>>>
>>>> If a hypercall needs a continuation, Xen returns to the guest with the
>>>> IP set to the hypercall instruction, and on the way back to the guest
>>>> Xen may schedule a different VCPU or it will do any upcalls (as per normal).
>>>>
>>>> The guest is free to return from the upcall to the original task
>>>> (continuing the hypercall) or to a different one.
>>>
>>> OK so that addresses what Xen will do when using continuation and
>>> hypercall preemption, my concern here was that using
>>> preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
>>> hypercall on the return from an interrupt (e.g., the timer interrupt)
>>> would still let the kernel preempt to tasks other than those related
>>> to Xen.
>>
>> Um. Why would that be a problem? We do want to switch to any task the
>> Linux scheduler thinks is best.
>
> Its safe but -- it technically is doing kernel preemption, unless we want
> to adjust the definition of CONFIG_PREEMPT=n to exclude hypercalls. This
> was my original concern with the use of preempt_schedule_irq() to do this.
> I am afraid of setting precedents without being clear or wider review and
> acceptance.

It's voluntary preemption at a well defined point. It's no different to
a cond_resched() call.

Note that we're not trying to fix this for the non-voluntary-preempt
kernels.

David

2014-12-01 22:36:15

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Mon, Dec 01, 2014 at 06:16:28PM +0000, David Vrabel wrote:
> On 01/12/14 16:19, Luis R. Rodriguez wrote:
> > On Mon, Dec 01, 2014 at 03:54:24PM +0000, David Vrabel wrote:
> >> On 01/12/14 15:44, Luis R. Rodriguez wrote:
> >>> On Mon, Dec 1, 2014 at 10:18 AM, David Vrabel <[email protected]> wrote:
> >>>> On 01/12/14 15:05, Luis R. Rodriguez wrote:
> >>>>> On Mon, Dec 01, 2014 at 11:11:43AM +0000, David Vrabel wrote:
> >>>>>> On 27/11/14 18:36, Luis R. Rodriguez wrote:
> >>>>>>> On Thu, Nov 27, 2014 at 07:36:31AM +0100, Juergen Gross wrote:
> >>>>>>>> On 11/26/2014 11:26 PM, Luis R. Rodriguez wrote:
> >>>>>>>>> From: "Luis R. Rodriguez" <[email protected]>
> >>>>>>>>>
> >>>>>>>>> Some folks had reported that some xen hypercalls take a long time
> >>>>>>>>> to complete when issued from the userspace private ioctl mechanism,
> >>>>>>>>> this can happen for instance with some hypercalls that have many
> >>>>>>>>> sub-operations, this can happen for instance on hypercalls that use
> >>>>>> [...]
> >>>>>>>>> --- a/drivers/xen/privcmd.c
> >>>>>>>>> +++ b/drivers/xen/privcmd.c
> >>>>>>>>> @@ -60,6 +60,9 @@ static long privcmd_ioctl_hypercall(void __user *udata)
> >>>>>>>>> hypercall.arg[0], hypercall.arg[1],
> >>>>>>>>> hypercall.arg[2], hypercall.arg[3],
> >>>>>>>>> hypercall.arg[4]);
> >>>>>>>>> +#ifndef CONFIG_PREEMPT
> >>>>>>>>> + schedule();
> >>>>>>>>> +#endif
> >>>>>>
> >>>>>> As Juergen points out, this does nothing. You need to schedule while in
> >>>>>> the middle of the hypercall.
> >>>>>>
> >>>>>> Remember that Xen's hypercall preemption only preempts the hypercall to
> >>>>>> run interrupts in the guest.
> >>>>>
> >>>>> How is it ensured that when the kernel preempts on this code path on
> >>>>> CONFIG_PREEMPT=n kernel that only interrupts in the guest are run?
> >>>>
> >>>> Sorry, I really didn't describe this very well.
> >>>>
> >>>> If a hypercall needs a continuation, Xen returns to the guest with the
> >>>> IP set to the hypercall instruction, and on the way back to the guest
> >>>> Xen may schedule a different VCPU or it will do any upcalls (as per normal).
> >>>>
> >>>> The guest is free to return from the upcall to the original task
> >>>> (continuing the hypercall) or to a different one.
> >>>
> >>> OK so that addresses what Xen will do when using continuation and
> >>> hypercall preemption, my concern here was that using
> >>> preempt_schedule_irq() on CONFIG_PREEMPT=n kernels in the middle of a
> >>> hypercall on the return from an interrupt (e.g., the timer interrupt)
> >>> would still let the kernel preempt to tasks other than those related
> >>> to Xen.
> >>
> >> Um. Why would that be a problem? We do want to switch to any task the
> >> Linux scheduler thinks is best.
> >
> > Its safe but -- it technically is doing kernel preemption, unless we want
> > to adjust the definition of CONFIG_PREEMPT=n to exclude hypercalls. This
> > was my original concern with the use of preempt_schedule_irq() to do this.
> > I am afraid of setting precedents without being clear or wider review and
> > acceptance.
>
> It's voluntary preemption at a well defined point.

Its voluntarily preempting the kernel even for CONFIG_PREEMPT=n kernels...

> It's no different to a cond_resched() call.

Then I do agree its a fair analogy (and find this obviously odd that how
widespread cond_resched() is), we just don't have an equivalent for IRQ
context, why not avoid the special check then and use this all the time in the
middle of a hypercall on the return from an interrupt (e.g., the timer
interrupt)?

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5e344bb..e60b5a1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2759,6 +2759,12 @@ static inline int signal_pending_state(long state, struct task_struct *p)
*/
extern int _cond_resched(void);

+/*
+ * Voluntarily preempting the kernel even for CONFIG_PREEMPT=n kernels
+ * on very special circumstances.
+ */
+extern int cond_resched_irq(void);
+
#define cond_resched() ({ \
__might_sleep(__FILE__, __LINE__, 0); \
_cond_resched(); \
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 240157c..1c4d443 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4264,6 +4264,16 @@ int __sched _cond_resched(void)
}
EXPORT_SYMBOL(_cond_resched);

+int __sched cond_resched_irq(void)
+{
+ if (should_resched()) {
+ preempt_schedule_irq();
+ return 1;
+ }
+ return 0;
+}
+EXPORT_SYMBOL_GPL(cond_resched_irq);
+
/*
* __cond_resched_lock() - if a reschedule is pending, drop the given lock,
* call schedule, and on return reacquire the lock.
Luis

2014-12-02 11:11:24

by David Vrabel

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On 01/12/14 22:36, Luis R. Rodriguez wrote:
>
> Then I do agree its a fair analogy (and find this obviously odd that how
> widespread cond_resched() is), we just don't have an equivalent for IRQ
> context, why not avoid the special check then and use this all the time in the
> middle of a hypercall on the return from an interrupt (e.g., the timer
> interrupt)?

http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html

David

2014-12-03 02:28:38

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Tue, Dec 02, 2014 at 11:11:18AM +0000, David Vrabel wrote:
> On 01/12/14 22:36, Luis R. Rodriguez wrote:
> >
> > Then I do agree its a fair analogy (and find this obviously odd that how
> > widespread cond_resched() is), we just don't have an equivalent for IRQ
> > context, why not avoid the special check then and use this all the time in the
> > middle of a hypercall on the return from an interrupt (e.g., the timer
> > interrupt)?
>
> http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html

OK thanks! That explains why we need some asm code but in that submission you
still also had used is_preemptible_hypercall(regs) and in the new
implementation you use a CPU variable xen_in_preemptible_hcall prior to calling
preempt_schedule_irq(). I believe you added the CPU variable because
preempt_schedule_irq() will preempt first without any checks if it should, I'm
asking why not do something like cond_resched_irq() where we check with
should_resched() prior to preempting and that way we can avoid having to use
the CPU variable?

Luis

2014-12-03 04:37:56

by Jürgen Groß

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On 12/03/2014 03:28 AM, Luis R. Rodriguez wrote:
> On Tue, Dec 02, 2014 at 11:11:18AM +0000, David Vrabel wrote:
>> On 01/12/14 22:36, Luis R. Rodriguez wrote:
>>>
>>> Then I do agree its a fair analogy (and find this obviously odd that how
>>> widespread cond_resched() is), we just don't have an equivalent for IRQ
>>> context, why not avoid the special check then and use this all the time in the
>>> middle of a hypercall on the return from an interrupt (e.g., the timer
>>> interrupt)?
>>
>> http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html
>
> OK thanks! That explains why we need some asm code but in that submission you
> still also had used is_preemptible_hypercall(regs) and in the new
> implementation you use a CPU variable xen_in_preemptible_hcall prior to calling
> preempt_schedule_irq(). I believe you added the CPU variable because
> preempt_schedule_irq() will preempt first without any checks if it should, I'm
> asking why not do something like cond_resched_irq() where we check with
> should_resched() prior to preempting and that way we can avoid having to use
> the CPU variable?

Because that could preempt at any asynchronous interrupt making the
no-preempt kernel fully preemptive. How would you know you are just
doing a critical hypercall which should be preempted?

Juergen

2014-12-03 19:39:51

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Wed, Dec 03, 2014 at 05:37:51AM +0100, Juergen Gross wrote:
> On 12/03/2014 03:28 AM, Luis R. Rodriguez wrote:
>> On Tue, Dec 02, 2014 at 11:11:18AM +0000, David Vrabel wrote:
>>> On 01/12/14 22:36, Luis R. Rodriguez wrote:
>>>>
>>>> Then I do agree its a fair analogy (and find this obviously odd that how
>>>> widespread cond_resched() is), we just don't have an equivalent for IRQ
>>>> context, why not avoid the special check then and use this all the time in the
>>>> middle of a hypercall on the return from an interrupt (e.g., the timer
>>>> interrupt)?
>>>
>>> http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html
>>
>> OK thanks! That explains why we need some asm code but in that submission you
>> still also had used is_preemptible_hypercall(regs) and in the new
>> implementation you use a CPU variable xen_in_preemptible_hcall prior to calling
>> preempt_schedule_irq(). I believe you added the CPU variable because
>> preempt_schedule_irq() will preempt first without any checks if it should, I'm
>> asking why not do something like cond_resched_irq() where we check with
>> should_resched() prior to preempting and that way we can avoid having to use
>> the CPU variable?
>
> Because that could preempt at any asynchronous interrupt making the
> no-preempt kernel fully preemptive.

OK yeah I see. That still doesn't negate the value of using something
like cond_resched_irq() with a should_resched() on only critical hypercalls.
The current implementation (patch by David) forces preemption without
checking for should_resched() so it would preempt unnecessarily at least
once.

> How would you know you are just
> doing a critical hypercall which should be preempted?

You would not, you're right. I was just trying to see if we could generalize
an API for this to avoid having users having to create their own CPU variables
but this all seems very specialized as we want to use this on the timer
so if we do generalize a cond_resched_irq() perhaps the documentation can
warn about this type of case or abuse.

Luis

2014-12-05 16:20:55

by Luis Chamberlain

[permalink] [raw]
Subject: Re: [Xen-devel] [PATCH] xen: privcmd: schedule() after private hypercall when non CONFIG_PREEMPT

On Wed, Dec 03, 2014 at 08:39:47PM +0100, Luis R. Rodriguez wrote:
> On Wed, Dec 03, 2014 at 05:37:51AM +0100, Juergen Gross wrote:
> > On 12/03/2014 03:28 AM, Luis R. Rodriguez wrote:
> >> On Tue, Dec 02, 2014 at 11:11:18AM +0000, David Vrabel wrote:
> >>> On 01/12/14 22:36, Luis R. Rodriguez wrote:
> >>>>
> >>>> Then I do agree its a fair analogy (and find this obviously odd that how
> >>>> widespread cond_resched() is), we just don't have an equivalent for IRQ
> >>>> context, why not avoid the special check then and use this all the time in the
> >>>> middle of a hypercall on the return from an interrupt (e.g., the timer
> >>>> interrupt)?
> >>>
> >>> http://lists.xen.org/archives/html/xen-devel/2014-02/msg01101.html
> >>
> >> OK thanks! That explains why we need some asm code but in that submission you
> >> still also had used is_preemptible_hypercall(regs) and in the new
> >> implementation you use a CPU variable xen_in_preemptible_hcall prior to calling
> >> preempt_schedule_irq(). I believe you added the CPU variable because
> >> preempt_schedule_irq() will preempt first without any checks if it should, I'm
> >> asking why not do something like cond_resched_irq() where we check with
> >> should_resched() prior to preempting and that way we can avoid having to use
> >> the CPU variable?
> >
> > Because that could preempt at any asynchronous interrupt making the
> > no-preempt kernel fully preemptive.
>
> OK yeah I see. That still doesn't negate the value of using something
> like cond_resched_irq() with a should_resched() on only critical hypercalls.
> The current implementation (patch by David) forces preemption without
> checking for should_resched() so it would preempt unnecessarily at least
> once.
>
> > How would you know you are just
> > doing a critical hypercall which should be preempted?
>
> You would not, you're right. I was just trying to see if we could generalize
> an API for this to avoid having users having to create their own CPU variables
> but this all seems very specialized as we want to use this on the timer
> so if we do generalize a cond_resched_irq() perhaps the documentation can
> warn about this type of case or abuse.

David's patch had the check only it was x86 based, if we use cond_resched_irq()
we can leave that aspect out to be done through asm inlines or it'll use the
generic shoudl_resched(), that should save some code on the asm implementations.

I have some patches now which generalizees this, I also have more information
about this can happen exactly, and a way to triggger it on small systems with
some hacks to emulate possibly backend behaviour on larger systems. In the worst
case this can be a dangerious situation to be in. I'll send some new RFTs.

Luis