From: Zide Chen <[email protected]>
Implement a new "system s2idle" hypercall allowing to notify the
hypervisor that the guest is entering s2idle power state.
Without introducing this hypercall, hypervisor can not trap on any
register write or any other VM exit while the guest is entering s2idle
state.
Co-developed-by: Peter Fang <[email protected]>
Signed-off-by: Peter Fang <[email protected]>
Co-developed-by: Tomasz Nowicki <[email protected]>
Signed-off-by: Tomasz Nowicki <[email protected]>
Signed-off-by: Zide Chen <[email protected]>
Co-developed-by: Grzegorz Jaszczyk <[email protected]>
Signed-off-by: Grzegorz Jaszczyk <[email protected]>
---
Documentation/virt/kvm/x86/hypercalls.rst | 7 +++++++
arch/x86/kvm/x86.c | 3 +++
drivers/acpi/x86/s2idle.c | 8 ++++++++
include/linux/suspend.h | 1 +
include/uapi/linux/kvm_para.h | 1 +
kernel/power/suspend.c | 4 ++++
6 files changed, 24 insertions(+)
diff --git a/Documentation/virt/kvm/x86/hypercalls.rst b/Documentation/virt/kvm/x86/hypercalls.rst
index e56fa8b9cfca..9d1836c837e3 100644
--- a/Documentation/virt/kvm/x86/hypercalls.rst
+++ b/Documentation/virt/kvm/x86/hypercalls.rst
@@ -190,3 +190,10 @@ the KVM_CAP_EXIT_HYPERCALL capability. Userspace must enable that capability
before advertising KVM_FEATURE_HC_MAP_GPA_RANGE in the guest CPUID. In
addition, if the guest supports KVM_FEATURE_MIGRATION_CONTROL, userspace
must also set up an MSR filter to process writes to MSR_KVM_MIGRATION_CONTROL.
+
+9. KVM_HC_SYSTEM_S2IDLE
+------------------------
+
+:Architecture: x86
+:Status: active
+:Purpose: Notify the hypervisor that the guest is entering s2idle state.
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e9473c7c7390..6ed4bd6e762b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9306,6 +9306,9 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
vcpu->arch.complete_userspace_io = complete_hypercall_exit;
return 0;
}
+ case KVM_HC_SYSTEM_S2IDLE:
+ ret = 0;
+ break;
default:
ret = -KVM_ENOSYS;
break;
diff --git a/drivers/acpi/x86/s2idle.c b/drivers/acpi/x86/s2idle.c
index 2963229062f8..0ae5e11380d2 100644
--- a/drivers/acpi/x86/s2idle.c
+++ b/drivers/acpi/x86/s2idle.c
@@ -18,6 +18,7 @@
#include <linux/acpi.h>
#include <linux/device.h>
#include <linux/suspend.h>
+#include <uapi/linux/kvm_para.h>
#include "../sleep.h"
@@ -520,10 +521,17 @@ void acpi_s2idle_restore_early(void)
lps0_dsm_func_mask, lps0_dsm_guid);
}
+static void s2idle_hypervisor_notify(void)
+{
+ if (static_cpu_has(X86_FEATURE_HYPERVISOR))
+ kvm_hypercall0(KVM_HC_SYSTEM_S2IDLE);
+}
+
static const struct platform_s2idle_ops acpi_s2idle_ops_lps0 = {
.begin = acpi_s2idle_begin,
.prepare = acpi_s2idle_prepare,
.prepare_late = acpi_s2idle_prepare_late,
+ .hypervisor_notify = s2idle_hypervisor_notify,
.wake = acpi_s2idle_wake,
.restore_early = acpi_s2idle_restore_early,
.restore = acpi_s2idle_restore,
diff --git a/include/linux/suspend.h b/include/linux/suspend.h
index 70f2921e2e70..42e04e0fe8b1 100644
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -191,6 +191,7 @@ struct platform_s2idle_ops {
int (*begin)(void);
int (*prepare)(void);
int (*prepare_late)(void);
+ void (*hypervisor_notify)(void);
bool (*wake)(void);
void (*restore_early)(void);
void (*restore)(void);
diff --git a/include/uapi/linux/kvm_para.h b/include/uapi/linux/kvm_para.h
index 960c7e93d1a9..072e77e40f89 100644
--- a/include/uapi/linux/kvm_para.h
+++ b/include/uapi/linux/kvm_para.h
@@ -30,6 +30,7 @@
#define KVM_HC_SEND_IPI 10
#define KVM_HC_SCHED_YIELD 11
#define KVM_HC_MAP_GPA_RANGE 12
+#define KVM_HC_SYSTEM_S2IDLE 13
/*
* hypercalls use architecture specific
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 827075944d28..c641c643290b 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -100,6 +100,10 @@ static void s2idle_enter(void)
/* Push all the CPUs into the idle loop. */
wake_up_all_idle_cpus();
+
+ if (s2idle_ops && s2idle_ops->hypervisor_notify)
+ s2idle_ops->hypervisor_notify();
+
/* Make the current CPU wait so it can enter the idle loop too. */
swait_event_exclusive(s2idle_wait_head,
s2idle_state == S2IDLE_STATE_WAKE);
--
2.36.1.476.g0c4daa206d-goog
On 6/9/22 04:03, Grzegorz Jaszczyk wrote:
> Co-developed-by: Peter Fang <[email protected]>
> Signed-off-by: Peter Fang <[email protected]>
> Co-developed-by: Tomasz Nowicki <[email protected]>
> Signed-off-by: Tomasz Nowicki <[email protected]>
> Signed-off-by: Zide Chen <[email protected]>
> Co-developed-by: Grzegorz Jaszczyk <[email protected]>
> Signed-off-by: Grzegorz Jaszczyk <[email protected]>
> ---
> Documentation/virt/kvm/x86/hypercalls.rst | 7 +++++++
> arch/x86/kvm/x86.c | 3 +++
> drivers/acpi/x86/s2idle.c | 8 ++++++++
> include/linux/suspend.h | 1 +
> include/uapi/linux/kvm_para.h | 1 +
> kernel/power/suspend.c | 4 ++++
> 6 files changed, 24 insertions(+)
What's the deal with these emails?
[email protected]
I see a smattering of those in the git logs, but never for Intel folks.
I'll also say that I'm a bit suspicious of a patch that includes 5
authors for 24 lines of code. Did it really take five of you to write
24 lines of code?
On Thu, Jun 09, 2022, Grzegorz Jaszczyk wrote:
> +9. KVM_HC_SYSTEM_S2IDLE
> +------------------------
> +
> +:Architecture: x86
> +:Status: active
> +:Purpose: Notify the hypervisor that the guest is entering s2idle state.
What about exiting s2idle? E.g.
1. VM0 enters s2idle
2. host notes that VM0 is in s2idle
3. VM0 exits s2idle
4. host still thinks VM0 is in s2idle
5. VM1 enters s2idle
6. host thinks all VMs are in s2idle, suspends the system
> +static void s2idle_hypervisor_notify(void)
> +{
> + if (static_cpu_has(X86_FEATURE_HYPERVISOR))
> + kvm_hypercall0(KVM_HC_SYSTEM_S2IDLE);
Checking the HYPERVISOR flag is not remotely sufficient. The hypervisor may not
be KVM, and if it is KVM, it may be an older version of KVM that doesn't support
the hypercall. The latter scenario won't be fatal unless KVM has been modified,
but blindly doing a hypercall for a different hypervisor could have disastrous
results, e.g. the registers ABIs are different, so the above will make a random
request depending on what is in other GPRs.
The bigger question is, why is KVM involved at all? KVM is just a dumb pipe out
to userspace, and not a very good one at that. There are multiple well established
ways to communicate with the VMM without custom hypercalls.
I bet if you're clever this can even be done without any guest changes, e.g. I
gotta imagine acpi_sleep_run_lps0_dsm() triggers MMIO/PIO with the right ACPI
configuration.
czw., 9 cze 2022 o 16:27 Dave Hansen <[email protected]> napisał(a):
>
> On 6/9/22 04:03, Grzegorz Jaszczyk wrote:
> > Co-developed-by: Peter Fang <[email protected]>
> > Signed-off-by: Peter Fang <[email protected]>
> > Co-developed-by: Tomasz Nowicki <[email protected]>
> > Signed-off-by: Tomasz Nowicki <[email protected]>
> > Signed-off-by: Zide Chen <[email protected]>
> > Co-developed-by: Grzegorz Jaszczyk <[email protected]>
> > Signed-off-by: Grzegorz Jaszczyk <[email protected]>
> > ---
> > Documentation/virt/kvm/x86/hypercalls.rst | 7 +++++++
> > arch/x86/kvm/x86.c | 3 +++
> > drivers/acpi/x86/s2idle.c | 8 ++++++++
> > include/linux/suspend.h | 1 +
> > include/uapi/linux/kvm_para.h | 1 +
> > kernel/power/suspend.c | 4 ++++
> > 6 files changed, 24 insertions(+)
>
> What's the deal with these emails?
>
> [email protected]
>
> I see a smattering of those in the git logs, but never for Intel folks.
I've kept emails as they were in the original patch and I do not think
I should change them. This is what Zide and Peter originally used.
>
> I'll also say that I'm a bit suspicious of a patch that includes 5
> authors for 24 lines of code. Did it really take five of you to write
> 24 lines of code?
This patch was built iteratively: original patch comes from Zide and
Peter, I've squashed it with Tomasz later changes and reworked by
myself for upstream. I didn't want to take credentials from any of the
above so ended up with Zide as an author and 3 co-developers. Please
let me know if that's an issue.
Best regards,
Grzegorz
czw., 9 cze 2022 o 16:55 Sean Christopherson <[email protected]> napisał(a):
>
> On Thu, Jun 09, 2022, Grzegorz Jaszczyk wrote:
> > +9. KVM_HC_SYSTEM_S2IDLE
> > +------------------------
> > +
> > +:Architecture: x86
> > +:Status: active
> > +:Purpose: Notify the hypervisor that the guest is entering s2idle state.
>
> What about exiting s2idle? E.g.
>
> 1. VM0 enters s2idle
> 2. host notes that VM0 is in s2idle
> 3. VM0 exits s2idle
> 4. host still thinks VM0 is in s2idle
> 5. VM1 enters s2idle
> 6. host thinks all VMs are in s2idle, suspends the system
I think that this problem couldn't be solved by adding notification
about exiting s2idle. Please consider (even after simplifying your
example to one VM):
1. VM0 enters s2idle
2. host notes about VM0 is in s2idle
3. host continues with system suspension but in the meantime VM0 exits
s2idle and sends notification but it is already too late (VM could not
even send notification on time).
Above could be actually prevented if the VMM had control over the
guest resumption. E.g. after VMM receives notification about guest
entering s2idle state, it would park the vCPU actually preventing it
from exiting s2idle without VMM intervention.
>
> > +static void s2idle_hypervisor_notify(void)
> > +{
> > + if (static_cpu_has(X86_FEATURE_HYPERVISOR))
> > + kvm_hypercall0(KVM_HC_SYSTEM_S2IDLE);
>
> Checking the HYPERVISOR flag is not remotely sufficient. The hypervisor may not
> be KVM, and if it is KVM, it may be an older version of KVM that doesn't support
> the hypercall. The latter scenario won't be fatal unless KVM has been modified,
> but blindly doing a hypercall for a different hypervisor could have disastrous
> results, e.g. the registers ABIs are different, so the above will make a random
> request depending on what is in other GPRs.
Good point: we've actually thought about not confusing/breaking VMMs
so I've introduced KVM_CAP_X86_SYSTEM_S2IDLE VM capability in the
second patch, but not breaking different hypervisors is another story.
Would hiding it under new 's2idle_notify_kvm' module parameter work
for upstream?:
+static bool s2idle_notify_kvm __read_mostly;
+module_param(s2idle_notify_kvm, bool, 0644);
+MODULE_PARM_DESC(s2idle_notify_kvm, "Notify hypervisor about guest
entering s2idle state");
+
..
+static void s2idle_hypervisor_notify(void)
+{
+ if (static_cpu_has(X86_FEATURE_HYPERVISOR) &&
s2idle_notify_kvm)
+ kvm_hypercall0(KVM_HC_SYSTEM_S2IDLE);
+}
+
>
> The bigger question is, why is KVM involved at all? KVM is just a dumb pipe out
> to userspace, and not a very good one at that. There are multiple well established
> ways to communicate with the VMM without custom hypercalls.
Could you please kindly advise about the recommended way of
communication with VMM, taking into account that we want to send this
notification just before entering s2idle state (please see also answer
to next comment), which is at a very late stage of the suspend process
with a lot of functionality already suspended?
>
>
> I bet if you're clever this can even be done without any guest changes, e.g. I
> gotta imagine acpi_sleep_run_lps0_dsm() triggers MMIO/PIO with the right ACPI
> configuration.
The problem is that between acpi_sleep_run_lps0_dsm and the place
where we introduced hypercall there are several places where we can
actually cancel and not enter the suspend state. So trapping on
acpi_sleep_run_lps0_dsm which triggers MMIO/PIO would be premature.
The other reason for doing it in this place is the fact that
s2idle_enter is called from an infinite loop inside s2idle_loop, which
could be interrupted by e.g. ACPI EC GPE (not aim for waking-up the
system) so s2idle_ops->wake() would return false and s2idle_enter will
be triggered again. In this case we would want to get notification
about guests actually entering s2idle state again, which wouldn't be
possible if we would rely on acpi_sleep_run_lps0_dsm.
Best regards,
Grzegorz
On 6/10/22 04:36, Grzegorz Jaszczyk wrote:
> czw., 9 cze 2022 o 16:27 Dave Hansen <[email protected]> napisał(a):
>> On 6/9/22 04:03, Grzegorz Jaszczyk wrote:
>>> Co-developed-by: Peter Fang <[email protected]>
>>> Signed-off-by: Peter Fang <[email protected]>
>>> Co-developed-by: Tomasz Nowicki <[email protected]>
>>> Signed-off-by: Tomasz Nowicki <[email protected]>
>>> Signed-off-by: Zide Chen <[email protected]>
>>> Co-developed-by: Grzegorz Jaszczyk <[email protected]>
>>> Signed-off-by: Grzegorz Jaszczyk <[email protected]>
>>> ---
>>> Documentation/virt/kvm/x86/hypercalls.rst | 7 +++++++
>>> arch/x86/kvm/x86.c | 3 +++
>>> drivers/acpi/x86/s2idle.c | 8 ++++++++
>>> include/linux/suspend.h | 1 +
>>> include/uapi/linux/kvm_para.h | 1 +
>>> kernel/power/suspend.c | 4 ++++
>>> 6 files changed, 24 insertions(+)
>> What's the deal with these emails?
>>
>> [email protected]
>>
>> I see a smattering of those in the git logs, but never for Intel folks.
> I've kept emails as they were in the original patch and I do not think
> I should change them. This is what Zide and Peter originally used.
"Original patch"? Where did you get this from?
>> I'll also say that I'm a bit suspicious of a patch that includes 5
>> authors for 24 lines of code. Did it really take five of you to write
>> 24 lines of code?
> This patch was built iteratively: original patch comes from Zide and
> Peter, I've squashed it with Tomasz later changes and reworked by
> myself for upstream. I didn't want to take credentials from any of the
> above so ended up with Zide as an author and 3 co-developers. Please
> let me know if that's an issue.
It just looks awfully fishy.
If it were me, and I'd put enough work into it to believe I deserved
credit as an *author* (again, of ~13 lines of actual code), I'd probably
just zap all the other SoB's and mention them in the changelog. I'd
also explain where the code came from.
Your text above wouldn't be horrible context to add to a cover letter.
On Fri, Jun 10, 2022, Grzegorz Jaszczyk wrote:
> czw., 9 cze 2022 o 16:55 Sean Christopherson <[email protected]> napisał(a):
> Above could be actually prevented if the VMM had control over the
> guest resumption. E.g. after VMM receives notification about guest
> entering s2idle state, it would park the vCPU actually preventing it
> from exiting s2idle without VMM intervention.
Ah, so you avoid races by assuming the VM wakes itself from s2idle any time a vCPU
is run, even if the vCPU doesn't actually have a wake event. That would be very
useful info to put in the changelog.
> > > +static void s2idle_hypervisor_notify(void)
> > > +{
> > > + if (static_cpu_has(X86_FEATURE_HYPERVISOR))
> > > + kvm_hypercall0(KVM_HC_SYSTEM_S2IDLE);
> >
> > Checking the HYPERVISOR flag is not remotely sufficient. The hypervisor may not
> > be KVM, and if it is KVM, it may be an older version of KVM that doesn't support
> > the hypercall. The latter scenario won't be fatal unless KVM has been modified,
> > but blindly doing a hypercall for a different hypervisor could have disastrous
> > results, e.g. the registers ABIs are different, so the above will make a random
> > request depending on what is in other GPRs.
>
> Good point: we've actually thought about not confusing/breaking VMMs
> so I've introduced KVM_CAP_X86_SYSTEM_S2IDLE VM capability in the
> second patch, but not breaking different hypervisors is another story.
> Would hiding it under new 's2idle_notify_kvm' module parameter work
> for upstream?:
No, enumerating support via KVM_CPUID_FEATURES is the correct way to do something
like this, e.g. see KVM_FEATURE_CLOCKSOURCE2. But honestly I wouldn't spend too
much time understanding how all of that works, because I still feel quite strongly
that getting KVM involved is completely unnecessary. A solution that isn't KVM
specific is preferable as it can then be implemented by any VMM that enumerates
s2idle support to the guest.
> > The bigger question is, why is KVM involved at all? KVM is just a dumb pipe out
> > to userspace, and not a very good one at that. There are multiple well established
> > ways to communicate with the VMM without custom hypercalls.
>
> Could you please kindly advise about the recommended way of
> communication with VMM, taking into account that we want to send this
> notification just before entering s2idle state (please see also answer
> to next comment), which is at a very late stage of the suspend process
> with a lot of functionality already suspended?
MMIO or PIO for the actual exit, there's nothing special about hypercalls. As for
enumerating to the guest that it should do something, why not add a new ACPI_LPS0_*
function? E.g. something like
static void s2idle_hypervisor_notify(void)
{
if (lps0_dsm_func_mask > 0)
acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT_HYPERVISOR_NOTIFY
lps0_dsm_func_mask, lps0_dsm_guid);
}
On 6/10/22 07:49, Dave Hansen wrote:
> On 6/10/22 04:36, Grzegorz Jaszczyk wrote:
>> czw., 9 cze 2022 o 16:27 Dave Hansen <[email protected]> napisał(a):
>>> On 6/9/22 04:03, Grzegorz Jaszczyk wrote:
>>>> Co-developed-by: Peter Fang <[email protected]>
>>>> Signed-off-by: Peter Fang <[email protected]>
>>>> Co-developed-by: Tomasz Nowicki <[email protected]>
>>>> Signed-off-by: Tomasz Nowicki <[email protected]>
>>>> Signed-off-by: Zide Chen <[email protected]>
>>>> Co-developed-by: Grzegorz Jaszczyk <[email protected]>
>>>> Signed-off-by: Grzegorz Jaszczyk <[email protected]>
>>>> ---
>>>> Documentation/virt/kvm/x86/hypercalls.rst | 7 +++++++
>>>> arch/x86/kvm/x86.c | 3 +++
>>>> drivers/acpi/x86/s2idle.c | 8 ++++++++
>>>> include/linux/suspend.h | 1 +
>>>> include/uapi/linux/kvm_para.h | 1 +
>>>> kernel/power/suspend.c | 4 ++++
>>>> 6 files changed, 24 insertions(+)
>>> What's the deal with these emails?
>>>
>>> [email protected]
>>>
>>> I see a smattering of those in the git logs, but never for Intel folks.
>> I've kept emails as they were in the original patch and I do not think
>> I should change them. This is what Zide and Peter originally used.
>
> "Original patch"? Where did you get this from?
Is this perhaps coming from Chromium Gerrit? If so, I think you should
include a link to the Gerrit code review discussion.
If it's not a public discussion/patch originally perhaps Suggested-by:
might be a better tag to use.
>
>>> I'll also say that I'm a bit suspicious of a patch that includes 5
>>> authors for 24 lines of code. Did it really take five of you to write
>>> 24 lines of code?
>> This patch was built iteratively: original patch comes from Zide and
>> Peter, I've squashed it with Tomasz later changes and reworked by
>> myself for upstream. I didn't want to take credentials from any of the
>> above so ended up with Zide as an author and 3 co-developers. Please
>> let me know if that's an issue.
>
> It just looks awfully fishy.
>
> If it were me, and I'd put enough work into it to believe I deserved
> credit as an *author* (again, of ~13 lines of actual code), I'd probably
> just zap all the other SoB's and mention them in the changelog. I'd
> also explain where the code came from.
>
> Your text above wouldn't be horrible context to add to a cover letter.
pon., 13 cze 2022 o 07:03 Mario Limonciello
<[email protected]> napisał(a):
>
> On 6/10/22 07:49, Dave Hansen wrote:
> > On 6/10/22 04:36, Grzegorz Jaszczyk wrote:
> >> czw., 9 cze 2022 o 16:27 Dave Hansen <[email protected]> napisał(a):
> >>> On 6/9/22 04:03, Grzegorz Jaszczyk wrote:
> >>>> Co-developed-by: Peter Fang <[email protected]>
> >>>> Signed-off-by: Peter Fang <[email protected]>
> >>>> Co-developed-by: Tomasz Nowicki <[email protected]>
> >>>> Signed-off-by: Tomasz Nowicki <[email protected]>
> >>>> Signed-off-by: Zide Chen <[email protected]>
> >>>> Co-developed-by: Grzegorz Jaszczyk <[email protected]>
> >>>> Signed-off-by: Grzegorz Jaszczyk <[email protected]>
> >>>> ---
> >>>> Documentation/virt/kvm/x86/hypercalls.rst | 7 +++++++
> >>>> arch/x86/kvm/x86.c | 3 +++
> >>>> drivers/acpi/x86/s2idle.c | 8 ++++++++
> >>>> include/linux/suspend.h | 1 +
> >>>> include/uapi/linux/kvm_para.h | 1 +
> >>>> kernel/power/suspend.c | 4 ++++
> >>>> 6 files changed, 24 insertions(+)
> >>> What's the deal with these emails?
> >>>
> >>> [email protected]
> >>>
> >>> I see a smattering of those in the git logs, but never for Intel folks.
> >> I've kept emails as they were in the original patch and I do not think
> >> I should change them. This is what Zide and Peter originally used.
> >
> > "Original patch"? Where did you get this from?
>
> Is this perhaps coming from Chromium Gerrit? If so, I think you should
> include a link to the Gerrit code review discussion.
Yes, the original patch comes from chromium gerrit:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3482475/4
and after reworking but before sending to the mailing list, I've asked
all involved guys for ack and it was done internally on gerrit:
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/3666997
>
> If it's not a public discussion/patch originally perhaps Suggested-by:
> might be a better tag to use.
>
> >
> >>> I'll also say that I'm a bit suspicious of a patch that includes 5
> >>> authors for 24 lines of code. Did it really take five of you to write
> >>> 24 lines of code?
> >> This patch was built iteratively: original patch comes from Zide and
> >> Peter, I've squashed it with Tomasz later changes and reworked by
> >> myself for upstream. I didn't want to take credentials from any of the
> >> above so ended up with Zide as an author and 3 co-developers. Please
> >> let me know if that's an issue.
> >
> > It just looks awfully fishy.
> >
> > If it were me, and I'd put enough work into it to believe I deserved
> > credit as an *author* (again, of ~13 lines of actual code), I'd probably
> > just zap all the other SoB's and mention them in the changelog. I'd
> > also explain where the code came from.
> >
> > Your text above wouldn't be horrible context to add to a cover letter.
Actually it may not be an issue for the next version since the
suggested by Sean approach is quite different so I would most likely
end up with reduced SoB/Co-dev-by in the next version.
Best regards,
Grzegorz
pt., 10 cze 2022 o 16:30 Sean Christopherson <[email protected]> napisał(a):
>
> On Fri, Jun 10, 2022, Grzegorz Jaszczyk wrote:
> > czw., 9 cze 2022 o 16:55 Sean Christopherson <[email protected]> napisał(a):
> > Above could be actually prevented if the VMM had control over the
> > guest resumption. E.g. after VMM receives notification about guest
> > entering s2idle state, it would park the vCPU actually preventing it
> > from exiting s2idle without VMM intervention.
>
> Ah, so you avoid races by assuming the VM wakes itself from s2idle any time a vCPU
> is run, even if the vCPU doesn't actually have a wake event. That would be very
> useful info to put in the changelog.
Just to clarify: I assumed that the VM may wake from s2idle any time a
vCPU is running and got a wake event. So going back to the previous
example:
1. VM0 enters s2idle
2. VMM gets notification about VM0 is in s2idle and during this
notification handling, the vCPU notifying about s2idle is not running
(we are in the middle of handling vCPU exit in VMM). So even if some
wakeup event will arrive it couldn't allow that vCPU to exit the
s2idle. This pending wakeup event wouldn't wakeup the VM0 until VMM
unpark the vCPU and VMM has control over it.
>
> > > > +static void s2idle_hypervisor_notify(void)
> > > > +{
> > > > + if (static_cpu_has(X86_FEATURE_HYPERVISOR))
> > > > + kvm_hypercall0(KVM_HC_SYSTEM_S2IDLE);
> > >
> > > Checking the HYPERVISOR flag is not remotely sufficient. The hypervisor may not
> > > be KVM, and if it is KVM, it may be an older version of KVM that doesn't support
> > > the hypercall. The latter scenario won't be fatal unless KVM has been modified,
> > > but blindly doing a hypercall for a different hypervisor could have disastrous
> > > results, e.g. the registers ABIs are different, so the above will make a random
> > > request depending on what is in other GPRs.
> >
> > Good point: we've actually thought about not confusing/breaking VMMs
> > so I've introduced KVM_CAP_X86_SYSTEM_S2IDLE VM capability in the
> > second patch, but not breaking different hypervisors is another story.
> > Would hiding it under new 's2idle_notify_kvm' module parameter work
> > for upstream?:
>
> No, enumerating support via KVM_CPUID_FEATURES is the correct way to do something
> like this, e.g. see KVM_FEATURE_CLOCKSOURCE2. But honestly I wouldn't spend too
> much time understanding how all of that works, because I still feel quite strongly
> that getting KVM involved is completely unnecessary. A solution that isn't KVM
> specific is preferable as it can then be implemented by any VMM that enumerates
> s2idle support to the guest.
Sure, thank you for the explanation and an example.
>
> > > The bigger question is, why is KVM involved at all? KVM is just a dumb pipe out
> > > to userspace, and not a very good one at that. There are multiple well established
> > > ways to communicate with the VMM without custom hypercalls.
> >
> > Could you please kindly advise about the recommended way of
> > communication with VMM, taking into account that we want to send this
> > notification just before entering s2idle state (please see also answer
> > to next comment), which is at a very late stage of the suspend process
> > with a lot of functionality already suspended?
>
> MMIO or PIO for the actual exit, there's nothing special about hypercalls. As for
> enumerating to the guest that it should do something, why not add a new ACPI_LPS0_*
> function? E.g. something like
>
> static void s2idle_hypervisor_notify(void)
> {
> if (lps0_dsm_func_mask > 0)
> acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT_HYPERVISOR_NOTIFY
> lps0_dsm_func_mask, lps0_dsm_guid);
> }
Great, thank you for your suggestion! I will try this approach and
come back. Since this will be the main change in the next version,
will it be ok for you to add Suggested-by: Sean Christopherson
<[email protected]> tag?
Best regards,
Grzegorz
On Wed, Jun 15, 2022, Grzegorz Jaszczyk wrote:
> pt., 10 cze 2022 o 16:30 Sean Christopherson <[email protected]> napisał(a):
> > MMIO or PIO for the actual exit, there's nothing special about hypercalls. As for
> > enumerating to the guest that it should do something, why not add a new ACPI_LPS0_*
> > function? E.g. something like
> >
> > static void s2idle_hypervisor_notify(void)
> > {
> > if (lps0_dsm_func_mask > 0)
> > acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT_HYPERVISOR_NOTIFY
> > lps0_dsm_func_mask, lps0_dsm_guid);
> > }
>
> Great, thank you for your suggestion! I will try this approach and
> come back. Since this will be the main change in the next version,
> will it be ok for you to add Suggested-by: Sean Christopherson
> <[email protected]> tag?
If you want, but there's certainly no need to do so. But I assume you or someone
at Intel will need to get formal approval for adding another ACPI LPS0 function?
I.e. isn't there work to be done outside of the kernel before any patches can be
merged?
On 6/16/2022 11:48, Sean Christopherson wrote:
> On Wed, Jun 15, 2022, Grzegorz Jaszczyk wrote:
>> pt., 10 cze 2022 o 16:30 Sean Christopherson <[email protected]> napisał(a):
>>> MMIO or PIO for the actual exit, there's nothing special about hypercalls. As for
>>> enumerating to the guest that it should do something, why not add a new ACPI_LPS0_*
>>> function? E.g. something like
>>>
>>> static void s2idle_hypervisor_notify(void)
>>> {
>>> if (lps0_dsm_func_mask > 0)
>>> acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT_HYPERVISOR_NOTIFY
>>> lps0_dsm_func_mask, lps0_dsm_guid);
>>> }
>>
>> Great, thank you for your suggestion! I will try this approach and
>> come back. Since this will be the main change in the next version,
>> will it be ok for you to add Suggested-by: Sean Christopherson
>> <[email protected]> tag?
>
> If you want, but there's certainly no need to do so. But I assume you or someone
> at Intel will need to get formal approval for adding another ACPI LPS0 function?
> I.e. isn't there work to be done outside of the kernel before any patches can be
> merged?
There are 3 different LPS0 GUIDs in use. An Intel one, an AMD (legacy)
one, and a Microsoft one. They all have their own specs, and so if this
was to be added I think all 3 need to be updated.
As this is Linux specific hypervisor behavior, I don't know you would be
able to convince Microsoft to update theirs' either.
How about using s2idle_devops? There is a prepare() call and a
restore() call that is set for each handler. The only consumer of this
ATM I'm aware of is the amd-pmc driver, but it's done like a
notification chain so that a bunch of drivers can hook in if they need to.
Then you can have this notification path and the associated ACPI device
it calls out to be it's own driver.
czw., 16 cze 2022 o 18:58 Limonciello, Mario
<[email protected]> napisał(a):
>
> On 6/16/2022 11:48, Sean Christopherson wrote:
> > On Wed, Jun 15, 2022, Grzegorz Jaszczyk wrote:
> >> pt., 10 cze 2022 o 16:30 Sean Christopherson <[email protected]> napisał(a):
> >>> MMIO or PIO for the actual exit, there's nothing special about hypercalls. As for
> >>> enumerating to the guest that it should do something, why not add a new ACPI_LPS0_*
> >>> function? E.g. something like
> >>>
> >>> static void s2idle_hypervisor_notify(void)
> >>> {
> >>> if (lps0_dsm_func_mask > 0)
> >>> acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT_HYPERVISOR_NOTIFY
> >>> lps0_dsm_func_mask, lps0_dsm_guid);
> >>> }
> >>
> >> Great, thank you for your suggestion! I will try this approach and
> >> come back. Since this will be the main change in the next version,
> >> will it be ok for you to add Suggested-by: Sean Christopherson
> >> <[email protected]> tag?
> >
> > If you want, but there's certainly no need to do so. But I assume you or someone
> > at Intel will need to get formal approval for adding another ACPI LPS0 function?
> > I.e. isn't there work to be done outside of the kernel before any patches can be
> > merged?
>
> There are 3 different LPS0 GUIDs in use. An Intel one, an AMD (legacy)
> one, and a Microsoft one. They all have their own specs, and so if this
> was to be added I think all 3 need to be updated.
Yes this will not be easy to achieve I think.
>
> As this is Linux specific hypervisor behavior, I don't know you would be
> able to convince Microsoft to update theirs' either.
>
> How about using s2idle_devops? There is a prepare() call and a
> restore() call that is set for each handler. The only consumer of this
> ATM I'm aware of is the amd-pmc driver, but it's done like a
> notification chain so that a bunch of drivers can hook in if they need to.
>
> Then you can have this notification path and the associated ACPI device
> it calls out to be it's own driver.
Thank you for your suggestion, just to be sure that I've understand
your idea correctly:
1) it will require to extend acpi_s2idle_dev_ops about something like
hypervisor_notify() call, since existing prepare() is called from end
of acpi_s2idle_prepare_late so it is too early as it was described in
one of previous message (between acpi_s2idle_prepare_late and place
where we use hypercall there are several places where the suspend
could be canceled, otherwise we could probably try to trap on other
acpi_sleep_run_lps0_dsm occurrence from acpi_s2idle_prepare_late).
2) using newly introduced acpi_s2idle_dev_ops hypervisor_notify() call
will allow to register handler from Intel x86/intel/pmc/core.c driver
and/or AMD x86/amd-pmc.c driver. Therefore we will need to get only
Intel and/or AMD approval about extending the ACPI LPS0 _DSM method,
correct?
I wonder if this will be affordable so just re-thinking loudly if
there is no other mechanism that could be suggested and used upstream
so we could notify hypervisor/vmm about guest entering s2idle state?
Especially that such _DSM function will be introduced only to trap on
some fake MMIO/PIO access and will be useful only for guest ACPI
tables?
Thank you,
Grzegorz
On 6/20/2022 10:43, Grzegorz Jaszczyk wrote:
> czw., 16 cze 2022 o 18:58 Limonciello, Mario
> <[email protected]> napisał(a):
>>
>> On 6/16/2022 11:48, Sean Christopherson wrote:
>>> On Wed, Jun 15, 2022, Grzegorz Jaszczyk wrote:
>>>> pt., 10 cze 2022 o 16:30 Sean Christopherson <[email protected]> napisał(a):
>>>>> MMIO or PIO for the actual exit, there's nothing special about hypercalls. As for
>>>>> enumerating to the guest that it should do something, why not add a new ACPI_LPS0_*
>>>>> function? E.g. something like
>>>>>
>>>>> static void s2idle_hypervisor_notify(void)
>>>>> {
>>>>> if (lps0_dsm_func_mask > 0)
>>>>> acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT_HYPERVISOR_NOTIFY
>>>>> lps0_dsm_func_mask, lps0_dsm_guid);
>>>>> }
>>>>
>>>> Great, thank you for your suggestion! I will try this approach and
>>>> come back. Since this will be the main change in the next version,
>>>> will it be ok for you to add Suggested-by: Sean Christopherson
>>>> <[email protected]> tag?
>>>
>>> If you want, but there's certainly no need to do so. But I assume you or someone
>>> at Intel will need to get formal approval for adding another ACPI LPS0 function?
>>> I.e. isn't there work to be done outside of the kernel before any patches can be
>>> merged?
>>
>> There are 3 different LPS0 GUIDs in use. An Intel one, an AMD (legacy)
>> one, and a Microsoft one. They all have their own specs, and so if this
>> was to be added I think all 3 need to be updated.
>
> Yes this will not be easy to achieve I think.
>
>>
>> As this is Linux specific hypervisor behavior, I don't know you would be
>> able to convince Microsoft to update theirs' either.
>>
>> How about using s2idle_devops? There is a prepare() call and a
>> restore() call that is set for each handler. The only consumer of this
>> ATM I'm aware of is the amd-pmc driver, but it's done like a
>> notification chain so that a bunch of drivers can hook in if they need to.
>>
>> Then you can have this notification path and the associated ACPI device
>> it calls out to be it's own driver.
>
> Thank you for your suggestion, just to be sure that I've understand
> your idea correctly:
> 1) it will require to extend acpi_s2idle_dev_ops about something like
> hypervisor_notify() call, since existing prepare() is called from end
> of acpi_s2idle_prepare_late so it is too early as it was described in
> one of previous message (between acpi_s2idle_prepare_late and place
> where we use hypercall there are several places where the suspend
> could be canceled, otherwise we could probably try to trap on other
> acpi_sleep_run_lps0_dsm occurrence from acpi_s2idle_prepare_late).
>
The idea for prepare() was it would be the absolute last thing before
the s2idle loop was run. You're sure that's too early? It's basically
the same thing as having a last stage new _DSM call.
What about adding a new abort() extension to acpi_s2idle_dev_ops? Then
you could catch the cancelled suspend case still and take corrective
action (if that action is different than what restore() would do).
> 2) using newly introduced acpi_s2idle_dev_ops hypervisor_notify() call
> will allow to register handler from Intel x86/intel/pmc/core.c driver
> and/or AMD x86/amd-pmc.c driver. Therefore we will need to get only
> Intel and/or AMD approval about extending the ACPI LPS0 _DSM method,
> correct?
>
Right now the only thing that hooks prepare()/restore() is the amd-pmc
driver (unless Intel's PMC had a change I didn't catch yet).
I don't think you should be changing any existing drivers but rather
introduce another platform driver for this specific case.
So it would be something like this:
acpi_s2idle_prepare_late
-> prepare()
--> AMD: amd_pmc handler for prepare()
--> Intel: intel_pmc handler for prepare() (conceptual)
--> HYPE0001 device: new driver's prepare() routine
So the platform driver would match the HYPE0001 device to load, and it
wouldn't do anything other than provide a prepare()/restore() handler
for your case.
You don't need to change any existing specs. If anything a new spec to
go with this new ACPI device would be made. Someone would need to
reserve the ID and such for it, but I think you can mock it up in advance.
> I wonder if this will be affordable so just re-thinking loudly if
> there is no other mechanism that could be suggested and used upstream
> so we could notify hypervisor/vmm about guest entering s2idle state?
> Especially that such _DSM function will be introduced only to trap on
> some fake MMIO/PIO access and will be useful only for guest ACPI
> tables?
>
Do you need to worry about Microsoft guests using Modern Standby too or
is that out of the scope of your problem set? I think you'll be a lot
more limited in how this can behave and where you can modify things if so.
pon., 20 cze 2022 o 18:32 Limonciello, Mario
<[email protected]> napisał(a):
>
> On 6/20/2022 10:43, Grzegorz Jaszczyk wrote:
> > czw., 16 cze 2022 o 18:58 Limonciello, Mario
> > <[email protected]> napisał(a):
> >>
> >> On 6/16/2022 11:48, Sean Christopherson wrote:
> >>> On Wed, Jun 15, 2022, Grzegorz Jaszczyk wrote:
> >>>> pt., 10 cze 2022 o 16:30 Sean Christopherson <[email protected]> napisał(a):
> >>>>> MMIO or PIO for the actual exit, there's nothing special about hypercalls. As for
> >>>>> enumerating to the guest that it should do something, why not add a new ACPI_LPS0_*
> >>>>> function? E.g. something like
> >>>>>
> >>>>> static void s2idle_hypervisor_notify(void)
> >>>>> {
> >>>>> if (lps0_dsm_func_mask > 0)
> >>>>> acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT_HYPERVISOR_NOTIFY
> >>>>> lps0_dsm_func_mask, lps0_dsm_guid);
> >>>>> }
> >>>>
> >>>> Great, thank you for your suggestion! I will try this approach and
> >>>> come back. Since this will be the main change in the next version,
> >>>> will it be ok for you to add Suggested-by: Sean Christopherson
> >>>> <[email protected]> tag?
> >>>
> >>> If you want, but there's certainly no need to do so. But I assume you or someone
> >>> at Intel will need to get formal approval for adding another ACPI LPS0 function?
> >>> I.e. isn't there work to be done outside of the kernel before any patches can be
> >>> merged?
> >>
> >> There are 3 different LPS0 GUIDs in use. An Intel one, an AMD (legacy)
> >> one, and a Microsoft one. They all have their own specs, and so if this
> >> was to be added I think all 3 need to be updated.
> >
> > Yes this will not be easy to achieve I think.
> >
> >>
> >> As this is Linux specific hypervisor behavior, I don't know you would be
> >> able to convince Microsoft to update theirs' either.
> >>
> >> How about using s2idle_devops? There is a prepare() call and a
> >> restore() call that is set for each handler. The only consumer of this
> >> ATM I'm aware of is the amd-pmc driver, but it's done like a
> >> notification chain so that a bunch of drivers can hook in if they need to.
> >>
> >> Then you can have this notification path and the associated ACPI device
> >> it calls out to be it's own driver.
> >
> > Thank you for your suggestion, just to be sure that I've understand
> > your idea correctly:
> > 1) it will require to extend acpi_s2idle_dev_ops about something like
> > hypervisor_notify() call, since existing prepare() is called from end
> > of acpi_s2idle_prepare_late so it is too early as it was described in
> > one of previous message (between acpi_s2idle_prepare_late and place
> > where we use hypercall there are several places where the suspend
> > could be canceled, otherwise we could probably try to trap on other
> > acpi_sleep_run_lps0_dsm occurrence from acpi_s2idle_prepare_late).
> >
>
> The idea for prepare() was it would be the absolute last thing before
> the s2idle loop was run. You're sure that's too early? It's basically
> the same thing as having a last stage new _DSM call.
>
> What about adding a new abort() extension to acpi_s2idle_dev_ops? Then
> you could catch the cancelled suspend case still and take corrective
> action (if that action is different than what restore() would do).
It will be problematic since the abort/restore notification could
arrive too late and therefore the whole system will go to suspend
thinking that the guest is in desired s2ilde state. Also in this case
it would be impossible to prevent races and actually making sure that
the guest is suspended or not. We already had similar discussion with
Sean earlier in this thread why the notification have to be send just
before swait_event_exclusive(s2idle_wait_head, s2idle_state ==
S2IDLE_STATE_WAKE) and that the VMM have to have control over guest
resumption.
Nevertheless if extending acpi_s2idle_dev_ops is possible, why not
extend it about the hypervisor_notify() and use it in the same place
where the hypercall is used in this patch? Do you see any issue with
that?
>
> > 2) using newly introduced acpi_s2idle_dev_ops hypervisor_notify() call
> > will allow to register handler from Intel x86/intel/pmc/core.c driver
> > and/or AMD x86/amd-pmc.c driver. Therefore we will need to get only
> > Intel and/or AMD approval about extending the ACPI LPS0 _DSM method,
> > correct?
> >
>
> Right now the only thing that hooks prepare()/restore() is the amd-pmc
> driver (unless Intel's PMC had a change I didn't catch yet).
>
> I don't think you should be changing any existing drivers but rather
> introduce another platform driver for this specific case.
>
> So it would be something like this:
>
> acpi_s2idle_prepare_late
> -> prepare()
> --> AMD: amd_pmc handler for prepare()
> --> Intel: intel_pmc handler for prepare() (conceptual)
> --> HYPE0001 device: new driver's prepare() routine
>
> So the platform driver would match the HYPE0001 device to load, and it
> wouldn't do anything other than provide a prepare()/restore() handler
> for your case.
>
> You don't need to change any existing specs. If anything a new spec to
> go with this new ACPI device would be made. Someone would need to
> reserve the ID and such for it, but I think you can mock it up in advance.
Thank you for your explanation. This means that I should register
"HYPE" through https://uefi.org/PNP_ACPI_Registry before introducing
this new driver to Linux.
I have no experience with the above, so I wonder who should be
responsible for maintaining such ACPI ID since it will not belong to
any specific vendor? There is an example of e.g. COREBOOT PROJECT
using "BOOT" ACPI ID [1], which seems similar in terms of not
specifying any vendor but rather the project as a responsible entity.
Maybe you have some recommendations?
I am also not sure if and where a specification describing such a
device has to be maintained. Since "HYPE0001" will have its own _DSM
so will it be required to document it somewhere rather than just using
it in the driver and preparing proper ACPI tables for guest?
>
> > I wonder if this will be affordable so just re-thinking loudly if
> > there is no other mechanism that could be suggested and used upstream
> > so we could notify hypervisor/vmm about guest entering s2idle state?
> > Especially that such _DSM function will be introduced only to trap on
> > some fake MMIO/PIO access and will be useful only for guest ACPI
> > tables?
> >
>
> Do you need to worry about Microsoft guests using Modern Standby too or
> is that out of the scope of your problem set? I think you'll be a lot
> more limited in how this can behave and where you can modify things if so.
>
I do not need to worry about Microsoft guests.
[1] https://uefi.org/acpi_id_list
Thank you,
Grzegorz
On 6/22/2022 04:53, Grzegorz Jaszczyk wrote:
> pon., 20 cze 2022 o 18:32 Limonciello, Mario
> <[email protected]> napisał(a):
>>
>> On 6/20/2022 10:43, Grzegorz Jaszczyk wrote:
>>> czw., 16 cze 2022 o 18:58 Limonciello, Mario
>>> <[email protected]> napisał(a):
>>>>
>>>> On 6/16/2022 11:48, Sean Christopherson wrote:
>>>>> On Wed, Jun 15, 2022, Grzegorz Jaszczyk wrote:
>>>>>> pt., 10 cze 2022 o 16:30 Sean Christopherson <[email protected]> napisał(a):
>>>>>>> MMIO or PIO for the actual exit, there's nothing special about hypercalls. As for
>>>>>>> enumerating to the guest that it should do something, why not add a new ACPI_LPS0_*
>>>>>>> function? E.g. something like
>>>>>>>
>>>>>>> static void s2idle_hypervisor_notify(void)
>>>>>>> {
>>>>>>> if (lps0_dsm_func_mask > 0)
>>>>>>> acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT_HYPERVISOR_NOTIFY
>>>>>>> lps0_dsm_func_mask, lps0_dsm_guid);
>>>>>>> }
>>>>>>
>>>>>> Great, thank you for your suggestion! I will try this approach and
>>>>>> come back. Since this will be the main change in the next version,
>>>>>> will it be ok for you to add Suggested-by: Sean Christopherson
>>>>>> <[email protected]> tag?
>>>>>
>>>>> If you want, but there's certainly no need to do so. But I assume you or someone
>>>>> at Intel will need to get formal approval for adding another ACPI LPS0 function?
>>>>> I.e. isn't there work to be done outside of the kernel before any patches can be
>>>>> merged?
>>>>
>>>> There are 3 different LPS0 GUIDs in use. An Intel one, an AMD (legacy)
>>>> one, and a Microsoft one. They all have their own specs, and so if this
>>>> was to be added I think all 3 need to be updated.
>>>
>>> Yes this will not be easy to achieve I think.
>>>
>>>>
>>>> As this is Linux specific hypervisor behavior, I don't know you would be
>>>> able to convince Microsoft to update theirs' either.
>>>>
>>>> How about using s2idle_devops? There is a prepare() call and a
>>>> restore() call that is set for each handler. The only consumer of this
>>>> ATM I'm aware of is the amd-pmc driver, but it's done like a
>>>> notification chain so that a bunch of drivers can hook in if they need to.
>>>>
>>>> Then you can have this notification path and the associated ACPI device
>>>> it calls out to be it's own driver.
>>>
>>> Thank you for your suggestion, just to be sure that I've understand
>>> your idea correctly:
>>> 1) it will require to extend acpi_s2idle_dev_ops about something like
>>> hypervisor_notify() call, since existing prepare() is called from end
>>> of acpi_s2idle_prepare_late so it is too early as it was described in
>>> one of previous message (between acpi_s2idle_prepare_late and place
>>> where we use hypercall there are several places where the suspend
>>> could be canceled, otherwise we could probably try to trap on other
>>> acpi_sleep_run_lps0_dsm occurrence from acpi_s2idle_prepare_late).
>>>
>>
>> The idea for prepare() was it would be the absolute last thing before
>> the s2idle loop was run. You're sure that's too early? It's basically
>> the same thing as having a last stage new _DSM call.
>>
>> What about adding a new abort() extension to acpi_s2idle_dev_ops? Then
>> you could catch the cancelled suspend case still and take corrective
>> action (if that action is different than what restore() would do).
>
> It will be problematic since the abort/restore notification could
> arrive too late and therefore the whole system will go to suspend
> thinking that the guest is in desired s2ilde state. Also in this case
> it would be impossible to prevent races and actually making sure that
> the guest is suspended or not. We already had similar discussion with
> Sean earlier in this thread why the notification have to be send just
> before swait_event_exclusive(s2idle_wait_head, s2idle_state ==
> S2IDLE_STATE_WAKE) and that the VMM have to have control over guest
> resumption.
>
> Nevertheless if extending acpi_s2idle_dev_ops is possible, why not
> extend it about the hypervisor_notify() and use it in the same place
> where the hypercall is used in this patch? Do you see any issue with
> that?
If this needs to be a hypercall and the hypercall needs to go at that
specific time, I wouldn't bother with extending acpi_s2idle_dev_ops.
The whole idea there was that this would be less custom and could follow
a spec.
TBH - given the strong dependency on being the very last command and
this being all Linux specific (you won't need to do something similar
with Windows) - I think the way you already did it makes the most sense.
It seems to me the ACPI device model doesn't really work well for this
scenario.
>
>>
>>> 2) using newly introduced acpi_s2idle_dev_ops hypervisor_notify() call
>>> will allow to register handler from Intel x86/intel/pmc/core.c driver
>>> and/or AMD x86/amd-pmc.c driver. Therefore we will need to get only
>>> Intel and/or AMD approval about extending the ACPI LPS0 _DSM method,
>>> correct?
>>>
>>
>> Right now the only thing that hooks prepare()/restore() is the amd-pmc
>> driver (unless Intel's PMC had a change I didn't catch yet).
>>
>> I don't think you should be changing any existing drivers but rather
>> introduce another platform driver for this specific case.
>>
>> So it would be something like this:
>>
>> acpi_s2idle_prepare_late
>> -> prepare()
>> --> AMD: amd_pmc handler for prepare()
>> --> Intel: intel_pmc handler for prepare() (conceptual)
>> --> HYPE0001 device: new driver's prepare() routine
>>
>> So the platform driver would match the HYPE0001 device to load, and it
>> wouldn't do anything other than provide a prepare()/restore() handler
>> for your case.
>>
>> You don't need to change any existing specs. If anything a new spec to
>> go with this new ACPI device would be made. Someone would need to
>> reserve the ID and such for it, but I think you can mock it up in advance.
>
> Thank you for your explanation. This means that I should register
> "HYPE" through https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuefi.org%2FPNP_ACPI_Registry&data=05%7C01%7Cmario.limonciello%40amd.com%7C49512293908e4ee17e8c08da54351ed5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637914884458918039%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=v5VsnxAINiJhOMLpwORLHd13WcYBHf%2FGSNv8Bjhyino%3D&reserved=0 before introducing
> this new driver to Linux.
> I have no experience with the above, so I wonder who should be
> responsible for maintaining such ACPI ID since it will not belong to
> any specific vendor? There is an example of e.g. COREBOOT PROJECT
> using "BOOT" ACPI ID [1], which seems similar in terms of not
> specifying any vendor but rather the project as a responsible entity.
> Maybe you have some recommendations?
Maybe LF could own a namespace and ID? But I would suggest you make a
mockup that everything works this way before you go explore too much.
Also make sure Rafael is aligned with your mockup.
>
> I am also not sure if and where a specification describing such a
> device has to be maintained. Since "HYPE0001" will have its own _DSM
> so will it be required to document it somewhere rather than just using
> it in the driver and preparing proper ACPI tables for guest?
>
>>
>>> I wonder if this will be affordable so just re-thinking loudly if
>>> there is no other mechanism that could be suggested and used upstream
>>> so we could notify hypervisor/vmm about guest entering s2idle state?
>>> Especially that such _DSM function will be introduced only to trap on
>>> some fake MMIO/PIO access and will be useful only for guest ACPI
>>> tables?
>>>
>>
>> Do you need to worry about Microsoft guests using Modern Standby too or
>> is that out of the scope of your problem set? I think you'll be a lot
>> more limited in how this can behave and where you can modify things if so.
>>
>
> I do not need to worry about Microsoft guests.
Makes life a lot easier :)
>
> [1] https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuefi.org%2Facpi_id_list&data=05%7C01%7Cmario.limonciello%40amd.com%7C49512293908e4ee17e8c08da54351ed5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637914884458918039%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=TXdPO%2BlCHa6v37IBsyymhGztgxZn6GEVESM%2FYI5LuUc%3D&reserved=0
>
> Thank you,
> Grzegorz
On 6/23/2022 11:50, Grzegorz Jaszczyk wrote:
> śr., 22 cze 2022 o 23:50 Limonciello, Mario
> <[email protected]> napisał(a):
>>
>> On 6/22/2022 04:53, Grzegorz Jaszczyk wrote:
>>> pon., 20 cze 2022 o 18:32 Limonciello, Mario
>>> <[email protected]> napisał(a):
>>>>
>>>> On 6/20/2022 10:43, Grzegorz Jaszczyk wrote:
>>>>> czw., 16 cze 2022 o 18:58 Limonciello, Mario
>>>>> <[email protected]> napisał(a):
>>>>>>
>>>>>> On 6/16/2022 11:48, Sean Christopherson wrote:
>>>>>>> On Wed, Jun 15, 2022, Grzegorz Jaszczyk wrote:
>>>>>>>> pt., 10 cze 2022 o 16:30 Sean Christopherson <[email protected]> napisał(a):
>>>>>>>>> MMIO or PIO for the actual exit, there's nothing special about hypercalls. As for
>>>>>>>>> enumerating to the guest that it should do something, why not add a new ACPI_LPS0_*
>>>>>>>>> function? E.g. something like
>>>>>>>>>
>>>>>>>>> static void s2idle_hypervisor_notify(void)
>>>>>>>>> {
>>>>>>>>> if (lps0_dsm_func_mask > 0)
>>>>>>>>> acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT_HYPERVISOR_NOTIFY
>>>>>>>>> lps0_dsm_func_mask, lps0_dsm_guid);
>>>>>>>>> }
>>>>>>>>
>>>>>>>> Great, thank you for your suggestion! I will try this approach and
>>>>>>>> come back. Since this will be the main change in the next version,
>>>>>>>> will it be ok for you to add Suggested-by: Sean Christopherson
>>>>>>>> <[email protected]> tag?
>>>>>>>
>>>>>>> If you want, but there's certainly no need to do so. But I assume you or someone
>>>>>>> at Intel will need to get formal approval for adding another ACPI LPS0 function?
>>>>>>> I.e. isn't there work to be done outside of the kernel before any patches can be
>>>>>>> merged?
>>>>>>
>>>>>> There are 3 different LPS0 GUIDs in use. An Intel one, an AMD (legacy)
>>>>>> one, and a Microsoft one. They all have their own specs, and so if this
>>>>>> was to be added I think all 3 need to be updated.
>>>>>
>>>>> Yes this will not be easy to achieve I think.
>>>>>
>>>>>>
>>>>>> As this is Linux specific hypervisor behavior, I don't know you would be
>>>>>> able to convince Microsoft to update theirs' either.
>>>>>>
>>>>>> How about using s2idle_devops? There is a prepare() call and a
>>>>>> restore() call that is set for each handler. The only consumer of this
>>>>>> ATM I'm aware of is the amd-pmc driver, but it's done like a
>>>>>> notification chain so that a bunch of drivers can hook in if they need to.
>>>>>>
>>>>>> Then you can have this notification path and the associated ACPI device
>>>>>> it calls out to be it's own driver.
>>>>>
>>>>> Thank you for your suggestion, just to be sure that I've understand
>>>>> your idea correctly:
>>>>> 1) it will require to extend acpi_s2idle_dev_ops about something like
>>>>> hypervisor_notify() call, since existing prepare() is called from end
>>>>> of acpi_s2idle_prepare_late so it is too early as it was described in
>>>>> one of previous message (between acpi_s2idle_prepare_late and place
>>>>> where we use hypercall there are several places where the suspend
>>>>> could be canceled, otherwise we could probably try to trap on other
>>>>> acpi_sleep_run_lps0_dsm occurrence from acpi_s2idle_prepare_late).
>>>>>
>>>>
>>>> The idea for prepare() was it would be the absolute last thing before
>>>> the s2idle loop was run. You're sure that's too early? It's basically
>>>> the same thing as having a last stage new _DSM call.
>>>>
>>>> What about adding a new abort() extension to acpi_s2idle_dev_ops? Then
>>>> you could catch the cancelled suspend case still and take corrective
>>>> action (if that action is different than what restore() would do).
>>>
>>> It will be problematic since the abort/restore notification could
>>> arrive too late and therefore the whole system will go to suspend
>>> thinking that the guest is in desired s2ilde state. Also in this case
>>> it would be impossible to prevent races and actually making sure that
>>> the guest is suspended or not. We already had similar discussion with
>>> Sean earlier in this thread why the notification have to be send just
>>> before swait_event_exclusive(s2idle_wait_head, s2idle_state ==
>>> S2IDLE_STATE_WAKE) and that the VMM have to have control over guest
>>> resumption.
>>>
>>> Nevertheless if extending acpi_s2idle_dev_ops is possible, why not
>>> extend it about the hypervisor_notify() and use it in the same place
>>> where the hypercall is used in this patch? Do you see any issue with
>>> that?
>>
>> If this needs to be a hypercall and the hypercall needs to go at that
>> specific time, I wouldn't bother with extending acpi_s2idle_dev_ops.
>> The whole idea there was that this would be less custom and could follow
>> a spec.
>
> Just to clarify - it probably doesn't need to be a hypercall. I've
> probably misled you with copy-pasting a handler name from the current
> patch but aiming your and Sean ACPI like approach.
Ah... Yeah I was quite confused.
> What I meant is
> something like:
> - extend acpi_s2idle_dev_ops with notify()
> - implement notify() handler for acpi_s2idle_dev_ops in HYPE0001
> driver (without hypercall):
> static void s2idle_notify(void)
> {
> acpi_evaluate_dsm(acpi_handle, guid_of_HYPE0001, 0,
> ACPI_HYPE_NOTIFY, NULL);
> }
>
> - register it via acpi_register_lps0_dev() from HYPE0001 driver
> - use it just before swait_event_exclusive(s2idle_wait_head..) as it
> is with original patch (the name of the function will be different):
> static void s2idle_hypervisor_notify(void)
> {
> struct acpi_s2idle_dev_ops *handler;
> ...
> list_for_each_entry(handler, &lps0_s2idle_devops_head, list_node) {
> if (handler->notify)
> handler->notify();
> }
> }
>
> so it will be like:
> -> s2idle_enter (just before swait_event_exclusive(s2idle_wait_head,.. )
> --> s2idle_hypervisor_notify (as platform_s2idle_ops)
> ---> notify (as acpi_s2idle_dev_ops)
> ----> HYPE0001 device driver's notify () routine
>
> It will probably be easier to understand it if I actually implement
> it.
Yeah; A lot of times seeing the mocked up code makes it easier to follow.
> Nevertheless this way we ensure that:
> - notification will be triggered at very last command before actually
> entering s2idle
> - we can trap on MMIO/PIO by implementing HYPE0001 specific _DSM
> method and therefore this implementation will not become hypervisor
> specific and also not use KVM as "dumb pipe out to userspace" as Sean
> suggested
> - we will not have to change existing Intel/AMD/Window spec (3
> different LPS0 GUIDs) but thanks to HYPE0001's acpi_s2idle_dev_ops
> involvment, only care about new HYPE0001 spec
>
I think your proposal is reasonable. Please include me on the RFC when
you've got it ready as well.
>>
>> TBH - given the strong dependency on being the very last command and
>> this being all Linux specific (you won't need to do something similar
>> with Windows) - I think the way you already did it makes the most sense.
>> It seems to me the ACPI device model doesn't really work well for this
>> scenario.
>>
>>>
>>>>
>>>>> 2) using newly introduced acpi_s2idle_dev_ops hypervisor_notify() call
>>>>> will allow to register handler from Intel x86/intel/pmc/core.c driver
>>>>> and/or AMD x86/amd-pmc.c driver. Therefore we will need to get only
>>>>> Intel and/or AMD approval about extending the ACPI LPS0 _DSM method,
>>>>> correct?
>>>>>
>>>>
>>>> Right now the only thing that hooks prepare()/restore() is the amd-pmc
>>>> driver (unless Intel's PMC had a change I didn't catch yet).
>>>>
>>>> I don't think you should be changing any existing drivers but rather
>>>> introduce another platform driver for this specific case.
>>>>
>>>> So it would be something like this:
>>>>
>>>> acpi_s2idle_prepare_late
>>>> -> prepare()
>>>> --> AMD: amd_pmc handler for prepare()
>>>> --> Intel: intel_pmc handler for prepare() (conceptual)
>>>> --> HYPE0001 device: new driver's prepare() routine
>>>>
>>>> So the platform driver would match the HYPE0001 device to load, and it
>>>> wouldn't do anything other than provide a prepare()/restore() handler
>>>> for your case.
>>>>
>>>> You don't need to change any existing specs. If anything a new spec to
>>>> go with this new ACPI device would be made. Someone would need to
>>>> reserve the ID and such for it, but I think you can mock it up in advance.
>>>
>>> Thank you for your explanation. This means that I should register
>>> "HYPE" through https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuefi.org%2FPNP_ACPI_Registry&data=05%7C01%7Cmario.limonciello%40amd.com%7Cfb93455738b84f772c0508da553878b6%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637915998363689041%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=jE1agna7RsjTW7%2BTp5UVFxByOPYURlNa79eyJxcKi2o%3D&reserved=0 before introducing
>>> this new driver to Linux.
>>> I have no experience with the above, so I wonder who should be
>>> responsible for maintaining such ACPI ID since it will not belong to
>>> any specific vendor? There is an example of e.g. COREBOOT PROJECT
>>> using "BOOT" ACPI ID [1], which seems similar in terms of not
>>> specifying any vendor but rather the project as a responsible entity.
>>> Maybe you have some recommendations?
>>
>> Maybe LF could own a namespace and ID? But I would suggest you make a
>> mockup that everything works this way before you go explore too much.
>
> Yeah, sure.
>
>>
>> Also make sure Rafael is aligned with your mockup.
>
> Agree.
>
>>
>>>
>>> I am also not sure if and where a specification describing such a
>>> device has to be maintained. Since "HYPE0001" will have its own _DSM
>>> so will it be required to document it somewhere rather than just using
>>> it in the driver and preparing proper ACPI tables for guest?
>>>
>>>>
>>>>> I wonder if this will be affordable so just re-thinking loudly if
>>>>> there is no other mechanism that could be suggested and used upstream
>>>>> so we could notify hypervisor/vmm about guest entering s2idle state?
>>>>> Especially that such _DSM function will be introduced only to trap on
>>>>> some fake MMIO/PIO access and will be useful only for guest ACPI
>>>>> tables?
>>>>>
>>>>
>>>> Do you need to worry about Microsoft guests using Modern Standby too or
>>>> is that out of the scope of your problem set? I think you'll be a lot
>>>> more limited in how this can behave and where you can modify things if so.
>>>>
>>>
>>> I do not need to worry about Microsoft guests.
>>
>> Makes life a lot easier :)
>
> Agree :) and thank you for all your feedback,
> Grzegorz
Sure.
śr., 22 cze 2022 o 23:50 Limonciello, Mario
<[email protected]> napisał(a):
>
> On 6/22/2022 04:53, Grzegorz Jaszczyk wrote:
> > pon., 20 cze 2022 o 18:32 Limonciello, Mario
> > <[email protected]> napisał(a):
> >>
> >> On 6/20/2022 10:43, Grzegorz Jaszczyk wrote:
> >>> czw., 16 cze 2022 o 18:58 Limonciello, Mario
> >>> <[email protected]> napisał(a):
> >>>>
> >>>> On 6/16/2022 11:48, Sean Christopherson wrote:
> >>>>> On Wed, Jun 15, 2022, Grzegorz Jaszczyk wrote:
> >>>>>> pt., 10 cze 2022 o 16:30 Sean Christopherson <[email protected]> napisał(a):
> >>>>>>> MMIO or PIO for the actual exit, there's nothing special about hypercalls. As for
> >>>>>>> enumerating to the guest that it should do something, why not add a new ACPI_LPS0_*
> >>>>>>> function? E.g. something like
> >>>>>>>
> >>>>>>> static void s2idle_hypervisor_notify(void)
> >>>>>>> {
> >>>>>>> if (lps0_dsm_func_mask > 0)
> >>>>>>> acpi_sleep_run_lps0_dsm(ACPI_LPS0_EXIT_HYPERVISOR_NOTIFY
> >>>>>>> lps0_dsm_func_mask, lps0_dsm_guid);
> >>>>>>> }
> >>>>>>
> >>>>>> Great, thank you for your suggestion! I will try this approach and
> >>>>>> come back. Since this will be the main change in the next version,
> >>>>>> will it be ok for you to add Suggested-by: Sean Christopherson
> >>>>>> <[email protected]> tag?
> >>>>>
> >>>>> If you want, but there's certainly no need to do so. But I assume you or someone
> >>>>> at Intel will need to get formal approval for adding another ACPI LPS0 function?
> >>>>> I.e. isn't there work to be done outside of the kernel before any patches can be
> >>>>> merged?
> >>>>
> >>>> There are 3 different LPS0 GUIDs in use. An Intel one, an AMD (legacy)
> >>>> one, and a Microsoft one. They all have their own specs, and so if this
> >>>> was to be added I think all 3 need to be updated.
> >>>
> >>> Yes this will not be easy to achieve I think.
> >>>
> >>>>
> >>>> As this is Linux specific hypervisor behavior, I don't know you would be
> >>>> able to convince Microsoft to update theirs' either.
> >>>>
> >>>> How about using s2idle_devops? There is a prepare() call and a
> >>>> restore() call that is set for each handler. The only consumer of this
> >>>> ATM I'm aware of is the amd-pmc driver, but it's done like a
> >>>> notification chain so that a bunch of drivers can hook in if they need to.
> >>>>
> >>>> Then you can have this notification path and the associated ACPI device
> >>>> it calls out to be it's own driver.
> >>>
> >>> Thank you for your suggestion, just to be sure that I've understand
> >>> your idea correctly:
> >>> 1) it will require to extend acpi_s2idle_dev_ops about something like
> >>> hypervisor_notify() call, since existing prepare() is called from end
> >>> of acpi_s2idle_prepare_late so it is too early as it was described in
> >>> one of previous message (between acpi_s2idle_prepare_late and place
> >>> where we use hypercall there are several places where the suspend
> >>> could be canceled, otherwise we could probably try to trap on other
> >>> acpi_sleep_run_lps0_dsm occurrence from acpi_s2idle_prepare_late).
> >>>
> >>
> >> The idea for prepare() was it would be the absolute last thing before
> >> the s2idle loop was run. You're sure that's too early? It's basically
> >> the same thing as having a last stage new _DSM call.
> >>
> >> What about adding a new abort() extension to acpi_s2idle_dev_ops? Then
> >> you could catch the cancelled suspend case still and take corrective
> >> action (if that action is different than what restore() would do).
> >
> > It will be problematic since the abort/restore notification could
> > arrive too late and therefore the whole system will go to suspend
> > thinking that the guest is in desired s2ilde state. Also in this case
> > it would be impossible to prevent races and actually making sure that
> > the guest is suspended or not. We already had similar discussion with
> > Sean earlier in this thread why the notification have to be send just
> > before swait_event_exclusive(s2idle_wait_head, s2idle_state ==
> > S2IDLE_STATE_WAKE) and that the VMM have to have control over guest
> > resumption.
> >
> > Nevertheless if extending acpi_s2idle_dev_ops is possible, why not
> > extend it about the hypervisor_notify() and use it in the same place
> > where the hypercall is used in this patch? Do you see any issue with
> > that?
>
> If this needs to be a hypercall and the hypercall needs to go at that
> specific time, I wouldn't bother with extending acpi_s2idle_dev_ops.
> The whole idea there was that this would be less custom and could follow
> a spec.
Just to clarify - it probably doesn't need to be a hypercall. I've
probably misled you with copy-pasting a handler name from the current
patch but aiming your and Sean ACPI like approach. What I meant is
something like:
- extend acpi_s2idle_dev_ops with notify()
- implement notify() handler for acpi_s2idle_dev_ops in HYPE0001
driver (without hypercall):
static void s2idle_notify(void)
{
acpi_evaluate_dsm(acpi_handle, guid_of_HYPE0001, 0,
ACPI_HYPE_NOTIFY, NULL);
}
- register it via acpi_register_lps0_dev() from HYPE0001 driver
- use it just before swait_event_exclusive(s2idle_wait_head..) as it
is with original patch (the name of the function will be different):
static void s2idle_hypervisor_notify(void)
{
struct acpi_s2idle_dev_ops *handler;
...
list_for_each_entry(handler, &lps0_s2idle_devops_head, list_node) {
if (handler->notify)
handler->notify();
}
}
so it will be like:
-> s2idle_enter (just before swait_event_exclusive(s2idle_wait_head,.. )
--> s2idle_hypervisor_notify (as platform_s2idle_ops)
---> notify (as acpi_s2idle_dev_ops)
----> HYPE0001 device driver's notify () routine
It will probably be easier to understand it if I actually implement
it. Nevertheless this way we ensure that:
- notification will be triggered at very last command before actually
entering s2idle
- we can trap on MMIO/PIO by implementing HYPE0001 specific _DSM
method and therefore this implementation will not become hypervisor
specific and also not use KVM as "dumb pipe out to userspace" as Sean
suggested
- we will not have to change existing Intel/AMD/Window spec (3
different LPS0 GUIDs) but thanks to HYPE0001's acpi_s2idle_dev_ops
involvment, only care about new HYPE0001 spec
>
> TBH - given the strong dependency on being the very last command and
> this being all Linux specific (you won't need to do something similar
> with Windows) - I think the way you already did it makes the most sense.
> It seems to me the ACPI device model doesn't really work well for this
> scenario.
>
> >
> >>
> >>> 2) using newly introduced acpi_s2idle_dev_ops hypervisor_notify() call
> >>> will allow to register handler from Intel x86/intel/pmc/core.c driver
> >>> and/or AMD x86/amd-pmc.c driver. Therefore we will need to get only
> >>> Intel and/or AMD approval about extending the ACPI LPS0 _DSM method,
> >>> correct?
> >>>
> >>
> >> Right now the only thing that hooks prepare()/restore() is the amd-pmc
> >> driver (unless Intel's PMC had a change I didn't catch yet).
> >>
> >> I don't think you should be changing any existing drivers but rather
> >> introduce another platform driver for this specific case.
> >>
> >> So it would be something like this:
> >>
> >> acpi_s2idle_prepare_late
> >> -> prepare()
> >> --> AMD: amd_pmc handler for prepare()
> >> --> Intel: intel_pmc handler for prepare() (conceptual)
> >> --> HYPE0001 device: new driver's prepare() routine
> >>
> >> So the platform driver would match the HYPE0001 device to load, and it
> >> wouldn't do anything other than provide a prepare()/restore() handler
> >> for your case.
> >>
> >> You don't need to change any existing specs. If anything a new spec to
> >> go with this new ACPI device would be made. Someone would need to
> >> reserve the ID and such for it, but I think you can mock it up in advance.
> >
> > Thank you for your explanation. This means that I should register
> > "HYPE" through https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fuefi.org%2FPNP_ACPI_Registry&data=05%7C01%7Cmario.limonciello%40amd.com%7C49512293908e4ee17e8c08da54351ed5%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637914884458918039%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=v5VsnxAINiJhOMLpwORLHd13WcYBHf%2FGSNv8Bjhyino%3D&reserved=0 before introducing
> > this new driver to Linux.
> > I have no experience with the above, so I wonder who should be
> > responsible for maintaining such ACPI ID since it will not belong to
> > any specific vendor? There is an example of e.g. COREBOOT PROJECT
> > using "BOOT" ACPI ID [1], which seems similar in terms of not
> > specifying any vendor but rather the project as a responsible entity.
> > Maybe you have some recommendations?
>
> Maybe LF could own a namespace and ID? But I would suggest you make a
> mockup that everything works this way before you go explore too much.
Yeah, sure.
>
> Also make sure Rafael is aligned with your mockup.
Agree.
>
> >
> > I am also not sure if and where a specification describing such a
> > device has to be maintained. Since "HYPE0001" will have its own _DSM
> > so will it be required to document it somewhere rather than just using
> > it in the driver and preparing proper ACPI tables for guest?
> >
> >>
> >>> I wonder if this will be affordable so just re-thinking loudly if
> >>> there is no other mechanism that could be suggested and used upstream
> >>> so we could notify hypervisor/vmm about guest entering s2idle state?
> >>> Especially that such _DSM function will be introduced only to trap on
> >>> some fake MMIO/PIO access and will be useful only for guest ACPI
> >>> tables?
> >>>
> >>
> >> Do you need to worry about Microsoft guests using Modern Standby too or
> >> is that out of the scope of your problem set? I think you'll be a lot
> >> more limited in how this can behave and where you can modify things if so.
> >>
> >
> > I do not need to worry about Microsoft guests.
>
> Makes life a lot easier :)
Agree :) and thank you for all your feedback,
Grzegorz
On Wed, Jun 22, 2022, Limonciello, Mario wrote:
> On 6/22/2022 04:53, Grzegorz Jaszczyk wrote:
> > It will be problematic since the abort/restore notification could
> > arrive too late and therefore the whole system will go to suspend
> > thinking that the guest is in desired s2ilde state. Also in this case
> > it would be impossible to prevent races and actually making sure that
> > the guest is suspended or not. We already had similar discussion with
> > Sean earlier in this thread why the notification have to be send just
> > before swait_event_exclusive(s2idle_wait_head, s2idle_state ==
> > S2IDLE_STATE_WAKE) and that the VMM have to have control over guest
> > resumption.
> >
> > Nevertheless if extending acpi_s2idle_dev_ops is possible, why not
> > extend it about the hypervisor_notify() and use it in the same place
> > where the hypercall is used in this patch? Do you see any issue with
> > that?
>
> If this needs to be a hypercall and the hypercall needs to go at that
> specific time, I wouldn't bother with extending acpi_s2idle_dev_ops. The
> whole idea there was that this would be less custom and could follow a spec.
It doesn't need to be a hypercall though. PIO and MMIO provide the same "exit to
host userspace" behavior, and there is zero reason to get KVM involved since KVM
(on x86) doesn't deal with platform scoped power management.
I get that squeezing this into the ACPI device model is awkward, but forcing KVM
into the picture isn't any better.
> TBH - given the strong dependency on being the very last command and this
> being all Linux specific (you won't need to do something similar with
> Windows) - I think the way you already did it makes the most sense.
> It seems to me the ACPI device model doesn't really work well for this
> scenario.