LinuxLists.cc - [PATCH v2 00/11] Fix PM hibernation in Xen guests

2020-07-02 18:24:52

Subject: [PATCH v2 00/11] Fix PM hibernation in Xen guests

Hello,
This series fixes PM hibernation for hvm guests running on xen hypervisor.
The running guest could now be hibernated and resumed successfully at a
later time. The fixes for PM hibernation are added to block and
network device drivers i.e xen-blkfront and xen-netfront. Any other driver
that needs to add S4 support if not already, can follow same method of
introducing freeze/thaw/restore callbacks.
The patches had been tested against upstream kernel and xen4.11. Large
scale testing is also done on Xen based Amazon EC2 instances. All this testing
involved running memory exhausting workload in the background.

Doing guest hibernation does not involve any support from hypervisor and
this way guest has complete control over its state. Infrastructure
restrictions for saving up guest state can be overcome by guest initiated
hibernation.

These patches were send out as RFC before and all the feedback had been
incorporated in the patches. The last v1 could be found here:

[v1]: https://lkml.org/lkml/2020/5/19/1312
All comments and feedback from v1 had been incorporated in v2 series.
Any comments/suggestions are welcome

Known issues:
1.KASLR causes intermittent hibernation failures. VM fails to resumes and
has to be restarted. I will investigate this issue separately and shouldn't
be a blocker for this patch series.
2. During hibernation, I observed sometimes that freezing of tasks fails due
to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may be 1
out of 200 runs and hibernation is aborted in this case. Re-trying hibernation
may work. Also, this is a known issue with hibernation and some
filesystems like XFS has been discussed by the community for years with not an
effectve resolution at this point.

Testing How to:
---------------
1. Setup xen hypervisor on a physical machine[ I used Ubuntu 16.04 +upstream
xen-4.11]
2. Bring up a HVM guest w/t kernel compiled with hibernation patches
[I used ubuntu18.04 netboot bionic images and also Amazon Linux on-prem images].
3. Create a swap file size=RAM size
4. Update grub parameters and reboot
5. Trigger pm-hibernation from within the VM

Example:
Set up a file-backed swap space. Swap file size>=Total memory on the system
sudo dd if=/dev/zero of=/swap bs=$(( 1024 * 1024 )) count=4096 # 4096MiB
sudo chmod 600 /swap
sudo mkswap /swap
sudo swapon /swap

Update resume device/resume offset in grub if using swap file:
resume=/dev/xvda1 resume_offset=200704 no_console_suspend=1

Execute:
--------
sudo pm-hibernate
OR
echo disk > /sys/power/state && echo reboot > /sys/power/disk

Compute resume offset code:
"
#!/usr/bin/env python
import sys
import array
import fcntl

#swap file
f = open(sys.argv[1], 'r')
buf = array.array('L', [0])

#FIBMAP
ret = fcntl.ioctl(f.fileno(), 0x01, buf)
print buf[0]
"

Aleksei Besogonov (1):
PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA

Anchal Agarwal (4):
x86/xen: Introduce new function to map HYPERVISOR_shared_info on
Resume
x86/xen: save and restore steal clock during PM hibernation
xen: Introduce wrapper for save/restore sched clock offset
xen: Update sched clock offset to avoid system instability in
hibernation

Munehisa Kamata (5):
xen/manage: keep track of the on-going suspend mode
xenbus: add freeze/thaw/restore callbacks support
x86/xen: add system core suspend and resume callbacks
xen-blkfront: add callbacks for PM suspend and hibernation
xen-netfront: add callbacks for PM suspend and hibernation

Thomas Gleixner (1):
genirq: Shutdown irq chips in suspend/resume during hibernation

arch/x86/xen/enlighten_hvm.c | 7 ++
arch/x86/xen/suspend.c | 53 +++++++++++++
arch/x86/xen/time.c | 15 +++-
arch/x86/xen/xen-ops.h | 3 +
drivers/block/xen-blkfront.c | 122 +++++++++++++++++++++++++++++-
drivers/net/xen-netfront.c | 98 +++++++++++++++++++++++-
drivers/xen/events/events_base.c | 1 +
drivers/xen/manage.c | 60 +++++++++++++++
drivers/xen/xenbus/xenbus_probe.c | 96 +++++++++++++++++++----
include/linux/irq.h | 2 +
include/xen/xen-ops.h | 3 +
include/xen/xenbus.h | 3 +
kernel/irq/chip.c | 2 +-
kernel/irq/internals.h | 1 +
kernel/irq/pm.c | 31 +++++---
kernel/power/user.c | 6 +-
16 files changed, 470 insertions(+), 33 deletions(-)

--
2.20.1

2020-07-02 18:24:57

by Anchal Agarwal

[permalink] [raw]

Subject: [PATCH v2 08/11] x86/xen: save and restore steal clock during PM hibernation

Save/restore steal times in syscore suspend/resume during PM
hibernation. Commit '5e25f5db6abb9: ("xen/time: do not
decrease steal time after live migration on xen")' fixes xen
guest steal time handling during migration. A similar issue is seen
during PM hibernation.
Currently, steal time accounting code in scheduler expects steal clock
callback to provide monotonically increasing value. If the accounting
code receives a smaller value than previous one, it uses a negative
value to calculate steal time and results in incorrectly updated idle
and steal time accounting. This breaks userspace tools which read
/proc/stat.

top - 08:05:35 up 2:12, 3 users, load average: 0.00, 0.07, 0.23
Tasks: 80 total, 1 running, 79 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,30100.0%id, 0.0%wa, 0.0%hi, 0.0%si,-1253874204672.0%st

This can actually happen when a Xen PVHVM guest gets restored from
hibernation, because such a restored guest is just a fresh domain from
Xen perspective and the time information in runstate info starts over
from scratch.

Changelog:
v1->v2: Removed patches that introduced new function calls for saving/restoring
sched clock offset and using existing ones that are used during LM

Signed-off-by: Anchal Agarwal <[email protected]>
---
arch/x86/xen/suspend.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index e8c924e93fc5..10cd14326472 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -94,10 +94,9 @@ static int xen_syscore_suspend(void)
int ret;

gnttab_suspend();
-
+ xen_manage_runstate_time(-1);
xrfp.domid = DOMID_SELF;
xrfp.gpfn = __pa(HYPERVISOR_shared_info) >> PAGE_SHIFT;
-
ret = HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrfp);
if (!ret)
HYPERVISOR_shared_info = &xen_dummy_shared_info;
@@ -111,7 +110,7 @@ static void xen_syscore_resume(void)
xen_hvm_map_shared_info();

pvclock_resume();
-
+ xen_manage_runstate_time(0);
gnttab_resume();
}

--
2.20.1

2020-07-02 18:25:10

by Anchal Agarwal

[permalink] [raw]

Subject: [PATCH v2 09/11] xen: Introduce wrapper for save/restore sched clock offset

Introduce wrappers for save/restore xen_sched_clock_offset to be
used by PM hibernation code to avoid system instability during resume.

Signed-off-by: Anchal Agarwal <[email protected]>
---
arch/x86/xen/time.c | 15 +++++++++++++--
arch/x86/xen/xen-ops.h | 2 ++
2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index c8897aad13cd..676950eb0cb5 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -386,12 +386,23 @@ static const struct pv_time_ops xen_time_ops __initconst = {
static struct pvclock_vsyscall_time_info *xen_clock __read_mostly;
static u64 xen_clock_value_saved;

+/*This is needed to maintain a monotonic clock value during PM hibernation */
+void xen_save_sched_clock_offset(void)
+{
+ xen_clock_value_saved = xen_clocksource_read() - xen_sched_clock_offset;
+}
+
+void xen_restore_sched_clock_offset(void)
+{
+ xen_sched_clock_offset = xen_clocksource_read() - xen_clock_value_saved;
+}
+
void xen_save_time_memory_area(void)
{
struct vcpu_register_time_memory_area t;
int ret;

- xen_clock_value_saved = xen_clocksource_read() - xen_sched_clock_offset;
+ xen_save_sched_clock_offset();

if (!xen_clock)
return;
@@ -434,7 +445,7 @@ void xen_restore_time_memory_area(void)
out:
/* Need pvclock_resume() before using xen_clocksource_read(). */
pvclock_resume();
- xen_sched_clock_offset = xen_clocksource_read() - xen_clock_value_saved;
+ xen_restore_sched_clock_offset();
}

static void xen_setup_vsyscall_time_info(void)
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 41e9e9120f2d..f4b78b19493b 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -70,6 +70,8 @@ void xen_save_time_memory_area(void);
void xen_restore_time_memory_area(void);
void xen_init_time_ops(void);
void xen_hvm_init_time_ops(void);
+void xen_save_sched_clock_offset(void);
+void xen_restore_sched_clock_offset(void);

irqreturn_t xen_debug_interrupt(int irq, void *dev_id);

--
2.20.1

2020-07-02 18:25:17

by Anchal Agarwal

[permalink] [raw]

Subject: [PATCH v2 10/11] xen: Update sched clock offset to avoid system instability in hibernation

Save/restore xen_sched_clock_offset in syscore suspend/resume during PM
hibernation. Commit '867cefb4cb1012: ("xen: Fix x86 sched_clock() interface
for xen")' fixes xen guest time handling during migration. A similar issue
is seen during PM hibernation when system runs CPU intensive workload.
Post resume pvclock resets the value to 0 however, xen sched_clock_offset
is never updated. System instability is seen during resume from hibernation
when system is under heavy CPU load. Since xen_sched_clock_offset is not
updated, system does not see the monotonic clock value and the scheduler
would then think that heavy CPU hog tasks need more time in CPU, causing
the system to freeze

Signed-off-by: Anchal Agarwal <[email protected]>
---
arch/x86/xen/suspend.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/arch/x86/xen/suspend.c b/arch/x86/xen/suspend.c
index 10cd14326472..4d8b1d2390b9 100644
--- a/arch/x86/xen/suspend.c
+++ b/arch/x86/xen/suspend.c
@@ -95,6 +95,7 @@ static int xen_syscore_suspend(void)

gnttab_suspend();
xen_manage_runstate_time(-1);
+ xen_save_sched_clock_offset();
xrfp.domid = DOMID_SELF;
xrfp.gpfn = __pa(HYPERVISOR_shared_info) >> PAGE_SHIFT;
ret = HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &xrfp);
@@ -110,6 +111,12 @@ static void xen_syscore_resume(void)
xen_hvm_map_shared_info();

pvclock_resume();
+ /*
+ * Restore xen_sched_clock_offset during resume to maintain
+ * monotonic clock value
+ */
+ xen_restore_sched_clock_offset();
+
xen_manage_runstate_time(0);
gnttab_resume();
}
--
2.20.1

2020-07-02 18:25:27

by Anchal Agarwal

[permalink] [raw]

Subject: [PATCH v2 03/11] x86/xen: Introduce new function to map HYPERVISOR_shared_info on Resume

Introduce a small function which re-uses shared page's PA allocated
during guest initialization time in reserve_shared_info() and not
allocate new page during resume flow.
It also does the mapping of shared_info_page by calling
xen_hvm_init_shared_info() to use the function.

Changelog:
v1->v2: Remove extra check for shared_info_pfn to be NULL

Signed-off-by: Anchal Agarwal <[email protected]>
---
arch/x86/xen/enlighten_hvm.c | 6 ++++++
arch/x86/xen/xen-ops.h | 1 +
2 files changed, 7 insertions(+)

diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index 3e89b0067ff0..d91099928746 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -28,6 +28,12 @@

static unsigned long shared_info_pfn;

+void xen_hvm_map_shared_info(void)
+{
+ xen_hvm_init_shared_info();
+ HYPERVISOR_shared_info = __va(PFN_PHYS(shared_info_pfn));
+}
+
void xen_hvm_init_shared_info(void)
{
struct xen_add_to_physmap xatp;
diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h
index 53b224fd6177..41e9e9120f2d 100644
--- a/arch/x86/xen/xen-ops.h
+++ b/arch/x86/xen/xen-ops.h
@@ -54,6 +54,7 @@ void xen_enable_sysenter(void);
void xen_enable_syscall(void);
void xen_vcpu_restore(void);

+void xen_hvm_map_shared_info(void);
void xen_hvm_init_shared_info(void);
void xen_unplug_emulated_devices(void);

--
2.20.1

2020-07-02 18:26:34

by Anchal Agarwal

[permalink] [raw]

Subject: [PATCH v2 11/11] PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA

From: Aleksei Besogonov <[email protected]>

The SNAPSHOT_SET_SWAP_AREA is supposed to be used to set the hibernation
offset on a running kernel to enable hibernating to a swap file.
However, it doesn't actually update the swsusp_resume_block variable. As
a result, the hibernation fails at the last step (after all the data is
written out) in the validation of the swap signature in
mark_swapfiles().

Before this patch, the command line processing was the only place where
swsusp_resume_block was set.

[Anchal Agarwal: Changelog: Resolved patch conflict as code fragmented to
snapshot_set_swap_area]

Signed-off-by: Aleksei Besogonov <[email protected]>
Signed-off-by: Munehisa Kamata <[email protected]>
Signed-off-by: Anchal Agarwal <[email protected]>
---
kernel/power/user.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/power/user.c b/kernel/power/user.c
index d5eedc2baa2a..e1209cefc103 100644
--- a/kernel/power/user.c
+++ b/kernel/power/user.c
@@ -242,8 +242,12 @@ static int snapshot_set_swap_area(struct snapshot_data *data,
return -EINVAL;
}
data->swap = swap_type_of(swdev, offset, &bdev);
- if (data->swap < 0)
+ if (data->swap < 0) {
return -ENODEV;
+ } else {
+ swsusp_resume_device = swdev;
+ swsusp_resume_block = offset;
+ }

data->bd_inode = bdev->bd_inode;
bdput(bdev);
--
2.20.1

2020-07-02 18:26:34

by Anchal Agarwal

[permalink] [raw]

Subject: [PATCH v2 02/11] xenbus: add freeze/thaw/restore callbacks support

From: Munehisa Kamata <[email protected]>

Since commit b3e96c0c7562 ("xen: use freeze/restore/thaw PM events for
suspend/resume/chkpt"), xenbus uses PMSG_FREEZE, PMSG_THAW and
PMSG_RESTORE events for Xen suspend. However, they're actually assigned
to xenbus_dev_suspend(), xenbus_dev_cancel() and xenbus_dev_resume()
respectively, and only suspend and resume callbacks are supported at
driver level. To support PM suspend and PM hibernation, modify the bus
level PM callbacks to invoke not only device driver's suspend/resume but
also freeze/thaw/restore.

Note that we'll use freeze/restore callbacks even for PM suspend whereas
suspend/resume callbacks are normally used in the case, becausae the
existing xenbus device drivers already have suspend/resume callbacks
specifically designed for Xen suspend. So we can allow the device
drivers to keep the existing callbacks wihtout modification.

[Anchal Agarwal: Changelog]:
RFC v1->v2: Refactored the callbacks code
v1->v2: Use dev_warn instead of pr_warn, naming/initialization
conventions
Signed-off-by: Agarwal Anchal <[email protected]>
Signed-off-by: Munehisa Kamata <[email protected]>
---
drivers/xen/xenbus/xenbus_probe.c | 96 ++++++++++++++++++++++++++-----
include/xen/xenbus.h | 3 +
2 files changed, 84 insertions(+), 15 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c
index 38725d97d909..715919aacd28 100644
--- a/drivers/xen/xenbus/xenbus_probe.c
+++ b/drivers/xen/xenbus/xenbus_probe.c
@@ -50,6 +50,7 @@
#include <linux/io.h>
#include <linux/slab.h>
#include <linux/module.h>
+#include <linux/suspend.h>

#include <asm/page.h>
#include <asm/xen/hypervisor.h>
@@ -599,16 +600,33 @@ int xenbus_dev_suspend(struct device *dev)
struct xenbus_driver *drv;
struct xenbus_device *xdev
= container_of(dev, struct xenbus_device, dev);
+ bool xen_suspend = xen_is_xen_suspend();

DPRINTK("%s", xdev->nodename);

if (dev->driver == NULL)
return 0;
drv = to_xenbus_driver(dev->driver);
- if (drv->suspend)
- err = drv->suspend(xdev);
- if (err)
- dev_warn(dev, "suspend failed: %i\n", err);
+ if (xen_suspend) {
+ if (drv->suspend)
+ err = drv->suspend(xdev);
+ } else {
+ if (drv->freeze) {
+ err = drv->freeze(xdev);
+ if (!err) {
+ free_otherend_watch(xdev);
+ free_otherend_details(xdev);
+ return 0;
+ }
+ }
+ }
+
+ if (err) {
+ dev_warn(&xdev->dev, "%s %s failed: %d\n", xen_suspend ?
+ "suspend" : "freeze", xdev->nodename, err);
+ return err;
+ }
+
return 0;
}
EXPORT_SYMBOL_GPL(xenbus_dev_suspend);
@@ -619,6 +637,7 @@ int xenbus_dev_resume(struct device *dev)
struct xenbus_driver *drv;
struct xenbus_device *xdev
= container_of(dev, struct xenbus_device, dev);
+ bool xen_suspend = xen_is_xen_suspend();

DPRINTK("%s", xdev->nodename);

@@ -627,23 +646,34 @@ int xenbus_dev_resume(struct device *dev)
drv = to_xenbus_driver(dev->driver);
err = talk_to_otherend(xdev);
if (err) {
- dev_warn(dev, "resume (talk_to_otherend) failed: %i\n", err);
+ dev_warn(&xdev->dev, "%s (talk_to_otherend) %s failed: %d\n",
+ xen_suspend ? "resume" : "restore",
+ xdev->nodename, err);
return err;
}

- xdev->state = XenbusStateInitialising;
+ if (xen_suspend) {
+ xdev->state = XenbusStateInitialising;
+ if (drv->resume)
+ err = drv->resume(xdev);
+ } else {
+ if (drv->restore)
+ err = drv->restore(xdev);
+ }

- if (drv->resume) {
- err = drv->resume(xdev);
- if (err) {
- dev_warn(dev, "resume failed: %i\n", err);
- return err;
- }
+ if (err) {
+ dev_warn(&xdev->dev, "%s %s failed: %d\n",
+ xen_suspend ? "resume" : "restore",
+ xdev->nodename, err);
+ return err;
}

err = watch_otherend(xdev);
if (err) {
- dev_warn(dev, "resume (watch_otherend) failed: %d\n", err);
+ dev_warn(&xdev->dev, "%s (watch_otherend) %s failed: %d.\n",
+ xen_suspend ? "resume" : "restore",
+ xdev->nodename, err);
+
return err;
}

@@ -653,8 +683,44 @@ EXPORT_SYMBOL_GPL(xenbus_dev_resume);

int xenbus_dev_cancel(struct device *dev)
{
- /* Do nothing */
- DPRINTK("cancel");
+ int err;
+ struct xenbus_driver *drv;
+ struct xenbus_device *xendev = to_xenbus_device(dev);
+ bool xen_suspend = xen_is_xen_suspend();
+
+ if (xen_suspend) {
+ /* Do nothing */
+ DPRINTK("cancel");
+ return 0;
+ }
+
+ DPRINTK("%s", xendev->nodename);
+
+ if (dev->driver == NULL)
+ return 0;
+ drv = to_xenbus_driver(dev->driver);
+ err = talk_to_otherend(xendev);
+ if (err) {
+ dev_warn(&xendev->dev, "thaw (talk_to_otherend) %s failed: %d.\n",
+ xendev->nodename, err);
+ return err;
+ }
+
+ if (drv->thaw) {
+ err = drv->thaw(xendev);
+ if (err) {
+ dev_warn(&xendev->dev, "thaw %s failed: %d\n", xendev->nodename, err);
+ return err;
+ }
+ }
+
+ err = watch_otherend(xendev);
+ if (err) {
+ dev_warn(&xendev->dev, "thaw (watch_otherend) %s failed: %d.\n",
+ xendev->nodename, err);
+ return err;
+ }
+
return 0;
}
EXPORT_SYMBOL_GPL(xenbus_dev_cancel);
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 5a8315e6d8a6..8da964763255 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -104,6 +104,9 @@ struct xenbus_driver {
int (*remove)(struct xenbus_device *dev);
int (*suspend)(struct xenbus_device *dev);
int (*resume)(struct xenbus_device *dev);
+ int (*freeze)(struct xenbus_device *dev);
+ int (*thaw)(struct xenbus_device *dev);
+ int (*restore)(struct xenbus_device *dev);
int (*uevent)(struct xenbus_device *, struct kobj_uevent_env *);
struct device_driver driver;
int (*read_otherend_details)(struct xenbus_device *dev);
--
2.20.1

2020-07-10 18:20:09

by Anchal Agarwal

[permalink] [raw]

Subject: Re: [PATCH v2 00/11] Fix PM hibernation in Xen guests

Gentle ping on this series.

--
Anchal

Hello,
This series fixes PM hibernation for hvm guests running on xen hypervisor.
The running guest could now be hibernated and resumed successfully at a
later time. The fixes for PM hibernation are added to block and
network device drivers i.e xen-blkfront and xen-netfront. Any other driver
that needs to add S4 support if not already, can follow same method of
introducing freeze/thaw/restore callbacks.
The patches had been tested against upstream kernel and xen4.11. Large
scale testing is also done on Xen based Amazon EC2 instances. All this testing
involved running memory exhausting workload in the background.

Doing guest hibernation does not involve any support from hypervisor and
this way guest has complete control over its state. Infrastructure
restrictions for saving up guest state can be overcome by guest initiated
hibernation.

These patches were send out as RFC before and all the feedback had been
incorporated in the patches. The last v1 could be found here:

[v1]: https://lkml.org/lkml/2020/5/19/1312
All comments and feedback from v1 had been incorporated in v2 series.
Any comments/suggestions are welcome

Known issues:
1.KASLR causes intermittent hibernation failures. VM fails to resumes and
has to be restarted. I will investigate this issue separately and shouldn't
be a blocker for this patch series.
2. During hibernation, I observed sometimes that freezing of tasks fails due
to busy XFS workqueuei[xfs-cil/xfs-sync]. This is also intermittent may be 1
out of 200 runs and hibernation is aborted in this case. Re-trying hibernation
may work. Also, this is a known issue with hibernation and some
filesystems like XFS has been discussed by the community for years with not an
effectve resolution at this point.

Testing How to:
---------------
1. Setup xen hypervisor on a physical machine[ I used Ubuntu 16.04 +upstream
xen-4.11]
2. Bring up a HVM guest w/t kernel compiled with hibernation patches
[I used ubuntu18.04 netboot bionic images and also Amazon Linux on-prem images].
3. Create a swap file size=RAM size
4. Update grub parameters and reboot
5. Trigger pm-hibernation from within the VM

Example:
Set up a file-backed swap space. Swap file size>=Total memory on the system
sudo dd if=/dev/zero of=/swap bs=$(( 1024 * 1024 )) count=4096 # 4096MiB
sudo chmod 600 /swap
sudo mkswap /swap
sudo swapon /swap

Update resume device/resume offset in grub if using swap file:
resume=/dev/xvda1 resume_offset=200704 no_console_suspend=1

Execute:
--------
sudo pm-hibernate
OR
echo disk > /sys/power/state && echo reboot > /sys/power/disk

Compute resume offset code:
"
#!/usr/bin/env python
import sys
import array
import fcntl

#swap file
f = open(sys.argv[1], 'r')
buf = array.array('L', [0])

#FIBMAP
ret = fcntl.ioctl(f.fileno(), 0x01, buf)
print buf[0]
"

Aleksei Besogonov (1):
PM / hibernate: update the resume offset on SNAPSHOT_SET_SWAP_AREA

Anchal Agarwal (4):
x86/xen: Introduce new function to map HYPERVISOR_shared_info on
Resume
x86/xen: save and restore steal clock during PM hibernation
xen: Introduce wrapper for save/restore sched clock offset
xen: Update sched clock offset to avoid system instability in
hibernation

Munehisa Kamata (5):
xen/manage: keep track of the on-going suspend mode
xenbus: add freeze/thaw/restore callbacks support
x86/xen: add system core suspend and resume callbacks
xen-blkfront: add callbacks for PM suspend and hibernation
xen-netfront: add callbacks for PM suspend and hibernation

Thomas Gleixner (1):
genirq: Shutdown irq chips in suspend/resume during hibernation

arch/x86/xen/enlighten_hvm.c | 7 ++
arch/x86/xen/suspend.c | 53 +++++++++++++
arch/x86/xen/time.c | 15 +++-
arch/x86/xen/xen-ops.h | 3 +
drivers/block/xen-blkfront.c | 122 +++++++++++++++++++++++++++++-
drivers/net/xen-netfront.c | 98 +++++++++++++++++++++++-
drivers/xen/events/events_base.c | 1 +
drivers/xen/manage.c | 60 +++++++++++++++
drivers/xen/xenbus/xenbus_probe.c | 96 +++++++++++++++++++----
include/linux/irq.h | 2 +
include/xen/xen-ops.h | 3 +
include/xen/xenbus.h | 3 +
kernel/irq/chip.c | 2 +-
kernel/irq/internals.h | 1 +
kernel/irq/pm.c | 31 +++++---
kernel/power/user.c | 6 +-
16 files changed, 470 insertions(+), 33 deletions(-)

--
2.20.1

2020-07-13 19:49:15

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH v2 00/11] Fix PM hibernation in Xen guests

On 7/10/20 2:17 PM, Agarwal, Anchal wrote:
> Gentle ping on this series.

Have you tested save/restore?

-bois

2020-07-15 19:53:46

by Anchal Agarwal

[permalink] [raw]

Subject: Re: [PATCH v2 00/11] Fix PM hibernation in Xen guests

On Mon, Jul 13, 2020 at 03:43:33PM -0400, Boris Ostrovsky wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On 7/10/20 2:17 PM, Agarwal, Anchal wrote:
> > Gentle ping on this series.
>
>
> Have you tested save/restore?
>
No, not with the last few series. But a good point, I will test that and get
back to you. Do you see anything specific in the series that suggests otherwise?

Thanks,
Anchal
>
> -bois
>
>
>

2020-07-15 20:54:20

by Boris Ostrovsky

[permalink] [raw]

Subject: Re: [PATCH v2 00/11] Fix PM hibernation in Xen guests

On 7/15/20 3:49 PM, Anchal Agarwal wrote:
> On Mon, Jul 13, 2020 at 03:43:33PM -0400, Boris Ostrovsky wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>>
>>
>>
>> On 7/10/20 2:17 PM, Agarwal, Anchal wrote:
>>> Gentle ping on this series.
>>
>> Have you tested save/restore?
>>
> No, not with the last few series. But a good point, I will test that and get
> back to you. Do you see anything specific in the series that suggests otherwise?

root@ovs104> xl save pvh saved
Saving to saved new xl format (info 0x3/0x0/1699)
xc: info: Saving domain 3, type x86 HVM
xc: Frames: 1044480/1044480 100%
xc: End of stream: 0/0 0%
root@ovs104> xl restore saved
Loading new save file saved (new xl fmt info 0x3/0x0/1699)
Savefile contains xl domain config in JSON format
Parsing config from <saved>
xc: info: Found x86 HVM domain from Xen 4.13
xc: info: Restoring domain
xc: info: Restore successful
xc: info: XenStore: mfn 0xfeffc, dom 0, evt 1
xc: info: Console: mfn 0xfefff, dom 0, evt 2
root@ovs104> xl console pvh
[ 139.943872] ------------[ cut here ]------------
[ 139.943872] kernel BUG at arch/x86/xen/enlighten.c:205!
[ 139.943872] invalid opcode: 0000 [#1] SMP PTI
[ 139.943872] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.8.0-rc5 #26
[ 139.943872] RIP: 0010:xen_vcpu_setup+0x16d/0x180
[ 139.943872] Code: 4a 8b 14 f5 40 c9 1b 82 48 89 d8 48 89 2c 02 8b 05
a4 d4 40 01 85 c0 0f 85 15 ff ff ff 4a 8b 04 f5 40 c9 1b 82 e9 f4 fe ff
ff <0f> 0b b8 ed ff ff ff e9 14 ff ff ff e8 12 4f 86 00 66 90 66 66 66
[ 139.943872] RSP: 0018:ffffc9000006bdb0 EFLAGS: 00010046
[ 139.943872] RAX: 0000000000000000 RBX: ffffc9000014fe00 RCX:
0000000000000000
[ 139.943872] RDX: ffff88803fc00000 RSI: 0000000000016128 RDI:
0000000000000000
[ 139.943872] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[ 139.943872] R10: ffffffff826174a0 R11: ffffc9000006bcb4 R12:
0000000000016120
[ 139.943872] R13: 0000000000016120 R14: 0000000000016128 R15:
0000000000000000
[ 139.943872] FS: 0000000000000000(0000) GS:ffff88803fc00000(0000)
knlGS:0000000000000000
[ 139.943872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 139.943872] CR2: 00007f704be8b000 CR3: 000000003901a004 CR4:
00000000000606f0
[ 139.943872] Call Trace:
[ 139.943872] ? __kmalloc+0x167/0x260
[ 139.943872] ? xen_manage_runstate_time+0x14a/0x170
[ 139.943872] xen_vcpu_restore+0x134/0x170
[ 139.943872] xen_hvm_post_suspend+0x1d/0x30
[ 139.943872] xen_arch_post_suspend+0x13/0x30
[ 139.943872] xen_suspend+0x87/0x190
[ 139.943872] multi_cpu_stop+0x6d/0x110
[ 139.943872] ? stop_machine_yield+0x10/0x10
[ 139.943872] cpu_stopper_thread+0x47/0x100
[ 139.943872] smpboot_thread_fn+0xc5/0x160
[ 139.943872] ? sort_range+0x20/0x20
[ 139.943872] kthread+0xfe/0x140
[ 139.943872] ? kthread_park+0x90/0x90
[ 139.943872] ret_from_fork+0x22/0x30
[ 139.943872] Modules linked in:
[ 139.943872] ---[ end trace 74716859a6b4f0a8 ]---
[ 139.943872] RIP: 0010:xen_vcpu_setup+0x16d/0x180
[ 139.943872] Code: 4a 8b 14 f5 40 c9 1b 82 48 89 d8 48 89 2c 02 8b 05
a4 d4 40 01 85 c0 0f 85 15 ff ff ff 4a 8b 04 f5 40 c9 1b 82 e9 f4 fe ff
ff <0f> 0b b8 ed ff ff ff e9 14 ff ff ff e8 12 4f 86 00 66 90 66 66 66
[ 139.943872] RSP: 0018:ffffc9000006bdb0 EFLAGS: 00010046
[ 139.943872] RAX: 0000000000000000 RBX: ffffc9000014fe00 RCX:
0000000000000000
[ 139.943872] RDX: ffff88803fc00000 RSI: 0000000000016128 RDI:
0000000000000000
[ 139.943872] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[ 139.943872] R10: ffffffff826174a0 R11: ffffc9000006bcb4 R12:
0000000000016120
[ 139.943872] R13: 0000000000016120 R14: 0000000000016128 R15:
0000000000000000
[ 139.943872] FS: 0000000000000000(0000) GS:ffff88803fc00000(0000)
knlGS:0000000000000000
[ 139.943872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 139.943872] CR2: 00007f704be8b000 CR3: 000000003901a004 CR4:
00000000000606f0
[ 139.943872] Kernel panic - not syncing: Fatal exception
[ 139.943872] Shutting down cpus with NMI
[ 143.927559] Kernel Offset: disabled
root@ovs104>

2020-07-16 23:29:27

by Anchal Agarwal

[permalink] [raw]

Subject: Re: [PATCH v2 00/11] Fix PM hibernation in Xen guests

On Wed, Jul 15, 2020 at 04:49:57PM -0400, Boris Ostrovsky wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
>
>
>
> On 7/15/20 3:49 PM, Anchal Agarwal wrote:
> > On Mon, Jul 13, 2020 at 03:43:33PM -0400, Boris Ostrovsky wrote:
> >> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
> >>
> >>
> >>
> >> On 7/10/20 2:17 PM, Agarwal, Anchal wrote:
> >>> Gentle ping on this series.
> >>
> >> Have you tested save/restore?
> >>
> > No, not with the last few series. But a good point, I will test that and get
> > back to you. Do you see anything specific in the series that suggests otherwise?
>
>
> root@ovs104> xl save pvh saved
> Saving to saved new xl format (info 0x3/0x0/1699)
> xc: info: Saving domain 3, type x86 HVM
> xc: Frames: 1044480/1044480 100%
> xc: End of stream: 0/0 0%
> root@ovs104> xl restore saved
> Loading new save file saved (new xl fmt info 0x3/0x0/1699)
> Savefile contains xl domain config in JSON format
> Parsing config from <saved>
> xc: info: Found x86 HVM domain from Xen 4.13
> xc: info: Restoring domain
> xc: info: Restore successful
> xc: info: XenStore: mfn 0xfeffc, dom 0, evt 1
> xc: info: Console: mfn 0xfefff, dom 0, evt 2
> root@ovs104> xl console pvh
> [? 139.943872] ------------[ cut here ]------------
> [? 139.943872] kernel BUG at arch/x86/xen/enlighten.c:205!
> [? 139.943872] invalid opcode: 0000 [#1] SMP PTI
> [? 139.943872] CPU: 0 PID: 11 Comm: migration/0 Not tainted 5.8.0-rc5 #26
> [? 139.943872] RIP: 0010:xen_vcpu_setup+0x16d/0x180
> [? 139.943872] Code: 4a 8b 14 f5 40 c9 1b 82 48 89 d8 48 89 2c 02 8b 05
> a4 d4 40 01 85 c0 0f 85 15 ff ff ff 4a 8b 04 f5 40 c9 1b 82 e9 f4 fe ff
> ff <0f> 0b b8 ed ff ff ff e9 14 ff ff ff e8 12 4f 86 00 66 90 66 66 66
> [? 139.943872] RSP: 0018:ffffc9000006bdb0 EFLAGS: 00010046
> [? 139.943872] RAX: 0000000000000000 RBX: ffffc9000014fe00 RCX:
> 0000000000000000
> [? 139.943872] RDX: ffff88803fc00000 RSI: 0000000000016128 RDI:
> 0000000000000000
> [? 139.943872] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> [? 139.943872] R10: ffffffff826174a0 R11: ffffc9000006bcb4 R12:
> 0000000000016120
> [? 139.943872] R13: 0000000000016120 R14: 0000000000016128 R15:
> 0000000000000000
> [? 139.943872] FS: 0000000000000000(0000) GS:ffff88803fc00000(0000)
> knlGS:0000000000000000
> [? 139.943872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [? 139.943872] CR2: 00007f704be8b000 CR3: 000000003901a004 CR4:
> 00000000000606f0
> [? 139.943872] Call Trace:
> [? 139.943872]? ? __kmalloc+0x167/0x260
> [? 139.943872]? ? xen_manage_runstate_time+0x14a/0x170
> [? 139.943872]? xen_vcpu_restore+0x134/0x170
> [? 139.943872]? xen_hvm_post_suspend+0x1d/0x30
> [? 139.943872]? xen_arch_post_suspend+0x13/0x30
> [? 139.943872]? xen_suspend+0x87/0x190
> [? 139.943872]? multi_cpu_stop+0x6d/0x110
> [? 139.943872]? ? stop_machine_yield+0x10/0x10
> [? 139.943872]? cpu_stopper_thread+0x47/0x100
> [? 139.943872]? smpboot_thread_fn+0xc5/0x160
> [? 139.943872]? ? sort_range+0x20/0x20
> [? 139.943872]? kthread+0xfe/0x140
> [? 139.943872]? ? kthread_park+0x90/0x90
> [? 139.943872]? ret_from_fork+0x22/0x30
> [? 139.943872] Modules linked in:
> [? 139.943872] ---[ end trace 74716859a6b4f0a8 ]---
> [? 139.943872] RIP: 0010:xen_vcpu_setup+0x16d/0x180
> [? 139.943872] Code: 4a 8b 14 f5 40 c9 1b 82 48 89 d8 48 89 2c 02 8b 05
> a4 d4 40 01 85 c0 0f 85 15 ff ff ff 4a 8b 04 f5 40 c9 1b 82 e9 f4 fe ff
> ff <0f> 0b b8 ed ff ff ff e9 14 ff ff ff e8 12 4f 86 00 66 90 66 66 66
> [? 139.943872] RSP: 0018:ffffc9000006bdb0 EFLAGS: 00010046
> [? 139.943872] RAX: 0000000000000000 RBX: ffffc9000014fe00 RCX:
> 0000000000000000
> [? 139.943872] RDX: ffff88803fc00000 RSI: 0000000000016128 RDI:
> 0000000000000000
> [? 139.943872] RBP: 0000000000000000 R08: 0000000000000000 R09:
> 0000000000000000
> [? 139.943872] R10: ffffffff826174a0 R11: ffffc9000006bcb4 R12:
> 0000000000016120
> [? 139.943872] R13: 0000000000016120 R14: 0000000000016128 R15:
> 0000000000000000
> [? 139.943872] FS: 0000000000000000(0000) GS:ffff88803fc00000(0000)
> knlGS:0000000000000000
> [? 139.943872] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [? 139.943872] CR2: 00007f704be8b000 CR3: 000000003901a004 CR4:
> 00000000000606f0
> [? 139.943872] Kernel panic - not syncing: Fatal exception
> [? 139.943872] Shutting down cpus with NMI
> [? 143.927559] Kernel Offset: disabled
> root@ovs104>
>
I think I may have found a bug. There were no issues with V1 version that I
send however, there were issues with V2. I tested both series and found xl
save/restore to be working in V1 but not in V2. I should have tested it.
Anyways, looks the issue is coming from executing syscore ops registered for
hibernation use case during call to xen_suspend.
I remember your comment from earlier where you did ask why we need to
check xen_suspend mode xen_syscore_suspend [patch-004] and I removed that based
on my theoretical understanding of your suggestion that since lock_system_sleep() lock
is taken, we cannot initialize hibernation. I skipped to check the part in the
code where during xen_suspend(), all registered syscore_ops suspend callbacks are
called. Hence the ones registered for PM hibernation will also be called.
With no check there on suspend mode, it fails to return from the function and
they never should be executed in case of xen suspend.
I will revert a part of that check in Patch-004 from V1 and send an updated patch with
the fix.

Thanks,
Anchal