2023-02-21 02:36:11

by Sia Jee Heng

[permalink] [raw]
Subject: [PATCH v4 0/4] RISC-V Hibernation Support

This series adds RISC-V Hibernation/suspend to disk support.
Low level Arch functions were created to support hibernation.
swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
cpu state onto the stack, then calling swsusp_save() to save the memory
image.

Arch specific hibernation header is implemented and is utilized by the
arch_hibernation_header_restore() and arch_hibernation_header_save()
functions. The arch specific hibernation header consists of satp, hartid,
and the cpu_resume address. The kernel built version is also need to be
saved into the hibernation image header to making sure only the same
kernel is restore when resume.

swsusp_arch_resume() creates a temporary page table that covering only
the linear map. It copies the restore code to a 'safe' page, then start to
restore the memory image. Once completed, it restores the original
kernel's page table. It then calls into __hibernate_cpu_resume()
to restore the CPU context. Finally, it follows the normal hibernation
path back to the hibernation core.

To enable hibernation/suspend to disk into RISCV, the below config
need to be enabled:
- CONFIG_ARCH_HIBERNATION_HEADER
- CONFIG_ARCH_HIBERNATION_POSSIBLE

At high-level, this series includes the following changes:
1) Change suspend_save_csrs() and suspend_restore_csrs()
to public function as these functions are common to
suspend/hibernation. (patch 1)
2) Refactor the common code in the __cpu_resume_enter() function and
__hibernate_cpu_resume() function. The common code are used by
hibernation and suspend. (patch 2)
3) Enhance kernel_page_present() function to support huge page. (patch 3)
4) Add arch/riscv low level functions to support
hibernation/suspend to disk. (patch 4)

The above patches are based on kernel v6.2 and are tested on
StarFive VF2 SBC board and Qemu.
ACPI platform mode is not supported in this series.

Changes since v3:
- Rebased to kernel v6.2
- Temporary page table code refactoring by reference to ARM64
- Resolved typo(s) and grammars
- Resolved documentation errors
- Resolved clang build issue
- Removed unnecessary comments
- Used kzalloc instead of kcalloc

Changes since v2:
- Rebased to kernel v6.2-rc5
- Refactor the common code used by hibernation and suspend
- Create copy_page macro
- Solved other comments from Andrew and Conor

Changes since v1:
- Rebased to kernel v6.2-rc3
- Fixed bot's compilation error

Sia Jee Heng (4):
RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public
function
RISC-V: Factor out common code of __cpu_resume_enter()
RISC-V: mm: Enable huge page support to kernel_page_present() function
RISC-V: Add arch functions to support hibernation/suspend-to-disk

arch/riscv/Kconfig | 7 +
arch/riscv/include/asm/assembler.h | 82 ++++++
arch/riscv/include/asm/suspend.h | 22 ++
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/asm-offsets.c | 5 +
arch/riscv/kernel/hibernate-asm.S | 77 +++++
arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
arch/riscv/kernel/suspend.c | 4 +-
arch/riscv/kernel/suspend_entry.S | 34 +--
arch/riscv/mm/pageattr.c | 8 +
10 files changed, 654 insertions(+), 33 deletions(-)
create mode 100644 arch/riscv/include/asm/assembler.h
create mode 100644 arch/riscv/kernel/hibernate-asm.S
create mode 100644 arch/riscv/kernel/hibernate.c


base-commit: db77b8502a4071a59c9424d95f87fe20bdb52c3a
--
2.34.1



2023-02-21 02:36:15

by Sia Jee Heng

[permalink] [raw]
Subject: [PATCH v4 1/4] RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function

Currently suspend_save_csrs() and suspend_restore_csrs() functions are
statically defined in the suspend.c. Change the function's attribute
to public so that the functions can be used by hibernation as well.

Signed-off-by: Sia Jee Heng <[email protected]>
Reviewed-by: Ley Foon Tan <[email protected]>
Reviewed-by: Mason Huo <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
---
arch/riscv/include/asm/suspend.h | 3 +++
arch/riscv/kernel/suspend.c | 4 ++--
2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
index 8be391c2aecb..75419c5ca272 100644
--- a/arch/riscv/include/asm/suspend.h
+++ b/arch/riscv/include/asm/suspend.h
@@ -33,4 +33,7 @@ int cpu_suspend(unsigned long arg,
/* Low-level CPU resume entry function */
int __cpu_resume_enter(unsigned long hartid, unsigned long context);

+/* Used to save and restore the csr */
+void suspend_save_csrs(struct suspend_context *context);
+void suspend_restore_csrs(struct suspend_context *context);
#endif
diff --git a/arch/riscv/kernel/suspend.c b/arch/riscv/kernel/suspend.c
index 9ba24fb8cc93..3c89b8ec69c4 100644
--- a/arch/riscv/kernel/suspend.c
+++ b/arch/riscv/kernel/suspend.c
@@ -8,7 +8,7 @@
#include <asm/csr.h>
#include <asm/suspend.h>

-static void suspend_save_csrs(struct suspend_context *context)
+void suspend_save_csrs(struct suspend_context *context)
{
context->scratch = csr_read(CSR_SCRATCH);
context->tvec = csr_read(CSR_TVEC);
@@ -29,7 +29,7 @@ static void suspend_save_csrs(struct suspend_context *context)
#endif
}

-static void suspend_restore_csrs(struct suspend_context *context)
+void suspend_restore_csrs(struct suspend_context *context)
{
csr_write(CSR_SCRATCH, context->scratch);
csr_write(CSR_TVEC, context->tvec);
--
2.34.1


2023-02-21 02:36:18

by Sia Jee Heng

[permalink] [raw]
Subject: [PATCH v4 2/4] RISC-V: Factor out common code of __cpu_resume_enter()

The cpu_resume() function is very similar for the suspend to disk and
suspend to ram cases. Factor out the common code into restore_csr macro
and restore_reg macro.

Signed-off-by: Sia Jee Heng <[email protected]>
---
arch/riscv/include/asm/assembler.h | 62 ++++++++++++++++++++++++++++++
arch/riscv/kernel/suspend_entry.S | 34 ++--------------
2 files changed, 65 insertions(+), 31 deletions(-)
create mode 100644 arch/riscv/include/asm/assembler.h

diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
new file mode 100644
index 000000000000..727a97735493
--- /dev/null
+++ b/arch/riscv/include/asm/assembler.h
@@ -0,0 +1,62 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2023 StarFive Technology Co., Ltd.
+ *
+ * Author: Jee Heng Sia <[email protected]>
+ */
+
+#ifndef __ASSEMBLY__
+#error "Only include this from assembly code"
+#endif
+
+#ifndef __ASM_ASSEMBLER_H
+#define __ASM_ASSEMBLER_H
+
+#include <asm/asm.h>
+#include <asm/asm-offsets.h>
+#include <asm/csr.h>
+
+/*
+ * restore_csr - restore hart's CSR value
+ */
+ .macro restore_csr
+ REG_L t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
+ csrw CSR_EPC, t0
+ REG_L t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
+ csrw CSR_STATUS, t0
+ REG_L t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
+ csrw CSR_TVAL, t0
+ REG_L t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
+ csrw CSR_CAUSE, t0
+ .endm
+
+/*
+ * restore_reg - Restore registers (except A0 and T0-T6)
+ */
+ .macro restore_reg
+ REG_L ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
+ REG_L sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
+ REG_L gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
+ REG_L tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
+ REG_L s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
+ REG_L s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
+ REG_L a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
+ REG_L a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
+ REG_L a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
+ REG_L a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
+ REG_L a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
+ REG_L a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
+ REG_L a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
+ REG_L s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
+ REG_L s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
+ REG_L s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
+ REG_L s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
+ REG_L s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
+ REG_L s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
+ REG_L s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
+ REG_L s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
+ REG_L s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
+ REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
+ .endm
+
+#endif /* __ASM_ASSEMBLER_H */
diff --git a/arch/riscv/kernel/suspend_entry.S b/arch/riscv/kernel/suspend_entry.S
index aafcca58c19d..74a8fab8e0f6 100644
--- a/arch/riscv/kernel/suspend_entry.S
+++ b/arch/riscv/kernel/suspend_entry.S
@@ -7,6 +7,7 @@
#include <linux/linkage.h>
#include <asm/asm.h>
#include <asm/asm-offsets.h>
+#include <asm/assembler.h>
#include <asm/csr.h>
#include <asm/xip_fixup.h>

@@ -83,39 +84,10 @@ ENTRY(__cpu_resume_enter)
add a0, a1, zero

/* Restore CSRs */
- REG_L t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
- csrw CSR_EPC, t0
- REG_L t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
- csrw CSR_STATUS, t0
- REG_L t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
- csrw CSR_TVAL, t0
- REG_L t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
- csrw CSR_CAUSE, t0
+ restore_csr

/* Restore registers (except A0 and T0-T6) */
- REG_L ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
- REG_L sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
- REG_L gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
- REG_L tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
- REG_L s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
- REG_L s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
- REG_L a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
- REG_L a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
- REG_L a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
- REG_L a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
- REG_L a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
- REG_L a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
- REG_L a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
- REG_L s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
- REG_L s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
- REG_L s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
- REG_L s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
- REG_L s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
- REG_L s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
- REG_L s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
- REG_L s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
- REG_L s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
- REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
+ restore_reg

/* Return zero value */
add a0, zero, zero
--
2.34.1


2023-02-21 02:36:20

by Sia Jee Heng

[permalink] [raw]
Subject: [PATCH v4 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function

Currently kernel_page_present() function doesn't support huge page
detection causes the function to mistakenly return false to the
hibernation core.

Add huge page detection to the function to solve the problem.

Fixes tag: commit 9e953cda5cdf ("riscv:
Introduce huge page support for 32/64bit kernel")

Signed-off-by: Sia Jee Heng <[email protected]>
Reviewed-by: Ley Foon Tan <[email protected]>
Reviewed-by: Mason Huo <[email protected]>
---
arch/riscv/mm/pageattr.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index 86c56616e5de..ea3d61de065b 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -217,18 +217,26 @@ bool kernel_page_present(struct page *page)
pgd = pgd_offset_k(addr);
if (!pgd_present(*pgd))
return false;
+ if (pgd_leaf(*pgd))
+ return true;

p4d = p4d_offset(pgd, addr);
if (!p4d_present(*p4d))
return false;
+ if (p4d_leaf(*p4d))
+ return true;

pud = pud_offset(p4d, addr);
if (!pud_present(*pud))
return false;
+ if (pud_leaf(*pud))
+ return true;

pmd = pmd_offset(pud, addr);
if (!pmd_present(*pmd))
return false;
+ if (pmd_leaf(*pmd))
+ return true;

pte = pte_offset_kernel(pmd, addr);
return pte_present(*pte);
--
2.34.1


2023-02-21 02:36:22

by Sia Jee Heng

[permalink] [raw]
Subject: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

Low level Arch functions were created to support hibernation.
swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
cpu state onto the stack, then calling swsusp_save() to save the memory
image.

Arch specific hibernation header is implemented and is utilized by the
arch_hibernation_header_restore() and arch_hibernation_header_save()
functions. The arch specific hibernation header consists of satp, hartid,
and the cpu_resume address. The kernel built version is also need to be
saved into the hibernation image header to making sure only the same
kernel is restore when resume.

swsusp_arch_resume() creates a temporary page table that covering only
the linear map. It copies the restore code to a 'safe' page, then start
to restore the memory image. Once completed, it restores the original
kernel's page table. It then calls into __hibernate_cpu_resume()
to restore the CPU context. Finally, it follows the normal hibernation
path back to the hibernation core.

To enable hibernation/suspend to disk into RISCV, the below config
need to be enabled:
- CONFIG_ARCH_HIBERNATION_HEADER
- CONFIG_ARCH_HIBERNATION_POSSIBLE

Signed-off-by: Sia Jee Heng <[email protected]>
Reviewed-by: Ley Foon Tan <[email protected]>
Reviewed-by: Mason Huo <[email protected]>
---
arch/riscv/Kconfig | 7 +
arch/riscv/include/asm/assembler.h | 20 ++
arch/riscv/include/asm/suspend.h | 19 ++
arch/riscv/kernel/Makefile | 1 +
arch/riscv/kernel/asm-offsets.c | 5 +
arch/riscv/kernel/hibernate-asm.S | 77 +++++
arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
7 files changed, 576 insertions(+)
create mode 100644 arch/riscv/kernel/hibernate-asm.S
create mode 100644 arch/riscv/kernel/hibernate.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index e2b656043abf..4555848a817f 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -690,6 +690,13 @@ menu "Power management options"

source "kernel/power/Kconfig"

+config ARCH_HIBERNATION_POSSIBLE
+ def_bool y
+
+config ARCH_HIBERNATION_HEADER
+ def_bool y
+ depends on HIBERNATION
+
endmenu # "Power management options"

menu "CPU Power Management"
diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
index 727a97735493..68c46c0e0ea8 100644
--- a/arch/riscv/include/asm/assembler.h
+++ b/arch/riscv/include/asm/assembler.h
@@ -59,4 +59,24 @@
REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
.endm

+/*
+ * copy_page - copy 1 page (4KB) of data from source to destination
+ * @a0 - destination
+ * @a1 - source
+ */
+ .macro copy_page a0, a1
+ lui a2, 0x1
+ add a2, a2, a0
+1 :
+ REG_L t0, 0(a1)
+ REG_L t1, SZREG(a1)
+
+ REG_S t0, 0(a0)
+ REG_S t1, SZREG(a0)
+
+ addi a0, a0, 2 * SZREG
+ addi a1, a1, 2 * SZREG
+ bne a2, a0, 1b
+ .endm
+
#endif /* __ASM_ASSEMBLER_H */
diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
index 75419c5ca272..3362da56a9d8 100644
--- a/arch/riscv/include/asm/suspend.h
+++ b/arch/riscv/include/asm/suspend.h
@@ -21,6 +21,11 @@ struct suspend_context {
#endif
};

+/*
+ * Used by hibernation core and cleared during resume sequence
+ */
+extern int in_suspend;
+
/* Low-level CPU suspend entry function */
int __cpu_suspend_enter(struct suspend_context *context);

@@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
/* Used to save and restore the csr */
void suspend_save_csrs(struct suspend_context *context);
void suspend_restore_csrs(struct suspend_context *context);
+
+/* Low-level API to support hibernation */
+int swsusp_arch_suspend(void);
+int swsusp_arch_resume(void);
+int arch_hibernation_header_save(void *addr, unsigned int max_size);
+int arch_hibernation_header_restore(void *addr);
+int __hibernate_cpu_resume(void);
+
+/* Used to resume on the CPU we hibernated on */
+int hibernate_resume_nonboot_cpu_disable(void);
+
+asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
+ unsigned long cpu_resume);
+asmlinkage int hibernate_core_restore_code(void);
#endif
diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
index 4cf303a779ab..daab341d55e4 100644
--- a/arch/riscv/kernel/Makefile
+++ b/arch/riscv/kernel/Makefile
@@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o

obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
+obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o

obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
index df9444397908..d6a75aac1d27 100644
--- a/arch/riscv/kernel/asm-offsets.c
+++ b/arch/riscv/kernel/asm-offsets.c
@@ -9,6 +9,7 @@
#include <linux/kbuild.h>
#include <linux/mm.h>
#include <linux/sched.h>
+#include <linux/suspend.h>
#include <asm/kvm_host.h>
#include <asm/thread_info.h>
#include <asm/ptrace.h>
@@ -116,6 +117,10 @@ void asm_offsets(void)

OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);

+ OFFSET(HIBERN_PBE_ADDR, pbe, address);
+ OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
+ OFFSET(HIBERN_PBE_NEXT, pbe, next);
+
OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
new file mode 100644
index 000000000000..846affe4dced
--- /dev/null
+++ b/arch/riscv/kernel/hibernate-asm.S
@@ -0,0 +1,77 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Hibernation low level support for RISCV.
+ *
+ * Copyright (C) 2023 StarFive Technology Co., Ltd.
+ *
+ * Author: Jee Heng Sia <[email protected]>
+ */
+
+#include <asm/asm.h>
+#include <asm/asm-offsets.h>
+#include <asm/assembler.h>
+#include <asm/csr.h>
+
+#include <linux/linkage.h>
+
+/*
+ * int __hibernate_cpu_resume(void)
+ * Switch back to the hibernated image's page table prior to restoring the CPU
+ * context.
+ *
+ * Always returns 0
+ */
+ENTRY(__hibernate_cpu_resume)
+ /* switch to hibernated image's page table. */
+ csrw CSR_SATP, s0
+ sfence.vma
+
+ REG_L a0, hibernate_cpu_context
+
+ restore_csr
+ restore_reg
+
+ /* Return zero value. */
+ add a0, zero, zero
+
+ ret
+END(__hibernate_cpu_resume)
+
+/*
+ * Prepare to restore the image.
+ * a0: satp of saved page tables.
+ * a1: satp of temporary page tables.
+ * a2: cpu_resume.
+ */
+ENTRY(hibernate_restore_image)
+ mv s0, a0
+ mv s1, a1
+ mv s2, a2
+ REG_L s4, restore_pblist
+ REG_L a1, relocated_restore_code
+
+ jalr a1
+END(hibernate_restore_image)
+
+/*
+ * The below code will be executed from a 'safe' page.
+ * It first switches to the temporary page table, then starts to copy the pages
+ * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
+ * to restore the CPU context.
+ */
+ENTRY(hibernate_core_restore_code)
+ /* switch to temp page table. */
+ csrw satp, s1
+ sfence.vma
+.Lcopy:
+ /* The below code will restore the hibernated image. */
+ REG_L a1, HIBERN_PBE_ADDR(s4)
+ REG_L a0, HIBERN_PBE_ORIG(s4)
+
+ copy_page a0, a1
+
+ REG_L s4, HIBERN_PBE_NEXT(s4)
+ bnez s4, .Lcopy
+
+ jalr s2
+END(hibernate_core_restore_code)
diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
new file mode 100644
index 000000000000..46a2f470db6e
--- /dev/null
+++ b/arch/riscv/kernel/hibernate.c
@@ -0,0 +1,447 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Hibernation support for RISCV
+ *
+ * Copyright (C) 2023 StarFive Technology Co., Ltd.
+ *
+ * Author: Jee Heng Sia <[email protected]>
+ */
+
+#include <asm/barrier.h>
+#include <asm/cacheflush.h>
+#include <asm/mmu_context.h>
+#include <asm/page.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+#include <asm/sections.h>
+#include <asm/set_memory.h>
+#include <asm/smp.h>
+#include <asm/suspend.h>
+
+#include <linux/cpu.h>
+#include <linux/memblock.h>
+#include <linux/pm.h>
+#include <linux/sched.h>
+#include <linux/suspend.h>
+#include <linux/utsname.h>
+
+/* The logical cpu number we should resume on, initialised to a non-cpu number. */
+static int sleep_cpu = -EINVAL;
+
+/* Pointer to the temporary resume page table. */
+static pgd_t *resume_pg_dir;
+
+/* CPU context to be saved. */
+struct suspend_context *hibernate_cpu_context;
+EXPORT_SYMBOL_GPL(hibernate_cpu_context);
+
+unsigned long relocated_restore_code;
+EXPORT_SYMBOL_GPL(relocated_restore_code);
+
+/**
+ * struct arch_hibernate_hdr_invariants - container to store kernel build version.
+ * @uts_version: to save the build number and date so that the we do not resume with
+ * a different kernel.
+ */
+struct arch_hibernate_hdr_invariants {
+ char uts_version[__NEW_UTS_LEN + 1];
+};
+
+/**
+ * struct arch_hibernate_hdr - helper parameters that help us to restore the image.
+ * @invariants: container to store kernel build version.
+ * @hartid: to make sure same boot_cpu executes the hibernate/restore code.
+ * @saved_satp: original page table used by the hibernated image.
+ * @restore_cpu_addr: the kernel's image address to restore the CPU context.
+ */
+static struct arch_hibernate_hdr {
+ struct arch_hibernate_hdr_invariants invariants;
+ unsigned long hartid;
+ unsigned long saved_satp;
+ unsigned long restore_cpu_addr;
+} resume_hdr;
+
+static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
+{
+ memset(i, 0, sizeof(*i));
+ memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
+}
+
+/*
+ * Check if the given pfn is in the 'nosave' section.
+ */
+int pfn_is_nosave(unsigned long pfn)
+{
+ unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
+ unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
+
+ return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
+}
+
+void notrace save_processor_state(void)
+{
+ WARN_ON(num_online_cpus() != 1);
+}
+
+void notrace restore_processor_state(void)
+{
+}
+
+/*
+ * Helper parameters need to be saved to the hibernation image header.
+ */
+int arch_hibernation_header_save(void *addr, unsigned int max_size)
+{
+ struct arch_hibernate_hdr *hdr = addr;
+
+ if (max_size < sizeof(*hdr))
+ return -EOVERFLOW;
+
+ arch_hdr_invariants(&hdr->invariants);
+
+ hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
+ hdr->saved_satp = csr_read(CSR_SATP);
+ hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(arch_hibernation_header_save);
+
+/*
+ * Retrieve the helper parameters from the hibernation image header.
+ */
+int arch_hibernation_header_restore(void *addr)
+{
+ struct arch_hibernate_hdr_invariants invariants;
+ struct arch_hibernate_hdr *hdr = addr;
+ int ret = 0;
+
+ arch_hdr_invariants(&invariants);
+
+ if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
+ pr_crit("Hibernate image not generated by this kernel!\n");
+ return -EINVAL;
+ }
+
+ sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
+ if (sleep_cpu < 0) {
+ pr_crit("Hibernated on a CPU not known to this kernel!\n");
+ sleep_cpu = -EINVAL;
+ return -EINVAL;
+ }
+
+#ifdef CONFIG_SMP
+ ret = bringup_hibernate_cpu(sleep_cpu);
+ if (ret) {
+ sleep_cpu = -EINVAL;
+ return ret;
+ }
+#endif
+ resume_hdr = *hdr;
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(arch_hibernation_header_restore);
+
+int swsusp_arch_suspend(void)
+{
+ int ret = 0;
+
+ if (__cpu_suspend_enter(hibernate_cpu_context)) {
+ sleep_cpu = smp_processor_id();
+ suspend_save_csrs(hibernate_cpu_context);
+ ret = swsusp_save();
+ } else {
+ suspend_restore_csrs(hibernate_cpu_context);
+ flush_tlb_all();
+ flush_icache_all();
+
+ /*
+ * Tell the hibernation core that we've just restored the memory.
+ */
+ in_suspend = 0;
+ sleep_cpu = -EINVAL;
+ }
+
+ return ret;
+}
+
+static unsigned long _temp_pgtable_map_pte(pte_t *dst_ptep, pte_t *src_ptep,
+ unsigned long addr, pgprot_t prot)
+{
+ pte_t pte = READ_ONCE(*src_ptep);
+
+ if (pte_present(pte))
+ set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot)));
+
+ return 0;
+}
+
+static unsigned long temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp,
+ unsigned long start, unsigned long end,
+ pgprot_t prot)
+{
+ unsigned long addr = start;
+ pte_t *src_ptep;
+ pte_t *dst_ptep;
+
+ if (pmd_none(READ_ONCE(*dst_pmdp))) {
+ dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
+ if (!dst_ptep)
+ return -ENOMEM;
+
+ pmd_populate_kernel(NULL, dst_pmdp, dst_ptep);
+ }
+
+ dst_ptep = pte_offset_kernel(dst_pmdp, start);
+ src_ptep = pte_offset_kernel(src_pmdp, start);
+
+ do {
+ _temp_pgtable_map_pte(dst_ptep, src_ptep, addr, prot);
+ } while (dst_ptep++, src_ptep++, addr += PAGE_SIZE, addr < end);
+
+ return 0;
+}
+
+static unsigned long temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp,
+ unsigned long start, unsigned long end,
+ pgprot_t prot)
+{
+ unsigned long addr = start;
+ unsigned long next;
+ unsigned long ret;
+ pmd_t *src_pmdp;
+ pmd_t *dst_pmdp;
+
+ if (pud_none(READ_ONCE(*dst_pudp))) {
+ dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
+ if (!dst_pmdp)
+ return -ENOMEM;
+
+ pud_populate(NULL, dst_pudp, dst_pmdp);
+ }
+
+ dst_pmdp = pmd_offset(dst_pudp, start);
+ src_pmdp = pmd_offset(src_pudp, start);
+
+ do {
+ pmd_t pmd = READ_ONCE(*src_pmdp);
+
+ next = pmd_addr_end(addr, end);
+
+ if (pmd_none(pmd))
+ continue;
+
+ if (pmd_leaf(pmd)) {
+ set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot)));
+ } else {
+ ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, addr, next, prot);
+ if (ret)
+ return -ENOMEM;
+ }
+ } while (dst_pmdp++, src_pmdp++, addr = next, addr != end);
+
+ return 0;
+}
+
+static unsigned long temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp,
+ unsigned long start,
+ unsigned long end, pgprot_t prot)
+{
+ unsigned long addr = start;
+ unsigned long next;
+ unsigned long ret;
+ pud_t *dst_pudp;
+ pud_t *src_pudp;
+
+ if (p4d_none(READ_ONCE(*dst_p4dp))) {
+ dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
+ if (!dst_pudp)
+ return -ENOMEM;
+
+ p4d_populate(NULL, dst_p4dp, dst_pudp);
+ }
+
+ dst_pudp = pud_offset(dst_p4dp, start);
+ src_pudp = pud_offset(src_p4dp, start);
+
+ do {
+ pud_t pud = READ_ONCE(*src_pudp);
+
+ next = pud_addr_end(addr, end);
+
+ if (pud_none(pud))
+ continue;
+
+ if (pud_leaf(pud)) {
+ set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot)));
+ } else {
+ ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, addr, next, prot);
+ if (ret)
+ return -ENOMEM;
+ }
+ } while (dst_pudp++, src_pudp++, addr = next, addr != end);
+
+ return 0;
+}
+
+static unsigned long temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp,
+ unsigned long start, unsigned long end,
+ pgprot_t prot)
+{
+ unsigned long addr = start;
+ unsigned long next;
+ unsigned long ret;
+ p4d_t *dst_p4dp;
+ p4d_t *src_p4dp;
+
+ if (pgd_none(READ_ONCE(*dst_pgdp))) {
+ dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC);
+ if (!dst_p4dp)
+ return -ENOMEM;
+
+ pgd_populate(NULL, dst_pgdp, dst_p4dp);
+ }
+
+ dst_p4dp = p4d_offset(dst_pgdp, start);
+ src_p4dp = p4d_offset(src_pgdp, start);
+
+ do {
+ p4d_t p4d = READ_ONCE(*src_p4dp);
+
+ next = p4d_addr_end(addr, end);
+
+ if (p4d_none(READ_ONCE(*src_p4dp)))
+ continue;
+
+ if (p4d_leaf(p4d)) {
+ set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot)));
+ } else {
+ ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, addr, next, prot);
+ if (ret)
+ return -ENOMEM;
+ }
+ } while (dst_p4dp++, src_p4dp++, addr = next, addr != end);
+
+ return 0;
+}
+
+static unsigned long temp_pgtable_mapping(pgd_t *pgdp)
+{
+ unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn);
+ unsigned long addr = PAGE_OFFSET;
+ pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
+ pgd_t *src_pgdp = pgd_offset_k(addr);
+ unsigned long next;
+
+ do {
+ next = pgd_addr_end(addr, end);
+ if (pgd_none(READ_ONCE(*src_pgdp)))
+ continue;
+
+ if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, next, PAGE_KERNEL))
+ return -ENOMEM;
+ } while (dst_pgdp++, src_pgdp++, addr = next, addr != end);
+
+ return 0;
+}
+
+static unsigned long temp_pgtable_text_mapping(pgd_t *pgdp, unsigned long addr)
+{
+ pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
+ pgd_t *src_pgdp = pgd_offset_k(addr);
+
+ if (pgd_none(READ_ONCE(*src_pgdp)))
+ return -EFAULT;
+
+ if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, addr, PAGE_KERNEL_EXEC))
+ return -ENOMEM;
+
+ return 0;
+}
+
+static unsigned long relocate_restore_code(void)
+{
+ unsigned long ret;
+ void *page = (void *)get_safe_page(GFP_ATOMIC);
+
+ if (!page)
+ return -ENOMEM;
+
+ copy_page(page, hibernate_core_restore_code);
+
+ /* Make the page containing the relocated code executable. */
+ set_memory_x((unsigned long)page, 1);
+
+ ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)page);
+ if (ret)
+ return ret;
+
+ return (unsigned long)page;
+}
+
+int swsusp_arch_resume(void)
+{
+ unsigned long ret;
+
+ /*
+ * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
+ * we don't need to free it here.
+ */
+ resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
+ if (!resume_pg_dir)
+ return -ENOMEM;
+
+ /*
+ * The pages need to be writable when restoring the image.
+ * Create a second copy of page table just for the linear map.
+ * Use this temporary page table to restore the image.
+ */
+ ret = temp_pgtable_mapping(resume_pg_dir);
+ if (ret)
+ return (int)ret;
+
+ /* Move the restore code to a new page so that it doesn't get overwritten by itself. */
+ relocated_restore_code = relocate_restore_code();
+ if (relocated_restore_code == -ENOMEM)
+ return -ENOMEM;
+
+ /*
+ * Map the __hibernate_cpu_resume() address to the temporary page table so that the
+ * restore code can jumps to it after finished restore the image. The next execution
+ * code doesn't find itself in a different address space after switching over to the
+ * original page table used by the hibernated image.
+ */
+ ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr);
+ if (ret)
+ return ret;
+
+ hibernate_restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
+ resume_hdr.restore_cpu_addr);
+
+ return 0;
+}
+
+#ifdef CONFIG_PM_SLEEP_SMP
+int hibernate_resume_nonboot_cpu_disable(void)
+{
+ if (sleep_cpu < 0) {
+ pr_err("Failing to resume from hibernate on an unknown CPU\n");
+ return -ENODEV;
+ }
+
+ return freeze_secondary_cpus(sleep_cpu);
+}
+#endif
+
+static int __init riscv_hibernate_init(void)
+{
+ hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), GFP_KERNEL);
+
+ if (WARN_ON(!hibernate_cpu_context))
+ return -ENOMEM;
+
+ return 0;
+}
+
+early_initcall(riscv_hibernate_init);
--
2.34.1


2023-02-23 06:39:45

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 1/4] RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function

On Tue, Feb 21, 2023 at 10:35:20AM +0800, Sia Jee Heng wrote:
> Currently suspend_save_csrs() and suspend_restore_csrs() functions are
> statically defined in the suspend.c. Change the function's attribute
> to public so that the functions can be used by hibernation as well.
>
> Signed-off-by: Sia Jee Heng <[email protected]>
> Reviewed-by: Ley Foon Tan <[email protected]>
> Reviewed-by: Mason Huo <[email protected]>
> Reviewed-by: Conor Dooley <[email protected]>
> ---
> arch/riscv/include/asm/suspend.h | 3 +++
> arch/riscv/kernel/suspend.c | 4 ++--
> 2 files changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> index 8be391c2aecb..75419c5ca272 100644
> --- a/arch/riscv/include/asm/suspend.h
> +++ b/arch/riscv/include/asm/suspend.h
> @@ -33,4 +33,7 @@ int cpu_suspend(unsigned long arg,
> /* Low-level CPU resume entry function */
> int __cpu_resume_enter(unsigned long hartid, unsigned long context);
>
> +/* Used to save and restore the csr */

s/the csr/CSRs/

> +void suspend_save_csrs(struct suspend_context *context);
> +void suspend_restore_csrs(struct suspend_context *context);
> #endif
> diff --git a/arch/riscv/kernel/suspend.c b/arch/riscv/kernel/suspend.c
> index 9ba24fb8cc93..3c89b8ec69c4 100644
> --- a/arch/riscv/kernel/suspend.c
> +++ b/arch/riscv/kernel/suspend.c
> @@ -8,7 +8,7 @@
> #include <asm/csr.h>
> #include <asm/suspend.h>
>
> -static void suspend_save_csrs(struct suspend_context *context)
> +void suspend_save_csrs(struct suspend_context *context)
> {
> context->scratch = csr_read(CSR_SCRATCH);
> context->tvec = csr_read(CSR_TVEC);
> @@ -29,7 +29,7 @@ static void suspend_save_csrs(struct suspend_context *context)
> #endif
> }
>
> -static void suspend_restore_csrs(struct suspend_context *context)
> +void suspend_restore_csrs(struct suspend_context *context)
> {
> csr_write(CSR_SCRATCH, context->scratch);
> csr_write(CSR_TVEC, context->tvec);
> --
> 2.34.1
>

Otherwise,

Reviewed-by: Andrew Jones <[email protected]>

Thanks,
drew

2023-02-23 06:52:04

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 2/4] RISC-V: Factor out common code of __cpu_resume_enter()

On Tue, Feb 21, 2023 at 10:35:21AM +0800, Sia Jee Heng wrote:
> The cpu_resume() function is very similar for the suspend to disk and
> suspend to ram cases. Factor out the common code into restore_csr macro
> and restore_reg macro.
>
> Signed-off-by: Sia Jee Heng <[email protected]>
> ---
> arch/riscv/include/asm/assembler.h | 62 ++++++++++++++++++++++++++++++
> arch/riscv/kernel/suspend_entry.S | 34 ++--------------
> 2 files changed, 65 insertions(+), 31 deletions(-)
> create mode 100644 arch/riscv/include/asm/assembler.h
>
> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> new file mode 100644
> index 000000000000..727a97735493
> --- /dev/null
> +++ b/arch/riscv/include/asm/assembler.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <[email protected]>
> + */
> +
> +#ifndef __ASSEMBLY__
> +#error "Only include this from assembly code"
> +#endif
> +
> +#ifndef __ASM_ASSEMBLER_H
> +#define __ASM_ASSEMBLER_H
> +
> +#include <asm/asm.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/csr.h>
> +
> +/*
> + * restore_csr - restore hart's CSR value
> + */
> + .macro restore_csr

Since there are more than one, 'restore_csrs' would be more appropriate
and s/CSR value/CSRs/

> + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> + csrw CSR_EPC, t0
> + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> + csrw CSR_STATUS, t0
> + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> + csrw CSR_TVAL, t0
> + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> + csrw CSR_CAUSE, t0
> + .endm
> +
> +/*
> + * restore_reg - Restore registers (except A0 and T0-T6)
> + */
> + .macro restore_reg

restore_regs

> + REG_L ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> + REG_L sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> + REG_L gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> + REG_L tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> + REG_L s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> + REG_L s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> + REG_L a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> + REG_L a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> + REG_L a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> + REG_L a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> + REG_L a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> + REG_L a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> + REG_L a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> + REG_L s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> + REG_L s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> + REG_L s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> + REG_L s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> + REG_L s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> + REG_L s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> + REG_L s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> + REG_L s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> + REG_L s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> + REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> + .endm
> +
> +#endif /* __ASM_ASSEMBLER_H */
> diff --git a/arch/riscv/kernel/suspend_entry.S b/arch/riscv/kernel/suspend_entry.S
> index aafcca58c19d..74a8fab8e0f6 100644
> --- a/arch/riscv/kernel/suspend_entry.S
> +++ b/arch/riscv/kernel/suspend_entry.S
> @@ -7,6 +7,7 @@
> #include <linux/linkage.h>
> #include <asm/asm.h>
> #include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
> #include <asm/csr.h>
> #include <asm/xip_fixup.h>
>
> @@ -83,39 +84,10 @@ ENTRY(__cpu_resume_enter)
> add a0, a1, zero
>
> /* Restore CSRs */
> - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> - csrw CSR_EPC, t0
> - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> - csrw CSR_STATUS, t0
> - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> - csrw CSR_TVAL, t0
> - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> - csrw CSR_CAUSE, t0
> + restore_csr
>
> /* Restore registers (except A0 and T0-T6) */
> - REG_L ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> - REG_L sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> - REG_L gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> - REG_L tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> - REG_L s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> - REG_L s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> - REG_L a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> - REG_L a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> - REG_L a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> - REG_L a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> - REG_L a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> - REG_L a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> - REG_L a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> - REG_L s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> - REG_L s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> - REG_L s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> - REG_L s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> - REG_L s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> - REG_L s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> - REG_L s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> - REG_L s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> - REG_L s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> - REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> + restore_reg
>
> /* Return zero value */
> add a0, zero, zero
> --
> 2.34.1
>

Otherwise,

Reviewed-by: Andrew Jones <[email protected]>

Thanks,
drew


2023-02-23 06:57:30

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function

On Tue, Feb 21, 2023 at 10:35:22AM +0800, Sia Jee Heng wrote:
> Currently kernel_page_present() function doesn't support huge page
> detection causes the function to mistakenly return false to the
> hibernation core.
>
> Add huge page detection to the function to solve the problem.
>
> Fixes tag: commit 9e953cda5cdf ("riscv:
> Introduce huge page support for 32/64bit kernel")

This should be formatted as below (no line wrap and no 'tag' in the tag)

Fixes: 9e953cda5cdf ("riscv: Introduce huge page support for 32/64bit kernel")

>
> Signed-off-by: Sia Jee Heng <[email protected]>
> Reviewed-by: Ley Foon Tan <[email protected]>
> Reviewed-by: Mason Huo <[email protected]>
> ---
> arch/riscv/mm/pageattr.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
> index 86c56616e5de..ea3d61de065b 100644
> --- a/arch/riscv/mm/pageattr.c
> +++ b/arch/riscv/mm/pageattr.c
> @@ -217,18 +217,26 @@ bool kernel_page_present(struct page *page)
> pgd = pgd_offset_k(addr);
> if (!pgd_present(*pgd))
> return false;
> + if (pgd_leaf(*pgd))
> + return true;
>
> p4d = p4d_offset(pgd, addr);
> if (!p4d_present(*p4d))
> return false;
> + if (p4d_leaf(*p4d))
> + return true;
>
> pud = pud_offset(p4d, addr);
> if (!pud_present(*pud))
> return false;
> + if (pud_leaf(*pud))
> + return true;
>
> pmd = pmd_offset(pud, addr);
> if (!pmd_present(*pmd))
> return false;
> + if (pmd_leaf(*pmd))
> + return true;
>
> pte = pte_offset_kernel(pmd, addr);
> return pte_present(*pte);
> --
> 2.34.1
>
>

Otherwise,

Reviewed-by: Andrew Jones <[email protected]>

Thanks,
drew

2023-02-23 18:08:36

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

On Tue, Feb 21, 2023 at 10:35:23AM +0800, Sia Jee Heng wrote:
> Low level Arch functions were created to support hibernation.
> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> cpu state onto the stack, then calling swsusp_save() to save the memory
> image.
>
> Arch specific hibernation header is implemented and is utilized by the
> arch_hibernation_header_restore() and arch_hibernation_header_save()
> functions. The arch specific hibernation header consists of satp, hartid,
> and the cpu_resume address. The kernel built version is also need to be
> saved into the hibernation image header to making sure only the same
> kernel is restore when resume.
>
> swsusp_arch_resume() creates a temporary page table that covering only
> the linear map. It copies the restore code to a 'safe' page, then start
> to restore the memory image. Once completed, it restores the original
> kernel's page table. It then calls into __hibernate_cpu_resume()
> to restore the CPU context. Finally, it follows the normal hibernation
> path back to the hibernation core.
>
> To enable hibernation/suspend to disk into RISCV, the below config
> need to be enabled:
> - CONFIG_ARCH_HIBERNATION_HEADER
> - CONFIG_ARCH_HIBERNATION_POSSIBLE
>
> Signed-off-by: Sia Jee Heng <[email protected]>
> Reviewed-by: Ley Foon Tan <[email protected]>
> Reviewed-by: Mason Huo <[email protected]>
> ---
> arch/riscv/Kconfig | 7 +
> arch/riscv/include/asm/assembler.h | 20 ++
> arch/riscv/include/asm/suspend.h | 19 ++
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/asm-offsets.c | 5 +
> arch/riscv/kernel/hibernate-asm.S | 77 +++++
> arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> 7 files changed, 576 insertions(+)
> create mode 100644 arch/riscv/kernel/hibernate-asm.S
> create mode 100644 arch/riscv/kernel/hibernate.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..4555848a817f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -690,6 +690,13 @@ menu "Power management options"
>
> source "kernel/power/Kconfig"
>
> +config ARCH_HIBERNATION_POSSIBLE
> + def_bool y
> +
> +config ARCH_HIBERNATION_HEADER
> + def_bool y
> + depends on HIBERNATION

nit: I think this can be simplified as def_bool HIBERNATION

> +
> endmenu # "Power management options"
>
> menu "CPU Power Management"
> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> index 727a97735493..68c46c0e0ea8 100644
> --- a/arch/riscv/include/asm/assembler.h
> +++ b/arch/riscv/include/asm/assembler.h
> @@ -59,4 +59,24 @@
> REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> .endm
>
> +/*
> + * copy_page - copy 1 page (4KB) of data from source to destination
> + * @a0 - destination
> + * @a1 - source
> + */
> + .macro copy_page a0, a1
> + lui a2, 0x1
> + add a2, a2, a0
> +1 :
^ please remove this space

> + REG_L t0, 0(a1)
> + REG_L t1, SZREG(a1)
> +
> + REG_S t0, 0(a0)
> + REG_S t1, SZREG(a0)
> +
> + addi a0, a0, 2 * SZREG
> + addi a1, a1, 2 * SZREG
> + bne a2, a0, 1b
> + .endm
> +
> #endif /* __ASM_ASSEMBLER_H */
> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> index 75419c5ca272..3362da56a9d8 100644
> --- a/arch/riscv/include/asm/suspend.h
> +++ b/arch/riscv/include/asm/suspend.h
> @@ -21,6 +21,11 @@ struct suspend_context {
> #endif
> };
>
> +/*
> + * Used by hibernation core and cleared during resume sequence
> + */
> +extern int in_suspend;
> +
> /* Low-level CPU suspend entry function */
> int __cpu_suspend_enter(struct suspend_context *context);
>
> @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> /* Used to save and restore the csr */
> void suspend_save_csrs(struct suspend_context *context);
> void suspend_restore_csrs(struct suspend_context *context);
> +
> +/* Low-level API to support hibernation */
> +int swsusp_arch_suspend(void);
> +int swsusp_arch_resume(void);
> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> +int arch_hibernation_header_restore(void *addr);
> +int __hibernate_cpu_resume(void);
> +
> +/* Used to resume on the CPU we hibernated on */
> +int hibernate_resume_nonboot_cpu_disable(void);
> +
> +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> + unsigned long cpu_resume);
> +asmlinkage int hibernate_core_restore_code(void);
> #endif
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 4cf303a779ab..daab341d55e4 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
>
> obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
>
> obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index df9444397908..d6a75aac1d27 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -9,6 +9,7 @@
> #include <linux/kbuild.h>
> #include <linux/mm.h>
> #include <linux/sched.h>
> +#include <linux/suspend.h>
> #include <asm/kvm_host.h>
> #include <asm/thread_info.h>
> #include <asm/ptrace.h>
> @@ -116,6 +117,10 @@ void asm_offsets(void)
>
> OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>
> + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> +
> OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> new file mode 100644
> index 000000000000..846affe4dced
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate-asm.S
> @@ -0,0 +1,77 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Hibernation low level support for RISCV.
> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <[email protected]>
> + */
> +
> +#include <asm/asm.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
> +#include <asm/csr.h>
> +
> +#include <linux/linkage.h>
> +
> +/*
> + * int __hibernate_cpu_resume(void)
> + * Switch back to the hibernated image's page table prior to restoring the CPU
> + * context.
> + *
> + * Always returns 0
> + */
> +ENTRY(__hibernate_cpu_resume)
> + /* switch to hibernated image's page table. */
> + csrw CSR_SATP, s0
> + sfence.vma
> +
> + REG_L a0, hibernate_cpu_context
> +
> + restore_csr
> + restore_reg
> +
> + /* Return zero value. */
> + add a0, zero, zero

nit: mv a0, zero

> +
> + ret
> +END(__hibernate_cpu_resume)
> +
> +/*
> + * Prepare to restore the image.
> + * a0: satp of saved page tables.
> + * a1: satp of temporary page tables.
> + * a2: cpu_resume.
> + */
> +ENTRY(hibernate_restore_image)
> + mv s0, a0
> + mv s1, a1
> + mv s2, a2
> + REG_L s4, restore_pblist
> + REG_L a1, relocated_restore_code
> +
> + jalr a1
> +END(hibernate_restore_image)
> +
> +/*
> + * The below code will be executed from a 'safe' page.
> + * It first switches to the temporary page table, then starts to copy the pages
> + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> + * to restore the CPU context.
> + */
> +ENTRY(hibernate_core_restore_code)
> + /* switch to temp page table. */
> + csrw satp, s1
> + sfence.vma
> +.Lcopy:
> + /* The below code will restore the hibernated image. */
> + REG_L a1, HIBERN_PBE_ADDR(s4)
> + REG_L a0, HIBERN_PBE_ORIG(s4)

Are we sure restore_pblist will never be NULL?

> +
> + copy_page a0, a1
> +
> + REG_L s4, HIBERN_PBE_NEXT(s4)
> + bnez s4, .Lcopy
> +
> + jalr s2
> +END(hibernate_core_restore_code)
> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> new file mode 100644
> index 000000000000..46a2f470db6e
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate.c
> @@ -0,0 +1,447 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Hibernation support for RISCV
> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <[email protected]>
> + */
> +
> +#include <asm/barrier.h>
> +#include <asm/cacheflush.h>
> +#include <asm/mmu_context.h>
> +#include <asm/page.h>
> +#include <asm/pgalloc.h>
> +#include <asm/pgtable.h>
> +#include <asm/sections.h>
> +#include <asm/set_memory.h>
> +#include <asm/smp.h>
> +#include <asm/suspend.h>
> +
> +#include <linux/cpu.h>
> +#include <linux/memblock.h>
> +#include <linux/pm.h>
> +#include <linux/sched.h>
> +#include <linux/suspend.h>
> +#include <linux/utsname.h>
> +
> +/* The logical cpu number we should resume on, initialised to a non-cpu number. */
> +static int sleep_cpu = -EINVAL;
> +
> +/* Pointer to the temporary resume page table. */
> +static pgd_t *resume_pg_dir;
> +
> +/* CPU context to be saved. */
> +struct suspend_context *hibernate_cpu_context;
> +EXPORT_SYMBOL_GPL(hibernate_cpu_context);
> +
> +unsigned long relocated_restore_code;
> +EXPORT_SYMBOL_GPL(relocated_restore_code);
> +
> +/**
> + * struct arch_hibernate_hdr_invariants - container to store kernel build version.
> + * @uts_version: to save the build number and date so that the we do not resume with
> + * a different kernel.
> + */
> +struct arch_hibernate_hdr_invariants {
> + char uts_version[__NEW_UTS_LEN + 1];
> +};
> +
> +/**
> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image.
> + * @invariants: container to store kernel build version.
> + * @hartid: to make sure same boot_cpu executes the hibernate/restore code.
> + * @saved_satp: original page table used by the hibernated image.
> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> + */
> +static struct arch_hibernate_hdr {
> + struct arch_hibernate_hdr_invariants invariants;
> + unsigned long hartid;
> + unsigned long saved_satp;
> + unsigned long restore_cpu_addr;
> +} resume_hdr;
> +
> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> +{
> + memset(i, 0, sizeof(*i));
> + memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> +}
> +
> +/*
> + * Check if the given pfn is in the 'nosave' section.
> + */
> +int pfn_is_nosave(unsigned long pfn)
> +{
> + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> +
> + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> +}
> +
> +void notrace save_processor_state(void)
> +{
> + WARN_ON(num_online_cpus() != 1);
> +}
> +
> +void notrace restore_processor_state(void)
> +{
> +}
> +
> +/*
> + * Helper parameters need to be saved to the hibernation image header.
> + */
> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> +{
> + struct arch_hibernate_hdr *hdr = addr;
> +
> + if (max_size < sizeof(*hdr))
> + return -EOVERFLOW;
> +
> + arch_hdr_invariants(&hdr->invariants);
> +
> + hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> + hdr->saved_satp = csr_read(CSR_SATP);
> + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(arch_hibernation_header_save);
> +
> +/*
> + * Retrieve the helper parameters from the hibernation image header.
> + */
> +int arch_hibernation_header_restore(void *addr)
> +{
> + struct arch_hibernate_hdr_invariants invariants;
> + struct arch_hibernate_hdr *hdr = addr;
> + int ret = 0;
> +
> + arch_hdr_invariants(&invariants);
> +
> + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> + pr_crit("Hibernate image not generated by this kernel!\n");
> + return -EINVAL;
> + }
> +
> + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> + if (sleep_cpu < 0) {
> + pr_crit("Hibernated on a CPU not known to this kernel!\n");
> + sleep_cpu = -EINVAL;
> + return -EINVAL;
> + }
> +
> +#ifdef CONFIG_SMP
> + ret = bringup_hibernate_cpu(sleep_cpu);
> + if (ret) {
> + sleep_cpu = -EINVAL;
> + return ret;
> + }
> +#endif
> + resume_hdr = *hdr;
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore);
> +
> +int swsusp_arch_suspend(void)
> +{
> + int ret = 0;
> +
> + if (__cpu_suspend_enter(hibernate_cpu_context)) {
> + sleep_cpu = smp_processor_id();
> + suspend_save_csrs(hibernate_cpu_context);
> + ret = swsusp_save();
> + } else {
> + suspend_restore_csrs(hibernate_cpu_context);
> + flush_tlb_all();
> + flush_icache_all();
> +
> + /*
> + * Tell the hibernation core that we've just restored the memory.
> + */
> + in_suspend = 0;
> + sleep_cpu = -EINVAL;
> + }
> +
> + return ret;
> +}
> +
> +static unsigned long _temp_pgtable_map_pte(pte_t *dst_ptep, pte_t *src_ptep,
> + unsigned long addr, pgprot_t prot)
> +{
> + pte_t pte = READ_ONCE(*src_ptep);
> +
> + if (pte_present(pte))
> + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot)));
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp,
> + unsigned long start, unsigned long end,
> + pgprot_t prot)
> +{
> + unsigned long addr = start;
> + pte_t *src_ptep;
> + pte_t *dst_ptep;
> +
> + if (pmd_none(READ_ONCE(*dst_pmdp))) {
> + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> + if (!dst_ptep)
> + return -ENOMEM;
> +
> + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep);
> + }
> +
> + dst_ptep = pte_offset_kernel(dst_pmdp, start);
> + src_ptep = pte_offset_kernel(src_pmdp, start);
> +
> + do {
> + _temp_pgtable_map_pte(dst_ptep, src_ptep, addr, prot);

I think I'd rather have the body of _temp_pgtable_map_pte() here and drop
the helper, because the helper does (pte_val(pte) | pgprot_val(prot))
which looks strange, until seeing here that 'pte' is only the address
bits, so OR'ing in new prot bits without clearing old prot bits makes
sense.

> + } while (dst_ptep++, src_ptep++, addr += PAGE_SIZE, addr < end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp,
> + unsigned long start, unsigned long end,
> + pgprot_t prot)
> +{
> + unsigned long addr = start;
> + unsigned long next;
> + unsigned long ret;
> + pmd_t *src_pmdp;
> + pmd_t *dst_pmdp;
> +
> + if (pud_none(READ_ONCE(*dst_pudp))) {
> + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> + if (!dst_pmdp)
> + return -ENOMEM;
> +
> + pud_populate(NULL, dst_pudp, dst_pmdp);
> + }
> +
> + dst_pmdp = pmd_offset(dst_pudp, start);
> + src_pmdp = pmd_offset(src_pudp, start);
> +
> + do {
> + pmd_t pmd = READ_ONCE(*src_pmdp);
> +
> + next = pmd_addr_end(addr, end);
> +
> + if (pmd_none(pmd))
> + continue;
> +
> + if (pmd_leaf(pmd)) {
> + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot)));
> + } else {
> + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, addr, next, prot);
> + if (ret)
> + return -ENOMEM;
> + }
> + } while (dst_pmdp++, src_pmdp++, addr = next, addr != end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp,
> + unsigned long start,
> + unsigned long end, pgprot_t prot)
> +{
> + unsigned long addr = start;
> + unsigned long next;
> + unsigned long ret;
> + pud_t *dst_pudp;
> + pud_t *src_pudp;
> +
> + if (p4d_none(READ_ONCE(*dst_p4dp))) {
> + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> + if (!dst_pudp)
> + return -ENOMEM;
> +
> + p4d_populate(NULL, dst_p4dp, dst_pudp);
> + }
> +
> + dst_pudp = pud_offset(dst_p4dp, start);
> + src_pudp = pud_offset(src_p4dp, start);
> +
> + do {
> + pud_t pud = READ_ONCE(*src_pudp);
> +
> + next = pud_addr_end(addr, end);
> +
> + if (pud_none(pud))
> + continue;
> +
> + if (pud_leaf(pud)) {
> + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot)));
> + } else {
> + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, addr, next, prot);
> + if (ret)
> + return -ENOMEM;
> + }
> + } while (dst_pudp++, src_pudp++, addr = next, addr != end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp,
> + unsigned long start, unsigned long end,
> + pgprot_t prot)
> +{
> + unsigned long addr = start;
> + unsigned long next;
> + unsigned long ret;
> + p4d_t *dst_p4dp;
> + p4d_t *src_p4dp;
> +
> + if (pgd_none(READ_ONCE(*dst_pgdp))) {
> + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC);
> + if (!dst_p4dp)
> + return -ENOMEM;
> +
> + pgd_populate(NULL, dst_pgdp, dst_p4dp);
> + }
> +
> + dst_p4dp = p4d_offset(dst_pgdp, start);
> + src_p4dp = p4d_offset(src_pgdp, start);
> +
> + do {
> + p4d_t p4d = READ_ONCE(*src_p4dp);
> +
> + next = p4d_addr_end(addr, end);
> +
> + if (p4d_none(READ_ONCE(*src_p4dp)))
> + continue;
> +
> + if (p4d_leaf(p4d)) {
> + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot)));
> + } else {
> + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, addr, next, prot);
> + if (ret)
> + return -ENOMEM;
> + }
> + } while (dst_p4dp++, src_p4dp++, addr = next, addr != end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp)
> +{
> + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn);
> + unsigned long addr = PAGE_OFFSET;
> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
> + pgd_t *src_pgdp = pgd_offset_k(addr);
> + unsigned long next;
> +
> + do {
> + next = pgd_addr_end(addr, end);
> + if (pgd_none(READ_ONCE(*src_pgdp)))
> + continue;
> +
> + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, next, PAGE_KERNEL))
> + return -ENOMEM;
> + } while (dst_pgdp++, src_pgdp++, addr = next, addr != end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_text_mapping(pgd_t *pgdp, unsigned long addr)
> +{
> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
> + pgd_t *src_pgdp = pgd_offset_k(addr);
> +
> + if (pgd_none(READ_ONCE(*src_pgdp)))
> + return -EFAULT;
> +
> + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, addr, PAGE_KERNEL_EXEC))
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static unsigned long relocate_restore_code(void)
> +{
> + unsigned long ret;
> + void *page = (void *)get_safe_page(GFP_ATOMIC);
> +
> + if (!page)
> + return -ENOMEM;
> +
> + copy_page(page, hibernate_core_restore_code);
> +
> + /* Make the page containing the relocated code executable. */
> + set_memory_x((unsigned long)page, 1);
> +
> + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)page);
> + if (ret)
> + return ret;
> +
> + return (unsigned long)page;
> +}
> +
> +int swsusp_arch_resume(void)
> +{
> + unsigned long ret;
> +
> + /*
> + * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> + * we don't need to free it here.
> + */
> + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> + if (!resume_pg_dir)
> + return -ENOMEM;
> +
> + /*
> + * The pages need to be writable when restoring the image.
> + * Create a second copy of page table just for the linear map.
> + * Use this temporary page table to restore the image.
> + */
> + ret = temp_pgtable_mapping(resume_pg_dir);
> + if (ret)
> + return (int)ret;
> +
> + /* Move the restore code to a new page so that it doesn't get overwritten by itself. */
> + relocated_restore_code = relocate_restore_code();
> + if (relocated_restore_code == -ENOMEM)
> + return -ENOMEM;
> +
> + /*
> + * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> + * restore code can jumps to it after finished restore the image. The next execution
> + * code doesn't find itself in a different address space after switching over to the
> + * original page table used by the hibernated image.
> + */
> + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr);
> + if (ret)
> + return ret;
> +
> + hibernate_restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> + resume_hdr.restore_cpu_addr);
> +
> + return 0;
> +}
> +
> +#ifdef CONFIG_PM_SLEEP_SMP
> +int hibernate_resume_nonboot_cpu_disable(void)
> +{
> + if (sleep_cpu < 0) {
> + pr_err("Failing to resume from hibernate on an unknown CPU\n");
> + return -ENODEV;
> + }
> +
> + return freeze_secondary_cpus(sleep_cpu);
> +}
> +#endif
> +
> +static int __init riscv_hibernate_init(void)
> +{
> + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), GFP_KERNEL);
> +
> + if (WARN_ON(!hibernate_cpu_context))
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +early_initcall(riscv_hibernate_init);
> --
> 2.34.1
>

Thanks,
drew

2023-02-24 01:33:46

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 1/4] RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Thursday, 23 February, 2023 2:40 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>; Conor Dooley
> <[email protected]>
> Subject: Re: [PATCH v4 1/4] RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function
>
> On Tue, Feb 21, 2023 at 10:35:20AM +0800, Sia Jee Heng wrote:
> > Currently suspend_save_csrs() and suspend_restore_csrs() functions are
> > statically defined in the suspend.c. Change the function's attribute
> > to public so that the functions can be used by hibernation as well.
> >
> > Signed-off-by: Sia Jee Heng <[email protected]>
> > Reviewed-by: Ley Foon Tan <[email protected]>
> > Reviewed-by: Mason Huo <[email protected]>
> > Reviewed-by: Conor Dooley <[email protected]>
> > ---
> > arch/riscv/include/asm/suspend.h | 3 +++
> > arch/riscv/kernel/suspend.c | 4 ++--
> > 2 files changed, 5 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > index 8be391c2aecb..75419c5ca272 100644
> > --- a/arch/riscv/include/asm/suspend.h
> > +++ b/arch/riscv/include/asm/suspend.h
> > @@ -33,4 +33,7 @@ int cpu_suspend(unsigned long arg,
> > /* Low-level CPU resume entry function */
> > int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> >
> > +/* Used to save and restore the csr */
>
> s/the csr/CSRs/
>
> > +void suspend_save_csrs(struct suspend_context *context);
> > +void suspend_restore_csrs(struct suspend_context *context);
> > #endif
> > diff --git a/arch/riscv/kernel/suspend.c b/arch/riscv/kernel/suspend.c
> > index 9ba24fb8cc93..3c89b8ec69c4 100644
> > --- a/arch/riscv/kernel/suspend.c
> > +++ b/arch/riscv/kernel/suspend.c
> > @@ -8,7 +8,7 @@
> > #include <asm/csr.h>
> > #include <asm/suspend.h>
> >
> > -static void suspend_save_csrs(struct suspend_context *context)
> > +void suspend_save_csrs(struct suspend_context *context)
> > {
> > context->scratch = csr_read(CSR_SCRATCH);
> > context->tvec = csr_read(CSR_TVEC);
> > @@ -29,7 +29,7 @@ static void suspend_save_csrs(struct suspend_context *context)
> > #endif
> > }
> >
> > -static void suspend_restore_csrs(struct suspend_context *context)
> > +void suspend_restore_csrs(struct suspend_context *context)
> > {
> > csr_write(CSR_SCRATCH, context->scratch);
> > csr_write(CSR_TVEC, context->tvec);
> > --
> > 2.34.1
> >
>
> Otherwise,
>
> Reviewed-by: Andrew Jones <[email protected]>
Noted with thanks
>
> Thanks,
> drew

2023-02-24 01:33:52

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 2/4] RISC-V: Factor out common code of __cpu_resume_enter()



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Thursday, 23 February, 2023 2:52 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 2/4] RISC-V: Factor out common code of __cpu_resume_enter()
>
> On Tue, Feb 21, 2023 at 10:35:21AM +0800, Sia Jee Heng wrote:
> > The cpu_resume() function is very similar for the suspend to disk and
> > suspend to ram cases. Factor out the common code into restore_csr macro
> > and restore_reg macro.
> >
> > Signed-off-by: Sia Jee Heng <[email protected]>
> > ---
> > arch/riscv/include/asm/assembler.h | 62 ++++++++++++++++++++++++++++++
> > arch/riscv/kernel/suspend_entry.S | 34 ++--------------
> > 2 files changed, 65 insertions(+), 31 deletions(-)
> > create mode 100644 arch/riscv/include/asm/assembler.h
> >
> > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > new file mode 100644
> > index 000000000000..727a97735493
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/assembler.h
> > @@ -0,0 +1,62 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <[email protected]>
> > + */
> > +
> > +#ifndef __ASSEMBLY__
> > +#error "Only include this from assembly code"
> > +#endif
> > +
> > +#ifndef __ASM_ASSEMBLER_H
> > +#define __ASM_ASSEMBLER_H
> > +
> > +#include <asm/asm.h>
> > +#include <asm/asm-offsets.h>
> > +#include <asm/csr.h>
> > +
> > +/*
> > + * restore_csr - restore hart's CSR value
> > + */
> > + .macro restore_csr
>
> Since there are more than one, 'restore_csrs' would be more appropriate
> and s/CSR value/CSRs/
>
> > + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> > + csrw CSR_EPC, t0
> > + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> > + csrw CSR_STATUS, t0
> > + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> > + csrw CSR_TVAL, t0
> > + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> > + csrw CSR_CAUSE, t0
> > + .endm
> > +
> > +/*
> > + * restore_reg - Restore registers (except A0 and T0-T6)
> > + */
> > + .macro restore_reg
>
> restore_regs
>
> > + REG_L ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> > + REG_L sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> > + REG_L gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> > + REG_L tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> > + REG_L s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> > + REG_L s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> > + REG_L a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> > + REG_L a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> > + REG_L a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> > + REG_L a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> > + REG_L a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> > + REG_L a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> > + REG_L a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> > + REG_L s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> > + REG_L s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> > + REG_L s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> > + REG_L s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> > + REG_L s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> > + REG_L s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> > + REG_L s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> > + REG_L s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> > + REG_L s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> > + REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > + .endm
> > +
> > +#endif /* __ASM_ASSEMBLER_H */
> > diff --git a/arch/riscv/kernel/suspend_entry.S b/arch/riscv/kernel/suspend_entry.S
> > index aafcca58c19d..74a8fab8e0f6 100644
> > --- a/arch/riscv/kernel/suspend_entry.S
> > +++ b/arch/riscv/kernel/suspend_entry.S
> > @@ -7,6 +7,7 @@
> > #include <linux/linkage.h>
> > #include <asm/asm.h>
> > #include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> > #include <asm/csr.h>
> > #include <asm/xip_fixup.h>
> >
> > @@ -83,39 +84,10 @@ ENTRY(__cpu_resume_enter)
> > add a0, a1, zero
> >
> > /* Restore CSRs */
> > - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> > - csrw CSR_EPC, t0
> > - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> > - csrw CSR_STATUS, t0
> > - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> > - csrw CSR_TVAL, t0
> > - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> > - csrw CSR_CAUSE, t0
> > + restore_csr
> >
> > /* Restore registers (except A0 and T0-T6) */
> > - REG_L ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> > - REG_L sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> > - REG_L gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> > - REG_L tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> > - REG_L s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> > - REG_L s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> > - REG_L a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> > - REG_L a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> > - REG_L a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> > - REG_L a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> > - REG_L a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> > - REG_L a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> > - REG_L a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> > - REG_L s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> > - REG_L s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> > - REG_L s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> > - REG_L s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> > - REG_L s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> > - REG_L s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> > - REG_L s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> > - REG_L s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> > - REG_L s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> > - REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > + restore_reg
> >
> > /* Return zero value */
> > add a0, zero, zero
> > --
> > 2.34.1
> >
>
> Otherwise,
>
> Reviewed-by: Andrew Jones <[email protected]>
noted with thanks
>
> Thanks,
> drew


2023-02-24 01:34:23

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Thursday, 23 February, 2023 2:57 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function
>
> On Tue, Feb 21, 2023 at 10:35:22AM +0800, Sia Jee Heng wrote:
> > Currently kernel_page_present() function doesn't support huge page
> > detection causes the function to mistakenly return false to the
> > hibernation core.
> >
> > Add huge page detection to the function to solve the problem.
> >
> > Fixes tag: commit 9e953cda5cdf ("riscv:
> > Introduce huge page support for 32/64bit kernel")
>
> This should be formatted as below (no line wrap and no 'tag' in the tag)
>
> Fixes: 9e953cda5cdf ("riscv: Introduce huge page support for 32/64bit kernel")
>
> >
> > Signed-off-by: Sia Jee Heng <[email protected]>
> > Reviewed-by: Ley Foon Tan <[email protected]>
> > Reviewed-by: Mason Huo <[email protected]>
> > ---
> > arch/riscv/mm/pageattr.c | 8 ++++++++
> > 1 file changed, 8 insertions(+)
> >
> > diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
> > index 86c56616e5de..ea3d61de065b 100644
> > --- a/arch/riscv/mm/pageattr.c
> > +++ b/arch/riscv/mm/pageattr.c
> > @@ -217,18 +217,26 @@ bool kernel_page_present(struct page *page)
> > pgd = pgd_offset_k(addr);
> > if (!pgd_present(*pgd))
> > return false;
> > + if (pgd_leaf(*pgd))
> > + return true;
> >
> > p4d = p4d_offset(pgd, addr);
> > if (!p4d_present(*p4d))
> > return false;
> > + if (p4d_leaf(*p4d))
> > + return true;
> >
> > pud = pud_offset(p4d, addr);
> > if (!pud_present(*pud))
> > return false;
> > + if (pud_leaf(*pud))
> > + return true;
> >
> > pmd = pmd_offset(pud, addr);
> > if (!pmd_present(*pmd))
> > return false;
> > + if (pmd_leaf(*pmd))
> > + return true;
> >
> > pte = pte_offset_kernel(pmd, addr);
> > return pte_present(*pte);
> > --
> > 2.34.1
> >
> >
>
> Otherwise,
>
> Reviewed-by: Andrew Jones <[email protected]>
noted with thanks
>
> Thanks,
> drew

2023-02-24 02:05:57

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Friday, 24 February, 2023 2:07 AM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> On Tue, Feb 21, 2023 at 10:35:23AM +0800, Sia Jee Heng wrote:
> > Low level Arch functions were created to support hibernation.
> > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > cpu state onto the stack, then calling swsusp_save() to save the memory
> > image.
> >
> > Arch specific hibernation header is implemented and is utilized by the
> > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > functions. The arch specific hibernation header consists of satp, hartid,
> > and the cpu_resume address. The kernel built version is also need to be
> > saved into the hibernation image header to making sure only the same
> > kernel is restore when resume.
> >
> > swsusp_arch_resume() creates a temporary page table that covering only
> > the linear map. It copies the restore code to a 'safe' page, then start
> > to restore the memory image. Once completed, it restores the original
> > kernel's page table. It then calls into __hibernate_cpu_resume()
> > to restore the CPU context. Finally, it follows the normal hibernation
> > path back to the hibernation core.
> >
> > To enable hibernation/suspend to disk into RISCV, the below config
> > need to be enabled:
> > - CONFIG_ARCH_HIBERNATION_HEADER
> > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >
> > Signed-off-by: Sia Jee Heng <[email protected]>
> > Reviewed-by: Ley Foon Tan <[email protected]>
> > Reviewed-by: Mason Huo <[email protected]>
> > ---
> > arch/riscv/Kconfig | 7 +
> > arch/riscv/include/asm/assembler.h | 20 ++
> > arch/riscv/include/asm/suspend.h | 19 ++
> > arch/riscv/kernel/Makefile | 1 +
> > arch/riscv/kernel/asm-offsets.c | 5 +
> > arch/riscv/kernel/hibernate-asm.S | 77 +++++
> > arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> > 7 files changed, 576 insertions(+)
> > create mode 100644 arch/riscv/kernel/hibernate-asm.S
> > create mode 100644 arch/riscv/kernel/hibernate.c
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index e2b656043abf..4555848a817f 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -690,6 +690,13 @@ menu "Power management options"
> >
> > source "kernel/power/Kconfig"
> >
> > +config ARCH_HIBERNATION_POSSIBLE
> > + def_bool y
> > +
> > +config ARCH_HIBERNATION_HEADER
> > + def_bool y
> > + depends on HIBERNATION
>
> nit: I think this can be simplified as def_bool HIBERNATION
good suggestion. will change it.
>
> > +
> > endmenu # "Power management options"
> >
> > menu "CPU Power Management"
> > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > index 727a97735493..68c46c0e0ea8 100644
> > --- a/arch/riscv/include/asm/assembler.h
> > +++ b/arch/riscv/include/asm/assembler.h
> > @@ -59,4 +59,24 @@
> > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > .endm
> >
> > +/*
> > + * copy_page - copy 1 page (4KB) of data from source to destination
> > + * @a0 - destination
> > + * @a1 - source
> > + */
> > + .macro copy_page a0, a1
> > + lui a2, 0x1
> > + add a2, a2, a0
> > +1 :
> ^ please remove this space
can't remove it otherwise checkpatch will throws ERROR: spaces required around that ':'
>
> > + REG_L t0, 0(a1)
> > + REG_L t1, SZREG(a1)
> > +
> > + REG_S t0, 0(a0)
> > + REG_S t1, SZREG(a0)
> > +
> > + addi a0, a0, 2 * SZREG
> > + addi a1, a1, 2 * SZREG
> > + bne a2, a0, 1b
> > + .endm
> > +
> > #endif /* __ASM_ASSEMBLER_H */
> > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > index 75419c5ca272..3362da56a9d8 100644
> > --- a/arch/riscv/include/asm/suspend.h
> > +++ b/arch/riscv/include/asm/suspend.h
> > @@ -21,6 +21,11 @@ struct suspend_context {
> > #endif
> > };
> >
> > +/*
> > + * Used by hibernation core and cleared during resume sequence
> > + */
> > +extern int in_suspend;
> > +
> > /* Low-level CPU suspend entry function */
> > int __cpu_suspend_enter(struct suspend_context *context);
> >
> > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> > /* Used to save and restore the csr */
> > void suspend_save_csrs(struct suspend_context *context);
> > void suspend_restore_csrs(struct suspend_context *context);
> > +
> > +/* Low-level API to support hibernation */
> > +int swsusp_arch_suspend(void);
> > +int swsusp_arch_resume(void);
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > +int arch_hibernation_header_restore(void *addr);
> > +int __hibernate_cpu_resume(void);
> > +
> > +/* Used to resume on the CPU we hibernated on */
> > +int hibernate_resume_nonboot_cpu_disable(void);
> > +
> > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > + unsigned long cpu_resume);
> > +asmlinkage int hibernate_core_restore_code(void);
> > #endif
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index 4cf303a779ab..daab341d55e4 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> >
> > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> >
> > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > index df9444397908..d6a75aac1d27 100644
> > --- a/arch/riscv/kernel/asm-offsets.c
> > +++ b/arch/riscv/kernel/asm-offsets.c
> > @@ -9,6 +9,7 @@
> > #include <linux/kbuild.h>
> > #include <linux/mm.h>
> > #include <linux/sched.h>
> > +#include <linux/suspend.h>
> > #include <asm/kvm_host.h>
> > #include <asm/thread_info.h>
> > #include <asm/ptrace.h>
> > @@ -116,6 +117,10 @@ void asm_offsets(void)
> >
> > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >
> > + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > +
> > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > new file mode 100644
> > index 000000000000..846affe4dced
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate-asm.S
> > @@ -0,0 +1,77 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Hibernation low level support for RISCV.
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <[email protected]>
> > + */
> > +
> > +#include <asm/asm.h>
> > +#include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> > +#include <asm/csr.h>
> > +
> > +#include <linux/linkage.h>
> > +
> > +/*
> > + * int __hibernate_cpu_resume(void)
> > + * Switch back to the hibernated image's page table prior to restoring the CPU
> > + * context.
> > + *
> > + * Always returns 0
> > + */
> > +ENTRY(__hibernate_cpu_resume)
> > + /* switch to hibernated image's page table. */
> > + csrw CSR_SATP, s0
> > + sfence.vma
> > +
> > + REG_L a0, hibernate_cpu_context
> > +
> > + restore_csr
> > + restore_reg
> > +
> > + /* Return zero value. */
> > + add a0, zero, zero
>
> nit: mv a0, zero
sure
>
> > +
> > + ret
> > +END(__hibernate_cpu_resume)
> > +
> > +/*
> > + * Prepare to restore the image.
> > + * a0: satp of saved page tables.
> > + * a1: satp of temporary page tables.
> > + * a2: cpu_resume.
> > + */
> > +ENTRY(hibernate_restore_image)
> > + mv s0, a0
> > + mv s1, a1
> > + mv s2, a2
> > + REG_L s4, restore_pblist
> > + REG_L a1, relocated_restore_code
> > +
> > + jalr a1
> > +END(hibernate_restore_image)
> > +
> > +/*
> > + * The below code will be executed from a 'safe' page.
> > + * It first switches to the temporary page table, then starts to copy the pages
> > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> > + * to restore the CPU context.
> > + */
> > +ENTRY(hibernate_core_restore_code)
> > + /* switch to temp page table. */
> > + csrw satp, s1
> > + sfence.vma
> > +.Lcopy:
> > + /* The below code will restore the hibernated image. */
> > + REG_L a1, HIBERN_PBE_ADDR(s4)
> > + REG_L a0, HIBERN_PBE_ORIG(s4)
>
> Are we sure restore_pblist will never be NULL?
restore_pblist is a link-list, it will be null during initialization or during page clean up by hibernation core. During the initial resume process, the hibernation core will check the header and load the pages. If everything works correctly, the page will be linked to the restore_pblist and then invoke swsusp_arch_resume() else hibernation core will throws error and failed to resume from the hibernated image.
>
> > +
> > + copy_page a0, a1
> > +
> > + REG_L s4, HIBERN_PBE_NEXT(s4)
> > + bnez s4, .Lcopy
> > +
> > + jalr s2
> > +END(hibernate_core_restore_code)
> > diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> > new file mode 100644
> > index 000000000000..46a2f470db6e
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate.c
> > @@ -0,0 +1,447 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Hibernation support for RISCV
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <[email protected]>
> > + */
> > +
> > +#include <asm/barrier.h>
> > +#include <asm/cacheflush.h>
> > +#include <asm/mmu_context.h>
> > +#include <asm/page.h>
> > +#include <asm/pgalloc.h>
> > +#include <asm/pgtable.h>
> > +#include <asm/sections.h>
> > +#include <asm/set_memory.h>
> > +#include <asm/smp.h>
> > +#include <asm/suspend.h>
> > +
> > +#include <linux/cpu.h>
> > +#include <linux/memblock.h>
> > +#include <linux/pm.h>
> > +#include <linux/sched.h>
> > +#include <linux/suspend.h>
> > +#include <linux/utsname.h>
> > +
> > +/* The logical cpu number we should resume on, initialised to a non-cpu number. */
> > +static int sleep_cpu = -EINVAL;
> > +
> > +/* Pointer to the temporary resume page table. */
> > +static pgd_t *resume_pg_dir;
> > +
> > +/* CPU context to be saved. */
> > +struct suspend_context *hibernate_cpu_context;
> > +EXPORT_SYMBOL_GPL(hibernate_cpu_context);
> > +
> > +unsigned long relocated_restore_code;
> > +EXPORT_SYMBOL_GPL(relocated_restore_code);
> > +
> > +/**
> > + * struct arch_hibernate_hdr_invariants - container to store kernel build version.
> > + * @uts_version: to save the build number and date so that the we do not resume with
> > + * a different kernel.
> > + */
> > +struct arch_hibernate_hdr_invariants {
> > + char uts_version[__NEW_UTS_LEN + 1];
> > +};
> > +
> > +/**
> > + * struct arch_hibernate_hdr - helper parameters that help us to restore the image.
> > + * @invariants: container to store kernel build version.
> > + * @hartid: to make sure same boot_cpu executes the hibernate/restore code.
> > + * @saved_satp: original page table used by the hibernated image.
> > + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> > + */
> > +static struct arch_hibernate_hdr {
> > + struct arch_hibernate_hdr_invariants invariants;
> > + unsigned long hartid;
> > + unsigned long saved_satp;
> > + unsigned long restore_cpu_addr;
> > +} resume_hdr;
> > +
> > +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> > +{
> > + memset(i, 0, sizeof(*i));
> > + memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> > +}
> > +
> > +/*
> > + * Check if the given pfn is in the 'nosave' section.
> > + */
> > +int pfn_is_nosave(unsigned long pfn)
> > +{
> > + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> > + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> > +
> > + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> > +}
> > +
> > +void notrace save_processor_state(void)
> > +{
> > + WARN_ON(num_online_cpus() != 1);
> > +}
> > +
> > +void notrace restore_processor_state(void)
> > +{
> > +}
> > +
> > +/*
> > + * Helper parameters need to be saved to the hibernation image header.
> > + */
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> > +{
> > + struct arch_hibernate_hdr *hdr = addr;
> > +
> > + if (max_size < sizeof(*hdr))
> > + return -EOVERFLOW;
> > +
> > + arch_hdr_invariants(&hdr->invariants);
> > +
> > + hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> > + hdr->saved_satp = csr_read(CSR_SATP);
> > + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(arch_hibernation_header_save);
> > +
> > +/*
> > + * Retrieve the helper parameters from the hibernation image header.
> > + */
> > +int arch_hibernation_header_restore(void *addr)
> > +{
> > + struct arch_hibernate_hdr_invariants invariants;
> > + struct arch_hibernate_hdr *hdr = addr;
> > + int ret = 0;
> > +
> > + arch_hdr_invariants(&invariants);
> > +
> > + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> > + pr_crit("Hibernate image not generated by this kernel!\n");
> > + return -EINVAL;
> > + }
> > +
> > + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> > + if (sleep_cpu < 0) {
> > + pr_crit("Hibernated on a CPU not known to this kernel!\n");
> > + sleep_cpu = -EINVAL;
> > + return -EINVAL;
> > + }
> > +
> > +#ifdef CONFIG_SMP
> > + ret = bringup_hibernate_cpu(sleep_cpu);
> > + if (ret) {
> > + sleep_cpu = -EINVAL;
> > + return ret;
> > + }
> > +#endif
> > + resume_hdr = *hdr;
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore);
> > +
> > +int swsusp_arch_suspend(void)
> > +{
> > + int ret = 0;
> > +
> > + if (__cpu_suspend_enter(hibernate_cpu_context)) {
> > + sleep_cpu = smp_processor_id();
> > + suspend_save_csrs(hibernate_cpu_context);
> > + ret = swsusp_save();
> > + } else {
> > + suspend_restore_csrs(hibernate_cpu_context);
> > + flush_tlb_all();
> > + flush_icache_all();
> > +
> > + /*
> > + * Tell the hibernation core that we've just restored the memory.
> > + */
> > + in_suspend = 0;
> > + sleep_cpu = -EINVAL;
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +static unsigned long _temp_pgtable_map_pte(pte_t *dst_ptep, pte_t *src_ptep,
> > + unsigned long addr, pgprot_t prot)
> > +{
> > + pte_t pte = READ_ONCE(*src_ptep);
> > +
> > + if (pte_present(pte))
> > + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot)));
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp,
> > + unsigned long start, unsigned long end,
> > + pgprot_t prot)
> > +{
> > + unsigned long addr = start;
> > + pte_t *src_ptep;
> > + pte_t *dst_ptep;
> > +
> > + if (pmd_none(READ_ONCE(*dst_pmdp))) {
> > + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> > + if (!dst_ptep)
> > + return -ENOMEM;
> > +
> > + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep);
> > + }
> > +
> > + dst_ptep = pte_offset_kernel(dst_pmdp, start);
> > + src_ptep = pte_offset_kernel(src_pmdp, start);
> > +
> > + do {
> > + _temp_pgtable_map_pte(dst_ptep, src_ptep, addr, prot);
>
> I think I'd rather have the body of _temp_pgtable_map_pte() here and drop
> the helper, because the helper does (pte_val(pte) | pgprot_val(prot))
> which looks strange, until seeing here that 'pte' is only the address
> bits, so OR'ing in new prot bits without clearing old prot bits makes
> sense.
we do not need to clear the old bits since we going to keep those bits but add new bits which are required for resume. Let's hold your question here but I will would like to see how Alex view it.
>
> > + } while (dst_ptep++, src_ptep++, addr += PAGE_SIZE, addr < end);
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp,
> > + unsigned long start, unsigned long end,
> > + pgprot_t prot)
> > +{
> > + unsigned long addr = start;
> > + unsigned long next;
> > + unsigned long ret;
> > + pmd_t *src_pmdp;
> > + pmd_t *dst_pmdp;
> > +
> > + if (pud_none(READ_ONCE(*dst_pudp))) {
> > + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> > + if (!dst_pmdp)
> > + return -ENOMEM;
> > +
> > + pud_populate(NULL, dst_pudp, dst_pmdp);
> > + }
> > +
> > + dst_pmdp = pmd_offset(dst_pudp, start);
> > + src_pmdp = pmd_offset(src_pudp, start);
> > +
> > + do {
> > + pmd_t pmd = READ_ONCE(*src_pmdp);
> > +
> > + next = pmd_addr_end(addr, end);
> > +
> > + if (pmd_none(pmd))
> > + continue;
> > +
> > + if (pmd_leaf(pmd)) {
> > + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot)));
> > + } else {
> > + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, addr, next, prot);
> > + if (ret)
> > + return -ENOMEM;
> > + }
> > + } while (dst_pmdp++, src_pmdp++, addr = next, addr != end);
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp,
> > + unsigned long start,
> > + unsigned long end, pgprot_t prot)
> > +{
> > + unsigned long addr = start;
> > + unsigned long next;
> > + unsigned long ret;
> > + pud_t *dst_pudp;
> > + pud_t *src_pudp;
> > +
> > + if (p4d_none(READ_ONCE(*dst_p4dp))) {
> > + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> > + if (!dst_pudp)
> > + return -ENOMEM;
> > +
> > + p4d_populate(NULL, dst_p4dp, dst_pudp);
> > + }
> > +
> > + dst_pudp = pud_offset(dst_p4dp, start);
> > + src_pudp = pud_offset(src_p4dp, start);
> > +
> > + do {
> > + pud_t pud = READ_ONCE(*src_pudp);
> > +
> > + next = pud_addr_end(addr, end);
> > +
> > + if (pud_none(pud))
> > + continue;
> > +
> > + if (pud_leaf(pud)) {
> > + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot)));
> > + } else {
> > + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, addr, next, prot);
> > + if (ret)
> > + return -ENOMEM;
> > + }
> > + } while (dst_pudp++, src_pudp++, addr = next, addr != end);
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp,
> > + unsigned long start, unsigned long end,
> > + pgprot_t prot)
> > +{
> > + unsigned long addr = start;
> > + unsigned long next;
> > + unsigned long ret;
> > + p4d_t *dst_p4dp;
> > + p4d_t *src_p4dp;
> > +
> > + if (pgd_none(READ_ONCE(*dst_pgdp))) {
> > + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC);
> > + if (!dst_p4dp)
> > + return -ENOMEM;
> > +
> > + pgd_populate(NULL, dst_pgdp, dst_p4dp);
> > + }
> > +
> > + dst_p4dp = p4d_offset(dst_pgdp, start);
> > + src_p4dp = p4d_offset(src_pgdp, start);
> > +
> > + do {
> > + p4d_t p4d = READ_ONCE(*src_p4dp);
> > +
> > + next = p4d_addr_end(addr, end);
> > +
> > + if (p4d_none(READ_ONCE(*src_p4dp)))
> > + continue;
> > +
> > + if (p4d_leaf(p4d)) {
> > + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot)));
> > + } else {
> > + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, addr, next, prot);
> > + if (ret)
> > + return -ENOMEM;
> > + }
> > + } while (dst_p4dp++, src_p4dp++, addr = next, addr != end);
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long temp_pgtable_mapping(pgd_t *pgdp)
> > +{
> > + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn);
> > + unsigned long addr = PAGE_OFFSET;
> > + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
> > + pgd_t *src_pgdp = pgd_offset_k(addr);
> > + unsigned long next;
> > +
> > + do {
> > + next = pgd_addr_end(addr, end);
> > + if (pgd_none(READ_ONCE(*src_pgdp)))
> > + continue;
> > +
> > + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, next, PAGE_KERNEL))
> > + return -ENOMEM;
> > + } while (dst_pgdp++, src_pgdp++, addr = next, addr != end);
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long temp_pgtable_text_mapping(pgd_t *pgdp, unsigned long addr)
> > +{
> > + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
> > + pgd_t *src_pgdp = pgd_offset_k(addr);
> > +
> > + if (pgd_none(READ_ONCE(*src_pgdp)))
> > + return -EFAULT;
> > +
> > + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, addr, PAGE_KERNEL_EXEC))
> > + return -ENOMEM;
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long relocate_restore_code(void)
> > +{
> > + unsigned long ret;
> > + void *page = (void *)get_safe_page(GFP_ATOMIC);
> > +
> > + if (!page)
> > + return -ENOMEM;
> > +
> > + copy_page(page, hibernate_core_restore_code);
> > +
> > + /* Make the page containing the relocated code executable. */
> > + set_memory_x((unsigned long)page, 1);
> > +
> > + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)page);
> > + if (ret)
> > + return ret;
> > +
> > + return (unsigned long)page;
> > +}
> > +
> > +int swsusp_arch_resume(void)
> > +{
> > + unsigned long ret;
> > +
> > + /*
> > + * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> > + * we don't need to free it here.
> > + */
> > + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> > + if (!resume_pg_dir)
> > + return -ENOMEM;
> > +
> > + /*
> > + * The pages need to be writable when restoring the image.
> > + * Create a second copy of page table just for the linear map.
> > + * Use this temporary page table to restore the image.
> > + */
> > + ret = temp_pgtable_mapping(resume_pg_dir);
> > + if (ret)
> > + return (int)ret;
> > +
> > + /* Move the restore code to a new page so that it doesn't get overwritten by itself. */
> > + relocated_restore_code = relocate_restore_code();
> > + if (relocated_restore_code == -ENOMEM)
> > + return -ENOMEM;
> > +
> > + /*
> > + * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> > + * restore code can jumps to it after finished restore the image. The next execution
> > + * code doesn't find itself in a different address space after switching over to the
> > + * original page table used by the hibernated image.
> > + */
> > + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr);
> > + if (ret)
> > + return ret;
> > +
> > + hibernate_restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> > + resume_hdr.restore_cpu_addr);
> > +
> > + return 0;
> > +}
> > +
> > +#ifdef CONFIG_PM_SLEEP_SMP
> > +int hibernate_resume_nonboot_cpu_disable(void)
> > +{
> > + if (sleep_cpu < 0) {
> > + pr_err("Failing to resume from hibernate on an unknown CPU\n");
> > + return -ENODEV;
> > + }
> > +
> > + return freeze_secondary_cpus(sleep_cpu);
> > +}
> > +#endif
> > +
> > +static int __init riscv_hibernate_init(void)
> > +{
> > + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), GFP_KERNEL);
> > +
> > + if (WARN_ON(!hibernate_cpu_context))
> > + return -ENOMEM;
> > +
> > + return 0;
> > +}
> > +
> > +early_initcall(riscv_hibernate_init);
> > --
> > 2.34.1
> >
>
> Thanks,
> drew

2023-02-24 02:12:28

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

Hi Alex,

Wondering if you have any comment on the v4 series?

Thanks
Regards
Jee Heng

> -----Original Message-----
> From: JeeHeng Sia <[email protected]>
> Sent: Tuesday, 21 February, 2023 10:35 AM
> To: [email protected]; [email protected]; [email protected]
> Cc: [email protected]; [email protected]; JeeHeng Sia <[email protected]>; Leyfoon Tan
> <[email protected]>; Mason Huo <[email protected]>
> Subject: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> Low level Arch functions were created to support hibernation.
> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> cpu state onto the stack, then calling swsusp_save() to save the memory
> image.
>
> Arch specific hibernation header is implemented and is utilized by the
> arch_hibernation_header_restore() and arch_hibernation_header_save()
> functions. The arch specific hibernation header consists of satp, hartid,
> and the cpu_resume address. The kernel built version is also need to be
> saved into the hibernation image header to making sure only the same
> kernel is restore when resume.
>
> swsusp_arch_resume() creates a temporary page table that covering only
> the linear map. It copies the restore code to a 'safe' page, then start
> to restore the memory image. Once completed, it restores the original
> kernel's page table. It then calls into __hibernate_cpu_resume()
> to restore the CPU context. Finally, it follows the normal hibernation
> path back to the hibernation core.
>
> To enable hibernation/suspend to disk into RISCV, the below config
> need to be enabled:
> - CONFIG_ARCH_HIBERNATION_HEADER
> - CONFIG_ARCH_HIBERNATION_POSSIBLE
>
> Signed-off-by: Sia Jee Heng <[email protected]>
> Reviewed-by: Ley Foon Tan <[email protected]>
> Reviewed-by: Mason Huo <[email protected]>
> ---
> arch/riscv/Kconfig | 7 +
> arch/riscv/include/asm/assembler.h | 20 ++
> arch/riscv/include/asm/suspend.h | 19 ++
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/asm-offsets.c | 5 +
> arch/riscv/kernel/hibernate-asm.S | 77 +++++
> arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> 7 files changed, 576 insertions(+)
> create mode 100644 arch/riscv/kernel/hibernate-asm.S
> create mode 100644 arch/riscv/kernel/hibernate.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..4555848a817f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -690,6 +690,13 @@ menu "Power management options"
>
> source "kernel/power/Kconfig"
>
> +config ARCH_HIBERNATION_POSSIBLE
> + def_bool y
> +
> +config ARCH_HIBERNATION_HEADER
> + def_bool y
> + depends on HIBERNATION
> +
> endmenu # "Power management options"
>
> menu "CPU Power Management"
> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> index 727a97735493..68c46c0e0ea8 100644
> --- a/arch/riscv/include/asm/assembler.h
> +++ b/arch/riscv/include/asm/assembler.h
> @@ -59,4 +59,24 @@
> REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> .endm
>
> +/*
> + * copy_page - copy 1 page (4KB) of data from source to destination
> + * @a0 - destination
> + * @a1 - source
> + */
> + .macro copy_page a0, a1
> + lui a2, 0x1
> + add a2, a2, a0
> +1 :
> + REG_L t0, 0(a1)
> + REG_L t1, SZREG(a1)
> +
> + REG_S t0, 0(a0)
> + REG_S t1, SZREG(a0)
> +
> + addi a0, a0, 2 * SZREG
> + addi a1, a1, 2 * SZREG
> + bne a2, a0, 1b
> + .endm
> +
> #endif /* __ASM_ASSEMBLER_H */
> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> index 75419c5ca272..3362da56a9d8 100644
> --- a/arch/riscv/include/asm/suspend.h
> +++ b/arch/riscv/include/asm/suspend.h
> @@ -21,6 +21,11 @@ struct suspend_context {
> #endif
> };
>
> +/*
> + * Used by hibernation core and cleared during resume sequence
> + */
> +extern int in_suspend;
> +
> /* Low-level CPU suspend entry function */
> int __cpu_suspend_enter(struct suspend_context *context);
>
> @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> /* Used to save and restore the csr */
> void suspend_save_csrs(struct suspend_context *context);
> void suspend_restore_csrs(struct suspend_context *context);
> +
> +/* Low-level API to support hibernation */
> +int swsusp_arch_suspend(void);
> +int swsusp_arch_resume(void);
> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> +int arch_hibernation_header_restore(void *addr);
> +int __hibernate_cpu_resume(void);
> +
> +/* Used to resume on the CPU we hibernated on */
> +int hibernate_resume_nonboot_cpu_disable(void);
> +
> +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> + unsigned long cpu_resume);
> +asmlinkage int hibernate_core_restore_code(void);
> #endif
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 4cf303a779ab..daab341d55e4 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
>
> obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
>
> obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index df9444397908..d6a75aac1d27 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -9,6 +9,7 @@
> #include <linux/kbuild.h>
> #include <linux/mm.h>
> #include <linux/sched.h>
> +#include <linux/suspend.h>
> #include <asm/kvm_host.h>
> #include <asm/thread_info.h>
> #include <asm/ptrace.h>
> @@ -116,6 +117,10 @@ void asm_offsets(void)
>
> OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>
> + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> +
> OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> new file mode 100644
> index 000000000000..846affe4dced
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate-asm.S
> @@ -0,0 +1,77 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Hibernation low level support for RISCV.
> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <[email protected]>
> + */
> +
> +#include <asm/asm.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
> +#include <asm/csr.h>
> +
> +#include <linux/linkage.h>
> +
> +/*
> + * int __hibernate_cpu_resume(void)
> + * Switch back to the hibernated image's page table prior to restoring the CPU
> + * context.
> + *
> + * Always returns 0
> + */
> +ENTRY(__hibernate_cpu_resume)
> + /* switch to hibernated image's page table. */
> + csrw CSR_SATP, s0
> + sfence.vma
> +
> + REG_L a0, hibernate_cpu_context
> +
> + restore_csr
> + restore_reg
> +
> + /* Return zero value. */
> + add a0, zero, zero
> +
> + ret
> +END(__hibernate_cpu_resume)
> +
> +/*
> + * Prepare to restore the image.
> + * a0: satp of saved page tables.
> + * a1: satp of temporary page tables.
> + * a2: cpu_resume.
> + */
> +ENTRY(hibernate_restore_image)
> + mv s0, a0
> + mv s1, a1
> + mv s2, a2
> + REG_L s4, restore_pblist
> + REG_L a1, relocated_restore_code
> +
> + jalr a1
> +END(hibernate_restore_image)
> +
> +/*
> + * The below code will be executed from a 'safe' page.
> + * It first switches to the temporary page table, then starts to copy the pages
> + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> + * to restore the CPU context.
> + */
> +ENTRY(hibernate_core_restore_code)
> + /* switch to temp page table. */
> + csrw satp, s1
> + sfence.vma
> +.Lcopy:
> + /* The below code will restore the hibernated image. */
> + REG_L a1, HIBERN_PBE_ADDR(s4)
> + REG_L a0, HIBERN_PBE_ORIG(s4)
> +
> + copy_page a0, a1
> +
> + REG_L s4, HIBERN_PBE_NEXT(s4)
> + bnez s4, .Lcopy
> +
> + jalr s2
> +END(hibernate_core_restore_code)
> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> new file mode 100644
> index 000000000000..46a2f470db6e
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate.c
> @@ -0,0 +1,447 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Hibernation support for RISCV
> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <[email protected]>
> + */
> +
> +#include <asm/barrier.h>
> +#include <asm/cacheflush.h>
> +#include <asm/mmu_context.h>
> +#include <asm/page.h>
> +#include <asm/pgalloc.h>
> +#include <asm/pgtable.h>
> +#include <asm/sections.h>
> +#include <asm/set_memory.h>
> +#include <asm/smp.h>
> +#include <asm/suspend.h>
> +
> +#include <linux/cpu.h>
> +#include <linux/memblock.h>
> +#include <linux/pm.h>
> +#include <linux/sched.h>
> +#include <linux/suspend.h>
> +#include <linux/utsname.h>
> +
> +/* The logical cpu number we should resume on, initialised to a non-cpu number. */
> +static int sleep_cpu = -EINVAL;
> +
> +/* Pointer to the temporary resume page table. */
> +static pgd_t *resume_pg_dir;
> +
> +/* CPU context to be saved. */
> +struct suspend_context *hibernate_cpu_context;
> +EXPORT_SYMBOL_GPL(hibernate_cpu_context);
> +
> +unsigned long relocated_restore_code;
> +EXPORT_SYMBOL_GPL(relocated_restore_code);
> +
> +/**
> + * struct arch_hibernate_hdr_invariants - container to store kernel build version.
> + * @uts_version: to save the build number and date so that the we do not resume with
> + * a different kernel.
> + */
> +struct arch_hibernate_hdr_invariants {
> + char uts_version[__NEW_UTS_LEN + 1];
> +};
> +
> +/**
> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image.
> + * @invariants: container to store kernel build version.
> + * @hartid: to make sure same boot_cpu executes the hibernate/restore code.
> + * @saved_satp: original page table used by the hibernated image.
> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> + */
> +static struct arch_hibernate_hdr {
> + struct arch_hibernate_hdr_invariants invariants;
> + unsigned long hartid;
> + unsigned long saved_satp;
> + unsigned long restore_cpu_addr;
> +} resume_hdr;
> +
> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> +{
> + memset(i, 0, sizeof(*i));
> + memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> +}
> +
> +/*
> + * Check if the given pfn is in the 'nosave' section.
> + */
> +int pfn_is_nosave(unsigned long pfn)
> +{
> + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> +
> + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> +}
> +
> +void notrace save_processor_state(void)
> +{
> + WARN_ON(num_online_cpus() != 1);
> +}
> +
> +void notrace restore_processor_state(void)
> +{
> +}
> +
> +/*
> + * Helper parameters need to be saved to the hibernation image header.
> + */
> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> +{
> + struct arch_hibernate_hdr *hdr = addr;
> +
> + if (max_size < sizeof(*hdr))
> + return -EOVERFLOW;
> +
> + arch_hdr_invariants(&hdr->invariants);
> +
> + hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> + hdr->saved_satp = csr_read(CSR_SATP);
> + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(arch_hibernation_header_save);
> +
> +/*
> + * Retrieve the helper parameters from the hibernation image header.
> + */
> +int arch_hibernation_header_restore(void *addr)
> +{
> + struct arch_hibernate_hdr_invariants invariants;
> + struct arch_hibernate_hdr *hdr = addr;
> + int ret = 0;
> +
> + arch_hdr_invariants(&invariants);
> +
> + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> + pr_crit("Hibernate image not generated by this kernel!\n");
> + return -EINVAL;
> + }
> +
> + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> + if (sleep_cpu < 0) {
> + pr_crit("Hibernated on a CPU not known to this kernel!\n");
> + sleep_cpu = -EINVAL;
> + return -EINVAL;
> + }
> +
> +#ifdef CONFIG_SMP
> + ret = bringup_hibernate_cpu(sleep_cpu);
> + if (ret) {
> + sleep_cpu = -EINVAL;
> + return ret;
> + }
> +#endif
> + resume_hdr = *hdr;
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore);
> +
> +int swsusp_arch_suspend(void)
> +{
> + int ret = 0;
> +
> + if (__cpu_suspend_enter(hibernate_cpu_context)) {
> + sleep_cpu = smp_processor_id();
> + suspend_save_csrs(hibernate_cpu_context);
> + ret = swsusp_save();
> + } else {
> + suspend_restore_csrs(hibernate_cpu_context);
> + flush_tlb_all();
> + flush_icache_all();
> +
> + /*
> + * Tell the hibernation core that we've just restored the memory.
> + */
> + in_suspend = 0;
> + sleep_cpu = -EINVAL;
> + }
> +
> + return ret;
> +}
> +
> +static unsigned long _temp_pgtable_map_pte(pte_t *dst_ptep, pte_t *src_ptep,
> + unsigned long addr, pgprot_t prot)
> +{
> + pte_t pte = READ_ONCE(*src_ptep);
> +
> + if (pte_present(pte))
> + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot)));
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp,
> + unsigned long start, unsigned long end,
> + pgprot_t prot)
> +{
> + unsigned long addr = start;
> + pte_t *src_ptep;
> + pte_t *dst_ptep;
> +
> + if (pmd_none(READ_ONCE(*dst_pmdp))) {
> + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> + if (!dst_ptep)
> + return -ENOMEM;
> +
> + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep);
> + }
> +
> + dst_ptep = pte_offset_kernel(dst_pmdp, start);
> + src_ptep = pte_offset_kernel(src_pmdp, start);
> +
> + do {
> + _temp_pgtable_map_pte(dst_ptep, src_ptep, addr, prot);
> + } while (dst_ptep++, src_ptep++, addr += PAGE_SIZE, addr < end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp,
> + unsigned long start, unsigned long end,
> + pgprot_t prot)
> +{
> + unsigned long addr = start;
> + unsigned long next;
> + unsigned long ret;
> + pmd_t *src_pmdp;
> + pmd_t *dst_pmdp;
> +
> + if (pud_none(READ_ONCE(*dst_pudp))) {
> + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> + if (!dst_pmdp)
> + return -ENOMEM;
> +
> + pud_populate(NULL, dst_pudp, dst_pmdp);
> + }
> +
> + dst_pmdp = pmd_offset(dst_pudp, start);
> + src_pmdp = pmd_offset(src_pudp, start);
> +
> + do {
> + pmd_t pmd = READ_ONCE(*src_pmdp);
> +
> + next = pmd_addr_end(addr, end);
> +
> + if (pmd_none(pmd))
> + continue;
> +
> + if (pmd_leaf(pmd)) {
> + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot)));
> + } else {
> + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, addr, next, prot);
> + if (ret)
> + return -ENOMEM;
> + }
> + } while (dst_pmdp++, src_pmdp++, addr = next, addr != end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp,
> + unsigned long start,
> + unsigned long end, pgprot_t prot)
> +{
> + unsigned long addr = start;
> + unsigned long next;
> + unsigned long ret;
> + pud_t *dst_pudp;
> + pud_t *src_pudp;
> +
> + if (p4d_none(READ_ONCE(*dst_p4dp))) {
> + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> + if (!dst_pudp)
> + return -ENOMEM;
> +
> + p4d_populate(NULL, dst_p4dp, dst_pudp);
> + }
> +
> + dst_pudp = pud_offset(dst_p4dp, start);
> + src_pudp = pud_offset(src_p4dp, start);
> +
> + do {
> + pud_t pud = READ_ONCE(*src_pudp);
> +
> + next = pud_addr_end(addr, end);
> +
> + if (pud_none(pud))
> + continue;
> +
> + if (pud_leaf(pud)) {
> + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot)));
> + } else {
> + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, addr, next, prot);
> + if (ret)
> + return -ENOMEM;
> + }
> + } while (dst_pudp++, src_pudp++, addr = next, addr != end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp,
> + unsigned long start, unsigned long end,
> + pgprot_t prot)
> +{
> + unsigned long addr = start;
> + unsigned long next;
> + unsigned long ret;
> + p4d_t *dst_p4dp;
> + p4d_t *src_p4dp;
> +
> + if (pgd_none(READ_ONCE(*dst_pgdp))) {
> + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC);
> + if (!dst_p4dp)
> + return -ENOMEM;
> +
> + pgd_populate(NULL, dst_pgdp, dst_p4dp);
> + }
> +
> + dst_p4dp = p4d_offset(dst_pgdp, start);
> + src_p4dp = p4d_offset(src_pgdp, start);
> +
> + do {
> + p4d_t p4d = READ_ONCE(*src_p4dp);
> +
> + next = p4d_addr_end(addr, end);
> +
> + if (p4d_none(READ_ONCE(*src_p4dp)))
> + continue;
> +
> + if (p4d_leaf(p4d)) {
> + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot)));
> + } else {
> + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, addr, next, prot);
> + if (ret)
> + return -ENOMEM;
> + }
> + } while (dst_p4dp++, src_p4dp++, addr = next, addr != end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp)
> +{
> + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn);
> + unsigned long addr = PAGE_OFFSET;
> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
> + pgd_t *src_pgdp = pgd_offset_k(addr);
> + unsigned long next;
> +
> + do {
> + next = pgd_addr_end(addr, end);
> + if (pgd_none(READ_ONCE(*src_pgdp)))
> + continue;
> +
> + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, next, PAGE_KERNEL))
> + return -ENOMEM;
> + } while (dst_pgdp++, src_pgdp++, addr = next, addr != end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_text_mapping(pgd_t *pgdp, unsigned long addr)
> +{
> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
> + pgd_t *src_pgdp = pgd_offset_k(addr);
> +
> + if (pgd_none(READ_ONCE(*src_pgdp)))
> + return -EFAULT;
> +
> + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, addr, PAGE_KERNEL_EXEC))
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static unsigned long relocate_restore_code(void)
> +{
> + unsigned long ret;
> + void *page = (void *)get_safe_page(GFP_ATOMIC);
> +
> + if (!page)
> + return -ENOMEM;
> +
> + copy_page(page, hibernate_core_restore_code);
> +
> + /* Make the page containing the relocated code executable. */
> + set_memory_x((unsigned long)page, 1);
> +
> + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)page);
> + if (ret)
> + return ret;
> +
> + return (unsigned long)page;
> +}
> +
> +int swsusp_arch_resume(void)
> +{
> + unsigned long ret;
> +
> + /*
> + * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> + * we don't need to free it here.
> + */
> + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> + if (!resume_pg_dir)
> + return -ENOMEM;
> +
> + /*
> + * The pages need to be writable when restoring the image.
> + * Create a second copy of page table just for the linear map.
> + * Use this temporary page table to restore the image.
> + */
> + ret = temp_pgtable_mapping(resume_pg_dir);
> + if (ret)
> + return (int)ret;
> +
> + /* Move the restore code to a new page so that it doesn't get overwritten by itself. */
> + relocated_restore_code = relocate_restore_code();
> + if (relocated_restore_code == -ENOMEM)
> + return -ENOMEM;
> +
> + /*
> + * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> + * restore code can jumps to it after finished restore the image. The next execution
> + * code doesn't find itself in a different address space after switching over to the
> + * original page table used by the hibernated image.
> + */
> + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr);
> + if (ret)
> + return ret;
> +
> + hibernate_restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> + resume_hdr.restore_cpu_addr);
> +
> + return 0;
> +}
> +
> +#ifdef CONFIG_PM_SLEEP_SMP
> +int hibernate_resume_nonboot_cpu_disable(void)
> +{
> + if (sleep_cpu < 0) {
> + pr_err("Failing to resume from hibernate on an unknown CPU\n");
> + return -ENODEV;
> + }
> +
> + return freeze_secondary_cpus(sleep_cpu);
> +}
> +#endif
> +
> +static int __init riscv_hibernate_init(void)
> +{
> + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), GFP_KERNEL);
> +
> + if (WARN_ON(!hibernate_cpu_context))
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +early_initcall(riscv_hibernate_init);
> --
> 2.34.1


2023-02-24 09:00:35

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

On Fri, Feb 24, 2023 at 02:05:43AM +0000, JeeHeng Sia wrote:
>
>
> > -----Original Message-----
> > From: Andrew Jones <[email protected]>
> > Sent: Friday, 24 February, 2023 2:07 AM
> > To: JeeHeng Sia <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >
> > On Tue, Feb 21, 2023 at 10:35:23AM +0800, Sia Jee Heng wrote:
> > > Low level Arch functions were created to support hibernation.
> > > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > > cpu state onto the stack, then calling swsusp_save() to save the memory
> > > image.
> > >
> > > Arch specific hibernation header is implemented and is utilized by the
> > > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > > functions. The arch specific hibernation header consists of satp, hartid,
> > > and the cpu_resume address. The kernel built version is also need to be
> > > saved into the hibernation image header to making sure only the same
> > > kernel is restore when resume.
> > >
> > > swsusp_arch_resume() creates a temporary page table that covering only
> > > the linear map. It copies the restore code to a 'safe' page, then start
> > > to restore the memory image. Once completed, it restores the original
> > > kernel's page table. It then calls into __hibernate_cpu_resume()
> > > to restore the CPU context. Finally, it follows the normal hibernation
> > > path back to the hibernation core.
> > >
> > > To enable hibernation/suspend to disk into RISCV, the below config
> > > need to be enabled:
> > > - CONFIG_ARCH_HIBERNATION_HEADER
> > > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> > >
> > > Signed-off-by: Sia Jee Heng <[email protected]>
> > > Reviewed-by: Ley Foon Tan <[email protected]>
> > > Reviewed-by: Mason Huo <[email protected]>
> > > ---
> > > arch/riscv/Kconfig | 7 +
> > > arch/riscv/include/asm/assembler.h | 20 ++
> > > arch/riscv/include/asm/suspend.h | 19 ++
> > > arch/riscv/kernel/Makefile | 1 +
> > > arch/riscv/kernel/asm-offsets.c | 5 +
> > > arch/riscv/kernel/hibernate-asm.S | 77 +++++
> > > arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> > > 7 files changed, 576 insertions(+)
> > > create mode 100644 arch/riscv/kernel/hibernate-asm.S
> > > create mode 100644 arch/riscv/kernel/hibernate.c
> > >
> > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > index e2b656043abf..4555848a817f 100644
> > > --- a/arch/riscv/Kconfig
> > > +++ b/arch/riscv/Kconfig
> > > @@ -690,6 +690,13 @@ menu "Power management options"
> > >
> > > source "kernel/power/Kconfig"
> > >
> > > +config ARCH_HIBERNATION_POSSIBLE
> > > + def_bool y
> > > +
> > > +config ARCH_HIBERNATION_HEADER
> > > + def_bool y
> > > + depends on HIBERNATION
> >
> > nit: I think this can be simplified as def_bool HIBERNATION
> good suggestion. will change it.
> >
> > > +
> > > endmenu # "Power management options"
> > >
> > > menu "CPU Power Management"
> > > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > > index 727a97735493..68c46c0e0ea8 100644
> > > --- a/arch/riscv/include/asm/assembler.h
> > > +++ b/arch/riscv/include/asm/assembler.h
> > > @@ -59,4 +59,24 @@
> > > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > > .endm
> > >
> > > +/*
> > > + * copy_page - copy 1 page (4KB) of data from source to destination
> > > + * @a0 - destination
> > > + * @a1 - source
> > > + */
> > > + .macro copy_page a0, a1
> > > + lui a2, 0x1
> > > + add a2, a2, a0
> > > +1 :
> > ^ please remove this space
> can't remove it otherwise checkpatch will throws ERROR: spaces required around that ':'

Oh, right, labels in macros have this requirement.

> >
> > > + REG_L t0, 0(a1)
> > > + REG_L t1, SZREG(a1)
> > > +
> > > + REG_S t0, 0(a0)
> > > + REG_S t1, SZREG(a0)
> > > +
> > > + addi a0, a0, 2 * SZREG
> > > + addi a1, a1, 2 * SZREG
> > > + bne a2, a0, 1b
> > > + .endm
> > > +
> > > #endif /* __ASM_ASSEMBLER_H */
> > > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > > index 75419c5ca272..3362da56a9d8 100644
> > > --- a/arch/riscv/include/asm/suspend.h
> > > +++ b/arch/riscv/include/asm/suspend.h
> > > @@ -21,6 +21,11 @@ struct suspend_context {
> > > #endif
> > > };
> > >
> > > +/*
> > > + * Used by hibernation core and cleared during resume sequence
> > > + */
> > > +extern int in_suspend;
> > > +
> > > /* Low-level CPU suspend entry function */
> > > int __cpu_suspend_enter(struct suspend_context *context);
> > >
> > > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> > > /* Used to save and restore the csr */
> > > void suspend_save_csrs(struct suspend_context *context);
> > > void suspend_restore_csrs(struct suspend_context *context);
> > > +
> > > +/* Low-level API to support hibernation */
> > > +int swsusp_arch_suspend(void);
> > > +int swsusp_arch_resume(void);
> > > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > > +int arch_hibernation_header_restore(void *addr);
> > > +int __hibernate_cpu_resume(void);
> > > +
> > > +/* Used to resume on the CPU we hibernated on */
> > > +int hibernate_resume_nonboot_cpu_disable(void);
> > > +
> > > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > > + unsigned long cpu_resume);
> > > +asmlinkage int hibernate_core_restore_code(void);
> > > #endif
> > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > index 4cf303a779ab..daab341d55e4 100644
> > > --- a/arch/riscv/kernel/Makefile
> > > +++ b/arch/riscv/kernel/Makefile
> > > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> > > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> > >
> > > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> > > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> > >
> > > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> > > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > index df9444397908..d6a75aac1d27 100644
> > > --- a/arch/riscv/kernel/asm-offsets.c
> > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > @@ -9,6 +9,7 @@
> > > #include <linux/kbuild.h>
> > > #include <linux/mm.h>
> > > #include <linux/sched.h>
> > > +#include <linux/suspend.h>
> > > #include <asm/kvm_host.h>
> > > #include <asm/thread_info.h>
> > > #include <asm/ptrace.h>
> > > @@ -116,6 +117,10 @@ void asm_offsets(void)
> > >
> > > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> > >
> > > + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > > + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > > +
> > > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> > > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> > > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > > new file mode 100644
> > > index 000000000000..846affe4dced
> > > --- /dev/null
> > > +++ b/arch/riscv/kernel/hibernate-asm.S
> > > @@ -0,0 +1,77 @@
> > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > +/*
> > > + * Hibernation low level support for RISCV.
> > > + *
> > > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > > + *
> > > + * Author: Jee Heng Sia <[email protected]>
> > > + */
> > > +
> > > +#include <asm/asm.h>
> > > +#include <asm/asm-offsets.h>
> > > +#include <asm/assembler.h>
> > > +#include <asm/csr.h>
> > > +
> > > +#include <linux/linkage.h>
> > > +
> > > +/*
> > > + * int __hibernate_cpu_resume(void)
> > > + * Switch back to the hibernated image's page table prior to restoring the CPU
> > > + * context.
> > > + *
> > > + * Always returns 0
> > > + */
> > > +ENTRY(__hibernate_cpu_resume)
> > > + /* switch to hibernated image's page table. */
> > > + csrw CSR_SATP, s0
> > > + sfence.vma
> > > +
> > > + REG_L a0, hibernate_cpu_context
> > > +
> > > + restore_csr
> > > + restore_reg
> > > +
> > > + /* Return zero value. */
> > > + add a0, zero, zero
> >
> > nit: mv a0, zero
> sure
> >
> > > +
> > > + ret
> > > +END(__hibernate_cpu_resume)
> > > +
> > > +/*
> > > + * Prepare to restore the image.
> > > + * a0: satp of saved page tables.
> > > + * a1: satp of temporary page tables.
> > > + * a2: cpu_resume.
> > > + */
> > > +ENTRY(hibernate_restore_image)
> > > + mv s0, a0
> > > + mv s1, a1
> > > + mv s2, a2
> > > + REG_L s4, restore_pblist
> > > + REG_L a1, relocated_restore_code
> > > +
> > > + jalr a1
> > > +END(hibernate_restore_image)
> > > +
> > > +/*
> > > + * The below code will be executed from a 'safe' page.
> > > + * It first switches to the temporary page table, then starts to copy the pages
> > > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> > > + * to restore the CPU context.
> > > + */
> > > +ENTRY(hibernate_core_restore_code)
> > > + /* switch to temp page table. */
> > > + csrw satp, s1
> > > + sfence.vma
> > > +.Lcopy:
> > > + /* The below code will restore the hibernated image. */
> > > + REG_L a1, HIBERN_PBE_ADDR(s4)
> > > + REG_L a0, HIBERN_PBE_ORIG(s4)
> >
> > Are we sure restore_pblist will never be NULL?
> restore_pblist is a link-list, it will be null during initialization or during page clean up by hibernation core. During the initial resume process, the hibernation core will check the header and load the pages. If everything works correctly, the page will be linked to the restore_pblist and then invoke swsusp_arch_resume() else hibernation core will throws error and failed to resume from the hibernated image.

I know restore_pblist is a linked-list and this doesn't answer the
question. The comment above restore_pblist says

/*
* List of PBEs needed for restoring the pages that were allocated before
* the suspend and included in the suspend image, but have also been
* allocated by the "resume" kernel, so their contents cannot be written
* directly to their "original" page frames.
*/

which implies the pages that end up on this list are "special". My
question is whether or not we're guaranteed to have at least one
of these special pages. If not, we shouldn't assume s4 is non-null.
If so, then a comment stating why that's guaranteed would be nice.

> >
> > > +
> > > + copy_page a0, a1
> > > +
> > > + REG_L s4, HIBERN_PBE_NEXT(s4)
> > > + bnez s4, .Lcopy
> > > +
> > > + jalr s2
> > > +END(hibernate_core_restore_code)
> > > diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> > > new file mode 100644
> > > index 000000000000..46a2f470db6e
> > > --- /dev/null
> > > +++ b/arch/riscv/kernel/hibernate.c
> > > @@ -0,0 +1,447 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * Hibernation support for RISCV
> > > + *
> > > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > > + *
> > > + * Author: Jee Heng Sia <[email protected]>
> > > + */
> > > +
> > > +#include <asm/barrier.h>
> > > +#include <asm/cacheflush.h>
> > > +#include <asm/mmu_context.h>
> > > +#include <asm/page.h>
> > > +#include <asm/pgalloc.h>
> > > +#include <asm/pgtable.h>
> > > +#include <asm/sections.h>
> > > +#include <asm/set_memory.h>
> > > +#include <asm/smp.h>
> > > +#include <asm/suspend.h>
> > > +
> > > +#include <linux/cpu.h>
> > > +#include <linux/memblock.h>
> > > +#include <linux/pm.h>
> > > +#include <linux/sched.h>
> > > +#include <linux/suspend.h>
> > > +#include <linux/utsname.h>
> > > +
> > > +/* The logical cpu number we should resume on, initialised to a non-cpu number. */
> > > +static int sleep_cpu = -EINVAL;
> > > +
> > > +/* Pointer to the temporary resume page table. */
> > > +static pgd_t *resume_pg_dir;
> > > +
> > > +/* CPU context to be saved. */
> > > +struct suspend_context *hibernate_cpu_context;
> > > +EXPORT_SYMBOL_GPL(hibernate_cpu_context);
> > > +
> > > +unsigned long relocated_restore_code;
> > > +EXPORT_SYMBOL_GPL(relocated_restore_code);
> > > +
> > > +/**
> > > + * struct arch_hibernate_hdr_invariants - container to store kernel build version.
> > > + * @uts_version: to save the build number and date so that the we do not resume with
> > > + * a different kernel.
> > > + */
> > > +struct arch_hibernate_hdr_invariants {
> > > + char uts_version[__NEW_UTS_LEN + 1];
> > > +};
> > > +
> > > +/**
> > > + * struct arch_hibernate_hdr - helper parameters that help us to restore the image.
> > > + * @invariants: container to store kernel build version.
> > > + * @hartid: to make sure same boot_cpu executes the hibernate/restore code.
> > > + * @saved_satp: original page table used by the hibernated image.
> > > + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> > > + */
> > > +static struct arch_hibernate_hdr {
> > > + struct arch_hibernate_hdr_invariants invariants;
> > > + unsigned long hartid;
> > > + unsigned long saved_satp;
> > > + unsigned long restore_cpu_addr;
> > > +} resume_hdr;
> > > +
> > > +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> > > +{
> > > + memset(i, 0, sizeof(*i));
> > > + memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> > > +}
> > > +
> > > +/*
> > > + * Check if the given pfn is in the 'nosave' section.
> > > + */
> > > +int pfn_is_nosave(unsigned long pfn)
> > > +{
> > > + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> > > + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> > > +
> > > + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> > > +}
> > > +
> > > +void notrace save_processor_state(void)
> > > +{
> > > + WARN_ON(num_online_cpus() != 1);
> > > +}
> > > +
> > > +void notrace restore_processor_state(void)
> > > +{
> > > +}
> > > +
> > > +/*
> > > + * Helper parameters need to be saved to the hibernation image header.
> > > + */
> > > +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> > > +{
> > > + struct arch_hibernate_hdr *hdr = addr;
> > > +
> > > + if (max_size < sizeof(*hdr))
> > > + return -EOVERFLOW;
> > > +
> > > + arch_hdr_invariants(&hdr->invariants);
> > > +
> > > + hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> > > + hdr->saved_satp = csr_read(CSR_SATP);
> > > + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> > > +
> > > + return 0;
> > > +}
> > > +EXPORT_SYMBOL_GPL(arch_hibernation_header_save);
> > > +
> > > +/*
> > > + * Retrieve the helper parameters from the hibernation image header.
> > > + */
> > > +int arch_hibernation_header_restore(void *addr)
> > > +{
> > > + struct arch_hibernate_hdr_invariants invariants;
> > > + struct arch_hibernate_hdr *hdr = addr;
> > > + int ret = 0;
> > > +
> > > + arch_hdr_invariants(&invariants);
> > > +
> > > + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> > > + pr_crit("Hibernate image not generated by this kernel!\n");
> > > + return -EINVAL;
> > > + }
> > > +
> > > + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> > > + if (sleep_cpu < 0) {
> > > + pr_crit("Hibernated on a CPU not known to this kernel!\n");
> > > + sleep_cpu = -EINVAL;
> > > + return -EINVAL;
> > > + }
> > > +
> > > +#ifdef CONFIG_SMP
> > > + ret = bringup_hibernate_cpu(sleep_cpu);
> > > + if (ret) {
> > > + sleep_cpu = -EINVAL;
> > > + return ret;
> > > + }
> > > +#endif
> > > + resume_hdr = *hdr;
> > > +
> > > + return ret;
> > > +}
> > > +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore);
> > > +
> > > +int swsusp_arch_suspend(void)
> > > +{
> > > + int ret = 0;
> > > +
> > > + if (__cpu_suspend_enter(hibernate_cpu_context)) {
> > > + sleep_cpu = smp_processor_id();
> > > + suspend_save_csrs(hibernate_cpu_context);
> > > + ret = swsusp_save();
> > > + } else {
> > > + suspend_restore_csrs(hibernate_cpu_context);
> > > + flush_tlb_all();
> > > + flush_icache_all();
> > > +
> > > + /*
> > > + * Tell the hibernation core that we've just restored the memory.
> > > + */
> > > + in_suspend = 0;
> > > + sleep_cpu = -EINVAL;
> > > + }
> > > +
> > > + return ret;
> > > +}
> > > +
> > > +static unsigned long _temp_pgtable_map_pte(pte_t *dst_ptep, pte_t *src_ptep,
> > > + unsigned long addr, pgprot_t prot)
> > > +{
> > > + pte_t pte = READ_ONCE(*src_ptep);
> > > +
> > > + if (pte_present(pte))
> > > + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot)));
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static unsigned long temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp,
> > > + unsigned long start, unsigned long end,
> > > + pgprot_t prot)
> > > +{
> > > + unsigned long addr = start;
> > > + pte_t *src_ptep;
> > > + pte_t *dst_ptep;
> > > +
> > > + if (pmd_none(READ_ONCE(*dst_pmdp))) {
> > > + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> > > + if (!dst_ptep)
> > > + return -ENOMEM;
> > > +
> > > + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep);
> > > + }
> > > +
> > > + dst_ptep = pte_offset_kernel(dst_pmdp, start);
> > > + src_ptep = pte_offset_kernel(src_pmdp, start);
> > > +
> > > + do {
> > > + _temp_pgtable_map_pte(dst_ptep, src_ptep, addr, prot);
> >
> > I think I'd rather have the body of _temp_pgtable_map_pte() here and drop
> > the helper, because the helper does (pte_val(pte) | pgprot_val(prot))
> > which looks strange, until seeing here that 'pte' is only the address
> > bits, so OR'ing in new prot bits without clearing old prot bits makes
> > sense.
> we do not need to clear the old bits since we going to keep those bits but add new bits which are required for resume. Let's hold your question here but I will would like to see how Alex view it.

I confused myself a bit in my first read, so some of what I said isn't
relevant, but I still wonder why we don't want to be more explicit about
what prot bits are present in the end, and I still wonder why we need such
a simple helper function which is used in exactly one place. Indeed, the
pattern of all the other pgtable functions below is to put the set_p*
calls directly in the loop.

Thanks,
drew

2023-02-24 09:33:46

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Friday, 24 February, 2023 5:00 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> On Fri, Feb 24, 2023 at 02:05:43AM +0000, JeeHeng Sia wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Jones <[email protected]>
> > > Sent: Friday, 24 February, 2023 2:07 AM
> > > To: JeeHeng Sia <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > >
> > > On Tue, Feb 21, 2023 at 10:35:23AM +0800, Sia Jee Heng wrote:
> > > > Low level Arch functions were created to support hibernation.
> > > > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > > > cpu state onto the stack, then calling swsusp_save() to save the memory
> > > > image.
> > > >
> > > > Arch specific hibernation header is implemented and is utilized by the
> > > > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > > > functions. The arch specific hibernation header consists of satp, hartid,
> > > > and the cpu_resume address. The kernel built version is also need to be
> > > > saved into the hibernation image header to making sure only the same
> > > > kernel is restore when resume.
> > > >
> > > > swsusp_arch_resume() creates a temporary page table that covering only
> > > > the linear map. It copies the restore code to a 'safe' page, then start
> > > > to restore the memory image. Once completed, it restores the original
> > > > kernel's page table. It then calls into __hibernate_cpu_resume()
> > > > to restore the CPU context. Finally, it follows the normal hibernation
> > > > path back to the hibernation core.
> > > >
> > > > To enable hibernation/suspend to disk into RISCV, the below config
> > > > need to be enabled:
> > > > - CONFIG_ARCH_HIBERNATION_HEADER
> > > > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> > > >
> > > > Signed-off-by: Sia Jee Heng <[email protected]>
> > > > Reviewed-by: Ley Foon Tan <[email protected]>
> > > > Reviewed-by: Mason Huo <[email protected]>
> > > > ---
> > > > arch/riscv/Kconfig | 7 +
> > > > arch/riscv/include/asm/assembler.h | 20 ++
> > > > arch/riscv/include/asm/suspend.h | 19 ++
> > > > arch/riscv/kernel/Makefile | 1 +
> > > > arch/riscv/kernel/asm-offsets.c | 5 +
> > > > arch/riscv/kernel/hibernate-asm.S | 77 +++++
> > > > arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> > > > 7 files changed, 576 insertions(+)
> > > > create mode 100644 arch/riscv/kernel/hibernate-asm.S
> > > > create mode 100644 arch/riscv/kernel/hibernate.c
> > > >
> > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > index e2b656043abf..4555848a817f 100644
> > > > --- a/arch/riscv/Kconfig
> > > > +++ b/arch/riscv/Kconfig
> > > > @@ -690,6 +690,13 @@ menu "Power management options"
> > > >
> > > > source "kernel/power/Kconfig"
> > > >
> > > > +config ARCH_HIBERNATION_POSSIBLE
> > > > + def_bool y
> > > > +
> > > > +config ARCH_HIBERNATION_HEADER
> > > > + def_bool y
> > > > + depends on HIBERNATION
> > >
> > > nit: I think this can be simplified as def_bool HIBERNATION
> > good suggestion. will change it.
> > >
> > > > +
> > > > endmenu # "Power management options"
> > > >
> > > > menu "CPU Power Management"
> > > > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > > > index 727a97735493..68c46c0e0ea8 100644
> > > > --- a/arch/riscv/include/asm/assembler.h
> > > > +++ b/arch/riscv/include/asm/assembler.h
> > > > @@ -59,4 +59,24 @@
> > > > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > > > .endm
> > > >
> > > > +/*
> > > > + * copy_page - copy 1 page (4KB) of data from source to destination
> > > > + * @a0 - destination
> > > > + * @a1 - source
> > > > + */
> > > > + .macro copy_page a0, a1
> > > > + lui a2, 0x1
> > > > + add a2, a2, a0
> > > > +1 :
> > > ^ please remove this space
> > can't remove it otherwise checkpatch will throws ERROR: spaces required around that ':'
>
> Oh, right, labels in macros have this requirement.
>
> > >
> > > > + REG_L t0, 0(a1)
> > > > + REG_L t1, SZREG(a1)
> > > > +
> > > > + REG_S t0, 0(a0)
> > > > + REG_S t1, SZREG(a0)
> > > > +
> > > > + addi a0, a0, 2 * SZREG
> > > > + addi a1, a1, 2 * SZREG
> > > > + bne a2, a0, 1b
> > > > + .endm
> > > > +
> > > > #endif /* __ASM_ASSEMBLER_H */
> > > > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > > > index 75419c5ca272..3362da56a9d8 100644
> > > > --- a/arch/riscv/include/asm/suspend.h
> > > > +++ b/arch/riscv/include/asm/suspend.h
> > > > @@ -21,6 +21,11 @@ struct suspend_context {
> > > > #endif
> > > > };
> > > >
> > > > +/*
> > > > + * Used by hibernation core and cleared during resume sequence
> > > > + */
> > > > +extern int in_suspend;
> > > > +
> > > > /* Low-level CPU suspend entry function */
> > > > int __cpu_suspend_enter(struct suspend_context *context);
> > > >
> > > > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> > > > /* Used to save and restore the csr */
> > > > void suspend_save_csrs(struct suspend_context *context);
> > > > void suspend_restore_csrs(struct suspend_context *context);
> > > > +
> > > > +/* Low-level API to support hibernation */
> > > > +int swsusp_arch_suspend(void);
> > > > +int swsusp_arch_resume(void);
> > > > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > > > +int arch_hibernation_header_restore(void *addr);
> > > > +int __hibernate_cpu_resume(void);
> > > > +
> > > > +/* Used to resume on the CPU we hibernated on */
> > > > +int hibernate_resume_nonboot_cpu_disable(void);
> > > > +
> > > > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > > > + unsigned long cpu_resume);
> > > > +asmlinkage int hibernate_core_restore_code(void);
> > > > #endif
> > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > index 4cf303a779ab..daab341d55e4 100644
> > > > --- a/arch/riscv/kernel/Makefile
> > > > +++ b/arch/riscv/kernel/Makefile
> > > > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> > > > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> > > >
> > > > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> > > > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> > > >
> > > > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> > > > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > > index df9444397908..d6a75aac1d27 100644
> > > > --- a/arch/riscv/kernel/asm-offsets.c
> > > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > > @@ -9,6 +9,7 @@
> > > > #include <linux/kbuild.h>
> > > > #include <linux/mm.h>
> > > > #include <linux/sched.h>
> > > > +#include <linux/suspend.h>
> > > > #include <asm/kvm_host.h>
> > > > #include <asm/thread_info.h>
> > > > #include <asm/ptrace.h>
> > > > @@ -116,6 +117,10 @@ void asm_offsets(void)
> > > >
> > > > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> > > >
> > > > + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > > > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > > > + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > > > +
> > > > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> > > > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> > > > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > > > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > > > new file mode 100644
> > > > index 000000000000..846affe4dced
> > > > --- /dev/null
> > > > +++ b/arch/riscv/kernel/hibernate-asm.S
> > > > @@ -0,0 +1,77 @@
> > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > +/*
> > > > + * Hibernation low level support for RISCV.
> > > > + *
> > > > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > > > + *
> > > > + * Author: Jee Heng Sia <[email protected]>
> > > > + */
> > > > +
> > > > +#include <asm/asm.h>
> > > > +#include <asm/asm-offsets.h>
> > > > +#include <asm/assembler.h>
> > > > +#include <asm/csr.h>
> > > > +
> > > > +#include <linux/linkage.h>
> > > > +
> > > > +/*
> > > > + * int __hibernate_cpu_resume(void)
> > > > + * Switch back to the hibernated image's page table prior to restoring the CPU
> > > > + * context.
> > > > + *
> > > > + * Always returns 0
> > > > + */
> > > > +ENTRY(__hibernate_cpu_resume)
> > > > + /* switch to hibernated image's page table. */
> > > > + csrw CSR_SATP, s0
> > > > + sfence.vma
> > > > +
> > > > + REG_L a0, hibernate_cpu_context
> > > > +
> > > > + restore_csr
> > > > + restore_reg
> > > > +
> > > > + /* Return zero value. */
> > > > + add a0, zero, zero
> > >
> > > nit: mv a0, zero
> > sure
> > >
> > > > +
> > > > + ret
> > > > +END(__hibernate_cpu_resume)
> > > > +
> > > > +/*
> > > > + * Prepare to restore the image.
> > > > + * a0: satp of saved page tables.
> > > > + * a1: satp of temporary page tables.
> > > > + * a2: cpu_resume.
> > > > + */
> > > > +ENTRY(hibernate_restore_image)
> > > > + mv s0, a0
> > > > + mv s1, a1
> > > > + mv s2, a2
> > > > + REG_L s4, restore_pblist
> > > > + REG_L a1, relocated_restore_code
> > > > +
> > > > + jalr a1
> > > > +END(hibernate_restore_image)
> > > > +
> > > > +/*
> > > > + * The below code will be executed from a 'safe' page.
> > > > + * It first switches to the temporary page table, then starts to copy the pages
> > > > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> > > > + * to restore the CPU context.
> > > > + */
> > > > +ENTRY(hibernate_core_restore_code)
> > > > + /* switch to temp page table. */
> > > > + csrw satp, s1
> > > > + sfence.vma
> > > > +.Lcopy:
> > > > + /* The below code will restore the hibernated image. */
> > > > + REG_L a1, HIBERN_PBE_ADDR(s4)
> > > > + REG_L a0, HIBERN_PBE_ORIG(s4)
> > >
> > > Are we sure restore_pblist will never be NULL?
> > restore_pblist is a link-list, it will be null during initialization or during page clean up by hibernation core. During the initial resume
> process, the hibernation core will check the header and load the pages. If everything works correctly, the page will be linked to the
> restore_pblist and then invoke swsusp_arch_resume() else hibernation core will throws error and failed to resume from the
> hibernated image.
>
> I know restore_pblist is a linked-list and this doesn't answer the
> question. The comment above restore_pblist says
>
> /*
> * List of PBEs needed for restoring the pages that were allocated before
> * the suspend and included in the suspend image, but have also been
> * allocated by the "resume" kernel, so their contents cannot be written
> * directly to their "original" page frames.
> */
>
> which implies the pages that end up on this list are "special". My
> question is whether or not we're guaranteed to have at least one
> of these special pages. If not, we shouldn't assume s4 is non-null.
> If so, then a comment stating why that's guaranteed would be nice.
The restore_pblist will not be null otherwise swsusp_arch_resume wouldn't get invoked. you can find how the link-list are link and how it checks against validity at https://elixir.bootlin.com/linux/v6.2-rc8/source/kernel/power/snapshot.c . " A comment stating why that's guaranteed would be nice" ? Hmm, perhaps this is out of my scope but I do believe in the page validity checking in the link I shared.
>
> > >
> > > > +
> > > > + copy_page a0, a1
> > > > +
> > > > + REG_L s4, HIBERN_PBE_NEXT(s4)
> > > > + bnez s4, .Lcopy
> > > > +
> > > > + jalr s2
> > > > +END(hibernate_core_restore_code)
> > > > diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> > > > new file mode 100644
> > > > index 000000000000..46a2f470db6e
> > > > --- /dev/null
> > > > +++ b/arch/riscv/kernel/hibernate.c
> > > > @@ -0,0 +1,447 @@
> > > > +// SPDX-License-Identifier: GPL-2.0-only
> > > > +/*
> > > > + * Hibernation support for RISCV
> > > > + *
> > > > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > > > + *
> > > > + * Author: Jee Heng Sia <[email protected]>
> > > > + */
> > > > +
> > > > +#include <asm/barrier.h>
> > > > +#include <asm/cacheflush.h>
> > > > +#include <asm/mmu_context.h>
> > > > +#include <asm/page.h>
> > > > +#include <asm/pgalloc.h>
> > > > +#include <asm/pgtable.h>
> > > > +#include <asm/sections.h>
> > > > +#include <asm/set_memory.h>
> > > > +#include <asm/smp.h>
> > > > +#include <asm/suspend.h>
> > > > +
> > > > +#include <linux/cpu.h>
> > > > +#include <linux/memblock.h>
> > > > +#include <linux/pm.h>
> > > > +#include <linux/sched.h>
> > > > +#include <linux/suspend.h>
> > > > +#include <linux/utsname.h>
> > > > +
> > > > +/* The logical cpu number we should resume on, initialised to a non-cpu number. */
> > > > +static int sleep_cpu = -EINVAL;
> > > > +
> > > > +/* Pointer to the temporary resume page table. */
> > > > +static pgd_t *resume_pg_dir;
> > > > +
> > > > +/* CPU context to be saved. */
> > > > +struct suspend_context *hibernate_cpu_context;
> > > > +EXPORT_SYMBOL_GPL(hibernate_cpu_context);
> > > > +
> > > > +unsigned long relocated_restore_code;
> > > > +EXPORT_SYMBOL_GPL(relocated_restore_code);
> > > > +
> > > > +/**
> > > > + * struct arch_hibernate_hdr_invariants - container to store kernel build version.
> > > > + * @uts_version: to save the build number and date so that the we do not resume with
> > > > + * a different kernel.
> > > > + */
> > > > +struct arch_hibernate_hdr_invariants {
> > > > + char uts_version[__NEW_UTS_LEN + 1];
> > > > +};
> > > > +
> > > > +/**
> > > > + * struct arch_hibernate_hdr - helper parameters that help us to restore the image.
> > > > + * @invariants: container to store kernel build version.
> > > > + * @hartid: to make sure same boot_cpu executes the hibernate/restore code.
> > > > + * @saved_satp: original page table used by the hibernated image.
> > > > + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> > > > + */
> > > > +static struct arch_hibernate_hdr {
> > > > + struct arch_hibernate_hdr_invariants invariants;
> > > > + unsigned long hartid;
> > > > + unsigned long saved_satp;
> > > > + unsigned long restore_cpu_addr;
> > > > +} resume_hdr;
> > > > +
> > > > +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> > > > +{
> > > > + memset(i, 0, sizeof(*i));
> > > > + memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> > > > +}
> > > > +
> > > > +/*
> > > > + * Check if the given pfn is in the 'nosave' section.
> > > > + */
> > > > +int pfn_is_nosave(unsigned long pfn)
> > > > +{
> > > > + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> > > > + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> > > > +
> > > > + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> > > > +}
> > > > +
> > > > +void notrace save_processor_state(void)
> > > > +{
> > > > + WARN_ON(num_online_cpus() != 1);
> > > > +}
> > > > +
> > > > +void notrace restore_processor_state(void)
> > > > +{
> > > > +}
> > > > +
> > > > +/*
> > > > + * Helper parameters need to be saved to the hibernation image header.
> > > > + */
> > > > +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> > > > +{
> > > > + struct arch_hibernate_hdr *hdr = addr;
> > > > +
> > > > + if (max_size < sizeof(*hdr))
> > > > + return -EOVERFLOW;
> > > > +
> > > > + arch_hdr_invariants(&hdr->invariants);
> > > > +
> > > > + hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> > > > + hdr->saved_satp = csr_read(CSR_SATP);
> > > > + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> > > > +
> > > > + return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(arch_hibernation_header_save);
> > > > +
> > > > +/*
> > > > + * Retrieve the helper parameters from the hibernation image header.
> > > > + */
> > > > +int arch_hibernation_header_restore(void *addr)
> > > > +{
> > > > + struct arch_hibernate_hdr_invariants invariants;
> > > > + struct arch_hibernate_hdr *hdr = addr;
> > > > + int ret = 0;
> > > > +
> > > > + arch_hdr_invariants(&invariants);
> > > > +
> > > > + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> > > > + pr_crit("Hibernate image not generated by this kernel!\n");
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> > > > + if (sleep_cpu < 0) {
> > > > + pr_crit("Hibernated on a CPU not known to this kernel!\n");
> > > > + sleep_cpu = -EINVAL;
> > > > + return -EINVAL;
> > > > + }
> > > > +
> > > > +#ifdef CONFIG_SMP
> > > > + ret = bringup_hibernate_cpu(sleep_cpu);
> > > > + if (ret) {
> > > > + sleep_cpu = -EINVAL;
> > > > + return ret;
> > > > + }
> > > > +#endif
> > > > + resume_hdr = *hdr;
> > > > +
> > > > + return ret;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore);
> > > > +
> > > > +int swsusp_arch_suspend(void)
> > > > +{
> > > > + int ret = 0;
> > > > +
> > > > + if (__cpu_suspend_enter(hibernate_cpu_context)) {
> > > > + sleep_cpu = smp_processor_id();
> > > > + suspend_save_csrs(hibernate_cpu_context);
> > > > + ret = swsusp_save();
> > > > + } else {
> > > > + suspend_restore_csrs(hibernate_cpu_context);
> > > > + flush_tlb_all();
> > > > + flush_icache_all();
> > > > +
> > > > + /*
> > > > + * Tell the hibernation core that we've just restored the memory.
> > > > + */
> > > > + in_suspend = 0;
> > > > + sleep_cpu = -EINVAL;
> > > > + }
> > > > +
> > > > + return ret;
> > > > +}
> > > > +
> > > > +static unsigned long _temp_pgtable_map_pte(pte_t *dst_ptep, pte_t *src_ptep,
> > > > + unsigned long addr, pgprot_t prot)
> > > > +{
> > > > + pte_t pte = READ_ONCE(*src_ptep);
> > > > +
> > > > + if (pte_present(pte))
> > > > + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot)));
> > > > +
> > > > + return 0;
> > > > +}
> > > > +
> > > > +static unsigned long temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp,
> > > > + unsigned long start, unsigned long end,
> > > > + pgprot_t prot)
> > > > +{
> > > > + unsigned long addr = start;
> > > > + pte_t *src_ptep;
> > > > + pte_t *dst_ptep;
> > > > +
> > > > + if (pmd_none(READ_ONCE(*dst_pmdp))) {
> > > > + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> > > > + if (!dst_ptep)
> > > > + return -ENOMEM;
> > > > +
> > > > + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep);
> > > > + }
> > > > +
> > > > + dst_ptep = pte_offset_kernel(dst_pmdp, start);
> > > > + src_ptep = pte_offset_kernel(src_pmdp, start);
> > > > +
> > > > + do {
> > > > + _temp_pgtable_map_pte(dst_ptep, src_ptep, addr, prot);
> > >
> > > I think I'd rather have the body of _temp_pgtable_map_pte() here and drop
> > > the helper, because the helper does (pte_val(pte) | pgprot_val(prot))
> > > which looks strange, until seeing here that 'pte' is only the address
> > > bits, so OR'ing in new prot bits without clearing old prot bits makes
> > > sense.
> > we do not need to clear the old bits since we going to keep those bits but add new bits which are required for resume. Let's hold
> your question here but I will would like to see how Alex view it.
>
> I confused myself a bit in my first read, so some of what I said isn't
> relevant, but I still wonder why we don't want to be more explicit about
> what prot bits are present in the end, and I still wonder why we need such
> a simple helper function which is used in exactly one place. Indeed, the
> pattern of all the other pgtable functions below is to put the set_p*
> calls directly in the loop.
I am sorry if I confused you but what I meant is that I would like to consolidate all comments from other reviewers before provide the best solution. There is no doubt that your comment is valid.
>
> Thanks,
> drew

2023-02-24 09:55:38

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

On Fri, Feb 24, 2023 at 09:33:31AM +0000, JeeHeng Sia wrote:
>
>
> > -----Original Message-----
> > From: Andrew Jones <[email protected]>
> > Sent: Friday, 24 February, 2023 5:00 PM
> > To: JeeHeng Sia <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >
> > On Fri, Feb 24, 2023 at 02:05:43AM +0000, JeeHeng Sia wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Andrew Jones <[email protected]>
> > > > Sent: Friday, 24 February, 2023 2:07 AM
> > > > To: JeeHeng Sia <[email protected]>
> > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > >
> > > > On Tue, Feb 21, 2023 at 10:35:23AM +0800, Sia Jee Heng wrote:
> > > > > Low level Arch functions were created to support hibernation.
> > > > > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > > > > cpu state onto the stack, then calling swsusp_save() to save the memory
> > > > > image.
> > > > >
> > > > > Arch specific hibernation header is implemented and is utilized by the
> > > > > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > > > > functions. The arch specific hibernation header consists of satp, hartid,
> > > > > and the cpu_resume address. The kernel built version is also need to be
> > > > > saved into the hibernation image header to making sure only the same
> > > > > kernel is restore when resume.
> > > > >
> > > > > swsusp_arch_resume() creates a temporary page table that covering only
> > > > > the linear map. It copies the restore code to a 'safe' page, then start
> > > > > to restore the memory image. Once completed, it restores the original
> > > > > kernel's page table. It then calls into __hibernate_cpu_resume()
> > > > > to restore the CPU context. Finally, it follows the normal hibernation
> > > > > path back to the hibernation core.
> > > > >
> > > > > To enable hibernation/suspend to disk into RISCV, the below config
> > > > > need to be enabled:
> > > > > - CONFIG_ARCH_HIBERNATION_HEADER
> > > > > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> > > > >
> > > > > Signed-off-by: Sia Jee Heng <[email protected]>
> > > > > Reviewed-by: Ley Foon Tan <[email protected]>
> > > > > Reviewed-by: Mason Huo <[email protected]>
> > > > > ---
> > > > > arch/riscv/Kconfig | 7 +
> > > > > arch/riscv/include/asm/assembler.h | 20 ++
> > > > > arch/riscv/include/asm/suspend.h | 19 ++
> > > > > arch/riscv/kernel/Makefile | 1 +
> > > > > arch/riscv/kernel/asm-offsets.c | 5 +
> > > > > arch/riscv/kernel/hibernate-asm.S | 77 +++++
> > > > > arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> > > > > 7 files changed, 576 insertions(+)
> > > > > create mode 100644 arch/riscv/kernel/hibernate-asm.S
> > > > > create mode 100644 arch/riscv/kernel/hibernate.c
> > > > >
> > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > index e2b656043abf..4555848a817f 100644
> > > > > --- a/arch/riscv/Kconfig
> > > > > +++ b/arch/riscv/Kconfig
> > > > > @@ -690,6 +690,13 @@ menu "Power management options"
> > > > >
> > > > > source "kernel/power/Kconfig"
> > > > >
> > > > > +config ARCH_HIBERNATION_POSSIBLE
> > > > > + def_bool y
> > > > > +
> > > > > +config ARCH_HIBERNATION_HEADER
> > > > > + def_bool y
> > > > > + depends on HIBERNATION
> > > >
> > > > nit: I think this can be simplified as def_bool HIBERNATION
> > > good suggestion. will change it.
> > > >
> > > > > +
> > > > > endmenu # "Power management options"
> > > > >
> > > > > menu "CPU Power Management"
> > > > > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > > > > index 727a97735493..68c46c0e0ea8 100644
> > > > > --- a/arch/riscv/include/asm/assembler.h
> > > > > +++ b/arch/riscv/include/asm/assembler.h
> > > > > @@ -59,4 +59,24 @@
> > > > > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > > > > .endm
> > > > >
> > > > > +/*
> > > > > + * copy_page - copy 1 page (4KB) of data from source to destination
> > > > > + * @a0 - destination
> > > > > + * @a1 - source
> > > > > + */
> > > > > + .macro copy_page a0, a1
> > > > > + lui a2, 0x1
> > > > > + add a2, a2, a0
> > > > > +1 :
> > > > ^ please remove this space
> > > can't remove it otherwise checkpatch will throws ERROR: spaces required around that ':'
> >
> > Oh, right, labels in macros have this requirement.
> >
> > > >
> > > > > + REG_L t0, 0(a1)
> > > > > + REG_L t1, SZREG(a1)
> > > > > +
> > > > > + REG_S t0, 0(a0)
> > > > > + REG_S t1, SZREG(a0)
> > > > > +
> > > > > + addi a0, a0, 2 * SZREG
> > > > > + addi a1, a1, 2 * SZREG
> > > > > + bne a2, a0, 1b
> > > > > + .endm
> > > > > +
> > > > > #endif /* __ASM_ASSEMBLER_H */
> > > > > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > > > > index 75419c5ca272..3362da56a9d8 100644
> > > > > --- a/arch/riscv/include/asm/suspend.h
> > > > > +++ b/arch/riscv/include/asm/suspend.h
> > > > > @@ -21,6 +21,11 @@ struct suspend_context {
> > > > > #endif
> > > > > };
> > > > >
> > > > > +/*
> > > > > + * Used by hibernation core and cleared during resume sequence
> > > > > + */
> > > > > +extern int in_suspend;
> > > > > +
> > > > > /* Low-level CPU suspend entry function */
> > > > > int __cpu_suspend_enter(struct suspend_context *context);
> > > > >
> > > > > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> > > > > /* Used to save and restore the csr */
> > > > > void suspend_save_csrs(struct suspend_context *context);
> > > > > void suspend_restore_csrs(struct suspend_context *context);
> > > > > +
> > > > > +/* Low-level API to support hibernation */
> > > > > +int swsusp_arch_suspend(void);
> > > > > +int swsusp_arch_resume(void);
> > > > > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > > > > +int arch_hibernation_header_restore(void *addr);
> > > > > +int __hibernate_cpu_resume(void);
> > > > > +
> > > > > +/* Used to resume on the CPU we hibernated on */
> > > > > +int hibernate_resume_nonboot_cpu_disable(void);
> > > > > +
> > > > > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > > > > + unsigned long cpu_resume);
> > > > > +asmlinkage int hibernate_core_restore_code(void);
> > > > > #endif
> > > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > > index 4cf303a779ab..daab341d55e4 100644
> > > > > --- a/arch/riscv/kernel/Makefile
> > > > > +++ b/arch/riscv/kernel/Makefile
> > > > > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> > > > > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> > > > >
> > > > > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> > > > > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> > > > >
> > > > > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> > > > > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > > > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > > > index df9444397908..d6a75aac1d27 100644
> > > > > --- a/arch/riscv/kernel/asm-offsets.c
> > > > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > > > @@ -9,6 +9,7 @@
> > > > > #include <linux/kbuild.h>
> > > > > #include <linux/mm.h>
> > > > > #include <linux/sched.h>
> > > > > +#include <linux/suspend.h>
> > > > > #include <asm/kvm_host.h>
> > > > > #include <asm/thread_info.h>
> > > > > #include <asm/ptrace.h>
> > > > > @@ -116,6 +117,10 @@ void asm_offsets(void)
> > > > >
> > > > > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> > > > >
> > > > > + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > > > > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > > > > + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > > > > +
> > > > > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> > > > > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> > > > > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > > > > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > > > > new file mode 100644
> > > > > index 000000000000..846affe4dced
> > > > > --- /dev/null
> > > > > +++ b/arch/riscv/kernel/hibernate-asm.S
> > > > > @@ -0,0 +1,77 @@
> > > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > > +/*
> > > > > + * Hibernation low level support for RISCV.
> > > > > + *
> > > > > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > > > > + *
> > > > > + * Author: Jee Heng Sia <[email protected]>
> > > > > + */
> > > > > +
> > > > > +#include <asm/asm.h>
> > > > > +#include <asm/asm-offsets.h>
> > > > > +#include <asm/assembler.h>
> > > > > +#include <asm/csr.h>
> > > > > +
> > > > > +#include <linux/linkage.h>
> > > > > +
> > > > > +/*
> > > > > + * int __hibernate_cpu_resume(void)
> > > > > + * Switch back to the hibernated image's page table prior to restoring the CPU
> > > > > + * context.
> > > > > + *
> > > > > + * Always returns 0
> > > > > + */
> > > > > +ENTRY(__hibernate_cpu_resume)
> > > > > + /* switch to hibernated image's page table. */
> > > > > + csrw CSR_SATP, s0
> > > > > + sfence.vma
> > > > > +
> > > > > + REG_L a0, hibernate_cpu_context
> > > > > +
> > > > > + restore_csr
> > > > > + restore_reg
> > > > > +
> > > > > + /* Return zero value. */
> > > > > + add a0, zero, zero
> > > >
> > > > nit: mv a0, zero
> > > sure
> > > >
> > > > > +
> > > > > + ret
> > > > > +END(__hibernate_cpu_resume)
> > > > > +
> > > > > +/*
> > > > > + * Prepare to restore the image.
> > > > > + * a0: satp of saved page tables.
> > > > > + * a1: satp of temporary page tables.
> > > > > + * a2: cpu_resume.
> > > > > + */
> > > > > +ENTRY(hibernate_restore_image)
> > > > > + mv s0, a0
> > > > > + mv s1, a1
> > > > > + mv s2, a2
> > > > > + REG_L s4, restore_pblist
> > > > > + REG_L a1, relocated_restore_code
> > > > > +
> > > > > + jalr a1
> > > > > +END(hibernate_restore_image)
> > > > > +
> > > > > +/*
> > > > > + * The below code will be executed from a 'safe' page.
> > > > > + * It first switches to the temporary page table, then starts to copy the pages
> > > > > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> > > > > + * to restore the CPU context.
> > > > > + */
> > > > > +ENTRY(hibernate_core_restore_code)
> > > > > + /* switch to temp page table. */
> > > > > + csrw satp, s1
> > > > > + sfence.vma
> > > > > +.Lcopy:
> > > > > + /* The below code will restore the hibernated image. */
> > > > > + REG_L a1, HIBERN_PBE_ADDR(s4)
> > > > > + REG_L a0, HIBERN_PBE_ORIG(s4)
> > > >
> > > > Are we sure restore_pblist will never be NULL?
> > > restore_pblist is a link-list, it will be null during initialization or during page clean up by hibernation core. During the initial resume
> > process, the hibernation core will check the header and load the pages. If everything works correctly, the page will be linked to the
> > restore_pblist and then invoke swsusp_arch_resume() else hibernation core will throws error and failed to resume from the
> > hibernated image.
> >
> > I know restore_pblist is a linked-list and this doesn't answer the
> > question. The comment above restore_pblist says
> >
> > /*
> > * List of PBEs needed for restoring the pages that were allocated before
> > * the suspend and included in the suspend image, but have also been
> > * allocated by the "resume" kernel, so their contents cannot be written
> > * directly to their "original" page frames.
> > */
> >
> > which implies the pages that end up on this list are "special". My
> > question is whether or not we're guaranteed to have at least one
> > of these special pages. If not, we shouldn't assume s4 is non-null.
> > If so, then a comment stating why that's guaranteed would be nice.
> The restore_pblist will not be null otherwise swsusp_arch_resume wouldn't get invoked. you can find how the link-list are link and how it checks against validity at https://elixir.bootlin.com/linux/v6.2-rc8/source/kernel/power/snapshot.c . " A comment stating why that's guaranteed would be nice" ? Hmm, perhaps this is out of my scope but I do believe in the page validity checking in the link I shared.

Sorry, but pointing to an entire source file (one that I've obviously
already looked at, since I quoted a comment from it...) is not helpful.
I don't see where restore_pblist is being checked before
swsusp_arch_resume() is issued (from its callsite in hibernate.c).

Thanks,
drew

2023-02-24 10:19:26

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v4 2/4] RISC-V: Factor out common code of __cpu_resume_enter()

Hi Sia,

On 2/21/23 03:35, Sia Jee Heng wrote:
> The cpu_resume() function is very similar for the suspend to disk and
> suspend to ram cases. Factor out the common code into restore_csr macro
> and restore_reg macro.
>
> Signed-off-by: Sia Jee Heng <[email protected]>
> ---
> arch/riscv/include/asm/assembler.h | 62 ++++++++++++++++++++++++++++++
> arch/riscv/kernel/suspend_entry.S | 34 ++--------------
> 2 files changed, 65 insertions(+), 31 deletions(-)
> create mode 100644 arch/riscv/include/asm/assembler.h
>
> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> new file mode 100644
> index 000000000000..727a97735493
> --- /dev/null
> +++ b/arch/riscv/include/asm/assembler.h
> @@ -0,0 +1,62 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <[email protected]>
> + */
> +
> +#ifndef __ASSEMBLY__
> +#error "Only include this from assembly code"
> +#endif
> +
> +#ifndef __ASM_ASSEMBLER_H
> +#define __ASM_ASSEMBLER_H
> +
> +#include <asm/asm.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/csr.h>
> +
> +/*
> + * restore_csr - restore hart's CSR value
> + */
> + .macro restore_csr
> + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> + csrw CSR_EPC, t0
> + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> + csrw CSR_STATUS, t0
> + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> + csrw CSR_TVAL, t0
> + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> + csrw CSR_CAUSE, t0
> + .endm
> +
> +/*
> + * restore_reg - Restore registers (except A0 and T0-T6)
> + */
> + .macro restore_reg
> + REG_L ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> + REG_L sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> + REG_L gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> + REG_L tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> + REG_L s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> + REG_L s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> + REG_L a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> + REG_L a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> + REG_L a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> + REG_L a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> + REG_L a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> + REG_L a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> + REG_L a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> + REG_L s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> + REG_L s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> + REG_L s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> + REG_L s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> + REG_L s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> + REG_L s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> + REG_L s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> + REG_L s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> + REG_L s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> + REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> + .endm
> +
> +#endif /* __ASM_ASSEMBLER_H */


You introduce assembler.h which in the future may contain other assembly
functions not related to suspend: I'd rename those macros so that we
know they are suspend related, something like
suspend_restore_regs/suspend_restore_csrs.

And instead of (SUSPEND_CONTEXT_REGS + PT_XXX) you could introduce
SUSPEND_CONTEXT_REGS_PT_XXX in asm-offsets.c?


> diff --git a/arch/riscv/kernel/suspend_entry.S b/arch/riscv/kernel/suspend_entry.S
> index aafcca58c19d..74a8fab8e0f6 100644
> --- a/arch/riscv/kernel/suspend_entry.S
> +++ b/arch/riscv/kernel/suspend_entry.S
> @@ -7,6 +7,7 @@
> #include <linux/linkage.h>
> #include <asm/asm.h>
> #include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
> #include <asm/csr.h>
> #include <asm/xip_fixup.h>
>
> @@ -83,39 +84,10 @@ ENTRY(__cpu_resume_enter)
> add a0, a1, zero
>
> /* Restore CSRs */
> - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> - csrw CSR_EPC, t0
> - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> - csrw CSR_STATUS, t0
> - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> - csrw CSR_TVAL, t0
> - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> - csrw CSR_CAUSE, t0
> + restore_csr
>
> /* Restore registers (except A0 and T0-T6) */
> - REG_L ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> - REG_L sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> - REG_L gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> - REG_L tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> - REG_L s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> - REG_L s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> - REG_L a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> - REG_L a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> - REG_L a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> - REG_L a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> - REG_L a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> - REG_L a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> - REG_L a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> - REG_L s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> - REG_L s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> - REG_L s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> - REG_L s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> - REG_L s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> - REG_L s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> - REG_L s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> - REG_L s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> - REG_L s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> - REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> + restore_reg
>
> /* Return zero value */
> add a0, zero, zero

2023-02-24 10:21:58

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v4 3/4] RISC-V: mm: Enable huge page support to kernel_page_present() function


On 2/21/23 03:35, Sia Jee Heng wrote:
> Currently kernel_page_present() function doesn't support huge page
> detection causes the function to mistakenly return false to the
> hibernation core.
>
> Add huge page detection to the function to solve the problem.
>
> Fixes tag: commit 9e953cda5cdf ("riscv:
> Introduce huge page support for 32/64bit kernel")
>
> Signed-off-by: Sia Jee Heng <[email protected]>
> Reviewed-by: Ley Foon Tan <[email protected]>
> Reviewed-by: Mason Huo <[email protected]>
> ---
> arch/riscv/mm/pageattr.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
> index 86c56616e5de..ea3d61de065b 100644
> --- a/arch/riscv/mm/pageattr.c
> +++ b/arch/riscv/mm/pageattr.c
> @@ -217,18 +217,26 @@ bool kernel_page_present(struct page *page)
> pgd = pgd_offset_k(addr);
> if (!pgd_present(*pgd))
> return false;
> + if (pgd_leaf(*pgd))
> + return true;
>
> p4d = p4d_offset(pgd, addr);
> if (!p4d_present(*p4d))
> return false;
> + if (p4d_leaf(*p4d))
> + return true;
>
> pud = pud_offset(p4d, addr);
> if (!pud_present(*pud))
> return false;
> + if (pud_leaf(*pud))
> + return true;
>
> pmd = pmd_offset(pud, addr);
> if (!pmd_present(*pmd))
> return false;
> + if (pmd_leaf(*pmd))
> + return true;
>
> pte = pte_offset_kernel(pmd, addr);
> return pte_present(*pte);


Reviewed-by: Alexandre Ghiti <[email protected]>

Thanks,

Alex


2023-02-24 10:32:28

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Friday, 24 February, 2023 5:55 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> On Fri, Feb 24, 2023 at 09:33:31AM +0000, JeeHeng Sia wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Jones <[email protected]>
> > > Sent: Friday, 24 February, 2023 5:00 PM
> > > To: JeeHeng Sia <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > >
> > > On Fri, Feb 24, 2023 at 02:05:43AM +0000, JeeHeng Sia wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Andrew Jones <[email protected]>
> > > > > Sent: Friday, 24 February, 2023 2:07 AM
> > > > > To: JeeHeng Sia <[email protected]>
> > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > >
> > > > > On Tue, Feb 21, 2023 at 10:35:23AM +0800, Sia Jee Heng wrote:
> > > > > > Low level Arch functions were created to support hibernation.
> > > > > > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > > > > > cpu state onto the stack, then calling swsusp_save() to save the memory
> > > > > > image.
> > > > > >
> > > > > > Arch specific hibernation header is implemented and is utilized by the
> > > > > > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > > > > > functions. The arch specific hibernation header consists of satp, hartid,
> > > > > > and the cpu_resume address. The kernel built version is also need to be
> > > > > > saved into the hibernation image header to making sure only the same
> > > > > > kernel is restore when resume.
> > > > > >
> > > > > > swsusp_arch_resume() creates a temporary page table that covering only
> > > > > > the linear map. It copies the restore code to a 'safe' page, then start
> > > > > > to restore the memory image. Once completed, it restores the original
> > > > > > kernel's page table. It then calls into __hibernate_cpu_resume()
> > > > > > to restore the CPU context. Finally, it follows the normal hibernation
> > > > > > path back to the hibernation core.
> > > > > >
> > > > > > To enable hibernation/suspend to disk into RISCV, the below config
> > > > > > need to be enabled:
> > > > > > - CONFIG_ARCH_HIBERNATION_HEADER
> > > > > > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> > > > > >
> > > > > > Signed-off-by: Sia Jee Heng <[email protected]>
> > > > > > Reviewed-by: Ley Foon Tan <[email protected]>
> > > > > > Reviewed-by: Mason Huo <[email protected]>
> > > > > > ---
> > > > > > arch/riscv/Kconfig | 7 +
> > > > > > arch/riscv/include/asm/assembler.h | 20 ++
> > > > > > arch/riscv/include/asm/suspend.h | 19 ++
> > > > > > arch/riscv/kernel/Makefile | 1 +
> > > > > > arch/riscv/kernel/asm-offsets.c | 5 +
> > > > > > arch/riscv/kernel/hibernate-asm.S | 77 +++++
> > > > > > arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> > > > > > 7 files changed, 576 insertions(+)
> > > > > > create mode 100644 arch/riscv/kernel/hibernate-asm.S
> > > > > > create mode 100644 arch/riscv/kernel/hibernate.c
> > > > > >
> > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > index e2b656043abf..4555848a817f 100644
> > > > > > --- a/arch/riscv/Kconfig
> > > > > > +++ b/arch/riscv/Kconfig
> > > > > > @@ -690,6 +690,13 @@ menu "Power management options"
> > > > > >
> > > > > > source "kernel/power/Kconfig"
> > > > > >
> > > > > > +config ARCH_HIBERNATION_POSSIBLE
> > > > > > + def_bool y
> > > > > > +
> > > > > > +config ARCH_HIBERNATION_HEADER
> > > > > > + def_bool y
> > > > > > + depends on HIBERNATION
> > > > >
> > > > > nit: I think this can be simplified as def_bool HIBERNATION
> > > > good suggestion. will change it.
> > > > >
> > > > > > +
> > > > > > endmenu # "Power management options"
> > > > > >
> > > > > > menu "CPU Power Management"
> > > > > > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > > > > > index 727a97735493..68c46c0e0ea8 100644
> > > > > > --- a/arch/riscv/include/asm/assembler.h
> > > > > > +++ b/arch/riscv/include/asm/assembler.h
> > > > > > @@ -59,4 +59,24 @@
> > > > > > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > > > > > .endm
> > > > > >
> > > > > > +/*
> > > > > > + * copy_page - copy 1 page (4KB) of data from source to destination
> > > > > > + * @a0 - destination
> > > > > > + * @a1 - source
> > > > > > + */
> > > > > > + .macro copy_page a0, a1
> > > > > > + lui a2, 0x1
> > > > > > + add a2, a2, a0
> > > > > > +1 :
> > > > > ^ please remove this space
> > > > can't remove it otherwise checkpatch will throws ERROR: spaces required around that ':'
> > >
> > > Oh, right, labels in macros have this requirement.
> > >
> > > > >
> > > > > > + REG_L t0, 0(a1)
> > > > > > + REG_L t1, SZREG(a1)
> > > > > > +
> > > > > > + REG_S t0, 0(a0)
> > > > > > + REG_S t1, SZREG(a0)
> > > > > > +
> > > > > > + addi a0, a0, 2 * SZREG
> > > > > > + addi a1, a1, 2 * SZREG
> > > > > > + bne a2, a0, 1b
> > > > > > + .endm
> > > > > > +
> > > > > > #endif /* __ASM_ASSEMBLER_H */
> > > > > > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > > > > > index 75419c5ca272..3362da56a9d8 100644
> > > > > > --- a/arch/riscv/include/asm/suspend.h
> > > > > > +++ b/arch/riscv/include/asm/suspend.h
> > > > > > @@ -21,6 +21,11 @@ struct suspend_context {
> > > > > > #endif
> > > > > > };
> > > > > >
> > > > > > +/*
> > > > > > + * Used by hibernation core and cleared during resume sequence
> > > > > > + */
> > > > > > +extern int in_suspend;
> > > > > > +
> > > > > > /* Low-level CPU suspend entry function */
> > > > > > int __cpu_suspend_enter(struct suspend_context *context);
> > > > > >
> > > > > > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> > > > > > /* Used to save and restore the csr */
> > > > > > void suspend_save_csrs(struct suspend_context *context);
> > > > > > void suspend_restore_csrs(struct suspend_context *context);
> > > > > > +
> > > > > > +/* Low-level API to support hibernation */
> > > > > > +int swsusp_arch_suspend(void);
> > > > > > +int swsusp_arch_resume(void);
> > > > > > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > > > > > +int arch_hibernation_header_restore(void *addr);
> > > > > > +int __hibernate_cpu_resume(void);
> > > > > > +
> > > > > > +/* Used to resume on the CPU we hibernated on */
> > > > > > +int hibernate_resume_nonboot_cpu_disable(void);
> > > > > > +
> > > > > > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > > > > > + unsigned long cpu_resume);
> > > > > > +asmlinkage int hibernate_core_restore_code(void);
> > > > > > #endif
> > > > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > > > index 4cf303a779ab..daab341d55e4 100644
> > > > > > --- a/arch/riscv/kernel/Makefile
> > > > > > +++ b/arch/riscv/kernel/Makefile
> > > > > > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> > > > > > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> > > > > >
> > > > > > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> > > > > > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> > > > > >
> > > > > > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> > > > > > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > > > > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > > > > index df9444397908..d6a75aac1d27 100644
> > > > > > --- a/arch/riscv/kernel/asm-offsets.c
> > > > > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > > > > @@ -9,6 +9,7 @@
> > > > > > #include <linux/kbuild.h>
> > > > > > #include <linux/mm.h>
> > > > > > #include <linux/sched.h>
> > > > > > +#include <linux/suspend.h>
> > > > > > #include <asm/kvm_host.h>
> > > > > > #include <asm/thread_info.h>
> > > > > > #include <asm/ptrace.h>
> > > > > > @@ -116,6 +117,10 @@ void asm_offsets(void)
> > > > > >
> > > > > > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> > > > > >
> > > > > > + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > > > > > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > > > > > + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > > > > > +
> > > > > > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> > > > > > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> > > > > > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > > > > > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > > > > > new file mode 100644
> > > > > > index 000000000000..846affe4dced
> > > > > > --- /dev/null
> > > > > > +++ b/arch/riscv/kernel/hibernate-asm.S
> > > > > > @@ -0,0 +1,77 @@
> > > > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > > > +/*
> > > > > > + * Hibernation low level support for RISCV.
> > > > > > + *
> > > > > > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > > > > > + *
> > > > > > + * Author: Jee Heng Sia <[email protected]>
> > > > > > + */
> > > > > > +
> > > > > > +#include <asm/asm.h>
> > > > > > +#include <asm/asm-offsets.h>
> > > > > > +#include <asm/assembler.h>
> > > > > > +#include <asm/csr.h>
> > > > > > +
> > > > > > +#include <linux/linkage.h>
> > > > > > +
> > > > > > +/*
> > > > > > + * int __hibernate_cpu_resume(void)
> > > > > > + * Switch back to the hibernated image's page table prior to restoring the CPU
> > > > > > + * context.
> > > > > > + *
> > > > > > + * Always returns 0
> > > > > > + */
> > > > > > +ENTRY(__hibernate_cpu_resume)
> > > > > > + /* switch to hibernated image's page table. */
> > > > > > + csrw CSR_SATP, s0
> > > > > > + sfence.vma
> > > > > > +
> > > > > > + REG_L a0, hibernate_cpu_context
> > > > > > +
> > > > > > + restore_csr
> > > > > > + restore_reg
> > > > > > +
> > > > > > + /* Return zero value. */
> > > > > > + add a0, zero, zero
> > > > >
> > > > > nit: mv a0, zero
> > > > sure
> > > > >
> > > > > > +
> > > > > > + ret
> > > > > > +END(__hibernate_cpu_resume)
> > > > > > +
> > > > > > +/*
> > > > > > + * Prepare to restore the image.
> > > > > > + * a0: satp of saved page tables.
> > > > > > + * a1: satp of temporary page tables.
> > > > > > + * a2: cpu_resume.
> > > > > > + */
> > > > > > +ENTRY(hibernate_restore_image)
> > > > > > + mv s0, a0
> > > > > > + mv s1, a1
> > > > > > + mv s2, a2
> > > > > > + REG_L s4, restore_pblist
> > > > > > + REG_L a1, relocated_restore_code
> > > > > > +
> > > > > > + jalr a1
> > > > > > +END(hibernate_restore_image)
> > > > > > +
> > > > > > +/*
> > > > > > + * The below code will be executed from a 'safe' page.
> > > > > > + * It first switches to the temporary page table, then starts to copy the pages
> > > > > > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> > > > > > + * to restore the CPU context.
> > > > > > + */
> > > > > > +ENTRY(hibernate_core_restore_code)
> > > > > > + /* switch to temp page table. */
> > > > > > + csrw satp, s1
> > > > > > + sfence.vma
> > > > > > +.Lcopy:
> > > > > > + /* The below code will restore the hibernated image. */
> > > > > > + REG_L a1, HIBERN_PBE_ADDR(s4)
> > > > > > + REG_L a0, HIBERN_PBE_ORIG(s4)
> > > > >
> > > > > Are we sure restore_pblist will never be NULL?
> > > > restore_pblist is a link-list, it will be null during initialization or during page clean up by hibernation core. During the initial
> resume
> > > process, the hibernation core will check the header and load the pages. If everything works correctly, the page will be linked to the
> > > restore_pblist and then invoke swsusp_arch_resume() else hibernation core will throws error and failed to resume from the
> > > hibernated image.
> > >
> > > I know restore_pblist is a linked-list and this doesn't answer the
> > > question. The comment above restore_pblist says
> > >
> > > /*
> > > * List of PBEs needed for restoring the pages that were allocated before
> > > * the suspend and included in the suspend image, but have also been
> > > * allocated by the "resume" kernel, so their contents cannot be written
> > > * directly to their "original" page frames.
> > > */
> > >
> > > which implies the pages that end up on this list are "special". My
> > > question is whether or not we're guaranteed to have at least one
> > > of these special pages. If not, we shouldn't assume s4 is non-null.
> > > If so, then a comment stating why that's guaranteed would be nice.
> > The restore_pblist will not be null otherwise swsusp_arch_resume wouldn't get invoked. you can find how the link-list are link and
> how it checks against validity at https://elixir.bootlin.com/linux/v6.2-rc8/source/kernel/power/snapshot.c . " A comment stating why
> that's guaranteed would be nice" ? Hmm, perhaps this is out of my scope but I do believe in the page validity checking in the link I
> shared.
>
> Sorry, but pointing to an entire source file (one that I've obviously
> already looked at, since I quoted a comment from it...) is not helpful.
> I don't see where restore_pblist is being checked before
> swsusp_arch_resume() is issued (from its callsite in hibernate.c).
Sure, below shows the hibernation flow for your reference. The link-list creation and checking found at: https://elixir.bootlin.com/linux/v6.2/source/kernel/power/snapshot.c#L2576
software_resume()
load_image_and_restore()
swsusp_read()
load_image()
snapshot_write_next()
get_buffer() <-- This is the function checks and links the pages to the restore_pblist
hibernation_restore()
resume_target_kernel()
swsusp_arch_resume()
>
> Thanks,
> drew

2023-02-24 12:08:37

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

On Fri, Feb 24, 2023 at 10:30:19AM +0000, JeeHeng Sia wrote:
>
>
> > -----Original Message-----
> > From: Andrew Jones <[email protected]>
> > Sent: Friday, 24 February, 2023 5:55 PM
> > To: JeeHeng Sia <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >
> > On Fri, Feb 24, 2023 at 09:33:31AM +0000, JeeHeng Sia wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Andrew Jones <[email protected]>
> > > > Sent: Friday, 24 February, 2023 5:00 PM
> > > > To: JeeHeng Sia <[email protected]>
> > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > >
> > > > On Fri, Feb 24, 2023 at 02:05:43AM +0000, JeeHeng Sia wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Andrew Jones <[email protected]>
> > > > > > Sent: Friday, 24 February, 2023 2:07 AM
> > > > > > To: JeeHeng Sia <[email protected]>
> > > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > > >
> > > > > > On Tue, Feb 21, 2023 at 10:35:23AM +0800, Sia Jee Heng wrote:
> > > > > > > Low level Arch functions were created to support hibernation.
> > > > > > > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > > > > > > cpu state onto the stack, then calling swsusp_save() to save the memory
> > > > > > > image.
> > > > > > >
> > > > > > > Arch specific hibernation header is implemented and is utilized by the
> > > > > > > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > > > > > > functions. The arch specific hibernation header consists of satp, hartid,
> > > > > > > and the cpu_resume address. The kernel built version is also need to be
> > > > > > > saved into the hibernation image header to making sure only the same
> > > > > > > kernel is restore when resume.
> > > > > > >
> > > > > > > swsusp_arch_resume() creates a temporary page table that covering only
> > > > > > > the linear map. It copies the restore code to a 'safe' page, then start
> > > > > > > to restore the memory image. Once completed, it restores the original
> > > > > > > kernel's page table. It then calls into __hibernate_cpu_resume()
> > > > > > > to restore the CPU context. Finally, it follows the normal hibernation
> > > > > > > path back to the hibernation core.
> > > > > > >
> > > > > > > To enable hibernation/suspend to disk into RISCV, the below config
> > > > > > > need to be enabled:
> > > > > > > - CONFIG_ARCH_HIBERNATION_HEADER
> > > > > > > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> > > > > > >
> > > > > > > Signed-off-by: Sia Jee Heng <[email protected]>
> > > > > > > Reviewed-by: Ley Foon Tan <[email protected]>
> > > > > > > Reviewed-by: Mason Huo <[email protected]>
> > > > > > > ---
> > > > > > > arch/riscv/Kconfig | 7 +
> > > > > > > arch/riscv/include/asm/assembler.h | 20 ++
> > > > > > > arch/riscv/include/asm/suspend.h | 19 ++
> > > > > > > arch/riscv/kernel/Makefile | 1 +
> > > > > > > arch/riscv/kernel/asm-offsets.c | 5 +
> > > > > > > arch/riscv/kernel/hibernate-asm.S | 77 +++++
> > > > > > > arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> > > > > > > 7 files changed, 576 insertions(+)
> > > > > > > create mode 100644 arch/riscv/kernel/hibernate-asm.S
> > > > > > > create mode 100644 arch/riscv/kernel/hibernate.c
> > > > > > >
> > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > > index e2b656043abf..4555848a817f 100644
> > > > > > > --- a/arch/riscv/Kconfig
> > > > > > > +++ b/arch/riscv/Kconfig
> > > > > > > @@ -690,6 +690,13 @@ menu "Power management options"
> > > > > > >
> > > > > > > source "kernel/power/Kconfig"
> > > > > > >
> > > > > > > +config ARCH_HIBERNATION_POSSIBLE
> > > > > > > + def_bool y
> > > > > > > +
> > > > > > > +config ARCH_HIBERNATION_HEADER
> > > > > > > + def_bool y
> > > > > > > + depends on HIBERNATION
> > > > > >
> > > > > > nit: I think this can be simplified as def_bool HIBERNATION
> > > > > good suggestion. will change it.
> > > > > >
> > > > > > > +
> > > > > > > endmenu # "Power management options"
> > > > > > >
> > > > > > > menu "CPU Power Management"
> > > > > > > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > > > > > > index 727a97735493..68c46c0e0ea8 100644
> > > > > > > --- a/arch/riscv/include/asm/assembler.h
> > > > > > > +++ b/arch/riscv/include/asm/assembler.h
> > > > > > > @@ -59,4 +59,24 @@
> > > > > > > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > > > > > > .endm
> > > > > > >
> > > > > > > +/*
> > > > > > > + * copy_page - copy 1 page (4KB) of data from source to destination
> > > > > > > + * @a0 - destination
> > > > > > > + * @a1 - source
> > > > > > > + */
> > > > > > > + .macro copy_page a0, a1
> > > > > > > + lui a2, 0x1
> > > > > > > + add a2, a2, a0
> > > > > > > +1 :
> > > > > > ^ please remove this space
> > > > > can't remove it otherwise checkpatch will throws ERROR: spaces required around that ':'
> > > >
> > > > Oh, right, labels in macros have this requirement.
> > > >
> > > > > >
> > > > > > > + REG_L t0, 0(a1)
> > > > > > > + REG_L t1, SZREG(a1)
> > > > > > > +
> > > > > > > + REG_S t0, 0(a0)
> > > > > > > + REG_S t1, SZREG(a0)
> > > > > > > +
> > > > > > > + addi a0, a0, 2 * SZREG
> > > > > > > + addi a1, a1, 2 * SZREG
> > > > > > > + bne a2, a0, 1b
> > > > > > > + .endm
> > > > > > > +
> > > > > > > #endif /* __ASM_ASSEMBLER_H */
> > > > > > > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > > > > > > index 75419c5ca272..3362da56a9d8 100644
> > > > > > > --- a/arch/riscv/include/asm/suspend.h
> > > > > > > +++ b/arch/riscv/include/asm/suspend.h
> > > > > > > @@ -21,6 +21,11 @@ struct suspend_context {
> > > > > > > #endif
> > > > > > > };
> > > > > > >
> > > > > > > +/*
> > > > > > > + * Used by hibernation core and cleared during resume sequence
> > > > > > > + */
> > > > > > > +extern int in_suspend;
> > > > > > > +
> > > > > > > /* Low-level CPU suspend entry function */
> > > > > > > int __cpu_suspend_enter(struct suspend_context *context);
> > > > > > >
> > > > > > > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> > > > > > > /* Used to save and restore the csr */
> > > > > > > void suspend_save_csrs(struct suspend_context *context);
> > > > > > > void suspend_restore_csrs(struct suspend_context *context);
> > > > > > > +
> > > > > > > +/* Low-level API to support hibernation */
> > > > > > > +int swsusp_arch_suspend(void);
> > > > > > > +int swsusp_arch_resume(void);
> > > > > > > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > > > > > > +int arch_hibernation_header_restore(void *addr);
> > > > > > > +int __hibernate_cpu_resume(void);
> > > > > > > +
> > > > > > > +/* Used to resume on the CPU we hibernated on */
> > > > > > > +int hibernate_resume_nonboot_cpu_disable(void);
> > > > > > > +
> > > > > > > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > > > > > > + unsigned long cpu_resume);
> > > > > > > +asmlinkage int hibernate_core_restore_code(void);
> > > > > > > #endif
> > > > > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > > > > index 4cf303a779ab..daab341d55e4 100644
> > > > > > > --- a/arch/riscv/kernel/Makefile
> > > > > > > +++ b/arch/riscv/kernel/Makefile
> > > > > > > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> > > > > > > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> > > > > > >
> > > > > > > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> > > > > > > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> > > > > > >
> > > > > > > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> > > > > > > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > > > > > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > > > > > index df9444397908..d6a75aac1d27 100644
> > > > > > > --- a/arch/riscv/kernel/asm-offsets.c
> > > > > > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > > > > > @@ -9,6 +9,7 @@
> > > > > > > #include <linux/kbuild.h>
> > > > > > > #include <linux/mm.h>
> > > > > > > #include <linux/sched.h>
> > > > > > > +#include <linux/suspend.h>
> > > > > > > #include <asm/kvm_host.h>
> > > > > > > #include <asm/thread_info.h>
> > > > > > > #include <asm/ptrace.h>
> > > > > > > @@ -116,6 +117,10 @@ void asm_offsets(void)
> > > > > > >
> > > > > > > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> > > > > > >
> > > > > > > + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > > > > > > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > > > > > > + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > > > > > > +
> > > > > > > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> > > > > > > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> > > > > > > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > > > > > > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > > > > > > new file mode 100644
> > > > > > > index 000000000000..846affe4dced
> > > > > > > --- /dev/null
> > > > > > > +++ b/arch/riscv/kernel/hibernate-asm.S
> > > > > > > @@ -0,0 +1,77 @@
> > > > > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > > > > +/*
> > > > > > > + * Hibernation low level support for RISCV.
> > > > > > > + *
> > > > > > > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > > > > > > + *
> > > > > > > + * Author: Jee Heng Sia <[email protected]>
> > > > > > > + */
> > > > > > > +
> > > > > > > +#include <asm/asm.h>
> > > > > > > +#include <asm/asm-offsets.h>
> > > > > > > +#include <asm/assembler.h>
> > > > > > > +#include <asm/csr.h>
> > > > > > > +
> > > > > > > +#include <linux/linkage.h>
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * int __hibernate_cpu_resume(void)
> > > > > > > + * Switch back to the hibernated image's page table prior to restoring the CPU
> > > > > > > + * context.
> > > > > > > + *
> > > > > > > + * Always returns 0
> > > > > > > + */
> > > > > > > +ENTRY(__hibernate_cpu_resume)
> > > > > > > + /* switch to hibernated image's page table. */
> > > > > > > + csrw CSR_SATP, s0
> > > > > > > + sfence.vma
> > > > > > > +
> > > > > > > + REG_L a0, hibernate_cpu_context
> > > > > > > +
> > > > > > > + restore_csr
> > > > > > > + restore_reg
> > > > > > > +
> > > > > > > + /* Return zero value. */
> > > > > > > + add a0, zero, zero
> > > > > >
> > > > > > nit: mv a0, zero
> > > > > sure
> > > > > >
> > > > > > > +
> > > > > > > + ret
> > > > > > > +END(__hibernate_cpu_resume)
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * Prepare to restore the image.
> > > > > > > + * a0: satp of saved page tables.
> > > > > > > + * a1: satp of temporary page tables.
> > > > > > > + * a2: cpu_resume.
> > > > > > > + */
> > > > > > > +ENTRY(hibernate_restore_image)
> > > > > > > + mv s0, a0
> > > > > > > + mv s1, a1
> > > > > > > + mv s2, a2
> > > > > > > + REG_L s4, restore_pblist
> > > > > > > + REG_L a1, relocated_restore_code
> > > > > > > +
> > > > > > > + jalr a1
> > > > > > > +END(hibernate_restore_image)
> > > > > > > +
> > > > > > > +/*
> > > > > > > + * The below code will be executed from a 'safe' page.
> > > > > > > + * It first switches to the temporary page table, then starts to copy the pages
> > > > > > > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> > > > > > > + * to restore the CPU context.
> > > > > > > + */
> > > > > > > +ENTRY(hibernate_core_restore_code)
> > > > > > > + /* switch to temp page table. */
> > > > > > > + csrw satp, s1
> > > > > > > + sfence.vma
> > > > > > > +.Lcopy:
> > > > > > > + /* The below code will restore the hibernated image. */
> > > > > > > + REG_L a1, HIBERN_PBE_ADDR(s4)
> > > > > > > + REG_L a0, HIBERN_PBE_ORIG(s4)
> > > > > >
> > > > > > Are we sure restore_pblist will never be NULL?
> > > > > restore_pblist is a link-list, it will be null during initialization or during page clean up by hibernation core. During the initial
> > resume
> > > > process, the hibernation core will check the header and load the pages. If everything works correctly, the page will be linked to the
> > > > restore_pblist and then invoke swsusp_arch_resume() else hibernation core will throws error and failed to resume from the
> > > > hibernated image.
> > > >
> > > > I know restore_pblist is a linked-list and this doesn't answer the
> > > > question. The comment above restore_pblist says
> > > >
> > > > /*
> > > > * List of PBEs needed for restoring the pages that were allocated before
> > > > * the suspend and included in the suspend image, but have also been
> > > > * allocated by the "resume" kernel, so their contents cannot be written
> > > > * directly to their "original" page frames.
> > > > */
> > > >
> > > > which implies the pages that end up on this list are "special". My
> > > > question is whether or not we're guaranteed to have at least one
> > > > of these special pages. If not, we shouldn't assume s4 is non-null.
> > > > If so, then a comment stating why that's guaranteed would be nice.
> > > The restore_pblist will not be null otherwise swsusp_arch_resume wouldn't get invoked. you can find how the link-list are link and
> > how it checks against validity at https://elixir.bootlin.com/linux/v6.2-rc8/source/kernel/power/snapshot.c . " A comment stating why
> > that's guaranteed would be nice" ? Hmm, perhaps this is out of my scope but I do believe in the page validity checking in the link I
> > shared.
> >
> > Sorry, but pointing to an entire source file (one that I've obviously
> > already looked at, since I quoted a comment from it...) is not helpful.
> > I don't see where restore_pblist is being checked before
> > swsusp_arch_resume() is issued (from its callsite in hibernate.c).
> Sure, below shows the hibernation flow for your reference. The link-list creation and checking found at: https://elixir.bootlin.com/linux/v6.2/source/kernel/power/snapshot.c#L2576
> software_resume()
> load_image_and_restore()
> swsusp_read()
> load_image()
> snapshot_write_next()
> get_buffer() <-- This is the function checks and links the pages to the restore_pblist

Yup, I've read this path, including get_buffer(), where I saw that
get_buffer() can return an address without allocating a PBE. Where is the
check that restore_pblist isn't NULL, i.e. we see that at least one PBE
has been allocated by get_buffer(), before we call swsusp_arch_resume()?

Or, is known that at least one or more pages match the criteria pointed
out in the comment below (copied from get_buffer())?

/*
* The "original" page frame has not been allocated and we have to
* use a "safe" page frame to store the loaded page.
*/

If so, then which ones? And where does it state that?

Thanks,
drew


> hibernation_restore()
> resume_target_kernel()
> swsusp_arch_resume()
> >
> > Thanks,
> > drew

2023-02-24 12:29:43

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

On 2/21/23 03:35, Sia Jee Heng wrote:
> Low level Arch functions were created to support hibernation.
> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> cpu state onto the stack, then calling swsusp_save() to save the memory
> image.
>
> Arch specific hibernation header is implemented and is utilized by the
> arch_hibernation_header_restore() and arch_hibernation_header_save()
> functions. The arch specific hibernation header consists of satp, hartid,
> and the cpu_resume address. The kernel built version is also need to be
> saved into the hibernation image header to making sure only the same
> kernel is restore when resume.
>
> swsusp_arch_resume() creates a temporary page table that covering only
> the linear map. It copies the restore code to a 'safe' page, then start
> to restore the memory image. Once completed, it restores the original
> kernel's page table. It then calls into __hibernate_cpu_resume()
> to restore the CPU context. Finally, it follows the normal hibernation
> path back to the hibernation core.
>
> To enable hibernation/suspend to disk into RISCV, the below config
> need to be enabled:
> - CONFIG_ARCH_HIBERNATION_HEADER
> - CONFIG_ARCH_HIBERNATION_POSSIBLE
>
> Signed-off-by: Sia Jee Heng <[email protected]>
> Reviewed-by: Ley Foon Tan <[email protected]>
> Reviewed-by: Mason Huo <[email protected]>
> ---
> arch/riscv/Kconfig | 7 +
> arch/riscv/include/asm/assembler.h | 20 ++
> arch/riscv/include/asm/suspend.h | 19 ++
> arch/riscv/kernel/Makefile | 1 +
> arch/riscv/kernel/asm-offsets.c | 5 +
> arch/riscv/kernel/hibernate-asm.S | 77 +++++
> arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> 7 files changed, 576 insertions(+)
> create mode 100644 arch/riscv/kernel/hibernate-asm.S
> create mode 100644 arch/riscv/kernel/hibernate.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index e2b656043abf..4555848a817f 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -690,6 +690,13 @@ menu "Power management options"
>
> source "kernel/power/Kconfig"
>
> +config ARCH_HIBERNATION_POSSIBLE
> + def_bool y
> +
> +config ARCH_HIBERNATION_HEADER
> + def_bool y
> + depends on HIBERNATION
> +
> endmenu # "Power management options"
>
> menu "CPU Power Management"
> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> index 727a97735493..68c46c0e0ea8 100644
> --- a/arch/riscv/include/asm/assembler.h
> +++ b/arch/riscv/include/asm/assembler.h
> @@ -59,4 +59,24 @@
> REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> .endm
>
> +/*
> + * copy_page - copy 1 page (4KB) of data from source to destination
> + * @a0 - destination
> + * @a1 - source
> + */
> + .macro copy_page a0, a1
> + lui a2, 0x1
> + add a2, a2, a0
> +1 :
> + REG_L t0, 0(a1)
> + REG_L t1, SZREG(a1)
> +
> + REG_S t0, 0(a0)
> + REG_S t1, SZREG(a0)
> +
> + addi a0, a0, 2 * SZREG
> + addi a1, a1, 2 * SZREG
> + bne a2, a0, 1b
> + .endm
> +
> #endif /* __ASM_ASSEMBLER_H */
> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> index 75419c5ca272..3362da56a9d8 100644
> --- a/arch/riscv/include/asm/suspend.h
> +++ b/arch/riscv/include/asm/suspend.h
> @@ -21,6 +21,11 @@ struct suspend_context {
> #endif
> };
>
> +/*
> + * Used by hibernation core and cleared during resume sequence
> + */
> +extern int in_suspend;
> +
> /* Low-level CPU suspend entry function */
> int __cpu_suspend_enter(struct suspend_context *context);
>
> @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> /* Used to save and restore the csr */
> void suspend_save_csrs(struct suspend_context *context);
> void suspend_restore_csrs(struct suspend_context *context);
> +
> +/* Low-level API to support hibernation */
> +int swsusp_arch_suspend(void);
> +int swsusp_arch_resume(void);
> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> +int arch_hibernation_header_restore(void *addr);
> +int __hibernate_cpu_resume(void);
> +
> +/* Used to resume on the CPU we hibernated on */
> +int hibernate_resume_nonboot_cpu_disable(void);
> +
> +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> + unsigned long cpu_resume);
> +asmlinkage int hibernate_core_restore_code(void);
> #endif
> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> index 4cf303a779ab..daab341d55e4 100644
> --- a/arch/riscv/kernel/Makefile
> +++ b/arch/riscv/kernel/Makefile
> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
>
> obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
>
> obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> index df9444397908..d6a75aac1d27 100644
> --- a/arch/riscv/kernel/asm-offsets.c
> +++ b/arch/riscv/kernel/asm-offsets.c
> @@ -9,6 +9,7 @@
> #include <linux/kbuild.h>
> #include <linux/mm.h>
> #include <linux/sched.h>
> +#include <linux/suspend.h>
> #include <asm/kvm_host.h>
> #include <asm/thread_info.h>
> #include <asm/ptrace.h>
> @@ -116,6 +117,10 @@ void asm_offsets(void)
>
> OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>
> + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> +
> OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> new file mode 100644
> index 000000000000..846affe4dced
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate-asm.S
> @@ -0,0 +1,77 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Hibernation low level support for RISCV.
> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <[email protected]>
> + */
> +
> +#include <asm/asm.h>
> +#include <asm/asm-offsets.h>
> +#include <asm/assembler.h>
> +#include <asm/csr.h>
> +
> +#include <linux/linkage.h>
> +
> +/*
> + * int __hibernate_cpu_resume(void)
> + * Switch back to the hibernated image's page table prior to restoring the CPU
> + * context.
> + *
> + * Always returns 0
> + */
> +ENTRY(__hibernate_cpu_resume)
> + /* switch to hibernated image's page table. */
> + csrw CSR_SATP, s0
> + sfence.vma
> +
> + REG_L a0, hibernate_cpu_context
> +
> + restore_csr
> + restore_reg
> +
> + /* Return zero value. */
> + add a0, zero, zero
> +
> + ret
> +END(__hibernate_cpu_resume)
> +
> +/*
> + * Prepare to restore the image.
> + * a0: satp of saved page tables.
> + * a1: satp of temporary page tables.
> + * a2: cpu_resume.
> + */
> +ENTRY(hibernate_restore_image)
> + mv s0, a0
> + mv s1, a1
> + mv s2, a2
> + REG_L s4, restore_pblist
> + REG_L a1, relocated_restore_code
> +
> + jalr a1
> +END(hibernate_restore_image)
> +
> +/*
> + * The below code will be executed from a 'safe' page.
> + * It first switches to the temporary page table, then starts to copy the pages
> + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> + * to restore the CPU context.
> + */
> +ENTRY(hibernate_core_restore_code)
> + /* switch to temp page table. */
> + csrw satp, s1
> + sfence.vma
> +.Lcopy:
> + /* The below code will restore the hibernated image. */
> + REG_L a1, HIBERN_PBE_ADDR(s4)
> + REG_L a0, HIBERN_PBE_ORIG(s4)
> +
> + copy_page a0, a1
> +
> + REG_L s4, HIBERN_PBE_NEXT(s4)
> + bnez s4, .Lcopy
> +
> + jalr s2
> +END(hibernate_core_restore_code)
> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> new file mode 100644
> index 000000000000..46a2f470db6e
> --- /dev/null
> +++ b/arch/riscv/kernel/hibernate.c
> @@ -0,0 +1,447 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Hibernation support for RISCV
> + *
> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> + *
> + * Author: Jee Heng Sia <[email protected]>
> + */
> +
> +#include <asm/barrier.h>
> +#include <asm/cacheflush.h>
> +#include <asm/mmu_context.h>
> +#include <asm/page.h>
> +#include <asm/pgalloc.h>
> +#include <asm/pgtable.h>
> +#include <asm/sections.h>
> +#include <asm/set_memory.h>
> +#include <asm/smp.h>
> +#include <asm/suspend.h>
> +
> +#include <linux/cpu.h>
> +#include <linux/memblock.h>
> +#include <linux/pm.h>
> +#include <linux/sched.h>
> +#include <linux/suspend.h>
> +#include <linux/utsname.h>
> +
> +/* The logical cpu number we should resume on, initialised to a non-cpu number. */
> +static int sleep_cpu = -EINVAL;
> +
> +/* Pointer to the temporary resume page table. */
> +static pgd_t *resume_pg_dir;
> +
> +/* CPU context to be saved. */
> +struct suspend_context *hibernate_cpu_context;
> +EXPORT_SYMBOL_GPL(hibernate_cpu_context);
> +
> +unsigned long relocated_restore_code;
> +EXPORT_SYMBOL_GPL(relocated_restore_code);
> +
> +/**
> + * struct arch_hibernate_hdr_invariants - container to store kernel build version.
> + * @uts_version: to save the build number and date so that the we do not resume with
> + * a different kernel.
> + */
> +struct arch_hibernate_hdr_invariants {
> + char uts_version[__NEW_UTS_LEN + 1];
> +};
> +
> +/**
> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image.
> + * @invariants: container to store kernel build version.
> + * @hartid: to make sure same boot_cpu executes the hibernate/restore code.
> + * @saved_satp: original page table used by the hibernated image.
> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> + */
> +static struct arch_hibernate_hdr {
> + struct arch_hibernate_hdr_invariants invariants;
> + unsigned long hartid;
> + unsigned long saved_satp;
> + unsigned long restore_cpu_addr;
> +} resume_hdr;
> +
> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> +{
> + memset(i, 0, sizeof(*i));
> + memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> +}
> +
> +/*
> + * Check if the given pfn is in the 'nosave' section.
> + */
> +int pfn_is_nosave(unsigned long pfn)
> +{
> + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> +
> + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> +}
> +
> +void notrace save_processor_state(void)
> +{
> + WARN_ON(num_online_cpus() != 1);
> +}
> +
> +void notrace restore_processor_state(void)
> +{
> +}
> +
> +/*
> + * Helper parameters need to be saved to the hibernation image header.
> + */
> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> +{
> + struct arch_hibernate_hdr *hdr = addr;
> +
> + if (max_size < sizeof(*hdr))
> + return -EOVERFLOW;
> +
> + arch_hdr_invariants(&hdr->invariants);
> +
> + hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> + hdr->saved_satp = csr_read(CSR_SATP);
> + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(arch_hibernation_header_save);
> +
> +/*
> + * Retrieve the helper parameters from the hibernation image header.
> + */
> +int arch_hibernation_header_restore(void *addr)
> +{
> + struct arch_hibernate_hdr_invariants invariants;
> + struct arch_hibernate_hdr *hdr = addr;
> + int ret = 0;
> +
> + arch_hdr_invariants(&invariants);
> +
> + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> + pr_crit("Hibernate image not generated by this kernel!\n");
> + return -EINVAL;
> + }
> +
> + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> + if (sleep_cpu < 0) {
> + pr_crit("Hibernated on a CPU not known to this kernel!\n");
> + sleep_cpu = -EINVAL;
> + return -EINVAL;
> + }
> +
> +#ifdef CONFIG_SMP
> + ret = bringup_hibernate_cpu(sleep_cpu);
> + if (ret) {
> + sleep_cpu = -EINVAL;
> + return ret;
> + }
> +#endif
> + resume_hdr = *hdr;
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore);
> +
> +int swsusp_arch_suspend(void)
> +{
> + int ret = 0;
> +
> + if (__cpu_suspend_enter(hibernate_cpu_context)) {
> + sleep_cpu = smp_processor_id();
> + suspend_save_csrs(hibernate_cpu_context);
> + ret = swsusp_save();
> + } else {
> + suspend_restore_csrs(hibernate_cpu_context);
> + flush_tlb_all();
> + flush_icache_all();
> +
> + /*
> + * Tell the hibernation core that we've just restored the memory.
> + */
> + in_suspend = 0;
> + sleep_cpu = -EINVAL;
> + }
> +
> + return ret;
> +}
> +
> +static unsigned long _temp_pgtable_map_pte(pte_t *dst_ptep, pte_t *src_ptep,
> + unsigned long addr, pgprot_t prot)
> +{
> + pte_t pte = READ_ONCE(*src_ptep);
> +
> + if (pte_present(pte))
> + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot)));
> +
> + return 0;
> +}


I don't see the need for this function


> +
> +static unsigned long temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp,
> + unsigned long start, unsigned long end,
> + pgprot_t prot)
> +{
> + unsigned long addr = start;
> + pte_t *src_ptep;
> + pte_t *dst_ptep;
> +
> + if (pmd_none(READ_ONCE(*dst_pmdp))) {
> + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> + if (!dst_ptep)
> + return -ENOMEM;
> +
> + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep);
> + }
> +
> + dst_ptep = pte_offset_kernel(dst_pmdp, start);
> + src_ptep = pte_offset_kernel(src_pmdp, start);
> +
> + do {
> + _temp_pgtable_map_pte(dst_ptep, src_ptep, addr, prot);
> + } while (dst_ptep++, src_ptep++, addr += PAGE_SIZE, addr < end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp,
> + unsigned long start, unsigned long end,
> + pgprot_t prot)
> +{
> + unsigned long addr = start;
> + unsigned long next;
> + unsigned long ret;
> + pmd_t *src_pmdp;
> + pmd_t *dst_pmdp;
> +
> + if (pud_none(READ_ONCE(*dst_pudp))) {
> + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> + if (!dst_pmdp)
> + return -ENOMEM;
> +
> + pud_populate(NULL, dst_pudp, dst_pmdp);
> + }
> +
> + dst_pmdp = pmd_offset(dst_pudp, start);
> + src_pmdp = pmd_offset(src_pudp, start);
> +
> + do {
> + pmd_t pmd = READ_ONCE(*src_pmdp);
> +
> + next = pmd_addr_end(addr, end);
> +
> + if (pmd_none(pmd))
> + continue;
> +
> + if (pmd_leaf(pmd)) {
> + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot)));
> + } else {
> + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, addr, next, prot);
> + if (ret)
> + return -ENOMEM;
> + }
> + } while (dst_pmdp++, src_pmdp++, addr = next, addr != end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp,
> + unsigned long start,
> + unsigned long end, pgprot_t prot)
> +{
> + unsigned long addr = start;
> + unsigned long next;
> + unsigned long ret;
> + pud_t *dst_pudp;
> + pud_t *src_pudp;
> +
> + if (p4d_none(READ_ONCE(*dst_p4dp))) {
> + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> + if (!dst_pudp)
> + return -ENOMEM;
> +
> + p4d_populate(NULL, dst_p4dp, dst_pudp);
> + }
> +
> + dst_pudp = pud_offset(dst_p4dp, start);
> + src_pudp = pud_offset(src_p4dp, start);
> +
> + do {
> + pud_t pud = READ_ONCE(*src_pudp);
> +
> + next = pud_addr_end(addr, end);
> +
> + if (pud_none(pud))
> + continue;
> +
> + if (pud_leaf(pud)) {
> + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot)));
> + } else {
> + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, addr, next, prot);
> + if (ret)
> + return -ENOMEM;
> + }
> + } while (dst_pudp++, src_pudp++, addr = next, addr != end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp,
> + unsigned long start, unsigned long end,
> + pgprot_t prot)
> +{
> + unsigned long addr = start;


Nit: you don't need the addr variable, you can rename start into addr
and directly work with it.


> + unsigned long next;
> + unsigned long ret;
> + p4d_t *dst_p4dp;
> + p4d_t *src_p4dp;
> +
> + if (pgd_none(READ_ONCE(*dst_pgdp))) {
> + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC);
> + if (!dst_p4dp)
> + return -ENOMEM;
> +
> + pgd_populate(NULL, dst_pgdp, dst_p4dp);
> + }
> +
> + dst_p4dp = p4d_offset(dst_pgdp, start);
> + src_p4dp = p4d_offset(src_pgdp, start);
> +
> + do {
> + p4d_t p4d = READ_ONCE(*src_p4dp);
> +
> + next = p4d_addr_end(addr, end);
> +
> + if (p4d_none(READ_ONCE(*src_p4dp)))


You should use p4d here: p4d_none(p4d)


> + continue;
> +
> + if (p4d_leaf(p4d)) {
> + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot)));


The "| pgprot_val(prot)" happens to work because PAGE_KERNEL will add
the PAGE_WRITE bit: I'd rather make it more clear by explicitly add
PAGE_WRITE.


> + } else {
> + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, addr, next, prot);
> + if (ret)
> + return -ENOMEM;
> + }
> + } while (dst_p4dp++, src_p4dp++, addr = next, addr != end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp)
> +{
> + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn);
> + unsigned long addr = PAGE_OFFSET;
> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
> + pgd_t *src_pgdp = pgd_offset_k(addr);
> + unsigned long next;
> +
> + do {
> + next = pgd_addr_end(addr, end);
> + if (pgd_none(READ_ONCE(*src_pgdp)))
> + continue;
> +


We added the pgd_leaf test in kernel_page_present, let's add it here too.


> + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, next, PAGE_KERNEL))
> + return -ENOMEM;
> + } while (dst_pgdp++, src_pgdp++, addr = next, addr != end);
> +
> + return 0;
> +}
> +
> +static unsigned long temp_pgtable_text_mapping(pgd_t *pgdp, unsigned long addr)
> +{
> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
> + pgd_t *src_pgdp = pgd_offset_k(addr);
> +
> + if (pgd_none(READ_ONCE(*src_pgdp)))
> + return -EFAULT;
> +
> + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, addr, PAGE_KERNEL_EXEC))
> + return -ENOMEM;
> +
> + return 0;
> +}


Ok so if we fall into a huge mapping, you add the exec permission to the
whole range, which could easily be 1GB. I think that either we can avoid
this step by mapping the whole linear mapping as executable, or we
actually use another pgd entry for this page that is not in the linear
mapping. The latter seems cleaner, what do you think?


> +
> +static unsigned long relocate_restore_code(void)
> +{
> + unsigned long ret;
> + void *page = (void *)get_safe_page(GFP_ATOMIC);
> +
> + if (!page)
> + return -ENOMEM;
> +
> + copy_page(page, hibernate_core_restore_code);
> +
> + /* Make the page containing the relocated code executable. */
> + set_memory_x((unsigned long)page, 1);
> +
> + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)page);
> + if (ret)
> + return ret;
> +
> + return (unsigned long)page;
> +}
> +
> +int swsusp_arch_resume(void)
> +{
> + unsigned long ret;
> +
> + /*
> + * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> + * we don't need to free it here.
> + */
> + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> + if (!resume_pg_dir)
> + return -ENOMEM;
> +
> + /*
> + * The pages need to be writable when restoring the image.
> + * Create a second copy of page table just for the linear map.
> + * Use this temporary page table to restore the image.
> + */
> + ret = temp_pgtable_mapping(resume_pg_dir);
> + if (ret)
> + return (int)ret;


The temp_pgtable* functions should return an int to avoid this cast.


> +
> + /* Move the restore code to a new page so that it doesn't get overwritten by itself. */
> + relocated_restore_code = relocate_restore_code();
> + if (relocated_restore_code == -ENOMEM)
> + return -ENOMEM;
> +
> + /*
> + * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> + * restore code can jumps to it after finished restore the image. The next execution
> + * code doesn't find itself in a different address space after switching over to the
> + * original page table used by the hibernated image.
> + */
> + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr);
> + if (ret)
> + return ret;
> +
> + hibernate_restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> + resume_hdr.restore_cpu_addr);
> +
> + return 0;
> +}
> +
> +#ifdef CONFIG_PM_SLEEP_SMP
> +int hibernate_resume_nonboot_cpu_disable(void)
> +{
> + if (sleep_cpu < 0) {
> + pr_err("Failing to resume from hibernate on an unknown CPU\n");
> + return -ENODEV;
> + }
> +
> + return freeze_secondary_cpus(sleep_cpu);
> +}
> +#endif
> +
> +static int __init riscv_hibernate_init(void)
> +{
> + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), GFP_KERNEL);
> +
> + if (WARN_ON(!hibernate_cpu_context))
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +early_initcall(riscv_hibernate_init);


Overall, it is now nicer with the the proper page table walk: but we can
now see that the code is exactly the same as arm64, what prevents us
from merging both somewhere in mm/?



2023-02-27 02:29:47

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Friday, 24 February, 2023 8:07 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> On Fri, Feb 24, 2023 at 10:30:19AM +0000, JeeHeng Sia wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Jones <[email protected]>
> > > Sent: Friday, 24 February, 2023 5:55 PM
> > > To: JeeHeng Sia <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > >
> > > On Fri, Feb 24, 2023 at 09:33:31AM +0000, JeeHeng Sia wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Andrew Jones <[email protected]>
> > > > > Sent: Friday, 24 February, 2023 5:00 PM
> > > > > To: JeeHeng Sia <[email protected]>
> > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > >
> > > > > On Fri, Feb 24, 2023 at 02:05:43AM +0000, JeeHeng Sia wrote:
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Andrew Jones <[email protected]>
> > > > > > > Sent: Friday, 24 February, 2023 2:07 AM
> > > > > > > To: JeeHeng Sia <[email protected]>
> > > > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > > > >
> > > > > > > On Tue, Feb 21, 2023 at 10:35:23AM +0800, Sia Jee Heng wrote:
> > > > > > > > Low level Arch functions were created to support hibernation.
> > > > > > > > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > > > > > > > cpu state onto the stack, then calling swsusp_save() to save the memory
> > > > > > > > image.
> > > > > > > >
> > > > > > > > Arch specific hibernation header is implemented and is utilized by the
> > > > > > > > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > > > > > > > functions. The arch specific hibernation header consists of satp, hartid,
> > > > > > > > and the cpu_resume address. The kernel built version is also need to be
> > > > > > > > saved into the hibernation image header to making sure only the same
> > > > > > > > kernel is restore when resume.
> > > > > > > >
> > > > > > > > swsusp_arch_resume() creates a temporary page table that covering only
> > > > > > > > the linear map. It copies the restore code to a 'safe' page, then start
> > > > > > > > to restore the memory image. Once completed, it restores the original
> > > > > > > > kernel's page table. It then calls into __hibernate_cpu_resume()
> > > > > > > > to restore the CPU context. Finally, it follows the normal hibernation
> > > > > > > > path back to the hibernation core.
> > > > > > > >
> > > > > > > > To enable hibernation/suspend to disk into RISCV, the below config
> > > > > > > > need to be enabled:
> > > > > > > > - CONFIG_ARCH_HIBERNATION_HEADER
> > > > > > > > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> > > > > > > >
> > > > > > > > Signed-off-by: Sia Jee Heng <[email protected]>
> > > > > > > > Reviewed-by: Ley Foon Tan <[email protected]>
> > > > > > > > Reviewed-by: Mason Huo <[email protected]>
> > > > > > > > ---
> > > > > > > > arch/riscv/Kconfig | 7 +
> > > > > > > > arch/riscv/include/asm/assembler.h | 20 ++
> > > > > > > > arch/riscv/include/asm/suspend.h | 19 ++
> > > > > > > > arch/riscv/kernel/Makefile | 1 +
> > > > > > > > arch/riscv/kernel/asm-offsets.c | 5 +
> > > > > > > > arch/riscv/kernel/hibernate-asm.S | 77 +++++
> > > > > > > > arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> > > > > > > > 7 files changed, 576 insertions(+)
> > > > > > > > create mode 100644 arch/riscv/kernel/hibernate-asm.S
> > > > > > > > create mode 100644 arch/riscv/kernel/hibernate.c
> > > > > > > >
> > > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > > > index e2b656043abf..4555848a817f 100644
> > > > > > > > --- a/arch/riscv/Kconfig
> > > > > > > > +++ b/arch/riscv/Kconfig
> > > > > > > > @@ -690,6 +690,13 @@ menu "Power management options"
> > > > > > > >
> > > > > > > > source "kernel/power/Kconfig"
> > > > > > > >
> > > > > > > > +config ARCH_HIBERNATION_POSSIBLE
> > > > > > > > + def_bool y
> > > > > > > > +
> > > > > > > > +config ARCH_HIBERNATION_HEADER
> > > > > > > > + def_bool y
> > > > > > > > + depends on HIBERNATION
> > > > > > >
> > > > > > > nit: I think this can be simplified as def_bool HIBERNATION
> > > > > > good suggestion. will change it.
> > > > > > >
> > > > > > > > +
> > > > > > > > endmenu # "Power management options"
> > > > > > > >
> > > > > > > > menu "CPU Power Management"
> > > > > > > > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > > > > > > > index 727a97735493..68c46c0e0ea8 100644
> > > > > > > > --- a/arch/riscv/include/asm/assembler.h
> > > > > > > > +++ b/arch/riscv/include/asm/assembler.h
> > > > > > > > @@ -59,4 +59,24 @@
> > > > > > > > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > > > > > > > .endm
> > > > > > > >
> > > > > > > > +/*
> > > > > > > > + * copy_page - copy 1 page (4KB) of data from source to destination
> > > > > > > > + * @a0 - destination
> > > > > > > > + * @a1 - source
> > > > > > > > + */
> > > > > > > > + .macro copy_page a0, a1
> > > > > > > > + lui a2, 0x1
> > > > > > > > + add a2, a2, a0
> > > > > > > > +1 :
> > > > > > > ^ please remove this space
> > > > > > can't remove it otherwise checkpatch will throws ERROR: spaces required around that ':'
> > > > >
> > > > > Oh, right, labels in macros have this requirement.
> > > > >
> > > > > > >
> > > > > > > > + REG_L t0, 0(a1)
> > > > > > > > + REG_L t1, SZREG(a1)
> > > > > > > > +
> > > > > > > > + REG_S t0, 0(a0)
> > > > > > > > + REG_S t1, SZREG(a0)
> > > > > > > > +
> > > > > > > > + addi a0, a0, 2 * SZREG
> > > > > > > > + addi a1, a1, 2 * SZREG
> > > > > > > > + bne a2, a0, 1b
> > > > > > > > + .endm
> > > > > > > > +
> > > > > > > > #endif /* __ASM_ASSEMBLER_H */
> > > > > > > > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > > > > > > > index 75419c5ca272..3362da56a9d8 100644
> > > > > > > > --- a/arch/riscv/include/asm/suspend.h
> > > > > > > > +++ b/arch/riscv/include/asm/suspend.h
> > > > > > > > @@ -21,6 +21,11 @@ struct suspend_context {
> > > > > > > > #endif
> > > > > > > > };
> > > > > > > >
> > > > > > > > +/*
> > > > > > > > + * Used by hibernation core and cleared during resume sequence
> > > > > > > > + */
> > > > > > > > +extern int in_suspend;
> > > > > > > > +
> > > > > > > > /* Low-level CPU suspend entry function */
> > > > > > > > int __cpu_suspend_enter(struct suspend_context *context);
> > > > > > > >
> > > > > > > > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> > > > > > > > /* Used to save and restore the csr */
> > > > > > > > void suspend_save_csrs(struct suspend_context *context);
> > > > > > > > void suspend_restore_csrs(struct suspend_context *context);
> > > > > > > > +
> > > > > > > > +/* Low-level API to support hibernation */
> > > > > > > > +int swsusp_arch_suspend(void);
> > > > > > > > +int swsusp_arch_resume(void);
> > > > > > > > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > > > > > > > +int arch_hibernation_header_restore(void *addr);
> > > > > > > > +int __hibernate_cpu_resume(void);
> > > > > > > > +
> > > > > > > > +/* Used to resume on the CPU we hibernated on */
> > > > > > > > +int hibernate_resume_nonboot_cpu_disable(void);
> > > > > > > > +
> > > > > > > > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > > > > > > > + unsigned long cpu_resume);
> > > > > > > > +asmlinkage int hibernate_core_restore_code(void);
> > > > > > > > #endif
> > > > > > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > > > > > index 4cf303a779ab..daab341d55e4 100644
> > > > > > > > --- a/arch/riscv/kernel/Makefile
> > > > > > > > +++ b/arch/riscv/kernel/Makefile
> > > > > > > > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> > > > > > > > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> > > > > > > >
> > > > > > > > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> > > > > > > > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> > > > > > > >
> > > > > > > > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> > > > > > > > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > > > > > > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > > > > > > index df9444397908..d6a75aac1d27 100644
> > > > > > > > --- a/arch/riscv/kernel/asm-offsets.c
> > > > > > > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > > > > > > @@ -9,6 +9,7 @@
> > > > > > > > #include <linux/kbuild.h>
> > > > > > > > #include <linux/mm.h>
> > > > > > > > #include <linux/sched.h>
> > > > > > > > +#include <linux/suspend.h>
> > > > > > > > #include <asm/kvm_host.h>
> > > > > > > > #include <asm/thread_info.h>
> > > > > > > > #include <asm/ptrace.h>
> > > > > > > > @@ -116,6 +117,10 @@ void asm_offsets(void)
> > > > > > > >
> > > > > > > > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> > > > > > > >
> > > > > > > > + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > > > > > > > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > > > > > > > + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > > > > > > > +
> > > > > > > > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> > > > > > > > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> > > > > > > > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > > > > > > > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > > > > > > > new file mode 100644
> > > > > > > > index 000000000000..846affe4dced
> > > > > > > > --- /dev/null
> > > > > > > > +++ b/arch/riscv/kernel/hibernate-asm.S
> > > > > > > > @@ -0,0 +1,77 @@
> > > > > > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > > > > > +/*
> > > > > > > > + * Hibernation low level support for RISCV.
> > > > > > > > + *
> > > > > > > > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > > > > > > > + *
> > > > > > > > + * Author: Jee Heng Sia <[email protected]>
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +#include <asm/asm.h>
> > > > > > > > +#include <asm/asm-offsets.h>
> > > > > > > > +#include <asm/assembler.h>
> > > > > > > > +#include <asm/csr.h>
> > > > > > > > +
> > > > > > > > +#include <linux/linkage.h>
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * int __hibernate_cpu_resume(void)
> > > > > > > > + * Switch back to the hibernated image's page table prior to restoring the CPU
> > > > > > > > + * context.
> > > > > > > > + *
> > > > > > > > + * Always returns 0
> > > > > > > > + */
> > > > > > > > +ENTRY(__hibernate_cpu_resume)
> > > > > > > > + /* switch to hibernated image's page table. */
> > > > > > > > + csrw CSR_SATP, s0
> > > > > > > > + sfence.vma
> > > > > > > > +
> > > > > > > > + REG_L a0, hibernate_cpu_context
> > > > > > > > +
> > > > > > > > + restore_csr
> > > > > > > > + restore_reg
> > > > > > > > +
> > > > > > > > + /* Return zero value. */
> > > > > > > > + add a0, zero, zero
> > > > > > >
> > > > > > > nit: mv a0, zero
> > > > > > sure
> > > > > > >
> > > > > > > > +
> > > > > > > > + ret
> > > > > > > > +END(__hibernate_cpu_resume)
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * Prepare to restore the image.
> > > > > > > > + * a0: satp of saved page tables.
> > > > > > > > + * a1: satp of temporary page tables.
> > > > > > > > + * a2: cpu_resume.
> > > > > > > > + */
> > > > > > > > +ENTRY(hibernate_restore_image)
> > > > > > > > + mv s0, a0
> > > > > > > > + mv s1, a1
> > > > > > > > + mv s2, a2
> > > > > > > > + REG_L s4, restore_pblist
> > > > > > > > + REG_L a1, relocated_restore_code
> > > > > > > > +
> > > > > > > > + jalr a1
> > > > > > > > +END(hibernate_restore_image)
> > > > > > > > +
> > > > > > > > +/*
> > > > > > > > + * The below code will be executed from a 'safe' page.
> > > > > > > > + * It first switches to the temporary page table, then starts to copy the pages
> > > > > > > > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> > > > > > > > + * to restore the CPU context.
> > > > > > > > + */
> > > > > > > > +ENTRY(hibernate_core_restore_code)
> > > > > > > > + /* switch to temp page table. */
> > > > > > > > + csrw satp, s1
> > > > > > > > + sfence.vma
> > > > > > > > +.Lcopy:
> > > > > > > > + /* The below code will restore the hibernated image. */
> > > > > > > > + REG_L a1, HIBERN_PBE_ADDR(s4)
> > > > > > > > + REG_L a0, HIBERN_PBE_ORIG(s4)
> > > > > > >
> > > > > > > Are we sure restore_pblist will never be NULL?
> > > > > > restore_pblist is a link-list, it will be null during initialization or during page clean up by hibernation core. During the initial
> > > resume
> > > > > process, the hibernation core will check the header and load the pages. If everything works correctly, the page will be linked to
> the
> > > > > restore_pblist and then invoke swsusp_arch_resume() else hibernation core will throws error and failed to resume from the
> > > > > hibernated image.
> > > > >
> > > > > I know restore_pblist is a linked-list and this doesn't answer the
> > > > > question. The comment above restore_pblist says
> > > > >
> > > > > /*
> > > > > * List of PBEs needed for restoring the pages that were allocated before
> > > > > * the suspend and included in the suspend image, but have also been
> > > > > * allocated by the "resume" kernel, so their contents cannot be written
> > > > > * directly to their "original" page frames.
> > > > > */
> > > > >
> > > > > which implies the pages that end up on this list are "special". My
> > > > > question is whether or not we're guaranteed to have at least one
> > > > > of these special pages. If not, we shouldn't assume s4 is non-null.
> > > > > If so, then a comment stating why that's guaranteed would be nice.
> > > > The restore_pblist will not be null otherwise swsusp_arch_resume wouldn't get invoked. you can find how the link-list are link
> and
> > > how it checks against validity at https://elixir.bootlin.com/linux/v6.2-rc8/source/kernel/power/snapshot.c . " A comment stating
> why
> > > that's guaranteed would be nice" ? Hmm, perhaps this is out of my scope but I do believe in the page validity checking in the link I
> > > shared.
> > >
> > > Sorry, but pointing to an entire source file (one that I've obviously
> > > already looked at, since I quoted a comment from it...) is not helpful.
> > > I don't see where restore_pblist is being checked before
> > > swsusp_arch_resume() is issued (from its callsite in hibernate.c).
> > Sure, below shows the hibernation flow for your reference. The link-list creation and checking found at:
> https://elixir.bootlin.com/linux/v6.2/source/kernel/power/snapshot.c#L2576
> > software_resume()
> > load_image_and_restore()
> > swsusp_read()
> > load_image()
> > snapshot_write_next()
> > get_buffer() <-- This is the function checks and links the pages to the restore_pblist
>
> Yup, I've read this path, including get_buffer(), where I saw that
> get_buffer() can return an address without allocating a PBE. Where is the
> check that restore_pblist isn't NULL, i.e. we see that at least one PBE
> has been allocated by get_buffer(), before we call swsusp_arch_resume()?
>
> Or, is known that at least one or more pages match the criteria pointed
> out in the comment below (copied from get_buffer())?
>
> /*
> * The "original" page frame has not been allocated and we have to
> * use a "safe" page frame to store the loaded page.
> */
>
> If so, then which ones? And where does it state that?
Let's look at the below pseudocode and hope it clear your doubt. restore_pblist depends on safe_page_list and pbe and both pointers are checked. I couldn't find from where the restore_pblist will be null..
//Pseudocode to illustrate the image loading
initialize restore_pblist to null;
initialize safe_pages_list to null;
Allocate safe page list, return error if failed;
load image;
loop: Create pbe chain, return error if failed;
assign orig_addr and safe_page to pbe;
link pbe to restore_pblist;
return pbe to handle->buffer;
check handle->buffer;
goto loop if no error else return with error;
>
> Thanks,
> drew
>
>
> > hibernation_restore()
> > resume_target_kernel()
> > swsusp_arch_resume()
> > >
> > > Thanks,
> > > drew

2023-02-27 02:32:25

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 2/4] RISC-V: Factor out common code of __cpu_resume_enter()



> -----Original Message-----
> From: Alexandre Ghiti <[email protected]>
> Sent: Friday, 24 February, 2023 6:19 PM
> To: JeeHeng Sia <[email protected]>; [email protected]; [email protected]; [email protected]
> Cc: [email protected]; [email protected]; Leyfoon Tan <[email protected]>; Mason Huo
> <[email protected]>
> Subject: Re: [PATCH v4 2/4] RISC-V: Factor out common code of __cpu_resume_enter()
>
> Hi Sia,
>
> On 2/21/23 03:35, Sia Jee Heng wrote:
> > The cpu_resume() function is very similar for the suspend to disk and
> > suspend to ram cases. Factor out the common code into restore_csr macro
> > and restore_reg macro.
> >
> > Signed-off-by: Sia Jee Heng <[email protected]>
> > ---
> > arch/riscv/include/asm/assembler.h | 62 ++++++++++++++++++++++++++++++
> > arch/riscv/kernel/suspend_entry.S | 34 ++--------------
> > 2 files changed, 65 insertions(+), 31 deletions(-)
> > create mode 100644 arch/riscv/include/asm/assembler.h
> >
> > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > new file mode 100644
> > index 000000000000..727a97735493
> > --- /dev/null
> > +++ b/arch/riscv/include/asm/assembler.h
> > @@ -0,0 +1,62 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <[email protected]>
> > + */
> > +
> > +#ifndef __ASSEMBLY__
> > +#error "Only include this from assembly code"
> > +#endif
> > +
> > +#ifndef __ASM_ASSEMBLER_H
> > +#define __ASM_ASSEMBLER_H
> > +
> > +#include <asm/asm.h>
> > +#include <asm/asm-offsets.h>
> > +#include <asm/csr.h>
> > +
> > +/*
> > + * restore_csr - restore hart's CSR value
> > + */
> > + .macro restore_csr
> > + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> > + csrw CSR_EPC, t0
> > + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> > + csrw CSR_STATUS, t0
> > + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> > + csrw CSR_TVAL, t0
> > + REG_L t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> > + csrw CSR_CAUSE, t0
> > + .endm
> > +
> > +/*
> > + * restore_reg - Restore registers (except A0 and T0-T6)
> > + */
> > + .macro restore_reg
> > + REG_L ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> > + REG_L sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> > + REG_L gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> > + REG_L tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> > + REG_L s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> > + REG_L s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> > + REG_L a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> > + REG_L a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> > + REG_L a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> > + REG_L a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> > + REG_L a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> > + REG_L a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> > + REG_L a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> > + REG_L s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> > + REG_L s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> > + REG_L s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> > + REG_L s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> > + REG_L s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> > + REG_L s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> > + REG_L s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> > + REG_L s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> > + REG_L s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> > + REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > + .endm
> > +
> > +#endif /* __ASM_ASSEMBLER_H */
>
>
> You introduce assembler.h which in the future may contain other assembly
> functions not related to suspend: I'd rename those macros so that we
> know they are suspend related, something like
> suspend_restore_regs/suspend_restore_csrs.
Sure, these can be done.
>
> And instead of (SUSPEND_CONTEXT_REGS + PT_XXX) you could introduce
> SUSPEND_CONTEXT_REGS_PT_XXX in asm-offsets.c?
There are already PT_XXX defined in the asm-offset.c, we should not create another set of SUSPEND_CONTEXT_REGS_PT_XXX because we can just re-use the definition instead of duplicate another set of offset which are doing the same thing. So, I would rather stick with the current definition.
DEFINE(PT_SIZE, sizeof(struct pt_regs));
OFFSET(PT_EPC, pt_regs, epc);
OFFSET(PT_RA, pt_regs, ra);
OFFSET(PT_FP, pt_regs, s0);
OFFSET(PT_S0, pt_regs, s0);
OFFSET(PT_S1, pt_regs, s1);
OFFSET(PT_S2, pt_regs, s2);
>
>
> > diff --git a/arch/riscv/kernel/suspend_entry.S b/arch/riscv/kernel/suspend_entry.S
> > index aafcca58c19d..74a8fab8e0f6 100644
> > --- a/arch/riscv/kernel/suspend_entry.S
> > +++ b/arch/riscv/kernel/suspend_entry.S
> > @@ -7,6 +7,7 @@
> > #include <linux/linkage.h>
> > #include <asm/asm.h>
> > #include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> > #include <asm/csr.h>
> > #include <asm/xip_fixup.h>
> >
> > @@ -83,39 +84,10 @@ ENTRY(__cpu_resume_enter)
> > add a0, a1, zero
> >
> > /* Restore CSRs */
> > - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_EPC)(a0)
> > - csrw CSR_EPC, t0
> > - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_STATUS)(a0)
> > - csrw CSR_STATUS, t0
> > - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_BADADDR)(a0)
> > - csrw CSR_TVAL, t0
> > - REG_L t0, (SUSPEND_CONTEXT_REGS + PT_CAUSE)(a0)
> > - csrw CSR_CAUSE, t0
> > + restore_csr
> >
> > /* Restore registers (except A0 and T0-T6) */
> > - REG_L ra, (SUSPEND_CONTEXT_REGS + PT_RA)(a0)
> > - REG_L sp, (SUSPEND_CONTEXT_REGS + PT_SP)(a0)
> > - REG_L gp, (SUSPEND_CONTEXT_REGS + PT_GP)(a0)
> > - REG_L tp, (SUSPEND_CONTEXT_REGS + PT_TP)(a0)
> > - REG_L s0, (SUSPEND_CONTEXT_REGS + PT_S0)(a0)
> > - REG_L s1, (SUSPEND_CONTEXT_REGS + PT_S1)(a0)
> > - REG_L a1, (SUSPEND_CONTEXT_REGS + PT_A1)(a0)
> > - REG_L a2, (SUSPEND_CONTEXT_REGS + PT_A2)(a0)
> > - REG_L a3, (SUSPEND_CONTEXT_REGS + PT_A3)(a0)
> > - REG_L a4, (SUSPEND_CONTEXT_REGS + PT_A4)(a0)
> > - REG_L a5, (SUSPEND_CONTEXT_REGS + PT_A5)(a0)
> > - REG_L a6, (SUSPEND_CONTEXT_REGS + PT_A6)(a0)
> > - REG_L a7, (SUSPEND_CONTEXT_REGS + PT_A7)(a0)
> > - REG_L s2, (SUSPEND_CONTEXT_REGS + PT_S2)(a0)
> > - REG_L s3, (SUSPEND_CONTEXT_REGS + PT_S3)(a0)
> > - REG_L s4, (SUSPEND_CONTEXT_REGS + PT_S4)(a0)
> > - REG_L s5, (SUSPEND_CONTEXT_REGS + PT_S5)(a0)
> > - REG_L s6, (SUSPEND_CONTEXT_REGS + PT_S6)(a0)
> > - REG_L s7, (SUSPEND_CONTEXT_REGS + PT_S7)(a0)
> > - REG_L s8, (SUSPEND_CONTEXT_REGS + PT_S8)(a0)
> > - REG_L s9, (SUSPEND_CONTEXT_REGS + PT_S9)(a0)
> > - REG_L s10, (SUSPEND_CONTEXT_REGS + PT_S10)(a0)
> > - REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > + restore_reg
> >
> > /* Return zero value */
> > add a0, zero, zero

2023-02-27 03:11:52

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk



> -----Original Message-----
> From: Alexandre Ghiti <[email protected]>
> Sent: Friday, 24 February, 2023 8:29 PM
> To: JeeHeng Sia <[email protected]>; [email protected]; [email protected]; [email protected]
> Cc: [email protected]; [email protected]; Leyfoon Tan <[email protected]>; Mason Huo
> <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> On 2/21/23 03:35, Sia Jee Heng wrote:
> > Low level Arch functions were created to support hibernation.
> > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > cpu state onto the stack, then calling swsusp_save() to save the memory
> > image.
> >
> > Arch specific hibernation header is implemented and is utilized by the
> > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > functions. The arch specific hibernation header consists of satp, hartid,
> > and the cpu_resume address. The kernel built version is also need to be
> > saved into the hibernation image header to making sure only the same
> > kernel is restore when resume.
> >
> > swsusp_arch_resume() creates a temporary page table that covering only
> > the linear map. It copies the restore code to a 'safe' page, then start
> > to restore the memory image. Once completed, it restores the original
> > kernel's page table. It then calls into __hibernate_cpu_resume()
> > to restore the CPU context. Finally, it follows the normal hibernation
> > path back to the hibernation core.
> >
> > To enable hibernation/suspend to disk into RISCV, the below config
> > need to be enabled:
> > - CONFIG_ARCH_HIBERNATION_HEADER
> > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >
> > Signed-off-by: Sia Jee Heng <[email protected]>
> > Reviewed-by: Ley Foon Tan <[email protected]>
> > Reviewed-by: Mason Huo <[email protected]>
> > ---
> > arch/riscv/Kconfig | 7 +
> > arch/riscv/include/asm/assembler.h | 20 ++
> > arch/riscv/include/asm/suspend.h | 19 ++
> > arch/riscv/kernel/Makefile | 1 +
> > arch/riscv/kernel/asm-offsets.c | 5 +
> > arch/riscv/kernel/hibernate-asm.S | 77 +++++
> > arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> > 7 files changed, 576 insertions(+)
> > create mode 100644 arch/riscv/kernel/hibernate-asm.S
> > create mode 100644 arch/riscv/kernel/hibernate.c
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index e2b656043abf..4555848a817f 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -690,6 +690,13 @@ menu "Power management options"
> >
> > source "kernel/power/Kconfig"
> >
> > +config ARCH_HIBERNATION_POSSIBLE
> > + def_bool y
> > +
> > +config ARCH_HIBERNATION_HEADER
> > + def_bool y
> > + depends on HIBERNATION
> > +
> > endmenu # "Power management options"
> >
> > menu "CPU Power Management"
> > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > index 727a97735493..68c46c0e0ea8 100644
> > --- a/arch/riscv/include/asm/assembler.h
> > +++ b/arch/riscv/include/asm/assembler.h
> > @@ -59,4 +59,24 @@
> > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > .endm
> >
> > +/*
> > + * copy_page - copy 1 page (4KB) of data from source to destination
> > + * @a0 - destination
> > + * @a1 - source
> > + */
> > + .macro copy_page a0, a1
> > + lui a2, 0x1
> > + add a2, a2, a0
> > +1 :
> > + REG_L t0, 0(a1)
> > + REG_L t1, SZREG(a1)
> > +
> > + REG_S t0, 0(a0)
> > + REG_S t1, SZREG(a0)
> > +
> > + addi a0, a0, 2 * SZREG
> > + addi a1, a1, 2 * SZREG
> > + bne a2, a0, 1b
> > + .endm
> > +
> > #endif /* __ASM_ASSEMBLER_H */
> > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > index 75419c5ca272..3362da56a9d8 100644
> > --- a/arch/riscv/include/asm/suspend.h
> > +++ b/arch/riscv/include/asm/suspend.h
> > @@ -21,6 +21,11 @@ struct suspend_context {
> > #endif
> > };
> >
> > +/*
> > + * Used by hibernation core and cleared during resume sequence
> > + */
> > +extern int in_suspend;
> > +
> > /* Low-level CPU suspend entry function */
> > int __cpu_suspend_enter(struct suspend_context *context);
> >
> > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> > /* Used to save and restore the csr */
> > void suspend_save_csrs(struct suspend_context *context);
> > void suspend_restore_csrs(struct suspend_context *context);
> > +
> > +/* Low-level API to support hibernation */
> > +int swsusp_arch_suspend(void);
> > +int swsusp_arch_resume(void);
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > +int arch_hibernation_header_restore(void *addr);
> > +int __hibernate_cpu_resume(void);
> > +
> > +/* Used to resume on the CPU we hibernated on */
> > +int hibernate_resume_nonboot_cpu_disable(void);
> > +
> > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > + unsigned long cpu_resume);
> > +asmlinkage int hibernate_core_restore_code(void);
> > #endif
> > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > index 4cf303a779ab..daab341d55e4 100644
> > --- a/arch/riscv/kernel/Makefile
> > +++ b/arch/riscv/kernel/Makefile
> > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> >
> > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> >
> > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > index df9444397908..d6a75aac1d27 100644
> > --- a/arch/riscv/kernel/asm-offsets.c
> > +++ b/arch/riscv/kernel/asm-offsets.c
> > @@ -9,6 +9,7 @@
> > #include <linux/kbuild.h>
> > #include <linux/mm.h>
> > #include <linux/sched.h>
> > +#include <linux/suspend.h>
> > #include <asm/kvm_host.h>
> > #include <asm/thread_info.h>
> > #include <asm/ptrace.h>
> > @@ -116,6 +117,10 @@ void asm_offsets(void)
> >
> > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >
> > + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > +
> > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > new file mode 100644
> > index 000000000000..846affe4dced
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate-asm.S
> > @@ -0,0 +1,77 @@
> > +/* SPDX-License-Identifier: GPL-2.0-only */
> > +/*
> > + * Hibernation low level support for RISCV.
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <[email protected]>
> > + */
> > +
> > +#include <asm/asm.h>
> > +#include <asm/asm-offsets.h>
> > +#include <asm/assembler.h>
> > +#include <asm/csr.h>
> > +
> > +#include <linux/linkage.h>
> > +
> > +/*
> > + * int __hibernate_cpu_resume(void)
> > + * Switch back to the hibernated image's page table prior to restoring the CPU
> > + * context.
> > + *
> > + * Always returns 0
> > + */
> > +ENTRY(__hibernate_cpu_resume)
> > + /* switch to hibernated image's page table. */
> > + csrw CSR_SATP, s0
> > + sfence.vma
> > +
> > + REG_L a0, hibernate_cpu_context
> > +
> > + restore_csr
> > + restore_reg
> > +
> > + /* Return zero value. */
> > + add a0, zero, zero
> > +
> > + ret
> > +END(__hibernate_cpu_resume)
> > +
> > +/*
> > + * Prepare to restore the image.
> > + * a0: satp of saved page tables.
> > + * a1: satp of temporary page tables.
> > + * a2: cpu_resume.
> > + */
> > +ENTRY(hibernate_restore_image)
> > + mv s0, a0
> > + mv s1, a1
> > + mv s2, a2
> > + REG_L s4, restore_pblist
> > + REG_L a1, relocated_restore_code
> > +
> > + jalr a1
> > +END(hibernate_restore_image)
> > +
> > +/*
> > + * The below code will be executed from a 'safe' page.
> > + * It first switches to the temporary page table, then starts to copy the pages
> > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> > + * to restore the CPU context.
> > + */
> > +ENTRY(hibernate_core_restore_code)
> > + /* switch to temp page table. */
> > + csrw satp, s1
> > + sfence.vma
> > +.Lcopy:
> > + /* The below code will restore the hibernated image. */
> > + REG_L a1, HIBERN_PBE_ADDR(s4)
> > + REG_L a0, HIBERN_PBE_ORIG(s4)
> > +
> > + copy_page a0, a1
> > +
> > + REG_L s4, HIBERN_PBE_NEXT(s4)
> > + bnez s4, .Lcopy
> > +
> > + jalr s2
> > +END(hibernate_core_restore_code)
> > diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> > new file mode 100644
> > index 000000000000..46a2f470db6e
> > --- /dev/null
> > +++ b/arch/riscv/kernel/hibernate.c
> > @@ -0,0 +1,447 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Hibernation support for RISCV
> > + *
> > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > + *
> > + * Author: Jee Heng Sia <[email protected]>
> > + */
> > +
> > +#include <asm/barrier.h>
> > +#include <asm/cacheflush.h>
> > +#include <asm/mmu_context.h>
> > +#include <asm/page.h>
> > +#include <asm/pgalloc.h>
> > +#include <asm/pgtable.h>
> > +#include <asm/sections.h>
> > +#include <asm/set_memory.h>
> > +#include <asm/smp.h>
> > +#include <asm/suspend.h>
> > +
> > +#include <linux/cpu.h>
> > +#include <linux/memblock.h>
> > +#include <linux/pm.h>
> > +#include <linux/sched.h>
> > +#include <linux/suspend.h>
> > +#include <linux/utsname.h>
> > +
> > +/* The logical cpu number we should resume on, initialised to a non-cpu number. */
> > +static int sleep_cpu = -EINVAL;
> > +
> > +/* Pointer to the temporary resume page table. */
> > +static pgd_t *resume_pg_dir;
> > +
> > +/* CPU context to be saved. */
> > +struct suspend_context *hibernate_cpu_context;
> > +EXPORT_SYMBOL_GPL(hibernate_cpu_context);
> > +
> > +unsigned long relocated_restore_code;
> > +EXPORT_SYMBOL_GPL(relocated_restore_code);
> > +
> > +/**
> > + * struct arch_hibernate_hdr_invariants - container to store kernel build version.
> > + * @uts_version: to save the build number and date so that the we do not resume with
> > + * a different kernel.
> > + */
> > +struct arch_hibernate_hdr_invariants {
> > + char uts_version[__NEW_UTS_LEN + 1];
> > +};
> > +
> > +/**
> > + * struct arch_hibernate_hdr - helper parameters that help us to restore the image.
> > + * @invariants: container to store kernel build version.
> > + * @hartid: to make sure same boot_cpu executes the hibernate/restore code.
> > + * @saved_satp: original page table used by the hibernated image.
> > + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> > + */
> > +static struct arch_hibernate_hdr {
> > + struct arch_hibernate_hdr_invariants invariants;
> > + unsigned long hartid;
> > + unsigned long saved_satp;
> > + unsigned long restore_cpu_addr;
> > +} resume_hdr;
> > +
> > +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> > +{
> > + memset(i, 0, sizeof(*i));
> > + memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> > +}
> > +
> > +/*
> > + * Check if the given pfn is in the 'nosave' section.
> > + */
> > +int pfn_is_nosave(unsigned long pfn)
> > +{
> > + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> > + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> > +
> > + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> > +}
> > +
> > +void notrace save_processor_state(void)
> > +{
> > + WARN_ON(num_online_cpus() != 1);
> > +}
> > +
> > +void notrace restore_processor_state(void)
> > +{
> > +}
> > +
> > +/*
> > + * Helper parameters need to be saved to the hibernation image header.
> > + */
> > +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> > +{
> > + struct arch_hibernate_hdr *hdr = addr;
> > +
> > + if (max_size < sizeof(*hdr))
> > + return -EOVERFLOW;
> > +
> > + arch_hdr_invariants(&hdr->invariants);
> > +
> > + hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> > + hdr->saved_satp = csr_read(CSR_SATP);
> > + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> > +
> > + return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(arch_hibernation_header_save);
> > +
> > +/*
> > + * Retrieve the helper parameters from the hibernation image header.
> > + */
> > +int arch_hibernation_header_restore(void *addr)
> > +{
> > + struct arch_hibernate_hdr_invariants invariants;
> > + struct arch_hibernate_hdr *hdr = addr;
> > + int ret = 0;
> > +
> > + arch_hdr_invariants(&invariants);
> > +
> > + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> > + pr_crit("Hibernate image not generated by this kernel!\n");
> > + return -EINVAL;
> > + }
> > +
> > + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> > + if (sleep_cpu < 0) {
> > + pr_crit("Hibernated on a CPU not known to this kernel!\n");
> > + sleep_cpu = -EINVAL;
> > + return -EINVAL;
> > + }
> > +
> > +#ifdef CONFIG_SMP
> > + ret = bringup_hibernate_cpu(sleep_cpu);
> > + if (ret) {
> > + sleep_cpu = -EINVAL;
> > + return ret;
> > + }
> > +#endif
> > + resume_hdr = *hdr;
> > +
> > + return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore);
> > +
> > +int swsusp_arch_suspend(void)
> > +{
> > + int ret = 0;
> > +
> > + if (__cpu_suspend_enter(hibernate_cpu_context)) {
> > + sleep_cpu = smp_processor_id();
> > + suspend_save_csrs(hibernate_cpu_context);
> > + ret = swsusp_save();
> > + } else {
> > + suspend_restore_csrs(hibernate_cpu_context);
> > + flush_tlb_all();
> > + flush_icache_all();
> > +
> > + /*
> > + * Tell the hibernation core that we've just restored the memory.
> > + */
> > + in_suspend = 0;
> > + sleep_cpu = -EINVAL;
> > + }
> > +
> > + return ret;
> > +}
> > +
> > +static unsigned long _temp_pgtable_map_pte(pte_t *dst_ptep, pte_t *src_ptep,
> > + unsigned long addr, pgprot_t prot)
> > +{
> > + pte_t pte = READ_ONCE(*src_ptep);
> > +
> > + if (pte_present(pte))
> > + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot)));
> > +
> > + return 0;
> > +}
>
>
> I don't see the need for this function
Sure, can remove it.
>
>
> > +
> > +static unsigned long temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp,
> > + unsigned long start, unsigned long end,
> > + pgprot_t prot)
> > +{
> > + unsigned long addr = start;
> > + pte_t *src_ptep;
> > + pte_t *dst_ptep;
> > +
> > + if (pmd_none(READ_ONCE(*dst_pmdp))) {
> > + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> > + if (!dst_ptep)
> > + return -ENOMEM;
> > +
> > + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep);
> > + }
> > +
> > + dst_ptep = pte_offset_kernel(dst_pmdp, start);
> > + src_ptep = pte_offset_kernel(src_pmdp, start);
> > +
> > + do {
> > + _temp_pgtable_map_pte(dst_ptep, src_ptep, addr, prot);
> > + } while (dst_ptep++, src_ptep++, addr += PAGE_SIZE, addr < end);
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp,
> > + unsigned long start, unsigned long end,
> > + pgprot_t prot)
> > +{
> > + unsigned long addr = start;
> > + unsigned long next;
> > + unsigned long ret;
> > + pmd_t *src_pmdp;
> > + pmd_t *dst_pmdp;
> > +
> > + if (pud_none(READ_ONCE(*dst_pudp))) {
> > + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> > + if (!dst_pmdp)
> > + return -ENOMEM;
> > +
> > + pud_populate(NULL, dst_pudp, dst_pmdp);
> > + }
> > +
> > + dst_pmdp = pmd_offset(dst_pudp, start);
> > + src_pmdp = pmd_offset(src_pudp, start);
> > +
> > + do {
> > + pmd_t pmd = READ_ONCE(*src_pmdp);
> > +
> > + next = pmd_addr_end(addr, end);
> > +
> > + if (pmd_none(pmd))
> > + continue;
> > +
> > + if (pmd_leaf(pmd)) {
> > + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot)));
> > + } else {
> > + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, addr, next, prot);
> > + if (ret)
> > + return -ENOMEM;
> > + }
> > + } while (dst_pmdp++, src_pmdp++, addr = next, addr != end);
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp,
> > + unsigned long start,
> > + unsigned long end, pgprot_t prot)
> > +{
> > + unsigned long addr = start;
> > + unsigned long next;
> > + unsigned long ret;
> > + pud_t *dst_pudp;
> > + pud_t *src_pudp;
> > +
> > + if (p4d_none(READ_ONCE(*dst_p4dp))) {
> > + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> > + if (!dst_pudp)
> > + return -ENOMEM;
> > +
> > + p4d_populate(NULL, dst_p4dp, dst_pudp);
> > + }
> > +
> > + dst_pudp = pud_offset(dst_p4dp, start);
> > + src_pudp = pud_offset(src_p4dp, start);
> > +
> > + do {
> > + pud_t pud = READ_ONCE(*src_pudp);
> > +
> > + next = pud_addr_end(addr, end);
> > +
> > + if (pud_none(pud))
> > + continue;
> > +
> > + if (pud_leaf(pud)) {
> > + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot)));
> > + } else {
> > + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, addr, next, prot);
> > + if (ret)
> > + return -ENOMEM;
> > + }
> > + } while (dst_pudp++, src_pudp++, addr = next, addr != end);
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp,
> > + unsigned long start, unsigned long end,
> > + pgprot_t prot)
> > +{
> > + unsigned long addr = start;
>
>
> Nit: you don't need the addr variable, you can rename start into addr
> and directly work with it.
sure.
>
>
> > + unsigned long next;
> > + unsigned long ret;
> > + p4d_t *dst_p4dp;
> > + p4d_t *src_p4dp;
> > +
> > + if (pgd_none(READ_ONCE(*dst_pgdp))) {
> > + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC);
> > + if (!dst_p4dp)
> > + return -ENOMEM;
> > +
> > + pgd_populate(NULL, dst_pgdp, dst_p4dp);
> > + }
> > +
> > + dst_p4dp = p4d_offset(dst_pgdp, start);
> > + src_p4dp = p4d_offset(src_pgdp, start);
> > +
> > + do {
> > + p4d_t p4d = READ_ONCE(*src_p4dp);
> > +
> > + next = p4d_addr_end(addr, end);
> > +
> > + if (p4d_none(READ_ONCE(*src_p4dp)))
>
>
> You should use p4d here: p4d_none(p4d)
sure
>
>
> > + continue;
> > +
> > + if (p4d_leaf(p4d)) {
> > + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot)));
>
>
> The "| pgprot_val(prot)" happens to work because PAGE_KERNEL will add
> the PAGE_WRITE bit: I'd rather make it more clear by explicitly add
> PAGE_WRITE.
sure, this can be done.
>
>
> > + } else {
> > + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, addr, next, prot);
> > + if (ret)
> > + return -ENOMEM;
> > + }
> > + } while (dst_p4dp++, src_p4dp++, addr = next, addr != end);
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long temp_pgtable_mapping(pgd_t *pgdp)
> > +{
> > + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn);
> > + unsigned long addr = PAGE_OFFSET;
> > + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
> > + pgd_t *src_pgdp = pgd_offset_k(addr);
> > + unsigned long next;
> > +
> > + do {
> > + next = pgd_addr_end(addr, end);
> > + if (pgd_none(READ_ONCE(*src_pgdp)))
> > + continue;
> > +
>
>
> We added the pgd_leaf test in kernel_page_present, let's add it here too.
sure.
>
>
> > + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, next, PAGE_KERNEL))
> > + return -ENOMEM;
> > + } while (dst_pgdp++, src_pgdp++, addr = next, addr != end);
> > +
> > + return 0;
> > +}
> > +
> > +static unsigned long temp_pgtable_text_mapping(pgd_t *pgdp, unsigned long addr)
> > +{
> > + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
> > + pgd_t *src_pgdp = pgd_offset_k(addr);
> > +
> > + if (pgd_none(READ_ONCE(*src_pgdp)))
> > + return -EFAULT;
> > +
> > + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, addr, PAGE_KERNEL_EXEC))
> > + return -ENOMEM;
> > +
> > + return 0;
> > +}
>
>
> Ok so if we fall into a huge mapping, you add the exec permission to the
> whole range, which could easily be 1GB. I think that either we can avoid
> this step by mapping the whole linear mapping as executable, or we
> actually use another pgd entry for this page that is not in the linear
> mapping. The latter seems cleaner, what do you think?
we can map the whole linear address to writable & executable, by doing this, we can avoid the remapping at the linear map again.
we still need to use the same pgd entry for the non-linear mapping just like how it did at the swapper_pg_dir (linear and non-linear addr are within the range supported by the pgd).
>
>
> > +
> > +static unsigned long relocate_restore_code(void)
> > +{
> > + unsigned long ret;
> > + void *page = (void *)get_safe_page(GFP_ATOMIC);
> > +
> > + if (!page)
> > + return -ENOMEM;
> > +
> > + copy_page(page, hibernate_core_restore_code);
> > +
> > + /* Make the page containing the relocated code executable. */
> > + set_memory_x((unsigned long)page, 1);
> > +
> > + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)page);
> > + if (ret)
> > + return ret;
> > +
> > + return (unsigned long)page;
> > +}
> > +
> > +int swsusp_arch_resume(void)
> > +{
> > + unsigned long ret;
> > +
> > + /*
> > + * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> > + * we don't need to free it here.
> > + */
> > + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> > + if (!resume_pg_dir)
> > + return -ENOMEM;
> > +
> > + /*
> > + * The pages need to be writable when restoring the image.
> > + * Create a second copy of page table just for the linear map.
> > + * Use this temporary page table to restore the image.
> > + */
> > + ret = temp_pgtable_mapping(resume_pg_dir);
> > + if (ret)
> > + return (int)ret;
>
>
> The temp_pgtable* functions should return an int to avoid this cast.
>
>
> > +
> > + /* Move the restore code to a new page so that it doesn't get overwritten by itself. */
> > + relocated_restore_code = relocate_restore_code();
> > + if (relocated_restore_code == -ENOMEM)
> > + return -ENOMEM;
> > +
> > + /*
> > + * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> > + * restore code can jumps to it after finished restore the image. The next execution
> > + * code doesn't find itself in a different address space after switching over to the
> > + * original page table used by the hibernated image.
> > + */
> > + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr);
> > + if (ret)
> > + return ret;
> > +
> > + hibernate_restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> > + resume_hdr.restore_cpu_addr);
> > +
> > + return 0;
> > +}
> > +
> > +#ifdef CONFIG_PM_SLEEP_SMP
> > +int hibernate_resume_nonboot_cpu_disable(void)
> > +{
> > + if (sleep_cpu < 0) {
> > + pr_err("Failing to resume from hibernate on an unknown CPU\n");
> > + return -ENODEV;
> > + }
> > +
> > + return freeze_secondary_cpus(sleep_cpu);
> > +}
> > +#endif
> > +
> > +static int __init riscv_hibernate_init(void)
> > +{
> > + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), GFP_KERNEL);
> > +
> > + if (WARN_ON(!hibernate_cpu_context))
> > + return -ENOMEM;
> > +
> > + return 0;
> > +}
> > +
> > +early_initcall(riscv_hibernate_init);
>
>
> Overall, it is now nicer with the the proper page table walk: but we can
> now see that the code is exactly the same as arm64, what prevents us
> from merging both somewhere in mm/?
1. low level page table bit definition not the same
2. Need to refactor code for both riscv and arm64
3. Need to verify the solution for both riscv and arm64 platforms (need someone who expertise on arm64)
4. Might need to extend the function to support other arch
5. Overall, it is do-able but the effort to support the above matters are huge.

>

2023-02-27 07:59:57

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

On Mon, Feb 27, 2023 at 02:14:27AM +0000, JeeHeng Sia wrote:
>
>
> > -----Original Message-----
> > From: Andrew Jones <[email protected]>
> > Sent: Friday, 24 February, 2023 8:07 PM
> > To: JeeHeng Sia <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >
> > On Fri, Feb 24, 2023 at 10:30:19AM +0000, JeeHeng Sia wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Andrew Jones <[email protected]>
> > > > Sent: Friday, 24 February, 2023 5:55 PM
> > > > To: JeeHeng Sia <[email protected]>
> > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > >
> > > > On Fri, Feb 24, 2023 at 09:33:31AM +0000, JeeHeng Sia wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Andrew Jones <[email protected]>
> > > > > > Sent: Friday, 24 February, 2023 5:00 PM
> > > > > > To: JeeHeng Sia <[email protected]>
> > > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > > >
> > > > > > On Fri, Feb 24, 2023 at 02:05:43AM +0000, JeeHeng Sia wrote:
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Andrew Jones <[email protected]>
> > > > > > > > Sent: Friday, 24 February, 2023 2:07 AM
> > > > > > > > To: JeeHeng Sia <[email protected]>
> > > > > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > > > > >
> > > > > > > > On Tue, Feb 21, 2023 at 10:35:23AM +0800, Sia Jee Heng wrote:
> > > > > > > > > Low level Arch functions were created to support hibernation.
> > > > > > > > > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > > > > > > > > cpu state onto the stack, then calling swsusp_save() to save the memory
> > > > > > > > > image.
> > > > > > > > >
> > > > > > > > > Arch specific hibernation header is implemented and is utilized by the
> > > > > > > > > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > > > > > > > > functions. The arch specific hibernation header consists of satp, hartid,
> > > > > > > > > and the cpu_resume address. The kernel built version is also need to be
> > > > > > > > > saved into the hibernation image header to making sure only the same
> > > > > > > > > kernel is restore when resume.
> > > > > > > > >
> > > > > > > > > swsusp_arch_resume() creates a temporary page table that covering only
> > > > > > > > > the linear map. It copies the restore code to a 'safe' page, then start
> > > > > > > > > to restore the memory image. Once completed, it restores the original
> > > > > > > > > kernel's page table. It then calls into __hibernate_cpu_resume()
> > > > > > > > > to restore the CPU context. Finally, it follows the normal hibernation
> > > > > > > > > path back to the hibernation core.
> > > > > > > > >
> > > > > > > > > To enable hibernation/suspend to disk into RISCV, the below config
> > > > > > > > > need to be enabled:
> > > > > > > > > - CONFIG_ARCH_HIBERNATION_HEADER
> > > > > > > > > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> > > > > > > > >
> > > > > > > > > Signed-off-by: Sia Jee Heng <[email protected]>
> > > > > > > > > Reviewed-by: Ley Foon Tan <[email protected]>
> > > > > > > > > Reviewed-by: Mason Huo <[email protected]>
> > > > > > > > > ---
> > > > > > > > > arch/riscv/Kconfig | 7 +
> > > > > > > > > arch/riscv/include/asm/assembler.h | 20 ++
> > > > > > > > > arch/riscv/include/asm/suspend.h | 19 ++
> > > > > > > > > arch/riscv/kernel/Makefile | 1 +
> > > > > > > > > arch/riscv/kernel/asm-offsets.c | 5 +
> > > > > > > > > arch/riscv/kernel/hibernate-asm.S | 77 +++++
> > > > > > > > > arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> > > > > > > > > 7 files changed, 576 insertions(+)
> > > > > > > > > create mode 100644 arch/riscv/kernel/hibernate-asm.S
> > > > > > > > > create mode 100644 arch/riscv/kernel/hibernate.c
> > > > > > > > >
> > > > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > > > > index e2b656043abf..4555848a817f 100644
> > > > > > > > > --- a/arch/riscv/Kconfig
> > > > > > > > > +++ b/arch/riscv/Kconfig
> > > > > > > > > @@ -690,6 +690,13 @@ menu "Power management options"
> > > > > > > > >
> > > > > > > > > source "kernel/power/Kconfig"
> > > > > > > > >
> > > > > > > > > +config ARCH_HIBERNATION_POSSIBLE
> > > > > > > > > + def_bool y
> > > > > > > > > +
> > > > > > > > > +config ARCH_HIBERNATION_HEADER
> > > > > > > > > + def_bool y
> > > > > > > > > + depends on HIBERNATION
> > > > > > > >
> > > > > > > > nit: I think this can be simplified as def_bool HIBERNATION
> > > > > > > good suggestion. will change it.
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > endmenu # "Power management options"
> > > > > > > > >
> > > > > > > > > menu "CPU Power Management"
> > > > > > > > > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > > > > > > > > index 727a97735493..68c46c0e0ea8 100644
> > > > > > > > > --- a/arch/riscv/include/asm/assembler.h
> > > > > > > > > +++ b/arch/riscv/include/asm/assembler.h
> > > > > > > > > @@ -59,4 +59,24 @@
> > > > > > > > > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > > > > > > > > .endm
> > > > > > > > >
> > > > > > > > > +/*
> > > > > > > > > + * copy_page - copy 1 page (4KB) of data from source to destination
> > > > > > > > > + * @a0 - destination
> > > > > > > > > + * @a1 - source
> > > > > > > > > + */
> > > > > > > > > + .macro copy_page a0, a1
> > > > > > > > > + lui a2, 0x1
> > > > > > > > > + add a2, a2, a0
> > > > > > > > > +1 :
> > > > > > > > ^ please remove this space
> > > > > > > can't remove it otherwise checkpatch will throws ERROR: spaces required around that ':'
> > > > > >
> > > > > > Oh, right, labels in macros have this requirement.
> > > > > >
> > > > > > > >
> > > > > > > > > + REG_L t0, 0(a1)
> > > > > > > > > + REG_L t1, SZREG(a1)
> > > > > > > > > +
> > > > > > > > > + REG_S t0, 0(a0)
> > > > > > > > > + REG_S t1, SZREG(a0)
> > > > > > > > > +
> > > > > > > > > + addi a0, a0, 2 * SZREG
> > > > > > > > > + addi a1, a1, 2 * SZREG
> > > > > > > > > + bne a2, a0, 1b
> > > > > > > > > + .endm
> > > > > > > > > +
> > > > > > > > > #endif /* __ASM_ASSEMBLER_H */
> > > > > > > > > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > > > > > > > > index 75419c5ca272..3362da56a9d8 100644
> > > > > > > > > --- a/arch/riscv/include/asm/suspend.h
> > > > > > > > > +++ b/arch/riscv/include/asm/suspend.h
> > > > > > > > > @@ -21,6 +21,11 @@ struct suspend_context {
> > > > > > > > > #endif
> > > > > > > > > };
> > > > > > > > >
> > > > > > > > > +/*
> > > > > > > > > + * Used by hibernation core and cleared during resume sequence
> > > > > > > > > + */
> > > > > > > > > +extern int in_suspend;
> > > > > > > > > +
> > > > > > > > > /* Low-level CPU suspend entry function */
> > > > > > > > > int __cpu_suspend_enter(struct suspend_context *context);
> > > > > > > > >
> > > > > > > > > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> > > > > > > > > /* Used to save and restore the csr */
> > > > > > > > > void suspend_save_csrs(struct suspend_context *context);
> > > > > > > > > void suspend_restore_csrs(struct suspend_context *context);
> > > > > > > > > +
> > > > > > > > > +/* Low-level API to support hibernation */
> > > > > > > > > +int swsusp_arch_suspend(void);
> > > > > > > > > +int swsusp_arch_resume(void);
> > > > > > > > > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > > > > > > > > +int arch_hibernation_header_restore(void *addr);
> > > > > > > > > +int __hibernate_cpu_resume(void);
> > > > > > > > > +
> > > > > > > > > +/* Used to resume on the CPU we hibernated on */
> > > > > > > > > +int hibernate_resume_nonboot_cpu_disable(void);
> > > > > > > > > +
> > > > > > > > > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > > > > > > > > + unsigned long cpu_resume);
> > > > > > > > > +asmlinkage int hibernate_core_restore_code(void);
> > > > > > > > > #endif
> > > > > > > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > > > > > > index 4cf303a779ab..daab341d55e4 100644
> > > > > > > > > --- a/arch/riscv/kernel/Makefile
> > > > > > > > > +++ b/arch/riscv/kernel/Makefile
> > > > > > > > > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> > > > > > > > > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> > > > > > > > >
> > > > > > > > > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> > > > > > > > > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> > > > > > > > >
> > > > > > > > > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> > > > > > > > > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > > > > > > > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > > > > > > > index df9444397908..d6a75aac1d27 100644
> > > > > > > > > --- a/arch/riscv/kernel/asm-offsets.c
> > > > > > > > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > > > > > > > @@ -9,6 +9,7 @@
> > > > > > > > > #include <linux/kbuild.h>
> > > > > > > > > #include <linux/mm.h>
> > > > > > > > > #include <linux/sched.h>
> > > > > > > > > +#include <linux/suspend.h>
> > > > > > > > > #include <asm/kvm_host.h>
> > > > > > > > > #include <asm/thread_info.h>
> > > > > > > > > #include <asm/ptrace.h>
> > > > > > > > > @@ -116,6 +117,10 @@ void asm_offsets(void)
> > > > > > > > >
> > > > > > > > > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> > > > > > > > >
> > > > > > > > > + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > > > > > > > > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > > > > > > > > + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > > > > > > > > +
> > > > > > > > > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> > > > > > > > > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> > > > > > > > > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > > > > > > > > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > > > > > > > > new file mode 100644
> > > > > > > > > index 000000000000..846affe4dced
> > > > > > > > > --- /dev/null
> > > > > > > > > +++ b/arch/riscv/kernel/hibernate-asm.S
> > > > > > > > > @@ -0,0 +1,77 @@
> > > > > > > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > > > > > > +/*
> > > > > > > > > + * Hibernation low level support for RISCV.
> > > > > > > > > + *
> > > > > > > > > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > > > > > > > > + *
> > > > > > > > > + * Author: Jee Heng Sia <[email protected]>
> > > > > > > > > + */
> > > > > > > > > +
> > > > > > > > > +#include <asm/asm.h>
> > > > > > > > > +#include <asm/asm-offsets.h>
> > > > > > > > > +#include <asm/assembler.h>
> > > > > > > > > +#include <asm/csr.h>
> > > > > > > > > +
> > > > > > > > > +#include <linux/linkage.h>
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * int __hibernate_cpu_resume(void)
> > > > > > > > > + * Switch back to the hibernated image's page table prior to restoring the CPU
> > > > > > > > > + * context.
> > > > > > > > > + *
> > > > > > > > > + * Always returns 0
> > > > > > > > > + */
> > > > > > > > > +ENTRY(__hibernate_cpu_resume)
> > > > > > > > > + /* switch to hibernated image's page table. */
> > > > > > > > > + csrw CSR_SATP, s0
> > > > > > > > > + sfence.vma
> > > > > > > > > +
> > > > > > > > > + REG_L a0, hibernate_cpu_context
> > > > > > > > > +
> > > > > > > > > + restore_csr
> > > > > > > > > + restore_reg
> > > > > > > > > +
> > > > > > > > > + /* Return zero value. */
> > > > > > > > > + add a0, zero, zero
> > > > > > > >
> > > > > > > > nit: mv a0, zero
> > > > > > > sure
> > > > > > > >
> > > > > > > > > +
> > > > > > > > > + ret
> > > > > > > > > +END(__hibernate_cpu_resume)
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * Prepare to restore the image.
> > > > > > > > > + * a0: satp of saved page tables.
> > > > > > > > > + * a1: satp of temporary page tables.
> > > > > > > > > + * a2: cpu_resume.
> > > > > > > > > + */
> > > > > > > > > +ENTRY(hibernate_restore_image)
> > > > > > > > > + mv s0, a0
> > > > > > > > > + mv s1, a1
> > > > > > > > > + mv s2, a2
> > > > > > > > > + REG_L s4, restore_pblist
> > > > > > > > > + REG_L a1, relocated_restore_code
> > > > > > > > > +
> > > > > > > > > + jalr a1
> > > > > > > > > +END(hibernate_restore_image)
> > > > > > > > > +
> > > > > > > > > +/*
> > > > > > > > > + * The below code will be executed from a 'safe' page.
> > > > > > > > > + * It first switches to the temporary page table, then starts to copy the pages
> > > > > > > > > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> > > > > > > > > + * to restore the CPU context.
> > > > > > > > > + */
> > > > > > > > > +ENTRY(hibernate_core_restore_code)
> > > > > > > > > + /* switch to temp page table. */
> > > > > > > > > + csrw satp, s1
> > > > > > > > > + sfence.vma
> > > > > > > > > +.Lcopy:
> > > > > > > > > + /* The below code will restore the hibernated image. */
> > > > > > > > > + REG_L a1, HIBERN_PBE_ADDR(s4)
> > > > > > > > > + REG_L a0, HIBERN_PBE_ORIG(s4)
> > > > > > > >
> > > > > > > > Are we sure restore_pblist will never be NULL?
> > > > > > > restore_pblist is a link-list, it will be null during initialization or during page clean up by hibernation core. During the initial
> > > > resume
> > > > > > process, the hibernation core will check the header and load the pages. If everything works correctly, the page will be linked to
> > the
> > > > > > restore_pblist and then invoke swsusp_arch_resume() else hibernation core will throws error and failed to resume from the
> > > > > > hibernated image.
> > > > > >
> > > > > > I know restore_pblist is a linked-list and this doesn't answer the
> > > > > > question. The comment above restore_pblist says
> > > > > >
> > > > > > /*
> > > > > > * List of PBEs needed for restoring the pages that were allocated before
> > > > > > * the suspend and included in the suspend image, but have also been
> > > > > > * allocated by the "resume" kernel, so their contents cannot be written
> > > > > > * directly to their "original" page frames.
> > > > > > */
> > > > > >
> > > > > > which implies the pages that end up on this list are "special". My
> > > > > > question is whether or not we're guaranteed to have at least one
> > > > > > of these special pages. If not, we shouldn't assume s4 is non-null.
> > > > > > If so, then a comment stating why that's guaranteed would be nice.
> > > > > The restore_pblist will not be null otherwise swsusp_arch_resume wouldn't get invoked. you can find how the link-list are link
> > and
> > > > how it checks against validity at https://elixir.bootlin.com/linux/v6.2-rc8/source/kernel/power/snapshot.c . " A comment stating
> > why
> > > > that's guaranteed would be nice" ? Hmm, perhaps this is out of my scope but I do believe in the page validity checking in the link I
> > > > shared.
> > > >
> > > > Sorry, but pointing to an entire source file (one that I've obviously
> > > > already looked at, since I quoted a comment from it...) is not helpful.
> > > > I don't see where restore_pblist is being checked before
> > > > swsusp_arch_resume() is issued (from its callsite in hibernate.c).
> > > Sure, below shows the hibernation flow for your reference. The link-list creation and checking found at:
> > https://elixir.bootlin.com/linux/v6.2/source/kernel/power/snapshot.c#L2576
> > > software_resume()
> > > load_image_and_restore()
> > > swsusp_read()
> > > load_image()
> > > snapshot_write_next()
> > > get_buffer() <-- This is the function checks and links the pages to the restore_pblist
> >
> > Yup, I've read this path, including get_buffer(), where I saw that
> > get_buffer() can return an address without allocating a PBE. Where is the
> > check that restore_pblist isn't NULL, i.e. we see that at least one PBE
> > has been allocated by get_buffer(), before we call swsusp_arch_resume()?
> >
> > Or, is known that at least one or more pages match the criteria pointed
> > out in the comment below (copied from get_buffer())?
> >
> > /*
> > * The "original" page frame has not been allocated and we have to
> > * use a "safe" page frame to store the loaded page.
> > */
> >
> > If so, then which ones? And where does it state that?
> Let's look at the below pseudocode and hope it clear your doubt. restore_pblist depends on safe_page_list and pbe and both pointers are checked. I couldn't find from where the restore_pblist will be null..
> //Pseudocode to illustrate the image loading
> initialize restore_pblist to null;
> initialize safe_pages_list to null;
> Allocate safe page list, return error if failed;
> load image;
> loop: Create pbe chain, return error if failed;

This loop pseudocode is incomplete. It's

loop:
if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
return page_address(page);
Create pbe chain, return error if failed;
...

which I pointed out explicitly in my last reply. Also, as I asked in my
last reply (and have been asking four times now, albeit less explicitly
the first two times), how do we know at least one PBE will be linked?
Or, even more specifically this time, where is the proof that for each
hibernation resume, there exists some page such that
!swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?

Thanks,
drew

> assign orig_addr and safe_page to pbe;
> link pbe to restore_pblist;
> return pbe to handle->buffer;
> check handle->buffer;
> goto loop if no error else return with error;
> >
> > Thanks,
> > drew
> >
> >
> > > hibernation_restore()
> > > resume_target_kernel()
> > > swsusp_arch_resume()
> > > >
> > > > Thanks,
> > > > drew

2023-02-27 10:52:45

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Monday, 27 February, 2023 4:00 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> On Mon, Feb 27, 2023 at 02:14:27AM +0000, JeeHeng Sia wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Jones <[email protected]>
> > > Sent: Friday, 24 February, 2023 8:07 PM
> > > To: JeeHeng Sia <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > >
> > > On Fri, Feb 24, 2023 at 10:30:19AM +0000, JeeHeng Sia wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Andrew Jones <[email protected]>
> > > > > Sent: Friday, 24 February, 2023 5:55 PM
> > > > > To: JeeHeng Sia <[email protected]>
> > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > >
> > > > > On Fri, Feb 24, 2023 at 09:33:31AM +0000, JeeHeng Sia wrote:
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Andrew Jones <[email protected]>
> > > > > > > Sent: Friday, 24 February, 2023 5:00 PM
> > > > > > > To: JeeHeng Sia <[email protected]>
> > > > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > > > >
> > > > > > > On Fri, Feb 24, 2023 at 02:05:43AM +0000, JeeHeng Sia wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Andrew Jones <[email protected]>
> > > > > > > > > Sent: Friday, 24 February, 2023 2:07 AM
> > > > > > > > > To: JeeHeng Sia <[email protected]>
> > > > > > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > > > > > >
> > > > > > > > > On Tue, Feb 21, 2023 at 10:35:23AM +0800, Sia Jee Heng wrote:
> > > > > > > > > > Low level Arch functions were created to support hibernation.
> > > > > > > > > > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > > > > > > > > > cpu state onto the stack, then calling swsusp_save() to save the memory
> > > > > > > > > > image.
> > > > > > > > > >
> > > > > > > > > > Arch specific hibernation header is implemented and is utilized by the
> > > > > > > > > > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > > > > > > > > > functions. The arch specific hibernation header consists of satp, hartid,
> > > > > > > > > > and the cpu_resume address. The kernel built version is also need to be
> > > > > > > > > > saved into the hibernation image header to making sure only the same
> > > > > > > > > > kernel is restore when resume.
> > > > > > > > > >
> > > > > > > > > > swsusp_arch_resume() creates a temporary page table that covering only
> > > > > > > > > > the linear map. It copies the restore code to a 'safe' page, then start
> > > > > > > > > > to restore the memory image. Once completed, it restores the original
> > > > > > > > > > kernel's page table. It then calls into __hibernate_cpu_resume()
> > > > > > > > > > to restore the CPU context. Finally, it follows the normal hibernation
> > > > > > > > > > path back to the hibernation core.
> > > > > > > > > >
> > > > > > > > > > To enable hibernation/suspend to disk into RISCV, the below config
> > > > > > > > > > need to be enabled:
> > > > > > > > > > - CONFIG_ARCH_HIBERNATION_HEADER
> > > > > > > > > > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Sia Jee Heng <[email protected]>
> > > > > > > > > > Reviewed-by: Ley Foon Tan <[email protected]>
> > > > > > > > > > Reviewed-by: Mason Huo <[email protected]>
> > > > > > > > > > ---
> > > > > > > > > > arch/riscv/Kconfig | 7 +
> > > > > > > > > > arch/riscv/include/asm/assembler.h | 20 ++
> > > > > > > > > > arch/riscv/include/asm/suspend.h | 19 ++
> > > > > > > > > > arch/riscv/kernel/Makefile | 1 +
> > > > > > > > > > arch/riscv/kernel/asm-offsets.c | 5 +
> > > > > > > > > > arch/riscv/kernel/hibernate-asm.S | 77 +++++
> > > > > > > > > > arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> > > > > > > > > > 7 files changed, 576 insertions(+)
> > > > > > > > > > create mode 100644 arch/riscv/kernel/hibernate-asm.S
> > > > > > > > > > create mode 100644 arch/riscv/kernel/hibernate.c
> > > > > > > > > >
> > > > > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > > > > > index e2b656043abf..4555848a817f 100644
> > > > > > > > > > --- a/arch/riscv/Kconfig
> > > > > > > > > > +++ b/arch/riscv/Kconfig
> > > > > > > > > > @@ -690,6 +690,13 @@ menu "Power management options"
> > > > > > > > > >
> > > > > > > > > > source "kernel/power/Kconfig"
> > > > > > > > > >
> > > > > > > > > > +config ARCH_HIBERNATION_POSSIBLE
> > > > > > > > > > + def_bool y
> > > > > > > > > > +
> > > > > > > > > > +config ARCH_HIBERNATION_HEADER
> > > > > > > > > > + def_bool y
> > > > > > > > > > + depends on HIBERNATION
> > > > > > > > >
> > > > > > > > > nit: I think this can be simplified as def_bool HIBERNATION
> > > > > > > > good suggestion. will change it.
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > endmenu # "Power management options"
> > > > > > > > > >
> > > > > > > > > > menu "CPU Power Management"
> > > > > > > > > > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > > > > > > > > > index 727a97735493..68c46c0e0ea8 100644
> > > > > > > > > > --- a/arch/riscv/include/asm/assembler.h
> > > > > > > > > > +++ b/arch/riscv/include/asm/assembler.h
> > > > > > > > > > @@ -59,4 +59,24 @@
> > > > > > > > > > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > > > > > > > > > .endm
> > > > > > > > > >
> > > > > > > > > > +/*
> > > > > > > > > > + * copy_page - copy 1 page (4KB) of data from source to destination
> > > > > > > > > > + * @a0 - destination
> > > > > > > > > > + * @a1 - source
> > > > > > > > > > + */
> > > > > > > > > > + .macro copy_page a0, a1
> > > > > > > > > > + lui a2, 0x1
> > > > > > > > > > + add a2, a2, a0
> > > > > > > > > > +1 :
> > > > > > > > > ^ please remove this space
> > > > > > > > can't remove it otherwise checkpatch will throws ERROR: spaces required around that ':'
> > > > > > >
> > > > > > > Oh, right, labels in macros have this requirement.
> > > > > > >
> > > > > > > > >
> > > > > > > > > > + REG_L t0, 0(a1)
> > > > > > > > > > + REG_L t1, SZREG(a1)
> > > > > > > > > > +
> > > > > > > > > > + REG_S t0, 0(a0)
> > > > > > > > > > + REG_S t1, SZREG(a0)
> > > > > > > > > > +
> > > > > > > > > > + addi a0, a0, 2 * SZREG
> > > > > > > > > > + addi a1, a1, 2 * SZREG
> > > > > > > > > > + bne a2, a0, 1b
> > > > > > > > > > + .endm
> > > > > > > > > > +
> > > > > > > > > > #endif /* __ASM_ASSEMBLER_H */
> > > > > > > > > > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > > > > > > > > > index 75419c5ca272..3362da56a9d8 100644
> > > > > > > > > > --- a/arch/riscv/include/asm/suspend.h
> > > > > > > > > > +++ b/arch/riscv/include/asm/suspend.h
> > > > > > > > > > @@ -21,6 +21,11 @@ struct suspend_context {
> > > > > > > > > > #endif
> > > > > > > > > > };
> > > > > > > > > >
> > > > > > > > > > +/*
> > > > > > > > > > + * Used by hibernation core and cleared during resume sequence
> > > > > > > > > > + */
> > > > > > > > > > +extern int in_suspend;
> > > > > > > > > > +
> > > > > > > > > > /* Low-level CPU suspend entry function */
> > > > > > > > > > int __cpu_suspend_enter(struct suspend_context *context);
> > > > > > > > > >
> > > > > > > > > > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> > > > > > > > > > /* Used to save and restore the csr */
> > > > > > > > > > void suspend_save_csrs(struct suspend_context *context);
> > > > > > > > > > void suspend_restore_csrs(struct suspend_context *context);
> > > > > > > > > > +
> > > > > > > > > > +/* Low-level API to support hibernation */
> > > > > > > > > > +int swsusp_arch_suspend(void);
> > > > > > > > > > +int swsusp_arch_resume(void);
> > > > > > > > > > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > > > > > > > > > +int arch_hibernation_header_restore(void *addr);
> > > > > > > > > > +int __hibernate_cpu_resume(void);
> > > > > > > > > > +
> > > > > > > > > > +/* Used to resume on the CPU we hibernated on */
> > > > > > > > > > +int hibernate_resume_nonboot_cpu_disable(void);
> > > > > > > > > > +
> > > > > > > > > > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > > > > > > > > > + unsigned long cpu_resume);
> > > > > > > > > > +asmlinkage int hibernate_core_restore_code(void);
> > > > > > > > > > #endif
> > > > > > > > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > > > > > > > index 4cf303a779ab..daab341d55e4 100644
> > > > > > > > > > --- a/arch/riscv/kernel/Makefile
> > > > > > > > > > +++ b/arch/riscv/kernel/Makefile
> > > > > > > > > > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> > > > > > > > > > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> > > > > > > > > >
> > > > > > > > > > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> > > > > > > > > > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> > > > > > > > > >
> > > > > > > > > > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> > > > > > > > > > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > > > > > > > > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > > > > > > > > index df9444397908..d6a75aac1d27 100644
> > > > > > > > > > --- a/arch/riscv/kernel/asm-offsets.c
> > > > > > > > > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > > > > > > > > @@ -9,6 +9,7 @@
> > > > > > > > > > #include <linux/kbuild.h>
> > > > > > > > > > #include <linux/mm.h>
> > > > > > > > > > #include <linux/sched.h>
> > > > > > > > > > +#include <linux/suspend.h>
> > > > > > > > > > #include <asm/kvm_host.h>
> > > > > > > > > > #include <asm/thread_info.h>
> > > > > > > > > > #include <asm/ptrace.h>
> > > > > > > > > > @@ -116,6 +117,10 @@ void asm_offsets(void)
> > > > > > > > > >
> > > > > > > > > > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> > > > > > > > > >
> > > > > > > > > > + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > > > > > > > > > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > > > > > > > > > + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > > > > > > > > > +
> > > > > > > > > > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> > > > > > > > > > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> > > > > > > > > > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > > > > > > > > > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > > > > > > > > > new file mode 100644
> > > > > > > > > > index 000000000000..846affe4dced
> > > > > > > > > > --- /dev/null
> > > > > > > > > > +++ b/arch/riscv/kernel/hibernate-asm.S
> > > > > > > > > > @@ -0,0 +1,77 @@
> > > > > > > > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > > > > > > > +/*
> > > > > > > > > > + * Hibernation low level support for RISCV.
> > > > > > > > > > + *
> > > > > > > > > > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > > > > > > > > > + *
> > > > > > > > > > + * Author: Jee Heng Sia <[email protected]>
> > > > > > > > > > + */
> > > > > > > > > > +
> > > > > > > > > > +#include <asm/asm.h>
> > > > > > > > > > +#include <asm/asm-offsets.h>
> > > > > > > > > > +#include <asm/assembler.h>
> > > > > > > > > > +#include <asm/csr.h>
> > > > > > > > > > +
> > > > > > > > > > +#include <linux/linkage.h>
> > > > > > > > > > +
> > > > > > > > > > +/*
> > > > > > > > > > + * int __hibernate_cpu_resume(void)
> > > > > > > > > > + * Switch back to the hibernated image's page table prior to restoring the CPU
> > > > > > > > > > + * context.
> > > > > > > > > > + *
> > > > > > > > > > + * Always returns 0
> > > > > > > > > > + */
> > > > > > > > > > +ENTRY(__hibernate_cpu_resume)
> > > > > > > > > > + /* switch to hibernated image's page table. */
> > > > > > > > > > + csrw CSR_SATP, s0
> > > > > > > > > > + sfence.vma
> > > > > > > > > > +
> > > > > > > > > > + REG_L a0, hibernate_cpu_context
> > > > > > > > > > +
> > > > > > > > > > + restore_csr
> > > > > > > > > > + restore_reg
> > > > > > > > > > +
> > > > > > > > > > + /* Return zero value. */
> > > > > > > > > > + add a0, zero, zero
> > > > > > > > >
> > > > > > > > > nit: mv a0, zero
> > > > > > > > sure
> > > > > > > > >
> > > > > > > > > > +
> > > > > > > > > > + ret
> > > > > > > > > > +END(__hibernate_cpu_resume)
> > > > > > > > > > +
> > > > > > > > > > +/*
> > > > > > > > > > + * Prepare to restore the image.
> > > > > > > > > > + * a0: satp of saved page tables.
> > > > > > > > > > + * a1: satp of temporary page tables.
> > > > > > > > > > + * a2: cpu_resume.
> > > > > > > > > > + */
> > > > > > > > > > +ENTRY(hibernate_restore_image)
> > > > > > > > > > + mv s0, a0
> > > > > > > > > > + mv s1, a1
> > > > > > > > > > + mv s2, a2
> > > > > > > > > > + REG_L s4, restore_pblist
> > > > > > > > > > + REG_L a1, relocated_restore_code
> > > > > > > > > > +
> > > > > > > > > > + jalr a1
> > > > > > > > > > +END(hibernate_restore_image)
> > > > > > > > > > +
> > > > > > > > > > +/*
> > > > > > > > > > + * The below code will be executed from a 'safe' page.
> > > > > > > > > > + * It first switches to the temporary page table, then starts to copy the pages
> > > > > > > > > > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> > > > > > > > > > + * to restore the CPU context.
> > > > > > > > > > + */
> > > > > > > > > > +ENTRY(hibernate_core_restore_code)
> > > > > > > > > > + /* switch to temp page table. */
> > > > > > > > > > + csrw satp, s1
> > > > > > > > > > + sfence.vma
> > > > > > > > > > +.Lcopy:
> > > > > > > > > > + /* The below code will restore the hibernated image. */
> > > > > > > > > > + REG_L a1, HIBERN_PBE_ADDR(s4)
> > > > > > > > > > + REG_L a0, HIBERN_PBE_ORIG(s4)
> > > > > > > > >
> > > > > > > > > Are we sure restore_pblist will never be NULL?
> > > > > > > > restore_pblist is a link-list, it will be null during initialization or during page clean up by hibernation core. During the initial
> > > > > resume
> > > > > > > process, the hibernation core will check the header and load the pages. If everything works correctly, the page will be linked
> to
> > > the
> > > > > > > restore_pblist and then invoke swsusp_arch_resume() else hibernation core will throws error and failed to resume from
> the
> > > > > > > hibernated image.
> > > > > > >
> > > > > > > I know restore_pblist is a linked-list and this doesn't answer the
> > > > > > > question. The comment above restore_pblist says
> > > > > > >
> > > > > > > /*
> > > > > > > * List of PBEs needed for restoring the pages that were allocated before
> > > > > > > * the suspend and included in the suspend image, but have also been
> > > > > > > * allocated by the "resume" kernel, so their contents cannot be written
> > > > > > > * directly to their "original" page frames.
> > > > > > > */
> > > > > > >
> > > > > > > which implies the pages that end up on this list are "special". My
> > > > > > > question is whether or not we're guaranteed to have at least one
> > > > > > > of these special pages. If not, we shouldn't assume s4 is non-null.
> > > > > > > If so, then a comment stating why that's guaranteed would be nice.
> > > > > > The restore_pblist will not be null otherwise swsusp_arch_resume wouldn't get invoked. you can find how the link-list are
> link
> > > and
> > > > > how it checks against validity at https://elixir.bootlin.com/linux/v6.2-rc8/source/kernel/power/snapshot.c . " A comment
> stating
> > > why
> > > > > that's guaranteed would be nice" ? Hmm, perhaps this is out of my scope but I do believe in the page validity checking in the
> link I
> > > > > shared.
> > > > >
> > > > > Sorry, but pointing to an entire source file (one that I've obviously
> > > > > already looked at, since I quoted a comment from it...) is not helpful.
> > > > > I don't see where restore_pblist is being checked before
> > > > > swsusp_arch_resume() is issued (from its callsite in hibernate.c).
> > > > Sure, below shows the hibernation flow for your reference. The link-list creation and checking found at:
> > > https://elixir.bootlin.com/linux/v6.2/source/kernel/power/snapshot.c#L2576
> > > > software_resume()
> > > > load_image_and_restore()
> > > > swsusp_read()
> > > > load_image()
> > > > snapshot_write_next()
> > > > get_buffer() <-- This is the function checks and links the pages to the restore_pblist
> > >
> > > Yup, I've read this path, including get_buffer(), where I saw that
> > > get_buffer() can return an address without allocating a PBE. Where is the
> > > check that restore_pblist isn't NULL, i.e. we see that at least one PBE
> > > has been allocated by get_buffer(), before we call swsusp_arch_resume()?
> > >
> > > Or, is known that at least one or more pages match the criteria pointed
> > > out in the comment below (copied from get_buffer())?
> > >
> > > /*
> > > * The "original" page frame has not been allocated and we have to
> > > * use a "safe" page frame to store the loaded page.
> > > */
> > >
> > > If so, then which ones? And where does it state that?
> > Let's look at the below pseudocode and hope it clear your doubt. restore_pblist depends on safe_page_list and pbe and both
> pointers are checked. I couldn't find from where the restore_pblist will be null..
> > //Pseudocode to illustrate the image loading
> > initialize restore_pblist to null;
> > initialize safe_pages_list to null;
> > Allocate safe page list, return error if failed;
> > load image;
> > loop: Create pbe chain, return error if failed;
>
> This loop pseudocode is incomplete. It's
>
> loop:
> if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> return page_address(page);
> Create pbe chain, return error if failed;
> ...
>
> which I pointed out explicitly in my last reply. Also, as I asked in my
> last reply (and have been asking four times now, albeit less explicitly
> the first two times), how do we know at least one PBE will be linked?
1 PBE correspond to 1 page, you shouldn't expect only 1 page is saved. Hibernation core will do the calculation. If the PBEs (restore_pblist) linked successfully, the hibernated image will be restore else normal boot will take place.
> Or, even more specifically this time, where is the proof that for each
> hibernation resume, there exists some page such that
> !swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?
forbidden_pages and free_pages are not contributed to the restore_pblist (as you already aware from the code). Infact, the forbidden_pages and free_pages are not save into the disk.
>
> Thanks,
> drew
>
> > assign orig_addr and safe_page to pbe;
> > link pbe to restore_pblist;
> > return pbe to handle->buffer;
> > check handle->buffer;
> > goto loop if no error else return with error;
> > >
> > > Thanks,
> > > drew
> > >
> > >
> > > > hibernation_restore()
> > > > resume_target_kernel()
> > > > swsusp_arch_resume()
> > > > >
> > > > > Thanks,
> > > > > drew

2023-02-27 11:44:50

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

On Mon, Feb 27, 2023 at 10:52:32AM +0000, JeeHeng Sia wrote:
>
>
> > -----Original Message-----
> > From: Andrew Jones <[email protected]>
> > Sent: Monday, 27 February, 2023 4:00 PM
> > To: JeeHeng Sia <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >
> > On Mon, Feb 27, 2023 at 02:14:27AM +0000, JeeHeng Sia wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Andrew Jones <[email protected]>
> > > > Sent: Friday, 24 February, 2023 8:07 PM
> > > > To: JeeHeng Sia <[email protected]>
> > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > >
> > > > On Fri, Feb 24, 2023 at 10:30:19AM +0000, JeeHeng Sia wrote:
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Andrew Jones <[email protected]>
> > > > > > Sent: Friday, 24 February, 2023 5:55 PM
> > > > > > To: JeeHeng Sia <[email protected]>
> > > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > > >
> > > > > > On Fri, Feb 24, 2023 at 09:33:31AM +0000, JeeHeng Sia wrote:
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Andrew Jones <[email protected]>
> > > > > > > > Sent: Friday, 24 February, 2023 5:00 PM
> > > > > > > > To: JeeHeng Sia <[email protected]>
> > > > > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > > > > >
> > > > > > > > On Fri, Feb 24, 2023 at 02:05:43AM +0000, JeeHeng Sia wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Andrew Jones <[email protected]>
> > > > > > > > > > Sent: Friday, 24 February, 2023 2:07 AM
> > > > > > > > > > To: JeeHeng Sia <[email protected]>
> > > > > > > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > > > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > > > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > > > > > > >
> > > > > > > > > > On Tue, Feb 21, 2023 at 10:35:23AM +0800, Sia Jee Heng wrote:
> > > > > > > > > > > Low level Arch functions were created to support hibernation.
> > > > > > > > > > > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > > > > > > > > > > cpu state onto the stack, then calling swsusp_save() to save the memory
> > > > > > > > > > > image.
> > > > > > > > > > >
> > > > > > > > > > > Arch specific hibernation header is implemented and is utilized by the
> > > > > > > > > > > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > > > > > > > > > > functions. The arch specific hibernation header consists of satp, hartid,
> > > > > > > > > > > and the cpu_resume address. The kernel built version is also need to be
> > > > > > > > > > > saved into the hibernation image header to making sure only the same
> > > > > > > > > > > kernel is restore when resume.
> > > > > > > > > > >
> > > > > > > > > > > swsusp_arch_resume() creates a temporary page table that covering only
> > > > > > > > > > > the linear map. It copies the restore code to a 'safe' page, then start
> > > > > > > > > > > to restore the memory image. Once completed, it restores the original
> > > > > > > > > > > kernel's page table. It then calls into __hibernate_cpu_resume()
> > > > > > > > > > > to restore the CPU context. Finally, it follows the normal hibernation
> > > > > > > > > > > path back to the hibernation core.
> > > > > > > > > > >
> > > > > > > > > > > To enable hibernation/suspend to disk into RISCV, the below config
> > > > > > > > > > > need to be enabled:
> > > > > > > > > > > - CONFIG_ARCH_HIBERNATION_HEADER
> > > > > > > > > > > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Sia Jee Heng <[email protected]>
> > > > > > > > > > > Reviewed-by: Ley Foon Tan <[email protected]>
> > > > > > > > > > > Reviewed-by: Mason Huo <[email protected]>
> > > > > > > > > > > ---
> > > > > > > > > > > arch/riscv/Kconfig | 7 +
> > > > > > > > > > > arch/riscv/include/asm/assembler.h | 20 ++
> > > > > > > > > > > arch/riscv/include/asm/suspend.h | 19 ++
> > > > > > > > > > > arch/riscv/kernel/Makefile | 1 +
> > > > > > > > > > > arch/riscv/kernel/asm-offsets.c | 5 +
> > > > > > > > > > > arch/riscv/kernel/hibernate-asm.S | 77 +++++
> > > > > > > > > > > arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> > > > > > > > > > > 7 files changed, 576 insertions(+)
> > > > > > > > > > > create mode 100644 arch/riscv/kernel/hibernate-asm.S
> > > > > > > > > > > create mode 100644 arch/riscv/kernel/hibernate.c
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > > > > > > index e2b656043abf..4555848a817f 100644
> > > > > > > > > > > --- a/arch/riscv/Kconfig
> > > > > > > > > > > +++ b/arch/riscv/Kconfig
> > > > > > > > > > > @@ -690,6 +690,13 @@ menu "Power management options"
> > > > > > > > > > >
> > > > > > > > > > > source "kernel/power/Kconfig"
> > > > > > > > > > >
> > > > > > > > > > > +config ARCH_HIBERNATION_POSSIBLE
> > > > > > > > > > > + def_bool y
> > > > > > > > > > > +
> > > > > > > > > > > +config ARCH_HIBERNATION_HEADER
> > > > > > > > > > > + def_bool y
> > > > > > > > > > > + depends on HIBERNATION
> > > > > > > > > >
> > > > > > > > > > nit: I think this can be simplified as def_bool HIBERNATION
> > > > > > > > > good suggestion. will change it.
> > > > > > > > > >
> > > > > > > > > > > +
> > > > > > > > > > > endmenu # "Power management options"
> > > > > > > > > > >
> > > > > > > > > > > menu "CPU Power Management"
> > > > > > > > > > > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > > > > > > > > > > index 727a97735493..68c46c0e0ea8 100644
> > > > > > > > > > > --- a/arch/riscv/include/asm/assembler.h
> > > > > > > > > > > +++ b/arch/riscv/include/asm/assembler.h
> > > > > > > > > > > @@ -59,4 +59,24 @@
> > > > > > > > > > > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > > > > > > > > > > .endm
> > > > > > > > > > >
> > > > > > > > > > > +/*
> > > > > > > > > > > + * copy_page - copy 1 page (4KB) of data from source to destination
> > > > > > > > > > > + * @a0 - destination
> > > > > > > > > > > + * @a1 - source
> > > > > > > > > > > + */
> > > > > > > > > > > + .macro copy_page a0, a1
> > > > > > > > > > > + lui a2, 0x1
> > > > > > > > > > > + add a2, a2, a0
> > > > > > > > > > > +1 :
> > > > > > > > > > ^ please remove this space
> > > > > > > > > can't remove it otherwise checkpatch will throws ERROR: spaces required around that ':'
> > > > > > > >
> > > > > > > > Oh, right, labels in macros have this requirement.
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > + REG_L t0, 0(a1)
> > > > > > > > > > > + REG_L t1, SZREG(a1)
> > > > > > > > > > > +
> > > > > > > > > > > + REG_S t0, 0(a0)
> > > > > > > > > > > + REG_S t1, SZREG(a0)
> > > > > > > > > > > +
> > > > > > > > > > > + addi a0, a0, 2 * SZREG
> > > > > > > > > > > + addi a1, a1, 2 * SZREG
> > > > > > > > > > > + bne a2, a0, 1b
> > > > > > > > > > > + .endm
> > > > > > > > > > > +
> > > > > > > > > > > #endif /* __ASM_ASSEMBLER_H */
> > > > > > > > > > > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > > > > > > > > > > index 75419c5ca272..3362da56a9d8 100644
> > > > > > > > > > > --- a/arch/riscv/include/asm/suspend.h
> > > > > > > > > > > +++ b/arch/riscv/include/asm/suspend.h
> > > > > > > > > > > @@ -21,6 +21,11 @@ struct suspend_context {
> > > > > > > > > > > #endif
> > > > > > > > > > > };
> > > > > > > > > > >
> > > > > > > > > > > +/*
> > > > > > > > > > > + * Used by hibernation core and cleared during resume sequence
> > > > > > > > > > > + */
> > > > > > > > > > > +extern int in_suspend;
> > > > > > > > > > > +
> > > > > > > > > > > /* Low-level CPU suspend entry function */
> > > > > > > > > > > int __cpu_suspend_enter(struct suspend_context *context);
> > > > > > > > > > >
> > > > > > > > > > > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> > > > > > > > > > > /* Used to save and restore the csr */
> > > > > > > > > > > void suspend_save_csrs(struct suspend_context *context);
> > > > > > > > > > > void suspend_restore_csrs(struct suspend_context *context);
> > > > > > > > > > > +
> > > > > > > > > > > +/* Low-level API to support hibernation */
> > > > > > > > > > > +int swsusp_arch_suspend(void);
> > > > > > > > > > > +int swsusp_arch_resume(void);
> > > > > > > > > > > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > > > > > > > > > > +int arch_hibernation_header_restore(void *addr);
> > > > > > > > > > > +int __hibernate_cpu_resume(void);
> > > > > > > > > > > +
> > > > > > > > > > > +/* Used to resume on the CPU we hibernated on */
> > > > > > > > > > > +int hibernate_resume_nonboot_cpu_disable(void);
> > > > > > > > > > > +
> > > > > > > > > > > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > > > > > > > > > > + unsigned long cpu_resume);
> > > > > > > > > > > +asmlinkage int hibernate_core_restore_code(void);
> > > > > > > > > > > #endif
> > > > > > > > > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > > > > > > > > index 4cf303a779ab..daab341d55e4 100644
> > > > > > > > > > > --- a/arch/riscv/kernel/Makefile
> > > > > > > > > > > +++ b/arch/riscv/kernel/Makefile
> > > > > > > > > > > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> > > > > > > > > > > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> > > > > > > > > > >
> > > > > > > > > > > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> > > > > > > > > > > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> > > > > > > > > > >
> > > > > > > > > > > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> > > > > > > > > > > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > > > > > > > > > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > > > > > > > > > index df9444397908..d6a75aac1d27 100644
> > > > > > > > > > > --- a/arch/riscv/kernel/asm-offsets.c
> > > > > > > > > > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > > > > > > > > > @@ -9,6 +9,7 @@
> > > > > > > > > > > #include <linux/kbuild.h>
> > > > > > > > > > > #include <linux/mm.h>
> > > > > > > > > > > #include <linux/sched.h>
> > > > > > > > > > > +#include <linux/suspend.h>
> > > > > > > > > > > #include <asm/kvm_host.h>
> > > > > > > > > > > #include <asm/thread_info.h>
> > > > > > > > > > > #include <asm/ptrace.h>
> > > > > > > > > > > @@ -116,6 +117,10 @@ void asm_offsets(void)
> > > > > > > > > > >
> > > > > > > > > > > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> > > > > > > > > > >
> > > > > > > > > > > + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > > > > > > > > > > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > > > > > > > > > > + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > > > > > > > > > > +
> > > > > > > > > > > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> > > > > > > > > > > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> > > > > > > > > > > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > > > > > > > > > > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > > > > > > > > > > new file mode 100644
> > > > > > > > > > > index 000000000000..846affe4dced
> > > > > > > > > > > --- /dev/null
> > > > > > > > > > > +++ b/arch/riscv/kernel/hibernate-asm.S
> > > > > > > > > > > @@ -0,0 +1,77 @@
> > > > > > > > > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > > > > > > > > +/*
> > > > > > > > > > > + * Hibernation low level support for RISCV.
> > > > > > > > > > > + *
> > > > > > > > > > > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > > > > > > > > > > + *
> > > > > > > > > > > + * Author: Jee Heng Sia <[email protected]>
> > > > > > > > > > > + */
> > > > > > > > > > > +
> > > > > > > > > > > +#include <asm/asm.h>
> > > > > > > > > > > +#include <asm/asm-offsets.h>
> > > > > > > > > > > +#include <asm/assembler.h>
> > > > > > > > > > > +#include <asm/csr.h>
> > > > > > > > > > > +
> > > > > > > > > > > +#include <linux/linkage.h>
> > > > > > > > > > > +
> > > > > > > > > > > +/*
> > > > > > > > > > > + * int __hibernate_cpu_resume(void)
> > > > > > > > > > > + * Switch back to the hibernated image's page table prior to restoring the CPU
> > > > > > > > > > > + * context.
> > > > > > > > > > > + *
> > > > > > > > > > > + * Always returns 0
> > > > > > > > > > > + */
> > > > > > > > > > > +ENTRY(__hibernate_cpu_resume)
> > > > > > > > > > > + /* switch to hibernated image's page table. */
> > > > > > > > > > > + csrw CSR_SATP, s0
> > > > > > > > > > > + sfence.vma
> > > > > > > > > > > +
> > > > > > > > > > > + REG_L a0, hibernate_cpu_context
> > > > > > > > > > > +
> > > > > > > > > > > + restore_csr
> > > > > > > > > > > + restore_reg
> > > > > > > > > > > +
> > > > > > > > > > > + /* Return zero value. */
> > > > > > > > > > > + add a0, zero, zero
> > > > > > > > > >
> > > > > > > > > > nit: mv a0, zero
> > > > > > > > > sure
> > > > > > > > > >
> > > > > > > > > > > +
> > > > > > > > > > > + ret
> > > > > > > > > > > +END(__hibernate_cpu_resume)
> > > > > > > > > > > +
> > > > > > > > > > > +/*
> > > > > > > > > > > + * Prepare to restore the image.
> > > > > > > > > > > + * a0: satp of saved page tables.
> > > > > > > > > > > + * a1: satp of temporary page tables.
> > > > > > > > > > > + * a2: cpu_resume.
> > > > > > > > > > > + */
> > > > > > > > > > > +ENTRY(hibernate_restore_image)
> > > > > > > > > > > + mv s0, a0
> > > > > > > > > > > + mv s1, a1
> > > > > > > > > > > + mv s2, a2
> > > > > > > > > > > + REG_L s4, restore_pblist
> > > > > > > > > > > + REG_L a1, relocated_restore_code
> > > > > > > > > > > +
> > > > > > > > > > > + jalr a1
> > > > > > > > > > > +END(hibernate_restore_image)
> > > > > > > > > > > +
> > > > > > > > > > > +/*
> > > > > > > > > > > + * The below code will be executed from a 'safe' page.
> > > > > > > > > > > + * It first switches to the temporary page table, then starts to copy the pages
> > > > > > > > > > > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> > > > > > > > > > > + * to restore the CPU context.
> > > > > > > > > > > + */
> > > > > > > > > > > +ENTRY(hibernate_core_restore_code)
> > > > > > > > > > > + /* switch to temp page table. */
> > > > > > > > > > > + csrw satp, s1
> > > > > > > > > > > + sfence.vma
> > > > > > > > > > > +.Lcopy:
> > > > > > > > > > > + /* The below code will restore the hibernated image. */
> > > > > > > > > > > + REG_L a1, HIBERN_PBE_ADDR(s4)
> > > > > > > > > > > + REG_L a0, HIBERN_PBE_ORIG(s4)
> > > > > > > > > >
> > > > > > > > > > Are we sure restore_pblist will never be NULL?
> > > > > > > > > restore_pblist is a link-list, it will be null during initialization or during page clean up by hibernation core. During the initial
> > > > > > resume
> > > > > > > > process, the hibernation core will check the header and load the pages. If everything works correctly, the page will be linked
> > to
> > > > the
> > > > > > > > restore_pblist and then invoke swsusp_arch_resume() else hibernation core will throws error and failed to resume from
> > the
> > > > > > > > hibernated image.
> > > > > > > >
> > > > > > > > I know restore_pblist is a linked-list and this doesn't answer the
> > > > > > > > question. The comment above restore_pblist says
> > > > > > > >
> > > > > > > > /*
> > > > > > > > * List of PBEs needed for restoring the pages that were allocated before
> > > > > > > > * the suspend and included in the suspend image, but have also been
> > > > > > > > * allocated by the "resume" kernel, so their contents cannot be written
> > > > > > > > * directly to their "original" page frames.
> > > > > > > > */
> > > > > > > >
> > > > > > > > which implies the pages that end up on this list are "special". My
> > > > > > > > question is whether or not we're guaranteed to have at least one
> > > > > > > > of these special pages. If not, we shouldn't assume s4 is non-null.
> > > > > > > > If so, then a comment stating why that's guaranteed would be nice.
> > > > > > > The restore_pblist will not be null otherwise swsusp_arch_resume wouldn't get invoked. you can find how the link-list are
> > link
> > > > and
> > > > > > how it checks against validity at https://elixir.bootlin.com/linux/v6.2-rc8/source/kernel/power/snapshot.c . " A comment
> > stating
> > > > why
> > > > > > that's guaranteed would be nice" ? Hmm, perhaps this is out of my scope but I do believe in the page validity checking in the
> > link I
> > > > > > shared.
> > > > > >
> > > > > > Sorry, but pointing to an entire source file (one that I've obviously
> > > > > > already looked at, since I quoted a comment from it...) is not helpful.
> > > > > > I don't see where restore_pblist is being checked before
> > > > > > swsusp_arch_resume() is issued (from its callsite in hibernate.c).
> > > > > Sure, below shows the hibernation flow for your reference. The link-list creation and checking found at:
> > > > https://elixir.bootlin.com/linux/v6.2/source/kernel/power/snapshot.c#L2576
> > > > > software_resume()
> > > > > load_image_and_restore()
> > > > > swsusp_read()
> > > > > load_image()
> > > > > snapshot_write_next()
> > > > > get_buffer() <-- This is the function checks and links the pages to the restore_pblist
> > > >
> > > > Yup, I've read this path, including get_buffer(), where I saw that
> > > > get_buffer() can return an address without allocating a PBE. Where is the
> > > > check that restore_pblist isn't NULL, i.e. we see that at least one PBE
> > > > has been allocated by get_buffer(), before we call swsusp_arch_resume()?
> > > >
> > > > Or, is known that at least one or more pages match the criteria pointed
> > > > out in the comment below (copied from get_buffer())?
> > > >
> > > > /*
> > > > * The "original" page frame has not been allocated and we have to
> > > > * use a "safe" page frame to store the loaded page.
> > > > */
> > > >
> > > > If so, then which ones? And where does it state that?
> > > Let's look at the below pseudocode and hope it clear your doubt. restore_pblist depends on safe_page_list and pbe and both
> > pointers are checked. I couldn't find from where the restore_pblist will be null..
> > > //Pseudocode to illustrate the image loading
> > > initialize restore_pblist to null;
> > > initialize safe_pages_list to null;
> > > Allocate safe page list, return error if failed;
> > > load image;
> > > loop: Create pbe chain, return error if failed;
> >
> > This loop pseudocode is incomplete. It's
> >
> > loop:
> > if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> > return page_address(page);
> > Create pbe chain, return error if failed;
> > ...
> >
> > which I pointed out explicitly in my last reply. Also, as I asked in my
> > last reply (and have been asking four times now, albeit less explicitly
> > the first two times), how do we know at least one PBE will be linked?
> 1 PBE correspond to 1 page, you shouldn't expect only 1 page is saved.

I know PBEs correspond to pages. *Why* should I not expect only one page
is saved? Or, more importantly, why should I expect more than zero pages
are saved?

Convincing answers might be because we *always* put the restore code in
pages which get added to the PBE list or that the original page tables
*always* get put in pages which get added to the PBE list. It's not very
convincing to simply *assume* that at least one random page will always
meet the PBE list criteria.

> Hibernation core will do the calculation. If the PBEs (restore_pblist) linked successfully, the hibernated image will be restore else normal boot will take place.
> > Or, even more specifically this time, where is the proof that for each
> > hibernation resume, there exists some page such that
> > !swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?
> forbidden_pages and free_pages are not contributed to the restore_pblist (as you already aware from the code). Infact, the forbidden_pages and free_pages are not save into the disk.

Exactly, so those pages are *not* going to contribute to the greater than
zero pages. What I've been asking for, from the beginning, is to know
which page(s) are known to *always* contribute to the list. Or, IOW, how
do you know the PBE list isn't empty, a.k.a restore_pblist isn't NULL?

Thanks,
drew

> >
> > Thanks,
> > drew
> >
> > > assign orig_addr and safe_page to pbe;
> > > link pbe to restore_pblist;
> > > return pbe to handle->buffer;
> > > check handle->buffer;
> > > goto loop if no error else return with error;
> > > >
> > > > Thanks,
> > > > drew
> > > >
> > > >
> > > > > hibernation_restore()
> > > > > resume_target_kernel()
> > > > > swsusp_arch_resume()
> > > > > >
> > > > > > Thanks,
> > > > > > drew

2023-02-27 20:31:47

by Alexandre Ghiti

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk


On 2/27/23 04:11, JeeHeng Sia wrote:
>
>> -----Original Message-----
>> From: Alexandre Ghiti <[email protected]>
>> Sent: Friday, 24 February, 2023 8:29 PM
>> To: JeeHeng Sia <[email protected]>; [email protected]; [email protected]; [email protected]
>> Cc: [email protected]; [email protected]; Leyfoon Tan <[email protected]>; Mason Huo
>> <[email protected]>
>> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>>
>> On 2/21/23 03:35, Sia Jee Heng wrote:
>>> Low level Arch functions were created to support hibernation.
>>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
>>> cpu state onto the stack, then calling swsusp_save() to save the memory
>>> image.
>>>
>>> Arch specific hibernation header is implemented and is utilized by the
>>> arch_hibernation_header_restore() and arch_hibernation_header_save()
>>> functions. The arch specific hibernation header consists of satp, hartid,
>>> and the cpu_resume address. The kernel built version is also need to be
>>> saved into the hibernation image header to making sure only the same
>>> kernel is restore when resume.
>>>
>>> swsusp_arch_resume() creates a temporary page table that covering only
>>> the linear map. It copies the restore code to a 'safe' page, then start
>>> to restore the memory image. Once completed, it restores the original
>>> kernel's page table. It then calls into __hibernate_cpu_resume()
>>> to restore the CPU context. Finally, it follows the normal hibernation
>>> path back to the hibernation core.
>>>
>>> To enable hibernation/suspend to disk into RISCV, the below config
>>> need to be enabled:
>>> - CONFIG_ARCH_HIBERNATION_HEADER
>>> - CONFIG_ARCH_HIBERNATION_POSSIBLE
>>>
>>> Signed-off-by: Sia Jee Heng <[email protected]>
>>> Reviewed-by: Ley Foon Tan <[email protected]>
>>> Reviewed-by: Mason Huo <[email protected]>
>>> ---
>>> arch/riscv/Kconfig | 7 +
>>> arch/riscv/include/asm/assembler.h | 20 ++
>>> arch/riscv/include/asm/suspend.h | 19 ++
>>> arch/riscv/kernel/Makefile | 1 +
>>> arch/riscv/kernel/asm-offsets.c | 5 +
>>> arch/riscv/kernel/hibernate-asm.S | 77 +++++
>>> arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
>>> 7 files changed, 576 insertions(+)
>>> create mode 100644 arch/riscv/kernel/hibernate-asm.S
>>> create mode 100644 arch/riscv/kernel/hibernate.c
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index e2b656043abf..4555848a817f 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -690,6 +690,13 @@ menu "Power management options"
>>>
>>> source "kernel/power/Kconfig"
>>>
>>> +config ARCH_HIBERNATION_POSSIBLE
>>> + def_bool y
>>> +
>>> +config ARCH_HIBERNATION_HEADER
>>> + def_bool y
>>> + depends on HIBERNATION
>>> +
>>> endmenu # "Power management options"
>>>
>>> menu "CPU Power Management"
>>> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
>>> index 727a97735493..68c46c0e0ea8 100644
>>> --- a/arch/riscv/include/asm/assembler.h
>>> +++ b/arch/riscv/include/asm/assembler.h
>>> @@ -59,4 +59,24 @@
>>> REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
>>> .endm
>>>
>>> +/*
>>> + * copy_page - copy 1 page (4KB) of data from source to destination
>>> + * @a0 - destination
>>> + * @a1 - source
>>> + */
>>> + .macro copy_page a0, a1
>>> + lui a2, 0x1
>>> + add a2, a2, a0
>>> +1 :
>>> + REG_L t0, 0(a1)
>>> + REG_L t1, SZREG(a1)
>>> +
>>> + REG_S t0, 0(a0)
>>> + REG_S t1, SZREG(a0)
>>> +
>>> + addi a0, a0, 2 * SZREG
>>> + addi a1, a1, 2 * SZREG
>>> + bne a2, a0, 1b
>>> + .endm
>>> +
>>> #endif /* __ASM_ASSEMBLER_H */
>>> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
>>> index 75419c5ca272..3362da56a9d8 100644
>>> --- a/arch/riscv/include/asm/suspend.h
>>> +++ b/arch/riscv/include/asm/suspend.h
>>> @@ -21,6 +21,11 @@ struct suspend_context {
>>> #endif
>>> };
>>>
>>> +/*
>>> + * Used by hibernation core and cleared during resume sequence
>>> + */
>>> +extern int in_suspend;
>>> +
>>> /* Low-level CPU suspend entry function */
>>> int __cpu_suspend_enter(struct suspend_context *context);
>>>
>>> @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
>>> /* Used to save and restore the csr */
>>> void suspend_save_csrs(struct suspend_context *context);
>>> void suspend_restore_csrs(struct suspend_context *context);
>>> +
>>> +/* Low-level API to support hibernation */
>>> +int swsusp_arch_suspend(void);
>>> +int swsusp_arch_resume(void);
>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
>>> +int arch_hibernation_header_restore(void *addr);
>>> +int __hibernate_cpu_resume(void);
>>> +
>>> +/* Used to resume on the CPU we hibernated on */
>>> +int hibernate_resume_nonboot_cpu_disable(void);
>>> +
>>> +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
>>> + unsigned long cpu_resume);
>>> +asmlinkage int hibernate_core_restore_code(void);
>>> #endif
>>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
>>> index 4cf303a779ab..daab341d55e4 100644
>>> --- a/arch/riscv/kernel/Makefile
>>> +++ b/arch/riscv/kernel/Makefile
>>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
>>> obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
>>>
>>> obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
>>> +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
>>>
>>> obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
>>> obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
>>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
>>> index df9444397908..d6a75aac1d27 100644
>>> --- a/arch/riscv/kernel/asm-offsets.c
>>> +++ b/arch/riscv/kernel/asm-offsets.c
>>> @@ -9,6 +9,7 @@
>>> #include <linux/kbuild.h>
>>> #include <linux/mm.h>
>>> #include <linux/sched.h>
>>> +#include <linux/suspend.h>
>>> #include <asm/kvm_host.h>
>>> #include <asm/thread_info.h>
>>> #include <asm/ptrace.h>
>>> @@ -116,6 +117,10 @@ void asm_offsets(void)
>>>
>>> OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
>>>
>>> + OFFSET(HIBERN_PBE_ADDR, pbe, address);
>>> + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
>>> + OFFSET(HIBERN_PBE_NEXT, pbe, next);
>>> +
>>> OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
>>> OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
>>> OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
>>> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
>>> new file mode 100644
>>> index 000000000000..846affe4dced
>>> --- /dev/null
>>> +++ b/arch/riscv/kernel/hibernate-asm.S
>>> @@ -0,0 +1,77 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +/*
>>> + * Hibernation low level support for RISCV.
>>> + *
>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>> + *
>>> + * Author: Jee Heng Sia <[email protected]>
>>> + */
>>> +
>>> +#include <asm/asm.h>
>>> +#include <asm/asm-offsets.h>
>>> +#include <asm/assembler.h>
>>> +#include <asm/csr.h>
>>> +
>>> +#include <linux/linkage.h>
>>> +
>>> +/*
>>> + * int __hibernate_cpu_resume(void)
>>> + * Switch back to the hibernated image's page table prior to restoring the CPU
>>> + * context.
>>> + *
>>> + * Always returns 0
>>> + */
>>> +ENTRY(__hibernate_cpu_resume)
>>> + /* switch to hibernated image's page table. */
>>> + csrw CSR_SATP, s0
>>> + sfence.vma
>>> +
>>> + REG_L a0, hibernate_cpu_context
>>> +
>>> + restore_csr
>>> + restore_reg
>>> +
>>> + /* Return zero value. */
>>> + add a0, zero, zero
>>> +
>>> + ret
>>> +END(__hibernate_cpu_resume)
>>> +
>>> +/*
>>> + * Prepare to restore the image.
>>> + * a0: satp of saved page tables.
>>> + * a1: satp of temporary page tables.
>>> + * a2: cpu_resume.
>>> + */
>>> +ENTRY(hibernate_restore_image)
>>> + mv s0, a0
>>> + mv s1, a1
>>> + mv s2, a2
>>> + REG_L s4, restore_pblist
>>> + REG_L a1, relocated_restore_code
>>> +
>>> + jalr a1
>>> +END(hibernate_restore_image)
>>> +
>>> +/*
>>> + * The below code will be executed from a 'safe' page.
>>> + * It first switches to the temporary page table, then starts to copy the pages
>>> + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
>>> + * to restore the CPU context.
>>> + */
>>> +ENTRY(hibernate_core_restore_code)
>>> + /* switch to temp page table. */
>>> + csrw satp, s1
>>> + sfence.vma
>>> +.Lcopy:
>>> + /* The below code will restore the hibernated image. */
>>> + REG_L a1, HIBERN_PBE_ADDR(s4)
>>> + REG_L a0, HIBERN_PBE_ORIG(s4)
>>> +
>>> + copy_page a0, a1
>>> +
>>> + REG_L s4, HIBERN_PBE_NEXT(s4)
>>> + bnez s4, .Lcopy
>>> +
>>> + jalr s2
>>> +END(hibernate_core_restore_code)
>>> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
>>> new file mode 100644
>>> index 000000000000..46a2f470db6e
>>> --- /dev/null
>>> +++ b/arch/riscv/kernel/hibernate.c
>>> @@ -0,0 +1,447 @@
>>> +// SPDX-License-Identifier: GPL-2.0-only
>>> +/*
>>> + * Hibernation support for RISCV
>>> + *
>>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
>>> + *
>>> + * Author: Jee Heng Sia <[email protected]>
>>> + */
>>> +
>>> +#include <asm/barrier.h>
>>> +#include <asm/cacheflush.h>
>>> +#include <asm/mmu_context.h>
>>> +#include <asm/page.h>
>>> +#include <asm/pgalloc.h>
>>> +#include <asm/pgtable.h>
>>> +#include <asm/sections.h>
>>> +#include <asm/set_memory.h>
>>> +#include <asm/smp.h>
>>> +#include <asm/suspend.h>
>>> +
>>> +#include <linux/cpu.h>
>>> +#include <linux/memblock.h>
>>> +#include <linux/pm.h>
>>> +#include <linux/sched.h>
>>> +#include <linux/suspend.h>
>>> +#include <linux/utsname.h>
>>> +
>>> +/* The logical cpu number we should resume on, initialised to a non-cpu number. */
>>> +static int sleep_cpu = -EINVAL;
>>> +
>>> +/* Pointer to the temporary resume page table. */
>>> +static pgd_t *resume_pg_dir;
>>> +
>>> +/* CPU context to be saved. */
>>> +struct suspend_context *hibernate_cpu_context;
>>> +EXPORT_SYMBOL_GPL(hibernate_cpu_context);
>>> +
>>> +unsigned long relocated_restore_code;
>>> +EXPORT_SYMBOL_GPL(relocated_restore_code);
>>> +
>>> +/**
>>> + * struct arch_hibernate_hdr_invariants - container to store kernel build version.
>>> + * @uts_version: to save the build number and date so that the we do not resume with
>>> + * a different kernel.
>>> + */
>>> +struct arch_hibernate_hdr_invariants {
>>> + char uts_version[__NEW_UTS_LEN + 1];
>>> +};
>>> +
>>> +/**
>>> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image.
>>> + * @invariants: container to store kernel build version.
>>> + * @hartid: to make sure same boot_cpu executes the hibernate/restore code.
>>> + * @saved_satp: original page table used by the hibernated image.
>>> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
>>> + */
>>> +static struct arch_hibernate_hdr {
>>> + struct arch_hibernate_hdr_invariants invariants;
>>> + unsigned long hartid;
>>> + unsigned long saved_satp;
>>> + unsigned long restore_cpu_addr;
>>> +} resume_hdr;
>>> +
>>> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
>>> +{
>>> + memset(i, 0, sizeof(*i));
>>> + memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
>>> +}
>>> +
>>> +/*
>>> + * Check if the given pfn is in the 'nosave' section.
>>> + */
>>> +int pfn_is_nosave(unsigned long pfn)
>>> +{
>>> + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
>>> + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
>>> +
>>> + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
>>> +}
>>> +
>>> +void notrace save_processor_state(void)
>>> +{
>>> + WARN_ON(num_online_cpus() != 1);
>>> +}
>>> +
>>> +void notrace restore_processor_state(void)
>>> +{
>>> +}
>>> +
>>> +/*
>>> + * Helper parameters need to be saved to the hibernation image header.
>>> + */
>>> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
>>> +{
>>> + struct arch_hibernate_hdr *hdr = addr;
>>> +
>>> + if (max_size < sizeof(*hdr))
>>> + return -EOVERFLOW;
>>> +
>>> + arch_hdr_invariants(&hdr->invariants);
>>> +
>>> + hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
>>> + hdr->saved_satp = csr_read(CSR_SATP);
>>> + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
>>> +
>>> + return 0;
>>> +}
>>> +EXPORT_SYMBOL_GPL(arch_hibernation_header_save);
>>> +
>>> +/*
>>> + * Retrieve the helper parameters from the hibernation image header.
>>> + */
>>> +int arch_hibernation_header_restore(void *addr)
>>> +{
>>> + struct arch_hibernate_hdr_invariants invariants;
>>> + struct arch_hibernate_hdr *hdr = addr;
>>> + int ret = 0;
>>> +
>>> + arch_hdr_invariants(&invariants);
>>> +
>>> + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
>>> + pr_crit("Hibernate image not generated by this kernel!\n");
>>> + return -EINVAL;
>>> + }
>>> +
>>> + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
>>> + if (sleep_cpu < 0) {
>>> + pr_crit("Hibernated on a CPU not known to this kernel!\n");
>>> + sleep_cpu = -EINVAL;
>>> + return -EINVAL;
>>> + }
>>> +
>>> +#ifdef CONFIG_SMP
>>> + ret = bringup_hibernate_cpu(sleep_cpu);
>>> + if (ret) {
>>> + sleep_cpu = -EINVAL;
>>> + return ret;
>>> + }
>>> +#endif
>>> + resume_hdr = *hdr;
>>> +
>>> + return ret;
>>> +}
>>> +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore);
>>> +
>>> +int swsusp_arch_suspend(void)
>>> +{
>>> + int ret = 0;
>>> +
>>> + if (__cpu_suspend_enter(hibernate_cpu_context)) {
>>> + sleep_cpu = smp_processor_id();
>>> + suspend_save_csrs(hibernate_cpu_context);
>>> + ret = swsusp_save();
>>> + } else {
>>> + suspend_restore_csrs(hibernate_cpu_context);
>>> + flush_tlb_all();
>>> + flush_icache_all();
>>> +
>>> + /*
>>> + * Tell the hibernation core that we've just restored the memory.
>>> + */
>>> + in_suspend = 0;
>>> + sleep_cpu = -EINVAL;
>>> + }
>>> +
>>> + return ret;
>>> +}
>>> +
>>> +static unsigned long _temp_pgtable_map_pte(pte_t *dst_ptep, pte_t *src_ptep,
>>> + unsigned long addr, pgprot_t prot)
>>> +{
>>> + pte_t pte = READ_ONCE(*src_ptep);
>>> +
>>> + if (pte_present(pte))
>>> + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot)));
>>> +
>>> + return 0;
>>> +}
>>
>> I don't see the need for this function
> Sure, can remove it.
>>
>>> +
>>> +static unsigned long temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp,
>>> + unsigned long start, unsigned long end,
>>> + pgprot_t prot)
>>> +{
>>> + unsigned long addr = start;
>>> + pte_t *src_ptep;
>>> + pte_t *dst_ptep;
>>> +
>>> + if (pmd_none(READ_ONCE(*dst_pmdp))) {
>>> + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
>>> + if (!dst_ptep)
>>> + return -ENOMEM;
>>> +
>>> + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep);
>>> + }
>>> +
>>> + dst_ptep = pte_offset_kernel(dst_pmdp, start);
>>> + src_ptep = pte_offset_kernel(src_pmdp, start);
>>> +
>>> + do {
>>> + _temp_pgtable_map_pte(dst_ptep, src_ptep, addr, prot);
>>> + } while (dst_ptep++, src_ptep++, addr += PAGE_SIZE, addr < end);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp,
>>> + unsigned long start, unsigned long end,
>>> + pgprot_t prot)
>>> +{
>>> + unsigned long addr = start;
>>> + unsigned long next;
>>> + unsigned long ret;
>>> + pmd_t *src_pmdp;
>>> + pmd_t *dst_pmdp;
>>> +
>>> + if (pud_none(READ_ONCE(*dst_pudp))) {
>>> + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
>>> + if (!dst_pmdp)
>>> + return -ENOMEM;
>>> +
>>> + pud_populate(NULL, dst_pudp, dst_pmdp);
>>> + }
>>> +
>>> + dst_pmdp = pmd_offset(dst_pudp, start);
>>> + src_pmdp = pmd_offset(src_pudp, start);
>>> +
>>> + do {
>>> + pmd_t pmd = READ_ONCE(*src_pmdp);
>>> +
>>> + next = pmd_addr_end(addr, end);
>>> +
>>> + if (pmd_none(pmd))
>>> + continue;
>>> +
>>> + if (pmd_leaf(pmd)) {
>>> + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot)));
>>> + } else {
>>> + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, addr, next, prot);
>>> + if (ret)
>>> + return -ENOMEM;
>>> + }
>>> + } while (dst_pmdp++, src_pmdp++, addr = next, addr != end);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp,
>>> + unsigned long start,
>>> + unsigned long end, pgprot_t prot)
>>> +{
>>> + unsigned long addr = start;
>>> + unsigned long next;
>>> + unsigned long ret;
>>> + pud_t *dst_pudp;
>>> + pud_t *src_pudp;
>>> +
>>> + if (p4d_none(READ_ONCE(*dst_p4dp))) {
>>> + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
>>> + if (!dst_pudp)
>>> + return -ENOMEM;
>>> +
>>> + p4d_populate(NULL, dst_p4dp, dst_pudp);
>>> + }
>>> +
>>> + dst_pudp = pud_offset(dst_p4dp, start);
>>> + src_pudp = pud_offset(src_p4dp, start);
>>> +
>>> + do {
>>> + pud_t pud = READ_ONCE(*src_pudp);
>>> +
>>> + next = pud_addr_end(addr, end);
>>> +
>>> + if (pud_none(pud))
>>> + continue;
>>> +
>>> + if (pud_leaf(pud)) {
>>> + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot)));
>>> + } else {
>>> + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, addr, next, prot);
>>> + if (ret)
>>> + return -ENOMEM;
>>> + }
>>> + } while (dst_pudp++, src_pudp++, addr = next, addr != end);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp,
>>> + unsigned long start, unsigned long end,
>>> + pgprot_t prot)
>>> +{
>>> + unsigned long addr = start;
>>
>> Nit: you don't need the addr variable, you can rename start into addr
>> and directly work with it.
> sure.
>>
>>> + unsigned long next;
>>> + unsigned long ret;
>>> + p4d_t *dst_p4dp;
>>> + p4d_t *src_p4dp;
>>> +
>>> + if (pgd_none(READ_ONCE(*dst_pgdp))) {
>>> + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC);
>>> + if (!dst_p4dp)
>>> + return -ENOMEM;
>>> +
>>> + pgd_populate(NULL, dst_pgdp, dst_p4dp);
>>> + }
>>> +
>>> + dst_p4dp = p4d_offset(dst_pgdp, start);
>>> + src_p4dp = p4d_offset(src_pgdp, start);
>>> +
>>> + do {
>>> + p4d_t p4d = READ_ONCE(*src_p4dp);
>>> +
>>> + next = p4d_addr_end(addr, end);
>>> +
>>> + if (p4d_none(READ_ONCE(*src_p4dp)))
>>
>> You should use p4d here: p4d_none(p4d)
> sure
>>
>>> + continue;
>>> +
>>> + if (p4d_leaf(p4d)) {
>>> + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot)));
>>
>> The "| pgprot_val(prot)" happens to work because PAGE_KERNEL will add
>> the PAGE_WRITE bit: I'd rather make it more clear by explicitly add
>> PAGE_WRITE.
> sure, this can be done.
>>
>>> + } else {
>>> + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, addr, next, prot);
>>> + if (ret)
>>> + return -ENOMEM;
>>> + }
>>> + } while (dst_p4dp++, src_p4dp++, addr = next, addr != end);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp)
>>> +{
>>> + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn);
>>> + unsigned long addr = PAGE_OFFSET;
>>> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
>>> + pgd_t *src_pgdp = pgd_offset_k(addr);
>>> + unsigned long next;
>>> +
>>> + do {
>>> + next = pgd_addr_end(addr, end);
>>> + if (pgd_none(READ_ONCE(*src_pgdp)))
>>> + continue;
>>> +
>>
>> We added the pgd_leaf test in kernel_page_present, let's add it here too.
> sure.
>>
>>> + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, next, PAGE_KERNEL))
>>> + return -ENOMEM;
>>> + } while (dst_pgdp++, src_pgdp++, addr = next, addr != end);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +static unsigned long temp_pgtable_text_mapping(pgd_t *pgdp, unsigned long addr)
>>> +{
>>> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
>>> + pgd_t *src_pgdp = pgd_offset_k(addr);
>>> +
>>> + if (pgd_none(READ_ONCE(*src_pgdp)))
>>> + return -EFAULT;
>>> +
>>> + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, addr, PAGE_KERNEL_EXEC))
>>> + return -ENOMEM;
>>> +
>>> + return 0;
>>> +}
>>
>> Ok so if we fall into a huge mapping, you add the exec permission to the
>> whole range, which could easily be 1GB. I think that either we can avoid
>> this step by mapping the whole linear mapping as executable, or we
>> actually use another pgd entry for this page that is not in the linear
>> mapping. The latter seems cleaner, what do you think?
> we can map the whole linear address to writable & executable, by doing this, we can avoid the remapping at the linear map again.
> we still need to use the same pgd entry for the non-linear mapping just like how it did at the swapper_pg_dir (linear and non-linear addr are within the range supported by the pgd).


Not sure I understand your last sentence


>>
>>> +
>>> +static unsigned long relocate_restore_code(void)
>>> +{
>>> + unsigned long ret;
>>> + void *page = (void *)get_safe_page(GFP_ATOMIC);
>>> +
>>> + if (!page)
>>> + return -ENOMEM;
>>> +
>>> + copy_page(page, hibernate_core_restore_code);
>>> +
>>> + /* Make the page containing the relocated code executable. */
>>> + set_memory_x((unsigned long)page, 1);
>>> +
>>> + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)page);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + return (unsigned long)page;
>>> +}
>>> +
>>> +int swsusp_arch_resume(void)
>>> +{
>>> + unsigned long ret;
>>> +
>>> + /*
>>> + * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
>>> + * we don't need to free it here.
>>> + */
>>> + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
>>> + if (!resume_pg_dir)
>>> + return -ENOMEM;
>>> +
>>> + /*
>>> + * The pages need to be writable when restoring the image.
>>> + * Create a second copy of page table just for the linear map.
>>> + * Use this temporary page table to restore the image.
>>> + */
>>> + ret = temp_pgtable_mapping(resume_pg_dir);
>>> + if (ret)
>>> + return (int)ret;
>>
>> The temp_pgtable* functions should return an int to avoid this cast.


Did you note this comment too?


>>
>>
>>> +
>>> + /* Move the restore code to a new page so that it doesn't get overwritten by itself. */
>>> + relocated_restore_code = relocate_restore_code();
>>> + if (relocated_restore_code == -ENOMEM)
>>> + return -ENOMEM;
>>> +
>>> + /*
>>> + * Map the __hibernate_cpu_resume() address to the temporary page table so that the
>>> + * restore code can jumps to it after finished restore the image. The next execution
>>> + * code doesn't find itself in a different address space after switching over to the
>>> + * original page table used by the hibernated image.
>>> + */
>>> + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr);
>>> + if (ret)
>>> + return ret;
>>> +
>>> + hibernate_restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
>>> + resume_hdr.restore_cpu_addr);
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +#ifdef CONFIG_PM_SLEEP_SMP
>>> +int hibernate_resume_nonboot_cpu_disable(void)
>>> +{
>>> + if (sleep_cpu < 0) {
>>> + pr_err("Failing to resume from hibernate on an unknown CPU\n");
>>> + return -ENODEV;
>>> + }
>>> +
>>> + return freeze_secondary_cpus(sleep_cpu);
>>> +}
>>> +#endif
>>> +
>>> +static int __init riscv_hibernate_init(void)
>>> +{
>>> + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), GFP_KERNEL);
>>> +
>>> + if (WARN_ON(!hibernate_cpu_context))
>>> + return -ENOMEM;
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +early_initcall(riscv_hibernate_init);
>>
>> Overall, it is now nicer with the the proper page table walk: but we can
>> now see that the code is exactly the same as arm64, what prevents us
>> from merging both somewhere in mm/?
> 1. low level page table bit definition not the same
> 2. Need to refactor code for both riscv and arm64
> 3. Need to verify the solution for both riscv and arm64 platforms (need someone who expertise on arm64)
> 4. Might need to extend the function to support other arch
> 5. Overall, it is do-able but the effort to support the above matters are huge.


Too bad, I really see benefits of avoiding code duplication, but that's
up to you.

Thanks,

Alex


>

2023-02-28 01:20:37

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk



> -----Original Message-----
> From: Alexandre Ghiti <[email protected]>
> Sent: Tuesday, 28 February, 2023 4:32 AM
> To: JeeHeng Sia <[email protected]>; [email protected]; [email protected]; [email protected]
> Cc: [email protected]; [email protected]; Leyfoon Tan <[email protected]>; Mason Huo
> <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
>
> On 2/27/23 04:11, JeeHeng Sia wrote:
> >
> >> -----Original Message-----
> >> From: Alexandre Ghiti <[email protected]>
> >> Sent: Friday, 24 February, 2023 8:29 PM
> >> To: JeeHeng Sia <[email protected]>; [email protected]; [email protected]; [email protected]
> >> Cc: [email protected]; [email protected]; Leyfoon Tan <[email protected]>; Mason Huo
> >> <[email protected]>
> >> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >>
> >> On 2/21/23 03:35, Sia Jee Heng wrote:
> >>> Low level Arch functions were created to support hibernation.
> >>> swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> >>> cpu state onto the stack, then calling swsusp_save() to save the memory
> >>> image.
> >>>
> >>> Arch specific hibernation header is implemented and is utilized by the
> >>> arch_hibernation_header_restore() and arch_hibernation_header_save()
> >>> functions. The arch specific hibernation header consists of satp, hartid,
> >>> and the cpu_resume address. The kernel built version is also need to be
> >>> saved into the hibernation image header to making sure only the same
> >>> kernel is restore when resume.
> >>>
> >>> swsusp_arch_resume() creates a temporary page table that covering only
> >>> the linear map. It copies the restore code to a 'safe' page, then start
> >>> to restore the memory image. Once completed, it restores the original
> >>> kernel's page table. It then calls into __hibernate_cpu_resume()
> >>> to restore the CPU context. Finally, it follows the normal hibernation
> >>> path back to the hibernation core.
> >>>
> >>> To enable hibernation/suspend to disk into RISCV, the below config
> >>> need to be enabled:
> >>> - CONFIG_ARCH_HIBERNATION_HEADER
> >>> - CONFIG_ARCH_HIBERNATION_POSSIBLE
> >>>
> >>> Signed-off-by: Sia Jee Heng <[email protected]>
> >>> Reviewed-by: Ley Foon Tan <[email protected]>
> >>> Reviewed-by: Mason Huo <[email protected]>
> >>> ---
> >>> arch/riscv/Kconfig | 7 +
> >>> arch/riscv/include/asm/assembler.h | 20 ++
> >>> arch/riscv/include/asm/suspend.h | 19 ++
> >>> arch/riscv/kernel/Makefile | 1 +
> >>> arch/riscv/kernel/asm-offsets.c | 5 +
> >>> arch/riscv/kernel/hibernate-asm.S | 77 +++++
> >>> arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> >>> 7 files changed, 576 insertions(+)
> >>> create mode 100644 arch/riscv/kernel/hibernate-asm.S
> >>> create mode 100644 arch/riscv/kernel/hibernate.c
> >>>
> >>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> >>> index e2b656043abf..4555848a817f 100644
> >>> --- a/arch/riscv/Kconfig
> >>> +++ b/arch/riscv/Kconfig
> >>> @@ -690,6 +690,13 @@ menu "Power management options"
> >>>
> >>> source "kernel/power/Kconfig"
> >>>
> >>> +config ARCH_HIBERNATION_POSSIBLE
> >>> + def_bool y
> >>> +
> >>> +config ARCH_HIBERNATION_HEADER
> >>> + def_bool y
> >>> + depends on HIBERNATION
> >>> +
> >>> endmenu # "Power management options"
> >>>
> >>> menu "CPU Power Management"
> >>> diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> >>> index 727a97735493..68c46c0e0ea8 100644
> >>> --- a/arch/riscv/include/asm/assembler.h
> >>> +++ b/arch/riscv/include/asm/assembler.h
> >>> @@ -59,4 +59,24 @@
> >>> REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> >>> .endm
> >>>
> >>> +/*
> >>> + * copy_page - copy 1 page (4KB) of data from source to destination
> >>> + * @a0 - destination
> >>> + * @a1 - source
> >>> + */
> >>> + .macro copy_page a0, a1
> >>> + lui a2, 0x1
> >>> + add a2, a2, a0
> >>> +1 :
> >>> + REG_L t0, 0(a1)
> >>> + REG_L t1, SZREG(a1)
> >>> +
> >>> + REG_S t0, 0(a0)
> >>> + REG_S t1, SZREG(a0)
> >>> +
> >>> + addi a0, a0, 2 * SZREG
> >>> + addi a1, a1, 2 * SZREG
> >>> + bne a2, a0, 1b
> >>> + .endm
> >>> +
> >>> #endif /* __ASM_ASSEMBLER_H */
> >>> diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> >>> index 75419c5ca272..3362da56a9d8 100644
> >>> --- a/arch/riscv/include/asm/suspend.h
> >>> +++ b/arch/riscv/include/asm/suspend.h
> >>> @@ -21,6 +21,11 @@ struct suspend_context {
> >>> #endif
> >>> };
> >>>
> >>> +/*
> >>> + * Used by hibernation core and cleared during resume sequence
> >>> + */
> >>> +extern int in_suspend;
> >>> +
> >>> /* Low-level CPU suspend entry function */
> >>> int __cpu_suspend_enter(struct suspend_context *context);
> >>>
> >>> @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> >>> /* Used to save and restore the csr */
> >>> void suspend_save_csrs(struct suspend_context *context);
> >>> void suspend_restore_csrs(struct suspend_context *context);
> >>> +
> >>> +/* Low-level API to support hibernation */
> >>> +int swsusp_arch_suspend(void);
> >>> +int swsusp_arch_resume(void);
> >>> +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> >>> +int arch_hibernation_header_restore(void *addr);
> >>> +int __hibernate_cpu_resume(void);
> >>> +
> >>> +/* Used to resume on the CPU we hibernated on */
> >>> +int hibernate_resume_nonboot_cpu_disable(void);
> >>> +
> >>> +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> >>> + unsigned long cpu_resume);
> >>> +asmlinkage int hibernate_core_restore_code(void);
> >>> #endif
> >>> diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> >>> index 4cf303a779ab..daab341d55e4 100644
> >>> --- a/arch/riscv/kernel/Makefile
> >>> +++ b/arch/riscv/kernel/Makefile
> >>> @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> >>> obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> >>>
> >>> obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> >>> +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> >>>
> >>> obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> >>> obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> >>> diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> >>> index df9444397908..d6a75aac1d27 100644
> >>> --- a/arch/riscv/kernel/asm-offsets.c
> >>> +++ b/arch/riscv/kernel/asm-offsets.c
> >>> @@ -9,6 +9,7 @@
> >>> #include <linux/kbuild.h>
> >>> #include <linux/mm.h>
> >>> #include <linux/sched.h>
> >>> +#include <linux/suspend.h>
> >>> #include <asm/kvm_host.h>
> >>> #include <asm/thread_info.h>
> >>> #include <asm/ptrace.h>
> >>> @@ -116,6 +117,10 @@ void asm_offsets(void)
> >>>
> >>> OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> >>>
> >>> + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> >>> + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> >>> + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> >>> +
> >>> OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> >>> OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> >>> OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> >>> diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> >>> new file mode 100644
> >>> index 000000000000..846affe4dced
> >>> --- /dev/null
> >>> +++ b/arch/riscv/kernel/hibernate-asm.S
> >>> @@ -0,0 +1,77 @@
> >>> +/* SPDX-License-Identifier: GPL-2.0-only */
> >>> +/*
> >>> + * Hibernation low level support for RISCV.
> >>> + *
> >>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> >>> + *
> >>> + * Author: Jee Heng Sia <[email protected]>
> >>> + */
> >>> +
> >>> +#include <asm/asm.h>
> >>> +#include <asm/asm-offsets.h>
> >>> +#include <asm/assembler.h>
> >>> +#include <asm/csr.h>
> >>> +
> >>> +#include <linux/linkage.h>
> >>> +
> >>> +/*
> >>> + * int __hibernate_cpu_resume(void)
> >>> + * Switch back to the hibernated image's page table prior to restoring the CPU
> >>> + * context.
> >>> + *
> >>> + * Always returns 0
> >>> + */
> >>> +ENTRY(__hibernate_cpu_resume)
> >>> + /* switch to hibernated image's page table. */
> >>> + csrw CSR_SATP, s0
> >>> + sfence.vma
> >>> +
> >>> + REG_L a0, hibernate_cpu_context
> >>> +
> >>> + restore_csr
> >>> + restore_reg
> >>> +
> >>> + /* Return zero value. */
> >>> + add a0, zero, zero
> >>> +
> >>> + ret
> >>> +END(__hibernate_cpu_resume)
> >>> +
> >>> +/*
> >>> + * Prepare to restore the image.
> >>> + * a0: satp of saved page tables.
> >>> + * a1: satp of temporary page tables.
> >>> + * a2: cpu_resume.
> >>> + */
> >>> +ENTRY(hibernate_restore_image)
> >>> + mv s0, a0
> >>> + mv s1, a1
> >>> + mv s2, a2
> >>> + REG_L s4, restore_pblist
> >>> + REG_L a1, relocated_restore_code
> >>> +
> >>> + jalr a1
> >>> +END(hibernate_restore_image)
> >>> +
> >>> +/*
> >>> + * The below code will be executed from a 'safe' page.
> >>> + * It first switches to the temporary page table, then starts to copy the pages
> >>> + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> >>> + * to restore the CPU context.
> >>> + */
> >>> +ENTRY(hibernate_core_restore_code)
> >>> + /* switch to temp page table. */
> >>> + csrw satp, s1
> >>> + sfence.vma
> >>> +.Lcopy:
> >>> + /* The below code will restore the hibernated image. */
> >>> + REG_L a1, HIBERN_PBE_ADDR(s4)
> >>> + REG_L a0, HIBERN_PBE_ORIG(s4)
> >>> +
> >>> + copy_page a0, a1
> >>> +
> >>> + REG_L s4, HIBERN_PBE_NEXT(s4)
> >>> + bnez s4, .Lcopy
> >>> +
> >>> + jalr s2
> >>> +END(hibernate_core_restore_code)
> >>> diff --git a/arch/riscv/kernel/hibernate.c b/arch/riscv/kernel/hibernate.c
> >>> new file mode 100644
> >>> index 000000000000..46a2f470db6e
> >>> --- /dev/null
> >>> +++ b/arch/riscv/kernel/hibernate.c
> >>> @@ -0,0 +1,447 @@
> >>> +// SPDX-License-Identifier: GPL-2.0-only
> >>> +/*
> >>> + * Hibernation support for RISCV
> >>> + *
> >>> + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> >>> + *
> >>> + * Author: Jee Heng Sia <[email protected]>
> >>> + */
> >>> +
> >>> +#include <asm/barrier.h>
> >>> +#include <asm/cacheflush.h>
> >>> +#include <asm/mmu_context.h>
> >>> +#include <asm/page.h>
> >>> +#include <asm/pgalloc.h>
> >>> +#include <asm/pgtable.h>
> >>> +#include <asm/sections.h>
> >>> +#include <asm/set_memory.h>
> >>> +#include <asm/smp.h>
> >>> +#include <asm/suspend.h>
> >>> +
> >>> +#include <linux/cpu.h>
> >>> +#include <linux/memblock.h>
> >>> +#include <linux/pm.h>
> >>> +#include <linux/sched.h>
> >>> +#include <linux/suspend.h>
> >>> +#include <linux/utsname.h>
> >>> +
> >>> +/* The logical cpu number we should resume on, initialised to a non-cpu number. */
> >>> +static int sleep_cpu = -EINVAL;
> >>> +
> >>> +/* Pointer to the temporary resume page table. */
> >>> +static pgd_t *resume_pg_dir;
> >>> +
> >>> +/* CPU context to be saved. */
> >>> +struct suspend_context *hibernate_cpu_context;
> >>> +EXPORT_SYMBOL_GPL(hibernate_cpu_context);
> >>> +
> >>> +unsigned long relocated_restore_code;
> >>> +EXPORT_SYMBOL_GPL(relocated_restore_code);
> >>> +
> >>> +/**
> >>> + * struct arch_hibernate_hdr_invariants - container to store kernel build version.
> >>> + * @uts_version: to save the build number and date so that the we do not resume with
> >>> + * a different kernel.
> >>> + */
> >>> +struct arch_hibernate_hdr_invariants {
> >>> + char uts_version[__NEW_UTS_LEN + 1];
> >>> +};
> >>> +
> >>> +/**
> >>> + * struct arch_hibernate_hdr - helper parameters that help us to restore the image.
> >>> + * @invariants: container to store kernel build version.
> >>> + * @hartid: to make sure same boot_cpu executes the hibernate/restore code.
> >>> + * @saved_satp: original page table used by the hibernated image.
> >>> + * @restore_cpu_addr: the kernel's image address to restore the CPU context.
> >>> + */
> >>> +static struct arch_hibernate_hdr {
> >>> + struct arch_hibernate_hdr_invariants invariants;
> >>> + unsigned long hartid;
> >>> + unsigned long saved_satp;
> >>> + unsigned long restore_cpu_addr;
> >>> +} resume_hdr;
> >>> +
> >>> +static inline void arch_hdr_invariants(struct arch_hibernate_hdr_invariants *i)
> >>> +{
> >>> + memset(i, 0, sizeof(*i));
> >>> + memcpy(i->uts_version, init_utsname()->version, sizeof(i->uts_version));
> >>> +}
> >>> +
> >>> +/*
> >>> + * Check if the given pfn is in the 'nosave' section.
> >>> + */
> >>> +int pfn_is_nosave(unsigned long pfn)
> >>> +{
> >>> + unsigned long nosave_begin_pfn = sym_to_pfn(&__nosave_begin);
> >>> + unsigned long nosave_end_pfn = sym_to_pfn(&__nosave_end - 1);
> >>> +
> >>> + return ((pfn >= nosave_begin_pfn) && (pfn <= nosave_end_pfn));
> >>> +}
> >>> +
> >>> +void notrace save_processor_state(void)
> >>> +{
> >>> + WARN_ON(num_online_cpus() != 1);
> >>> +}
> >>> +
> >>> +void notrace restore_processor_state(void)
> >>> +{
> >>> +}
> >>> +
> >>> +/*
> >>> + * Helper parameters need to be saved to the hibernation image header.
> >>> + */
> >>> +int arch_hibernation_header_save(void *addr, unsigned int max_size)
> >>> +{
> >>> + struct arch_hibernate_hdr *hdr = addr;
> >>> +
> >>> + if (max_size < sizeof(*hdr))
> >>> + return -EOVERFLOW;
> >>> +
> >>> + arch_hdr_invariants(&hdr->invariants);
> >>> +
> >>> + hdr->hartid = cpuid_to_hartid_map(sleep_cpu);
> >>> + hdr->saved_satp = csr_read(CSR_SATP);
> >>> + hdr->restore_cpu_addr = (unsigned long)__hibernate_cpu_resume;
> >>> +
> >>> + return 0;
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(arch_hibernation_header_save);
> >>> +
> >>> +/*
> >>> + * Retrieve the helper parameters from the hibernation image header.
> >>> + */
> >>> +int arch_hibernation_header_restore(void *addr)
> >>> +{
> >>> + struct arch_hibernate_hdr_invariants invariants;
> >>> + struct arch_hibernate_hdr *hdr = addr;
> >>> + int ret = 0;
> >>> +
> >>> + arch_hdr_invariants(&invariants);
> >>> +
> >>> + if (memcmp(&hdr->invariants, &invariants, sizeof(invariants))) {
> >>> + pr_crit("Hibernate image not generated by this kernel!\n");
> >>> + return -EINVAL;
> >>> + }
> >>> +
> >>> + sleep_cpu = riscv_hartid_to_cpuid(hdr->hartid);
> >>> + if (sleep_cpu < 0) {
> >>> + pr_crit("Hibernated on a CPU not known to this kernel!\n");
> >>> + sleep_cpu = -EINVAL;
> >>> + return -EINVAL;
> >>> + }
> >>> +
> >>> +#ifdef CONFIG_SMP
> >>> + ret = bringup_hibernate_cpu(sleep_cpu);
> >>> + if (ret) {
> >>> + sleep_cpu = -EINVAL;
> >>> + return ret;
> >>> + }
> >>> +#endif
> >>> + resume_hdr = *hdr;
> >>> +
> >>> + return ret;
> >>> +}
> >>> +EXPORT_SYMBOL_GPL(arch_hibernation_header_restore);
> >>> +
> >>> +int swsusp_arch_suspend(void)
> >>> +{
> >>> + int ret = 0;
> >>> +
> >>> + if (__cpu_suspend_enter(hibernate_cpu_context)) {
> >>> + sleep_cpu = smp_processor_id();
> >>> + suspend_save_csrs(hibernate_cpu_context);
> >>> + ret = swsusp_save();
> >>> + } else {
> >>> + suspend_restore_csrs(hibernate_cpu_context);
> >>> + flush_tlb_all();
> >>> + flush_icache_all();
> >>> +
> >>> + /*
> >>> + * Tell the hibernation core that we've just restored the memory.
> >>> + */
> >>> + in_suspend = 0;
> >>> + sleep_cpu = -EINVAL;
> >>> + }
> >>> +
> >>> + return ret;
> >>> +}
> >>> +
> >>> +static unsigned long _temp_pgtable_map_pte(pte_t *dst_ptep, pte_t *src_ptep,
> >>> + unsigned long addr, pgprot_t prot)
> >>> +{
> >>> + pte_t pte = READ_ONCE(*src_ptep);
> >>> +
> >>> + if (pte_present(pte))
> >>> + set_pte(dst_ptep, __pte(pte_val(pte) | pgprot_val(prot)));
> >>> +
> >>> + return 0;
> >>> +}
> >>
> >> I don't see the need for this function
> > Sure, can remove it.
> >>
> >>> +
> >>> +static unsigned long temp_pgtable_map_pte(pmd_t *dst_pmdp, pmd_t *src_pmdp,
> >>> + unsigned long start, unsigned long end,
> >>> + pgprot_t prot)
> >>> +{
> >>> + unsigned long addr = start;
> >>> + pte_t *src_ptep;
> >>> + pte_t *dst_ptep;
> >>> +
> >>> + if (pmd_none(READ_ONCE(*dst_pmdp))) {
> >>> + dst_ptep = (pte_t *)get_safe_page(GFP_ATOMIC);
> >>> + if (!dst_ptep)
> >>> + return -ENOMEM;
> >>> +
> >>> + pmd_populate_kernel(NULL, dst_pmdp, dst_ptep);
> >>> + }
> >>> +
> >>> + dst_ptep = pte_offset_kernel(dst_pmdp, start);
> >>> + src_ptep = pte_offset_kernel(src_pmdp, start);
> >>> +
> >>> + do {
> >>> + _temp_pgtable_map_pte(dst_ptep, src_ptep, addr, prot);
> >>> + } while (dst_ptep++, src_ptep++, addr += PAGE_SIZE, addr < end);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_map_pmd(pud_t *dst_pudp, pud_t *src_pudp,
> >>> + unsigned long start, unsigned long end,
> >>> + pgprot_t prot)
> >>> +{
> >>> + unsigned long addr = start;
> >>> + unsigned long next;
> >>> + unsigned long ret;
> >>> + pmd_t *src_pmdp;
> >>> + pmd_t *dst_pmdp;
> >>> +
> >>> + if (pud_none(READ_ONCE(*dst_pudp))) {
> >>> + dst_pmdp = (pmd_t *)get_safe_page(GFP_ATOMIC);
> >>> + if (!dst_pmdp)
> >>> + return -ENOMEM;
> >>> +
> >>> + pud_populate(NULL, dst_pudp, dst_pmdp);
> >>> + }
> >>> +
> >>> + dst_pmdp = pmd_offset(dst_pudp, start);
> >>> + src_pmdp = pmd_offset(src_pudp, start);
> >>> +
> >>> + do {
> >>> + pmd_t pmd = READ_ONCE(*src_pmdp);
> >>> +
> >>> + next = pmd_addr_end(addr, end);
> >>> +
> >>> + if (pmd_none(pmd))
> >>> + continue;
> >>> +
> >>> + if (pmd_leaf(pmd)) {
> >>> + set_pmd(dst_pmdp, __pmd(pmd_val(pmd) | pgprot_val(prot)));
> >>> + } else {
> >>> + ret = temp_pgtable_map_pte(dst_pmdp, src_pmdp, addr, next, prot);
> >>> + if (ret)
> >>> + return -ENOMEM;
> >>> + }
> >>> + } while (dst_pmdp++, src_pmdp++, addr = next, addr != end);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_map_pud(p4d_t *dst_p4dp, p4d_t *src_p4dp,
> >>> + unsigned long start,
> >>> + unsigned long end, pgprot_t prot)
> >>> +{
> >>> + unsigned long addr = start;
> >>> + unsigned long next;
> >>> + unsigned long ret;
> >>> + pud_t *dst_pudp;
> >>> + pud_t *src_pudp;
> >>> +
> >>> + if (p4d_none(READ_ONCE(*dst_p4dp))) {
> >>> + dst_pudp = (pud_t *)get_safe_page(GFP_ATOMIC);
> >>> + if (!dst_pudp)
> >>> + return -ENOMEM;
> >>> +
> >>> + p4d_populate(NULL, dst_p4dp, dst_pudp);
> >>> + }
> >>> +
> >>> + dst_pudp = pud_offset(dst_p4dp, start);
> >>> + src_pudp = pud_offset(src_p4dp, start);
> >>> +
> >>> + do {
> >>> + pud_t pud = READ_ONCE(*src_pudp);
> >>> +
> >>> + next = pud_addr_end(addr, end);
> >>> +
> >>> + if (pud_none(pud))
> >>> + continue;
> >>> +
> >>> + if (pud_leaf(pud)) {
> >>> + set_pud(dst_pudp, __pud(pud_val(pud) | pgprot_val(prot)));
> >>> + } else {
> >>> + ret = temp_pgtable_map_pmd(dst_pudp, src_pudp, addr, next, prot);
> >>> + if (ret)
> >>> + return -ENOMEM;
> >>> + }
> >>> + } while (dst_pudp++, src_pudp++, addr = next, addr != end);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_map_p4d(pgd_t *dst_pgdp, pgd_t *src_pgdp,
> >>> + unsigned long start, unsigned long end,
> >>> + pgprot_t prot)
> >>> +{
> >>> + unsigned long addr = start;
> >>
> >> Nit: you don't need the addr variable, you can rename start into addr
> >> and directly work with it.
> > sure.
> >>
> >>> + unsigned long next;
> >>> + unsigned long ret;
> >>> + p4d_t *dst_p4dp;
> >>> + p4d_t *src_p4dp;
> >>> +
> >>> + if (pgd_none(READ_ONCE(*dst_pgdp))) {
> >>> + dst_p4dp = (p4d_t *)get_safe_page(GFP_ATOMIC);
> >>> + if (!dst_p4dp)
> >>> + return -ENOMEM;
> >>> +
> >>> + pgd_populate(NULL, dst_pgdp, dst_p4dp);
> >>> + }
> >>> +
> >>> + dst_p4dp = p4d_offset(dst_pgdp, start);
> >>> + src_p4dp = p4d_offset(src_pgdp, start);
> >>> +
> >>> + do {
> >>> + p4d_t p4d = READ_ONCE(*src_p4dp);
> >>> +
> >>> + next = p4d_addr_end(addr, end);
> >>> +
> >>> + if (p4d_none(READ_ONCE(*src_p4dp)))
> >>
> >> You should use p4d here: p4d_none(p4d)
> > sure
> >>
> >>> + continue;
> >>> +
> >>> + if (p4d_leaf(p4d)) {
> >>> + set_p4d(dst_p4dp, __p4d(p4d_val(p4d) | pgprot_val(prot)));
> >>
> >> The "| pgprot_val(prot)" happens to work because PAGE_KERNEL will add
> >> the PAGE_WRITE bit: I'd rather make it more clear by explicitly add
> >> PAGE_WRITE.
> > sure, this can be done.
> >>
> >>> + } else {
> >>> + ret = temp_pgtable_map_pud(dst_p4dp, src_p4dp, addr, next, prot);
> >>> + if (ret)
> >>> + return -ENOMEM;
> >>> + }
> >>> + } while (dst_p4dp++, src_p4dp++, addr = next, addr != end);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_mapping(pgd_t *pgdp)
> >>> +{
> >>> + unsigned long end = (unsigned long)pfn_to_virt(max_low_pfn);
> >>> + unsigned long addr = PAGE_OFFSET;
> >>> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
> >>> + pgd_t *src_pgdp = pgd_offset_k(addr);
> >>> + unsigned long next;
> >>> +
> >>> + do {
> >>> + next = pgd_addr_end(addr, end);
> >>> + if (pgd_none(READ_ONCE(*src_pgdp)))
> >>> + continue;
> >>> +
> >>
> >> We added the pgd_leaf test in kernel_page_present, let's add it here too.
> > sure.
> >>
> >>> + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, next, PAGE_KERNEL))
> >>> + return -ENOMEM;
> >>> + } while (dst_pgdp++, src_pgdp++, addr = next, addr != end);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +static unsigned long temp_pgtable_text_mapping(pgd_t *pgdp, unsigned long addr)
> >>> +{
> >>> + pgd_t *dst_pgdp = pgd_offset_pgd(pgdp, addr);
> >>> + pgd_t *src_pgdp = pgd_offset_k(addr);
> >>> +
> >>> + if (pgd_none(READ_ONCE(*src_pgdp)))
> >>> + return -EFAULT;
> >>> +
> >>> + if (temp_pgtable_map_p4d(dst_pgdp, src_pgdp, addr, addr, PAGE_KERNEL_EXEC))
> >>> + return -ENOMEM;
> >>> +
> >>> + return 0;
> >>> +}
> >>
> >> Ok so if we fall into a huge mapping, you add the exec permission to the
> >> whole range, which could easily be 1GB. I think that either we can avoid
> >> this step by mapping the whole linear mapping as executable, or we
> >> actually use another pgd entry for this page that is not in the linear
> >> mapping. The latter seems cleaner, what do you think?
> > we can map the whole linear address to writable & executable, by doing this, we can avoid the remapping at the linear map again.
> > we still need to use the same pgd entry for the non-linear mapping just like how it did at the swapper_pg_dir (linear and non-linear
> addr are within the range supported by the pgd).
>
>
> Not sure I understand your last sentence
I mean same pgd entry can be used for linear and non-linear addr, we don’t have to create another pgd entry like what it does for the process handling.
>
>
> >>
> >>> +
> >>> +static unsigned long relocate_restore_code(void)
> >>> +{
> >>> + unsigned long ret;
> >>> + void *page = (void *)get_safe_page(GFP_ATOMIC);
> >>> +
> >>> + if (!page)
> >>> + return -ENOMEM;
> >>> +
> >>> + copy_page(page, hibernate_core_restore_code);
> >>> +
> >>> + /* Make the page containing the relocated code executable. */
> >>> + set_memory_x((unsigned long)page, 1);
> >>> +
> >>> + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)page);
> >>> + if (ret)
> >>> + return ret;
> >>> +
> >>> + return (unsigned long)page;
> >>> +}
> >>> +
> >>> +int swsusp_arch_resume(void)
> >>> +{
> >>> + unsigned long ret;
> >>> +
> >>> + /*
> >>> + * Memory allocated by get_safe_page() will be dealt with by the hibernation core,
> >>> + * we don't need to free it here.
> >>> + */
> >>> + resume_pg_dir = (pgd_t *)get_safe_page(GFP_ATOMIC);
> >>> + if (!resume_pg_dir)
> >>> + return -ENOMEM;
> >>> +
> >>> + /*
> >>> + * The pages need to be writable when restoring the image.
> >>> + * Create a second copy of page table just for the linear map.
> >>> + * Use this temporary page table to restore the image.
> >>> + */
> >>> + ret = temp_pgtable_mapping(resume_pg_dir);
> >>> + if (ret)
> >>> + return (int)ret;
> >>
> >> The temp_pgtable* functions should return an int to avoid this cast.
>
>
> Did you note this comment too?
oops, missed this comment. Thanks for point it out. Sure, it can be done.
>
>
> >>
> >>
> >>> +
> >>> + /* Move the restore code to a new page so that it doesn't get overwritten by itself. */
> >>> + relocated_restore_code = relocate_restore_code();
> >>> + if (relocated_restore_code == -ENOMEM)
> >>> + return -ENOMEM;
> >>> +
> >>> + /*
> >>> + * Map the __hibernate_cpu_resume() address to the temporary page table so that the
> >>> + * restore code can jumps to it after finished restore the image. The next execution
> >>> + * code doesn't find itself in a different address space after switching over to the
> >>> + * original page table used by the hibernated image.
> >>> + */
> >>> + ret = temp_pgtable_text_mapping(resume_pg_dir, (unsigned long)resume_hdr.restore_cpu_addr);
> >>> + if (ret)
> >>> + return ret;
> >>> +
> >>> + hibernate_restore_image(resume_hdr.saved_satp, (PFN_DOWN(__pa(resume_pg_dir)) | satp_mode),
> >>> + resume_hdr.restore_cpu_addr);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +#ifdef CONFIG_PM_SLEEP_SMP
> >>> +int hibernate_resume_nonboot_cpu_disable(void)
> >>> +{
> >>> + if (sleep_cpu < 0) {
> >>> + pr_err("Failing to resume from hibernate on an unknown CPU\n");
> >>> + return -ENODEV;
> >>> + }
> >>> +
> >>> + return freeze_secondary_cpus(sleep_cpu);
> >>> +}
> >>> +#endif
> >>> +
> >>> +static int __init riscv_hibernate_init(void)
> >>> +{
> >>> + hibernate_cpu_context = kzalloc(sizeof(*hibernate_cpu_context), GFP_KERNEL);
> >>> +
> >>> + if (WARN_ON(!hibernate_cpu_context))
> >>> + return -ENOMEM;
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>> +early_initcall(riscv_hibernate_init);
> >>
> >> Overall, it is now nicer with the the proper page table walk: but we can
> >> now see that the code is exactly the same as arm64, what prevents us
> >> from merging both somewhere in mm/?
> > 1. low level page table bit definition not the same
> > 2. Need to refactor code for both riscv and arm64
> > 3. Need to verify the solution for both riscv and arm64 platforms (need someone who expertise on arm64)
> > 4. Might need to extend the function to support other arch
> > 5. Overall, it is do-able but the effort to support the above matters are huge.
>
>
> Too bad, I really see benefits of avoiding code duplication, but that's
> up to you.
Sure, I do see the benefit but I also see the effort needed. Perhaps I can put it to my todo list. I can work out with you in the near future but for sure not in this patch series.
>
> Thanks,
>
> Alex
>
>
> >

2023-02-28 01:33:04

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Monday, 27 February, 2023 7:45 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> On Mon, Feb 27, 2023 at 10:52:32AM +0000, JeeHeng Sia wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Jones <[email protected]>
> > > Sent: Monday, 27 February, 2023 4:00 PM
> > > To: JeeHeng Sia <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > >
> > > On Mon, Feb 27, 2023 at 02:14:27AM +0000, JeeHeng Sia wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Andrew Jones <[email protected]>
> > > > > Sent: Friday, 24 February, 2023 8:07 PM
> > > > > To: JeeHeng Sia <[email protected]>
> > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > >
> > > > > On Fri, Feb 24, 2023 at 10:30:19AM +0000, JeeHeng Sia wrote:
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Andrew Jones <[email protected]>
> > > > > > > Sent: Friday, 24 February, 2023 5:55 PM
> > > > > > > To: JeeHeng Sia <[email protected]>
> > > > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > > > >
> > > > > > > On Fri, Feb 24, 2023 at 09:33:31AM +0000, JeeHeng Sia wrote:
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Andrew Jones <[email protected]>
> > > > > > > > > Sent: Friday, 24 February, 2023 5:00 PM
> > > > > > > > > To: JeeHeng Sia <[email protected]>
> > > > > > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > > > > > >
> > > > > > > > > On Fri, Feb 24, 2023 at 02:05:43AM +0000, JeeHeng Sia wrote:
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: Andrew Jones <[email protected]>
> > > > > > > > > > > Sent: Friday, 24 February, 2023 2:07 AM
> > > > > > > > > > > To: JeeHeng Sia <[email protected]>
> > > > > > > > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected];
> linux-
> > > > > > > > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > > > > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Feb 21, 2023 at 10:35:23AM +0800, Sia Jee Heng wrote:
> > > > > > > > > > > > Low level Arch functions were created to support hibernation.
> > > > > > > > > > > > swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
> > > > > > > > > > > > cpu state onto the stack, then calling swsusp_save() to save the memory
> > > > > > > > > > > > image.
> > > > > > > > > > > >
> > > > > > > > > > > > Arch specific hibernation header is implemented and is utilized by the
> > > > > > > > > > > > arch_hibernation_header_restore() and arch_hibernation_header_save()
> > > > > > > > > > > > functions. The arch specific hibernation header consists of satp, hartid,
> > > > > > > > > > > > and the cpu_resume address. The kernel built version is also need to be
> > > > > > > > > > > > saved into the hibernation image header to making sure only the same
> > > > > > > > > > > > kernel is restore when resume.
> > > > > > > > > > > >
> > > > > > > > > > > > swsusp_arch_resume() creates a temporary page table that covering only
> > > > > > > > > > > > the linear map. It copies the restore code to a 'safe' page, then start
> > > > > > > > > > > > to restore the memory image. Once completed, it restores the original
> > > > > > > > > > > > kernel's page table. It then calls into __hibernate_cpu_resume()
> > > > > > > > > > > > to restore the CPU context. Finally, it follows the normal hibernation
> > > > > > > > > > > > path back to the hibernation core.
> > > > > > > > > > > >
> > > > > > > > > > > > To enable hibernation/suspend to disk into RISCV, the below config
> > > > > > > > > > > > need to be enabled:
> > > > > > > > > > > > - CONFIG_ARCH_HIBERNATION_HEADER
> > > > > > > > > > > > - CONFIG_ARCH_HIBERNATION_POSSIBLE
> > > > > > > > > > > >
> > > > > > > > > > > > Signed-off-by: Sia Jee Heng <[email protected]>
> > > > > > > > > > > > Reviewed-by: Ley Foon Tan <[email protected]>
> > > > > > > > > > > > Reviewed-by: Mason Huo <[email protected]>
> > > > > > > > > > > > ---
> > > > > > > > > > > > arch/riscv/Kconfig | 7 +
> > > > > > > > > > > > arch/riscv/include/asm/assembler.h | 20 ++
> > > > > > > > > > > > arch/riscv/include/asm/suspend.h | 19 ++
> > > > > > > > > > > > arch/riscv/kernel/Makefile | 1 +
> > > > > > > > > > > > arch/riscv/kernel/asm-offsets.c | 5 +
> > > > > > > > > > > > arch/riscv/kernel/hibernate-asm.S | 77 +++++
> > > > > > > > > > > > arch/riscv/kernel/hibernate.c | 447 +++++++++++++++++++++++++++++
> > > > > > > > > > > > 7 files changed, 576 insertions(+)
> > > > > > > > > > > > create mode 100644 arch/riscv/kernel/hibernate-asm.S
> > > > > > > > > > > > create mode 100644 arch/riscv/kernel/hibernate.c
> > > > > > > > > > > >
> > > > > > > > > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > > > > > > > > > > > index e2b656043abf..4555848a817f 100644
> > > > > > > > > > > > --- a/arch/riscv/Kconfig
> > > > > > > > > > > > +++ b/arch/riscv/Kconfig
> > > > > > > > > > > > @@ -690,6 +690,13 @@ menu "Power management options"
> > > > > > > > > > > >
> > > > > > > > > > > > source "kernel/power/Kconfig"
> > > > > > > > > > > >
> > > > > > > > > > > > +config ARCH_HIBERNATION_POSSIBLE
> > > > > > > > > > > > + def_bool y
> > > > > > > > > > > > +
> > > > > > > > > > > > +config ARCH_HIBERNATION_HEADER
> > > > > > > > > > > > + def_bool y
> > > > > > > > > > > > + depends on HIBERNATION
> > > > > > > > > > >
> > > > > > > > > > > nit: I think this can be simplified as def_bool HIBERNATION
> > > > > > > > > > good suggestion. will change it.
> > > > > > > > > > >
> > > > > > > > > > > > +
> > > > > > > > > > > > endmenu # "Power management options"
> > > > > > > > > > > >
> > > > > > > > > > > > menu "CPU Power Management"
> > > > > > > > > > > > diff --git a/arch/riscv/include/asm/assembler.h b/arch/riscv/include/asm/assembler.h
> > > > > > > > > > > > index 727a97735493..68c46c0e0ea8 100644
> > > > > > > > > > > > --- a/arch/riscv/include/asm/assembler.h
> > > > > > > > > > > > +++ b/arch/riscv/include/asm/assembler.h
> > > > > > > > > > > > @@ -59,4 +59,24 @@
> > > > > > > > > > > > REG_L s11, (SUSPEND_CONTEXT_REGS + PT_S11)(a0)
> > > > > > > > > > > > .endm
> > > > > > > > > > > >
> > > > > > > > > > > > +/*
> > > > > > > > > > > > + * copy_page - copy 1 page (4KB) of data from source to destination
> > > > > > > > > > > > + * @a0 - destination
> > > > > > > > > > > > + * @a1 - source
> > > > > > > > > > > > + */
> > > > > > > > > > > > + .macro copy_page a0, a1
> > > > > > > > > > > > + lui a2, 0x1
> > > > > > > > > > > > + add a2, a2, a0
> > > > > > > > > > > > +1 :
> > > > > > > > > > > ^ please remove this space
> > > > > > > > > > can't remove it otherwise checkpatch will throws ERROR: spaces required around that ':'
> > > > > > > > >
> > > > > > > > > Oh, right, labels in macros have this requirement.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > + REG_L t0, 0(a1)
> > > > > > > > > > > > + REG_L t1, SZREG(a1)
> > > > > > > > > > > > +
> > > > > > > > > > > > + REG_S t0, 0(a0)
> > > > > > > > > > > > + REG_S t1, SZREG(a0)
> > > > > > > > > > > > +
> > > > > > > > > > > > + addi a0, a0, 2 * SZREG
> > > > > > > > > > > > + addi a1, a1, 2 * SZREG
> > > > > > > > > > > > + bne a2, a0, 1b
> > > > > > > > > > > > + .endm
> > > > > > > > > > > > +
> > > > > > > > > > > > #endif /* __ASM_ASSEMBLER_H */
> > > > > > > > > > > > diff --git a/arch/riscv/include/asm/suspend.h b/arch/riscv/include/asm/suspend.h
> > > > > > > > > > > > index 75419c5ca272..3362da56a9d8 100644
> > > > > > > > > > > > --- a/arch/riscv/include/asm/suspend.h
> > > > > > > > > > > > +++ b/arch/riscv/include/asm/suspend.h
> > > > > > > > > > > > @@ -21,6 +21,11 @@ struct suspend_context {
> > > > > > > > > > > > #endif
> > > > > > > > > > > > };
> > > > > > > > > > > >
> > > > > > > > > > > > +/*
> > > > > > > > > > > > + * Used by hibernation core and cleared during resume sequence
> > > > > > > > > > > > + */
> > > > > > > > > > > > +extern int in_suspend;
> > > > > > > > > > > > +
> > > > > > > > > > > > /* Low-level CPU suspend entry function */
> > > > > > > > > > > > int __cpu_suspend_enter(struct suspend_context *context);
> > > > > > > > > > > >
> > > > > > > > > > > > @@ -36,4 +41,18 @@ int __cpu_resume_enter(unsigned long hartid, unsigned long context);
> > > > > > > > > > > > /* Used to save and restore the csr */
> > > > > > > > > > > > void suspend_save_csrs(struct suspend_context *context);
> > > > > > > > > > > > void suspend_restore_csrs(struct suspend_context *context);
> > > > > > > > > > > > +
> > > > > > > > > > > > +/* Low-level API to support hibernation */
> > > > > > > > > > > > +int swsusp_arch_suspend(void);
> > > > > > > > > > > > +int swsusp_arch_resume(void);
> > > > > > > > > > > > +int arch_hibernation_header_save(void *addr, unsigned int max_size);
> > > > > > > > > > > > +int arch_hibernation_header_restore(void *addr);
> > > > > > > > > > > > +int __hibernate_cpu_resume(void);
> > > > > > > > > > > > +
> > > > > > > > > > > > +/* Used to resume on the CPU we hibernated on */
> > > > > > > > > > > > +int hibernate_resume_nonboot_cpu_disable(void);
> > > > > > > > > > > > +
> > > > > > > > > > > > +asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp,
> > > > > > > > > > > > + unsigned long cpu_resume);
> > > > > > > > > > > > +asmlinkage int hibernate_core_restore_code(void);
> > > > > > > > > > > > #endif
> > > > > > > > > > > > diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile
> > > > > > > > > > > > index 4cf303a779ab..daab341d55e4 100644
> > > > > > > > > > > > --- a/arch/riscv/kernel/Makefile
> > > > > > > > > > > > +++ b/arch/riscv/kernel/Makefile
> > > > > > > > > > > > @@ -64,6 +64,7 @@ obj-$(CONFIG_MODULES) += module.o
> > > > > > > > > > > > obj-$(CONFIG_MODULE_SECTIONS) += module-sections.o
> > > > > > > > > > > >
> > > > > > > > > > > > obj-$(CONFIG_CPU_PM) += suspend_entry.o suspend.o
> > > > > > > > > > > > +obj-$(CONFIG_HIBERNATION) += hibernate.o hibernate-asm.o
> > > > > > > > > > > >
> > > > > > > > > > > > obj-$(CONFIG_FUNCTION_TRACER) += mcount.o ftrace.o
> > > > > > > > > > > > obj-$(CONFIG_DYNAMIC_FTRACE) += mcount-dyn.o
> > > > > > > > > > > > diff --git a/arch/riscv/kernel/asm-offsets.c b/arch/riscv/kernel/asm-offsets.c
> > > > > > > > > > > > index df9444397908..d6a75aac1d27 100644
> > > > > > > > > > > > --- a/arch/riscv/kernel/asm-offsets.c
> > > > > > > > > > > > +++ b/arch/riscv/kernel/asm-offsets.c
> > > > > > > > > > > > @@ -9,6 +9,7 @@
> > > > > > > > > > > > #include <linux/kbuild.h>
> > > > > > > > > > > > #include <linux/mm.h>
> > > > > > > > > > > > #include <linux/sched.h>
> > > > > > > > > > > > +#include <linux/suspend.h>
> > > > > > > > > > > > #include <asm/kvm_host.h>
> > > > > > > > > > > > #include <asm/thread_info.h>
> > > > > > > > > > > > #include <asm/ptrace.h>
> > > > > > > > > > > > @@ -116,6 +117,10 @@ void asm_offsets(void)
> > > > > > > > > > > >
> > > > > > > > > > > > OFFSET(SUSPEND_CONTEXT_REGS, suspend_context, regs);
> > > > > > > > > > > >
> > > > > > > > > > > > + OFFSET(HIBERN_PBE_ADDR, pbe, address);
> > > > > > > > > > > > + OFFSET(HIBERN_PBE_ORIG, pbe, orig_address);
> > > > > > > > > > > > + OFFSET(HIBERN_PBE_NEXT, pbe, next);
> > > > > > > > > > > > +
> > > > > > > > > > > > OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero);
> > > > > > > > > > > > OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra);
> > > > > > > > > > > > OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp);
> > > > > > > > > > > > diff --git a/arch/riscv/kernel/hibernate-asm.S b/arch/riscv/kernel/hibernate-asm.S
> > > > > > > > > > > > new file mode 100644
> > > > > > > > > > > > index 000000000000..846affe4dced
> > > > > > > > > > > > --- /dev/null
> > > > > > > > > > > > +++ b/arch/riscv/kernel/hibernate-asm.S
> > > > > > > > > > > > @@ -0,0 +1,77 @@
> > > > > > > > > > > > +/* SPDX-License-Identifier: GPL-2.0-only */
> > > > > > > > > > > > +/*
> > > > > > > > > > > > + * Hibernation low level support for RISCV.
> > > > > > > > > > > > + *
> > > > > > > > > > > > + * Copyright (C) 2023 StarFive Technology Co., Ltd.
> > > > > > > > > > > > + *
> > > > > > > > > > > > + * Author: Jee Heng Sia <[email protected]>
> > > > > > > > > > > > + */
> > > > > > > > > > > > +
> > > > > > > > > > > > +#include <asm/asm.h>
> > > > > > > > > > > > +#include <asm/asm-offsets.h>
> > > > > > > > > > > > +#include <asm/assembler.h>
> > > > > > > > > > > > +#include <asm/csr.h>
> > > > > > > > > > > > +
> > > > > > > > > > > > +#include <linux/linkage.h>
> > > > > > > > > > > > +
> > > > > > > > > > > > +/*
> > > > > > > > > > > > + * int __hibernate_cpu_resume(void)
> > > > > > > > > > > > + * Switch back to the hibernated image's page table prior to restoring the CPU
> > > > > > > > > > > > + * context.
> > > > > > > > > > > > + *
> > > > > > > > > > > > + * Always returns 0
> > > > > > > > > > > > + */
> > > > > > > > > > > > +ENTRY(__hibernate_cpu_resume)
> > > > > > > > > > > > + /* switch to hibernated image's page table. */
> > > > > > > > > > > > + csrw CSR_SATP, s0
> > > > > > > > > > > > + sfence.vma
> > > > > > > > > > > > +
> > > > > > > > > > > > + REG_L a0, hibernate_cpu_context
> > > > > > > > > > > > +
> > > > > > > > > > > > + restore_csr
> > > > > > > > > > > > + restore_reg
> > > > > > > > > > > > +
> > > > > > > > > > > > + /* Return zero value. */
> > > > > > > > > > > > + add a0, zero, zero
> > > > > > > > > > >
> > > > > > > > > > > nit: mv a0, zero
> > > > > > > > > > sure
> > > > > > > > > > >
> > > > > > > > > > > > +
> > > > > > > > > > > > + ret
> > > > > > > > > > > > +END(__hibernate_cpu_resume)
> > > > > > > > > > > > +
> > > > > > > > > > > > +/*
> > > > > > > > > > > > + * Prepare to restore the image.
> > > > > > > > > > > > + * a0: satp of saved page tables.
> > > > > > > > > > > > + * a1: satp of temporary page tables.
> > > > > > > > > > > > + * a2: cpu_resume.
> > > > > > > > > > > > + */
> > > > > > > > > > > > +ENTRY(hibernate_restore_image)
> > > > > > > > > > > > + mv s0, a0
> > > > > > > > > > > > + mv s1, a1
> > > > > > > > > > > > + mv s2, a2
> > > > > > > > > > > > + REG_L s4, restore_pblist
> > > > > > > > > > > > + REG_L a1, relocated_restore_code
> > > > > > > > > > > > +
> > > > > > > > > > > > + jalr a1
> > > > > > > > > > > > +END(hibernate_restore_image)
> > > > > > > > > > > > +
> > > > > > > > > > > > +/*
> > > > > > > > > > > > + * The below code will be executed from a 'safe' page.
> > > > > > > > > > > > + * It first switches to the temporary page table, then starts to copy the pages
> > > > > > > > > > > > + * back to the original memory location. Finally, it jumps to __hibernate_cpu_resume()
> > > > > > > > > > > > + * to restore the CPU context.
> > > > > > > > > > > > + */
> > > > > > > > > > > > +ENTRY(hibernate_core_restore_code)
> > > > > > > > > > > > + /* switch to temp page table. */
> > > > > > > > > > > > + csrw satp, s1
> > > > > > > > > > > > + sfence.vma
> > > > > > > > > > > > +.Lcopy:
> > > > > > > > > > > > + /* The below code will restore the hibernated image. */
> > > > > > > > > > > > + REG_L a1, HIBERN_PBE_ADDR(s4)
> > > > > > > > > > > > + REG_L a0, HIBERN_PBE_ORIG(s4)
> > > > > > > > > > >
> > > > > > > > > > > Are we sure restore_pblist will never be NULL?
> > > > > > > > > > restore_pblist is a link-list, it will be null during initialization or during page clean up by hibernation core. During the
> initial
> > > > > > > resume
> > > > > > > > > process, the hibernation core will check the header and load the pages. If everything works correctly, the page will be
> linked
> > > to
> > > > > the
> > > > > > > > > restore_pblist and then invoke swsusp_arch_resume() else hibernation core will throws error and failed to resume from
> > > the
> > > > > > > > > hibernated image.
> > > > > > > > >
> > > > > > > > > I know restore_pblist is a linked-list and this doesn't answer the
> > > > > > > > > question. The comment above restore_pblist says
> > > > > > > > >
> > > > > > > > > /*
> > > > > > > > > * List of PBEs needed for restoring the pages that were allocated before
> > > > > > > > > * the suspend and included in the suspend image, but have also been
> > > > > > > > > * allocated by the "resume" kernel, so their contents cannot be written
> > > > > > > > > * directly to their "original" page frames.
> > > > > > > > > */
> > > > > > > > >
> > > > > > > > > which implies the pages that end up on this list are "special". My
> > > > > > > > > question is whether or not we're guaranteed to have at least one
> > > > > > > > > of these special pages. If not, we shouldn't assume s4 is non-null.
> > > > > > > > > If so, then a comment stating why that's guaranteed would be nice.
> > > > > > > > The restore_pblist will not be null otherwise swsusp_arch_resume wouldn't get invoked. you can find how the link-list are
> > > link
> > > > > and
> > > > > > > how it checks against validity at https://elixir.bootlin.com/linux/v6.2-rc8/source/kernel/power/snapshot.c . " A comment
> > > stating
> > > > > why
> > > > > > > that's guaranteed would be nice" ? Hmm, perhaps this is out of my scope but I do believe in the page validity checking in the
> > > link I
> > > > > > > shared.
> > > > > > >
> > > > > > > Sorry, but pointing to an entire source file (one that I've obviously
> > > > > > > already looked at, since I quoted a comment from it...) is not helpful.
> > > > > > > I don't see where restore_pblist is being checked before
> > > > > > > swsusp_arch_resume() is issued (from its callsite in hibernate.c).
> > > > > > Sure, below shows the hibernation flow for your reference. The link-list creation and checking found at:
> > > > > https://elixir.bootlin.com/linux/v6.2/source/kernel/power/snapshot.c#L2576
> > > > > > software_resume()
> > > > > > load_image_and_restore()
> > > > > > swsusp_read()
> > > > > > load_image()
> > > > > > snapshot_write_next()
> > > > > > get_buffer() <-- This is the function checks and links the pages to the restore_pblist
> > > > >
> > > > > Yup, I've read this path, including get_buffer(), where I saw that
> > > > > get_buffer() can return an address without allocating a PBE. Where is the
> > > > > check that restore_pblist isn't NULL, i.e. we see that at least one PBE
> > > > > has been allocated by get_buffer(), before we call swsusp_arch_resume()?
> > > > >
> > > > > Or, is known that at least one or more pages match the criteria pointed
> > > > > out in the comment below (copied from get_buffer())?
> > > > >
> > > > > /*
> > > > > * The "original" page frame has not been allocated and we have to
> > > > > * use a "safe" page frame to store the loaded page.
> > > > > */
> > > > >
> > > > > If so, then which ones? And where does it state that?
> > > > Let's look at the below pseudocode and hope it clear your doubt. restore_pblist depends on safe_page_list and pbe and both
> > > pointers are checked. I couldn't find from where the restore_pblist will be null..
> > > > //Pseudocode to illustrate the image loading
> > > > initialize restore_pblist to null;
> > > > initialize safe_pages_list to null;
> > > > Allocate safe page list, return error if failed;
> > > > load image;
> > > > loop: Create pbe chain, return error if failed;
> > >
> > > This loop pseudocode is incomplete. It's
> > >
> > > loop:
> > > if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> > > return page_address(page);
> > > Create pbe chain, return error if failed;
> > > ...
> > >
> > > which I pointed out explicitly in my last reply. Also, as I asked in my
> > > last reply (and have been asking four times now, albeit less explicitly
> > > the first two times), how do we know at least one PBE will be linked?
> > 1 PBE correspond to 1 page, you shouldn't expect only 1 page is saved.
>
> I know PBEs correspond to pages. *Why* should I not expect only one page
> is saved? Or, more importantly, why should I expect more than zero pages
> are saved?
>
> Convincing answers might be because we *always* put the restore code in
> pages which get added to the PBE list or that the original page tables
> *always* get put in pages which get added to the PBE list. It's not very
> convincing to simply *assume* that at least one random page will always
> meet the PBE list criteria.
>
> > Hibernation core will do the calculation. If the PBEs (restore_pblist) linked successfully, the hibernated image will be restore else
> normal boot will take place.
> > > Or, even more specifically this time, where is the proof that for each
> > > hibernation resume, there exists some page such that
> > > !swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?
> > forbidden_pages and free_pages are not contributed to the restore_pblist (as you already aware from the code). Infact, the
> forbidden_pages and free_pages are not save into the disk.
>
> Exactly, so those pages are *not* going to contribute to the greater than
> zero pages. What I've been asking for, from the beginning, is to know
> which page(s) are known to *always* contribute to the list. Or, IOW, how
> do you know the PBE list isn't empty, a.k.a restore_pblist isn't NULL?
Well, this is keep going around in a circle, thought the answer is in the hibernation code. restore_pblist get the pointer from the PBE, and the PBE already checked for validity.
Can I suggest you to submit a patch to the hibernation core?
>
> Thanks,
> drew
>
> > >
> > > Thanks,
> > > drew
> > >
> > > > assign orig_addr and safe_page to pbe;
> > > > link pbe to restore_pblist;
> > > > return pbe to handle->buffer;
> > > > check handle->buffer;
> > > > goto loop if no error else return with error;
> > > > >
> > > > > Thanks,
> > > > > drew
> > > > >
> > > > >
> > > > > > hibernation_restore()
> > > > > > resume_target_kernel()
> > > > > > swsusp_arch_resume()
> > > > > > >
> > > > > > > Thanks,
> > > > > > > drew

2023-02-28 05:05:06

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

On Tue, Feb 28, 2023 at 01:32:53AM +0000, JeeHeng Sia wrote:
> > > > > load image;
> > > > > loop: Create pbe chain, return error if failed;
> > > >
> > > > This loop pseudocode is incomplete. It's
> > > >
> > > > loop:
> > > > if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> > > > return page_address(page);
> > > > Create pbe chain, return error if failed;
> > > > ...
> > > >
> > > > which I pointed out explicitly in my last reply. Also, as I asked in my
> > > > last reply (and have been asking four times now, albeit less explicitly
> > > > the first two times), how do we know at least one PBE will be linked?
> > > 1 PBE correspond to 1 page, you shouldn't expect only 1 page is saved.
> >
> > I know PBEs correspond to pages. *Why* should I not expect only one page
> > is saved? Or, more importantly, why should I expect more than zero pages
> > are saved?
> >
> > Convincing answers might be because we *always* put the restore code in
> > pages which get added to the PBE list or that the original page tables
> > *always* get put in pages which get added to the PBE list. It's not very
> > convincing to simply *assume* that at least one random page will always
> > meet the PBE list criteria.
> >
> > > Hibernation core will do the calculation. If the PBEs (restore_pblist) linked successfully, the hibernated image will be restore else
> > normal boot will take place.
> > > > Or, even more specifically this time, where is the proof that for each
> > > > hibernation resume, there exists some page such that
> > > > !swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?
> > > forbidden_pages and free_pages are not contributed to the restore_pblist (as you already aware from the code). Infact, the
> > forbidden_pages and free_pages are not save into the disk.
> >
> > Exactly, so those pages are *not* going to contribute to the greater than
> > zero pages. What I've been asking for, from the beginning, is to know
> > which page(s) are known to *always* contribute to the list. Or, IOW, how
> > do you know the PBE list isn't empty, a.k.a restore_pblist isn't NULL?
> Well, this is keep going around in a circle, thought the answer is in the hibernation code. restore_pblist get the pointer from the PBE, and the PBE already checked for validity.

It keeps going around in circles because you keep avoiding my question by
pointing out trivial linked list code. I'm not worried about the linked
list code being correct. My concern is that you're using a linked list
with an assumption that it is not empty. My question has been all along,
how do you know it's not empty?

I'll change the way I ask this time. Please take a look at your PBE list
and let me know if there are PBEs on it that must be there on each
hibernation resume, e.g. the resume code page is there or whatever.

> Can I suggest you to submit a patch to the hibernation core?

Why? What's wrong with it?

Thanks,
drew

2023-02-28 05:35:45

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Tuesday, 28 February, 2023 1:05 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> On Tue, Feb 28, 2023 at 01:32:53AM +0000, JeeHeng Sia wrote:
> > > > > > load image;
> > > > > > loop: Create pbe chain, return error if failed;
> > > > >
> > > > > This loop pseudocode is incomplete. It's
> > > > >
> > > > > loop:
> > > > > if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> > > > > return page_address(page);
> > > > > Create pbe chain, return error if failed;
> > > > > ...
> > > > >
> > > > > which I pointed out explicitly in my last reply. Also, as I asked in my
> > > > > last reply (and have been asking four times now, albeit less explicitly
> > > > > the first two times), how do we know at least one PBE will be linked?
> > > > 1 PBE correspond to 1 page, you shouldn't expect only 1 page is saved.
> > >
> > > I know PBEs correspond to pages. *Why* should I not expect only one page
> > > is saved? Or, more importantly, why should I expect more than zero pages
> > > are saved?
> > >
> > > Convincing answers might be because we *always* put the restore code in
> > > pages which get added to the PBE list or that the original page tables
> > > *always* get put in pages which get added to the PBE list. It's not very
> > > convincing to simply *assume* that at least one random page will always
> > > meet the PBE list criteria.
> > >
> > > > Hibernation core will do the calculation. If the PBEs (restore_pblist) linked successfully, the hibernated image will be restore else
> > > normal boot will take place.
> > > > > Or, even more specifically this time, where is the proof that for each
> > > > > hibernation resume, there exists some page such that
> > > > > !swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?
> > > > forbidden_pages and free_pages are not contributed to the restore_pblist (as you already aware from the code). Infact, the
> > > forbidden_pages and free_pages are not save into the disk.
> > >
> > > Exactly, so those pages are *not* going to contribute to the greater than
> > > zero pages. What I've been asking for, from the beginning, is to know
> > > which page(s) are known to *always* contribute to the list. Or, IOW, how
> > > do you know the PBE list isn't empty, a.k.a restore_pblist isn't NULL?
> > Well, this is keep going around in a circle, thought the answer is in the hibernation code. restore_pblist get the pointer from the PBE,
> and the PBE already checked for validity.
>
> It keeps going around in circles because you keep avoiding my question by
> pointing out trivial linked list code. I'm not worried about the linked
> list code being correct. My concern is that you're using a linked list
> with an assumption that it is not empty. My question has been all along,
> how do you know it's not empty?
>
> I'll change the way I ask this time. Please take a look at your PBE list
> and let me know if there are PBEs on it that must be there on each
> hibernation resume, e.g. the resume code page is there or whatever.
>
> > Can I suggest you to submit a patch to the hibernation core?
>
> Why? What's wrong with it?
Kindly let me draw 2 scenarios for you. Option 1 is to add the restore_pblist checking to the hibernation core and option 2 is to add restore_pblist checking to the arch solution
Although I really don't think it is needed. But if you really wanted to add the checking, I would suggest to go with option 1. again, I really think that it is not needed!

//Option 1
//Pseudocode to illustrate the image loading
initialize restore_pblist to null;
initialize safe_pages_list to null;
Allocate safe page list, return error if failed;
load image;
loop: Create pbe chain, return error if failed;
assign orig_addr and safe_page to pbe;
link pbe to restore_pblist;
/* Add checking here */
return error if restore_pblist equal to null;
return pbe to handle->buffer;
check handle->buffer;
goto loop if no error else return with error;

//option 2
//Pseudocode to illustrate the image loading
initialize restore_pblist to null;
initialize safe_pages_list to null;
Allocate safe page list, return error if failed;
load image;
loop: Create pbe chain, return error if failed;
assign orig_addr and safe_page to pbe;
link pbe to restore_pblist;
return pbe to handle->buffer;
check handle->buffer;
goto loop if no error else return with error;
everything works correctly, continue the rest of the operation
invoke swsusp_arch_resume

//@swsusp_arch_resume()
loop2: return error if restore_pblist is null
increment restore_pblist and goto loop2
create temp_pg_table
continue the rest of the resume operation
>
> Thanks,
> drew

2023-02-28 06:34:14

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Tuesday, 28 February, 2023 1:05 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> On Tue, Feb 28, 2023 at 01:32:53AM +0000, JeeHeng Sia wrote:
> > > > > > load image;
> > > > > > loop: Create pbe chain, return error if failed;
> > > > >
> > > > > This loop pseudocode is incomplete. It's
> > > > >
> > > > > loop:
> > > > > if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> > > > > return page_address(page);
> > > > > Create pbe chain, return error if failed;
> > > > > ...
> > > > >
> > > > > which I pointed out explicitly in my last reply. Also, as I asked in my
> > > > > last reply (and have been asking four times now, albeit less explicitly
> > > > > the first two times), how do we know at least one PBE will be linked?
> > > > 1 PBE correspond to 1 page, you shouldn't expect only 1 page is saved.
> > >
> > > I know PBEs correspond to pages. *Why* should I not expect only one page
> > > is saved? Or, more importantly, why should I expect more than zero pages
> > > are saved?
> > >
> > > Convincing answers might be because we *always* put the restore code in
> > > pages which get added to the PBE list or that the original page tables
> > > *always* get put in pages which get added to the PBE list. It's not very
> > > convincing to simply *assume* that at least one random page will always
> > > meet the PBE list criteria.
> > >
> > > > Hibernation core will do the calculation. If the PBEs (restore_pblist) linked successfully, the hibernated image will be restore else
> > > normal boot will take place.
> > > > > Or, even more specifically this time, where is the proof that for each
> > > > > hibernation resume, there exists some page such that
> > > > > !swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?
> > > > forbidden_pages and free_pages are not contributed to the restore_pblist (as you already aware from the code). Infact, the
> > > forbidden_pages and free_pages are not save into the disk.
> > >
> > > Exactly, so those pages are *not* going to contribute to the greater than
> > > zero pages. What I've been asking for, from the beginning, is to know
> > > which page(s) are known to *always* contribute to the list. Or, IOW, how
> > > do you know the PBE list isn't empty, a.k.a restore_pblist isn't NULL?
> > Well, this is keep going around in a circle, thought the answer is in the hibernation code. restore_pblist get the pointer from the PBE,
> and the PBE already checked for validity.
>
> It keeps going around in circles because you keep avoiding my question by
> pointing out trivial linked list code. I'm not worried about the linked
> list code being correct. My concern is that you're using a linked list
> with an assumption that it is not empty. My question has been all along,
> how do you know it's not empty?
>
> I'll change the way I ask this time. Please take a look at your PBE list
> and let me know if there are PBEs on it that must be there on each
> hibernation resume, e.g. the resume code page is there or whatever.
Just to add on, it is not "my" PBE list but the list is from the hibernation core. As already draw out the scenarios for you, checking should be done at the initialization phase.
>
> > Can I suggest you to submit a patch to the hibernation core?
>
> Why? What's wrong with it?
>
> Thanks,
> drew

2023-02-28 07:18:54

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

On Tue, Feb 28, 2023 at 05:33:32AM +0000, JeeHeng Sia wrote:
>
>
> > -----Original Message-----
> > From: Andrew Jones <[email protected]>
> > Sent: Tuesday, 28 February, 2023 1:05 PM
> > To: JeeHeng Sia <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >
> > On Tue, Feb 28, 2023 at 01:32:53AM +0000, JeeHeng Sia wrote:
> > > > > > > load image;
> > > > > > > loop: Create pbe chain, return error if failed;
> > > > > >
> > > > > > This loop pseudocode is incomplete. It's
> > > > > >
> > > > > > loop:
> > > > > > if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> > > > > > return page_address(page);
> > > > > > Create pbe chain, return error if failed;
> > > > > > ...
> > > > > >
> > > > > > which I pointed out explicitly in my last reply. Also, as I asked in my
> > > > > > last reply (and have been asking four times now, albeit less explicitly
> > > > > > the first two times), how do we know at least one PBE will be linked?
> > > > > 1 PBE correspond to 1 page, you shouldn't expect only 1 page is saved.
> > > >
> > > > I know PBEs correspond to pages. *Why* should I not expect only one page
> > > > is saved? Or, more importantly, why should I expect more than zero pages
> > > > are saved?
> > > >
> > > > Convincing answers might be because we *always* put the restore code in
> > > > pages which get added to the PBE list or that the original page tables
> > > > *always* get put in pages which get added to the PBE list. It's not very
> > > > convincing to simply *assume* that at least one random page will always
> > > > meet the PBE list criteria.
> > > >
> > > > > Hibernation core will do the calculation. If the PBEs (restore_pblist) linked successfully, the hibernated image will be restore else
> > > > normal boot will take place.
> > > > > > Or, even more specifically this time, where is the proof that for each
> > > > > > hibernation resume, there exists some page such that
> > > > > > !swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?
> > > > > forbidden_pages and free_pages are not contributed to the restore_pblist (as you already aware from the code). Infact, the
> > > > forbidden_pages and free_pages are not save into the disk.
> > > >
> > > > Exactly, so those pages are *not* going to contribute to the greater than
> > > > zero pages. What I've been asking for, from the beginning, is to know
> > > > which page(s) are known to *always* contribute to the list. Or, IOW, how
> > > > do you know the PBE list isn't empty, a.k.a restore_pblist isn't NULL?
> > > Well, this is keep going around in a circle, thought the answer is in the hibernation code. restore_pblist get the pointer from the PBE,
> > and the PBE already checked for validity.
> >
> > It keeps going around in circles because you keep avoiding my question by
> > pointing out trivial linked list code. I'm not worried about the linked
> > list code being correct. My concern is that you're using a linked list
> > with an assumption that it is not empty. My question has been all along,
> > how do you know it's not empty?
> >
> > I'll change the way I ask this time. Please take a look at your PBE list
> > and let me know if there are PBEs on it that must be there on each
> > hibernation resume, e.g. the resume code page is there or whatever.
> >
> > > Can I suggest you to submit a patch to the hibernation core?
> >
> > Why? What's wrong with it?
> Kindly let me draw 2 scenarios for you. Option 1 is to add the restore_pblist checking to the hibernation core and option 2 is to add restore_pblist checking to the arch solution
> Although I really don't think it is needed. But if you really wanted to add the checking, I would suggest to go with option 1. again, I really think that it is not needed!

This entire email thread is because you've first coded, and now stated,
that you don't think the PBE list will ever be empty. And now, below, I
see you're proposing to return an error when the PBE list is empty, why?
If there's nothing in the PBE list, then there's nothing to do for it.
Why is that an error condition?

Please explain to me why you think the PBE list *must* not be empty
(which is what I've been asking for over and over). OIOW, are there
any pages you have in mind which the resume kernel always uses and
are also always going to end up in the suspend image? I don't know,
but I assume clean, file-backed pages do not get added to the suspend
image, which would rule out most kernel code pages. Also, many pages
written during boot (which is where the resume kernel is at resume time)
were no longer resident at hibernate time, so they won't be in the
suspend image either. While it's quite likely I'm missing something
obvious, I'd rather be told what that is than to assume the PBE list
will never be empty. Which is why I keep asking about it...

Thanks,
drew

>
> //Option 1
> //Pseudocode to illustrate the image loading
> initialize restore_pblist to null;
> initialize safe_pages_list to null;
> Allocate safe page list, return error if failed;
> load image;
> loop: Create pbe chain, return error if failed;
> assign orig_addr and safe_page to pbe;
> link pbe to restore_pblist;
> /* Add checking here */
> return error if restore_pblist equal to null;
> return pbe to handle->buffer;
> check handle->buffer;
> goto loop if no error else return with error;
>
> //option 2
> //Pseudocode to illustrate the image loading
> initialize restore_pblist to null;
> initialize safe_pages_list to null;
> Allocate safe page list, return error if failed;
> load image;
> loop: Create pbe chain, return error if failed;
> assign orig_addr and safe_page to pbe;
> link pbe to restore_pblist;
> return pbe to handle->buffer;
> check handle->buffer;
> goto loop if no error else return with error;
> everything works correctly, continue the rest of the operation
> invoke swsusp_arch_resume
>
> //@swsusp_arch_resume()
> loop2: return error if restore_pblist is null
> increment restore_pblist and goto loop2
> create temp_pg_table
> continue the rest of the resume operation
> >
> > Thanks,
> > drew

2023-02-28 07:29:43

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

On Tue, Feb 28, 2023 at 06:33:59AM +0000, JeeHeng Sia wrote:
>
>
> > -----Original Message-----
> > From: Andrew Jones <[email protected]>
> > Sent: Tuesday, 28 February, 2023 1:05 PM
> > To: JeeHeng Sia <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >
> > On Tue, Feb 28, 2023 at 01:32:53AM +0000, JeeHeng Sia wrote:
> > > > > > > load image;
> > > > > > > loop: Create pbe chain, return error if failed;
> > > > > >
> > > > > > This loop pseudocode is incomplete. It's
> > > > > >
> > > > > > loop:
> > > > > > if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> > > > > > return page_address(page);
> > > > > > Create pbe chain, return error if failed;
> > > > > > ...
> > > > > >
> > > > > > which I pointed out explicitly in my last reply. Also, as I asked in my
> > > > > > last reply (and have been asking four times now, albeit less explicitly
> > > > > > the first two times), how do we know at least one PBE will be linked?
> > > > > 1 PBE correspond to 1 page, you shouldn't expect only 1 page is saved.
> > > >
> > > > I know PBEs correspond to pages. *Why* should I not expect only one page
> > > > is saved? Or, more importantly, why should I expect more than zero pages
> > > > are saved?
> > > >
> > > > Convincing answers might be because we *always* put the restore code in
> > > > pages which get added to the PBE list or that the original page tables
> > > > *always* get put in pages which get added to the PBE list. It's not very
> > > > convincing to simply *assume* that at least one random page will always
> > > > meet the PBE list criteria.
> > > >
> > > > > Hibernation core will do the calculation. If the PBEs (restore_pblist) linked successfully, the hibernated image will be restore else
> > > > normal boot will take place.
> > > > > > Or, even more specifically this time, where is the proof that for each
> > > > > > hibernation resume, there exists some page such that
> > > > > > !swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?
> > > > > forbidden_pages and free_pages are not contributed to the restore_pblist (as you already aware from the code). Infact, the
> > > > forbidden_pages and free_pages are not save into the disk.
> > > >
> > > > Exactly, so those pages are *not* going to contribute to the greater than
> > > > zero pages. What I've been asking for, from the beginning, is to know
> > > > which page(s) are known to *always* contribute to the list. Or, IOW, how
> > > > do you know the PBE list isn't empty, a.k.a restore_pblist isn't NULL?
> > > Well, this is keep going around in a circle, thought the answer is in the hibernation code. restore_pblist get the pointer from the PBE,
> > and the PBE already checked for validity.
> >
> > It keeps going around in circles because you keep avoiding my question by
> > pointing out trivial linked list code. I'm not worried about the linked
> > list code being correct. My concern is that you're using a linked list
> > with an assumption that it is not empty. My question has been all along,
> > how do you know it's not empty?
> >
> > I'll change the way I ask this time. Please take a look at your PBE list
> > and let me know if there are PBEs on it that must be there on each
> > hibernation resume, e.g. the resume code page is there or whatever.
> Just to add on, it is not "my" PBE list but the list is from the hibernation core. As already draw out the scenarios for you, checking should be done at the initialization phase.

Your PBE list is your instance of the PBE list when you resume your
hibernation test. I'm simply asking you to dump the PBE list while
you resume a hibernation, and then tell me what's there.

Please stop thinking about the trivial details of the code, like which
file a variable is in, and start thinking about how the code is being
used. A PBE list is a concept, your PBE list is an instance of that
concept, the code, which is the least interesting part, is just an
implementation of that concept. First, I want to understand the concept,
then we can worry about the code.

drew

> >
> > > Can I suggest you to submit a patch to the hibernation core?
> >
> > Why? What's wrong with it?
> >
> > Thanks,
> > drew

2023-02-28 07:29:56

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Tuesday, 28 February, 2023 3:19 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> On Tue, Feb 28, 2023 at 05:33:32AM +0000, JeeHeng Sia wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Jones <[email protected]>
> > > Sent: Tuesday, 28 February, 2023 1:05 PM
> > > To: JeeHeng Sia <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > >
> > > On Tue, Feb 28, 2023 at 01:32:53AM +0000, JeeHeng Sia wrote:
> > > > > > > > load image;
> > > > > > > > loop: Create pbe chain, return error if failed;
> > > > > > >
> > > > > > > This loop pseudocode is incomplete. It's
> > > > > > >
> > > > > > > loop:
> > > > > > > if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> > > > > > > return page_address(page);
> > > > > > > Create pbe chain, return error if failed;
> > > > > > > ...
> > > > > > >
> > > > > > > which I pointed out explicitly in my last reply. Also, as I asked in my
> > > > > > > last reply (and have been asking four times now, albeit less explicitly
> > > > > > > the first two times), how do we know at least one PBE will be linked?
> > > > > > 1 PBE correspond to 1 page, you shouldn't expect only 1 page is saved.
> > > > >
> > > > > I know PBEs correspond to pages. *Why* should I not expect only one page
> > > > > is saved? Or, more importantly, why should I expect more than zero pages
> > > > > are saved?
> > > > >
> > > > > Convincing answers might be because we *always* put the restore code in
> > > > > pages which get added to the PBE list or that the original page tables
> > > > > *always* get put in pages which get added to the PBE list. It's not very
> > > > > convincing to simply *assume* that at least one random page will always
> > > > > meet the PBE list criteria.
> > > > >
> > > > > > Hibernation core will do the calculation. If the PBEs (restore_pblist) linked successfully, the hibernated image will be restore
> else
> > > > > normal boot will take place.
> > > > > > > Or, even more specifically this time, where is the proof that for each
> > > > > > > hibernation resume, there exists some page such that
> > > > > > > !swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?
> > > > > > forbidden_pages and free_pages are not contributed to the restore_pblist (as you already aware from the code). Infact, the
> > > > > forbidden_pages and free_pages are not save into the disk.
> > > > >
> > > > > Exactly, so those pages are *not* going to contribute to the greater than
> > > > > zero pages. What I've been asking for, from the beginning, is to know
> > > > > which page(s) are known to *always* contribute to the list. Or, IOW, how
> > > > > do you know the PBE list isn't empty, a.k.a restore_pblist isn't NULL?
> > > > Well, this is keep going around in a circle, thought the answer is in the hibernation code. restore_pblist get the pointer from the
> PBE,
> > > and the PBE already checked for validity.
> > >
> > > It keeps going around in circles because you keep avoiding my question by
> > > pointing out trivial linked list code. I'm not worried about the linked
> > > list code being correct. My concern is that you're using a linked list
> > > with an assumption that it is not empty. My question has been all along,
> > > how do you know it's not empty?
> > >
> > > I'll change the way I ask this time. Please take a look at your PBE list
> > > and let me know if there are PBEs on it that must be there on each
> > > hibernation resume, e.g. the resume code page is there or whatever.
> > >
> > > > Can I suggest you to submit a patch to the hibernation core?
> > >
> > > Why? What's wrong with it?
> > Kindly let me draw 2 scenarios for you. Option 1 is to add the restore_pblist checking to the hibernation core and option 2 is to add
> restore_pblist checking to the arch solution
> > Although I really don't think it is needed. But if you really wanted to add the checking, I would suggest to go with option 1. again, I
> really think that it is not needed!
>
> This entire email thread is because you've first coded, and now stated,
> that you don't think the PBE list will ever be empty. And now, below, I
> see you're proposing to return an error when the PBE list is empty, why?
> If there's nothing in the PBE list, then there's nothing to do for it.
> Why is that an error condition?
>
> Please explain to me why you think the PBE list *must* not be empty
> (which is what I've been asking for over and over). OIOW, are there
> any pages you have in mind which the resume kernel always uses and
> are also always going to end up in the suspend image? I don't know,
> but I assume clean, file-backed pages do not get added to the suspend
> image, which would rule out most kernel code pages. Also, many pages
> written during boot (which is where the resume kernel is at resume time)
> were no longer resident at hibernate time, so they won't be in the
> suspend image either. While it's quite likely I'm missing something
> obvious, I'd rather be told what that is than to assume the PBE list
> will never be empty. Which is why I keep asking about it...
The answer already in the Linux kernel hibernation core, do you need me to write a white paper to explain in detail or you need a conference call?
>
> Thanks,
> drew
>
> >
> > //Option 1
> > //Pseudocode to illustrate the image loading
> > initialize restore_pblist to null;
> > initialize safe_pages_list to null;
> > Allocate safe page list, return error if failed;
> > load image;
> > loop: Create pbe chain, return error if failed;
> > assign orig_addr and safe_page to pbe;
> > link pbe to restore_pblist;
> > /* Add checking here */
> > return error if restore_pblist equal to null;
> > return pbe to handle->buffer;
> > check handle->buffer;
> > goto loop if no error else return with error;
> >
> > //option 2
> > //Pseudocode to illustrate the image loading
> > initialize restore_pblist to null;
> > initialize safe_pages_list to null;
> > Allocate safe page list, return error if failed;
> > load image;
> > loop: Create pbe chain, return error if failed;
> > assign orig_addr and safe_page to pbe;
> > link pbe to restore_pblist;
> > return pbe to handle->buffer;
> > check handle->buffer;
> > goto loop if no error else return with error;
> > everything works correctly, continue the rest of the operation
> > invoke swsusp_arch_resume
> >
> > //@swsusp_arch_resume()
> > loop2: return error if restore_pblist is null
> > increment restore_pblist and goto loop2
> > create temp_pg_table
> > continue the rest of the resume operation
> > >
> > > Thanks,
> > > drew

2023-02-28 07:36:00

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk



> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Tuesday, 28 February, 2023 3:30 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> On Tue, Feb 28, 2023 at 06:33:59AM +0000, JeeHeng Sia wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Jones <[email protected]>
> > > Sent: Tuesday, 28 February, 2023 1:05 PM
> > > To: JeeHeng Sia <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > >
> > > On Tue, Feb 28, 2023 at 01:32:53AM +0000, JeeHeng Sia wrote:
> > > > > > > > load image;
> > > > > > > > loop: Create pbe chain, return error if failed;
> > > > > > >
> > > > > > > This loop pseudocode is incomplete. It's
> > > > > > >
> > > > > > > loop:
> > > > > > > if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> > > > > > > return page_address(page);
> > > > > > > Create pbe chain, return error if failed;
> > > > > > > ...
> > > > > > >
> > > > > > > which I pointed out explicitly in my last reply. Also, as I asked in my
> > > > > > > last reply (and have been asking four times now, albeit less explicitly
> > > > > > > the first two times), how do we know at least one PBE will be linked?
> > > > > > 1 PBE correspond to 1 page, you shouldn't expect only 1 page is saved.
> > > > >
> > > > > I know PBEs correspond to pages. *Why* should I not expect only one page
> > > > > is saved? Or, more importantly, why should I expect more than zero pages
> > > > > are saved?
> > > > >
> > > > > Convincing answers might be because we *always* put the restore code in
> > > > > pages which get added to the PBE list or that the original page tables
> > > > > *always* get put in pages which get added to the PBE list. It's not very
> > > > > convincing to simply *assume* that at least one random page will always
> > > > > meet the PBE list criteria.
> > > > >
> > > > > > Hibernation core will do the calculation. If the PBEs (restore_pblist) linked successfully, the hibernated image will be restore
> else
> > > > > normal boot will take place.
> > > > > > > Or, even more specifically this time, where is the proof that for each
> > > > > > > hibernation resume, there exists some page such that
> > > > > > > !swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?
> > > > > > forbidden_pages and free_pages are not contributed to the restore_pblist (as you already aware from the code). Infact, the
> > > > > forbidden_pages and free_pages are not save into the disk.
> > > > >
> > > > > Exactly, so those pages are *not* going to contribute to the greater than
> > > > > zero pages. What I've been asking for, from the beginning, is to know
> > > > > which page(s) are known to *always* contribute to the list. Or, IOW, how
> > > > > do you know the PBE list isn't empty, a.k.a restore_pblist isn't NULL?
> > > > Well, this is keep going around in a circle, thought the answer is in the hibernation code. restore_pblist get the pointer from the
> PBE,
> > > and the PBE already checked for validity.
> > >
> > > It keeps going around in circles because you keep avoiding my question by
> > > pointing out trivial linked list code. I'm not worried about the linked
> > > list code being correct. My concern is that you're using a linked list
> > > with an assumption that it is not empty. My question has been all along,
> > > how do you know it's not empty?
> > >
> > > I'll change the way I ask this time. Please take a look at your PBE list
> > > and let me know if there are PBEs on it that must be there on each
> > > hibernation resume, e.g. the resume code page is there or whatever.
> > Just to add on, it is not "my" PBE list but the list is from the hibernation core. As already draw out the scenarios for you, checking
> should be done at the initialization phase.
>
> Your PBE list is your instance of the PBE list when you resume your
> hibernation test. I'm simply asking you to dump the PBE list while
> you resume a hibernation, and then tell me what's there.
>
> Please stop thinking about the trivial details of the code, like which
> file a variable is in, and start thinking about how the code is being
> used. A PBE list is a concept, your PBE list is an instance of that
> concept, the code, which is the least interesting part, is just an
> implementation of that concept. First, I want to understand the concept,
> then we can worry about the code.
>
Dear Andrew, perhaps a conference call is better? otherwise it is going to waste the time in typing...Let me know how to join the call with you....thank you.
> drew
>
> > >
> > > > Can I suggest you to submit a patch to the hibernation core?
> > >
> > > Why? What's wrong with it?
> > >
> > > Thanks,
> > > drew

2023-02-28 07:37:38

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

On Tue, Feb 28, 2023 at 07:29:40AM +0000, JeeHeng Sia wrote:
>
>
> > -----Original Message-----
> > From: Andrew Jones <[email protected]>
> > Sent: Tuesday, 28 February, 2023 3:19 PM
> > To: JeeHeng Sia <[email protected]>
> > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> >
> > On Tue, Feb 28, 2023 at 05:33:32AM +0000, JeeHeng Sia wrote:
> > >
> > >
> > > > -----Original Message-----
> > > > From: Andrew Jones <[email protected]>
> > > > Sent: Tuesday, 28 February, 2023 1:05 PM
> > > > To: JeeHeng Sia <[email protected]>
> > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > >
> > > > On Tue, Feb 28, 2023 at 01:32:53AM +0000, JeeHeng Sia wrote:
> > > > > > > > > load image;
> > > > > > > > > loop: Create pbe chain, return error if failed;
> > > > > > > >
> > > > > > > > This loop pseudocode is incomplete. It's
> > > > > > > >
> > > > > > > > loop:
> > > > > > > > if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> > > > > > > > return page_address(page);
> > > > > > > > Create pbe chain, return error if failed;
> > > > > > > > ...
> > > > > > > >
> > > > > > > > which I pointed out explicitly in my last reply. Also, as I asked in my
> > > > > > > > last reply (and have been asking four times now, albeit less explicitly
> > > > > > > > the first two times), how do we know at least one PBE will be linked?
> > > > > > > 1 PBE correspond to 1 page, you shouldn't expect only 1 page is saved.
> > > > > >
> > > > > > I know PBEs correspond to pages. *Why* should I not expect only one page
> > > > > > is saved? Or, more importantly, why should I expect more than zero pages
> > > > > > are saved?
> > > > > >
> > > > > > Convincing answers might be because we *always* put the restore code in
> > > > > > pages which get added to the PBE list or that the original page tables
> > > > > > *always* get put in pages which get added to the PBE list. It's not very
> > > > > > convincing to simply *assume* that at least one random page will always
> > > > > > meet the PBE list criteria.
> > > > > >
> > > > > > > Hibernation core will do the calculation. If the PBEs (restore_pblist) linked successfully, the hibernated image will be restore
> > else
> > > > > > normal boot will take place.
> > > > > > > > Or, even more specifically this time, where is the proof that for each
> > > > > > > > hibernation resume, there exists some page such that
> > > > > > > > !swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?
> > > > > > > forbidden_pages and free_pages are not contributed to the restore_pblist (as you already aware from the code). Infact, the
> > > > > > forbidden_pages and free_pages are not save into the disk.
> > > > > >
> > > > > > Exactly, so those pages are *not* going to contribute to the greater than
> > > > > > zero pages. What I've been asking for, from the beginning, is to know
> > > > > > which page(s) are known to *always* contribute to the list. Or, IOW, how
> > > > > > do you know the PBE list isn't empty, a.k.a restore_pblist isn't NULL?
> > > > > Well, this is keep going around in a circle, thought the answer is in the hibernation code. restore_pblist get the pointer from the
> > PBE,
> > > > and the PBE already checked for validity.
> > > >
> > > > It keeps going around in circles because you keep avoiding my question by
> > > > pointing out trivial linked list code. I'm not worried about the linked
> > > > list code being correct. My concern is that you're using a linked list
> > > > with an assumption that it is not empty. My question has been all along,
> > > > how do you know it's not empty?
> > > >
> > > > I'll change the way I ask this time. Please take a look at your PBE list
> > > > and let me know if there are PBEs on it that must be there on each
> > > > hibernation resume, e.g. the resume code page is there or whatever.
> > > >
> > > > > Can I suggest you to submit a patch to the hibernation core?
> > > >
> > > > Why? What's wrong with it?
> > > Kindly let me draw 2 scenarios for you. Option 1 is to add the restore_pblist checking to the hibernation core and option 2 is to add
> > restore_pblist checking to the arch solution
> > > Although I really don't think it is needed. But if you really wanted to add the checking, I would suggest to go with option 1. again, I
> > really think that it is not needed!
> >
> > This entire email thread is because you've first coded, and now stated,
> > that you don't think the PBE list will ever be empty. And now, below, I
> > see you're proposing to return an error when the PBE list is empty, why?
> > If there's nothing in the PBE list, then there's nothing to do for it.
> > Why is that an error condition?
> >
> > Please explain to me why you think the PBE list *must* not be empty
> > (which is what I've been asking for over and over). OIOW, are there
> > any pages you have in mind which the resume kernel always uses and
> > are also always going to end up in the suspend image? I don't know,
> > but I assume clean, file-backed pages do not get added to the suspend
> > image, which would rule out most kernel code pages. Also, many pages
> > written during boot (which is where the resume kernel is at resume time)
> > were no longer resident at hibernate time, so they won't be in the
> > suspend image either. While it's quite likely I'm missing something
> > obvious, I'd rather be told what that is than to assume the PBE list
> > will never be empty. Which is why I keep asking about it...
> The answer already in the Linux kernel hibernation core, do you need me to write a white paper to explain in detail or you need a conference call?

I'm not sure why you don't just write a paragraph or two here in this
email thread explaining what "the answer" is. Anyway, feel free to
invite me to a call if you think it'd be easier to hash out that way.

Thanks,
drew

> >
> > Thanks,
> > drew
> >
> > >
> > > //Option 1
> > > //Pseudocode to illustrate the image loading
> > > initialize restore_pblist to null;
> > > initialize safe_pages_list to null;
> > > Allocate safe page list, return error if failed;
> > > load image;
> > > loop: Create pbe chain, return error if failed;
> > > assign orig_addr and safe_page to pbe;
> > > link pbe to restore_pblist;
> > > /* Add checking here */
> > > return error if restore_pblist equal to null;
> > > return pbe to handle->buffer;
> > > check handle->buffer;
> > > goto loop if no error else return with error;
> > >
> > > //option 2
> > > //Pseudocode to illustrate the image loading
> > > initialize restore_pblist to null;
> > > initialize safe_pages_list to null;
> > > Allocate safe page list, return error if failed;
> > > load image;
> > > loop: Create pbe chain, return error if failed;
> > > assign orig_addr and safe_page to pbe;
> > > link pbe to restore_pblist;
> > > return pbe to handle->buffer;
> > > check handle->buffer;
> > > goto loop if no error else return with error;
> > > everything works correctly, continue the rest of the operation
> > > invoke swsusp_arch_resume
> > >
> > > //@swsusp_arch_resume()
> > > loop2: return error if restore_pblist is null
> > > increment restore_pblist and goto loop2
> > > create temp_pg_table
> > > continue the rest of the resume operation
> > > >
> > > > Thanks,
> > > > drew

2023-03-03 01:55:14

by Sia Jee Heng

[permalink] [raw]
Subject: RE: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

Hi Andrew,


> -----Original Message-----
> From: Andrew Jones <[email protected]>
> Sent: Tuesday, February 28, 2023 3:37 PM
> To: JeeHeng Sia <[email protected]>
> Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
>
> On Tue, Feb 28, 2023 at 07:29:40AM +0000, JeeHeng Sia wrote:
> >
> >
> > > -----Original Message-----
> > > From: Andrew Jones <[email protected]>
> > > Sent: Tuesday, 28 February, 2023 3:19 PM
> > > To: JeeHeng Sia <[email protected]>
> > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > >
> > > On Tue, Feb 28, 2023 at 05:33:32AM +0000, JeeHeng Sia wrote:
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Andrew Jones <[email protected]>
> > > > > Sent: Tuesday, 28 February, 2023 1:05 PM
> > > > > To: JeeHeng Sia <[email protected]>
> > > > > Cc: [email protected]; [email protected]; [email protected]; [email protected]; linux-
> > > > > [email protected]; Leyfoon Tan <[email protected]>; Mason Huo <[email protected]>
> > > > > Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk
> > > > >
> > > > > On Tue, Feb 28, 2023 at 01:32:53AM +0000, JeeHeng Sia wrote:
> > > > > > > > > > load image;
> > > > > > > > > > loop: Create pbe chain, return error if failed;
> > > > > > > > >
> > > > > > > > > This loop pseudocode is incomplete. It's
> > > > > > > > >
> > > > > > > > > loop:
> > > > > > > > > if (swsusp_page_is_forbidden(page) && swsusp_page_is_free(page))
> > > > > > > > > return page_address(page);
> > > > > > > > > Create pbe chain, return error if failed;
> > > > > > > > > ...
> > > > > > > > >
> > > > > > > > > which I pointed out explicitly in my last reply. Also, as I asked in my
> > > > > > > > > last reply (and have been asking four times now, albeit less explicitly
> > > > > > > > > the first two times), how do we know at least one PBE will be linked?
> > > > > > > > 1 PBE correspond to 1 page, you shouldn't expect only 1 page is saved.
> > > > > > >
> > > > > > > I know PBEs correspond to pages. *Why* should I not expect only one page
> > > > > > > is saved? Or, more importantly, why should I expect more than zero pages
> > > > > > > are saved?
> > > > > > >
> > > > > > > Convincing answers might be because we *always* put the restore code in
> > > > > > > pages which get added to the PBE list or that the original page tables
> > > > > > > *always* get put in pages which get added to the PBE list. It's not very
> > > > > > > convincing to simply *assume* that at least one random page will always
> > > > > > > meet the PBE list criteria.
> > > > > > >
> > > > > > > > Hibernation core will do the calculation. If the PBEs (restore_pblist) linked successfully, the hibernated image will be
> restore
> > > else
> > > > > > > normal boot will take place.
> > > > > > > > > Or, even more specifically this time, where is the proof that for each
> > > > > > > > > hibernation resume, there exists some page such that
> > > > > > > > > !swsusp_page_is_forbidden(page) or !swsusp_page_is_free(page) is true?
> > > > > > > > forbidden_pages and free_pages are not contributed to the restore_pblist (as you already aware from the code). Infact,
> the
> > > > > > > forbidden_pages and free_pages are not save into the disk.
> > > > > > >
> > > > > > > Exactly, so those pages are *not* going to contribute to the greater than
> > > > > > > zero pages. What I've been asking for, from the beginning, is to know
> > > > > > > which page(s) are known to *always* contribute to the list. Or, IOW, how
> > > > > > > do you know the PBE list isn't empty, a.k.a restore_pblist isn't NULL?
> > > > > > Well, this is keep going around in a circle, thought the answer is in the hibernation code. restore_pblist get the pointer from
> the
> > > PBE,
> > > > > and the PBE already checked for validity.
> > > > >
> > > > > It keeps going around in circles because you keep avoiding my question by
> > > > > pointing out trivial linked list code. I'm not worried about the linked
> > > > > list code being correct. My concern is that you're using a linked list
> > > > > with an assumption that it is not empty. My question has been all along,
> > > > > how do you know it's not empty?
> > > > >
> > > > > I'll change the way I ask this time. Please take a look at your PBE list
> > > > > and let me know if there are PBEs on it that must be there on each
> > > > > hibernation resume, e.g. the resume code page is there or whatever.
> > > > >
> > > > > > Can I suggest you to submit a patch to the hibernation core?
> > > > >
> > > > > Why? What's wrong with it?
> > > > Kindly let me draw 2 scenarios for you. Option 1 is to add the restore_pblist checking to the hibernation core and option 2 is to
> add
> > > restore_pblist checking to the arch solution
> > > > Although I really don't think it is needed. But if you really wanted to add the checking, I would suggest to go with option 1. again,
> I
> > > really think that it is not needed!
> > >
> > > This entire email thread is because you've first coded, and now stated,
> > > that you don't think the PBE list will ever be empty. And now, below, I
> > > see you're proposing to return an error when the PBE list is empty, why?
> > > If there's nothing in the PBE list, then there's nothing to do for it.
> > > Why is that an error condition?
> > >
> > > Please explain to me why you think the PBE list *must* not be empty
> > > (which is what I've been asking for over and over). OIOW, are there
> > > any pages you have in mind which the resume kernel always uses and
> > > are also always going to end up in the suspend image? I don't know,
> > > but I assume clean, file-backed pages do not get added to the suspend
> > > image, which would rule out most kernel code pages. Also, many pages
> > > written during boot (which is where the resume kernel is at resume time)
> > > were no longer resident at hibernate time, so they won't be in the
> > > suspend image either. While it's quite likely I'm missing something
> > > obvious, I'd rather be told what that is than to assume the PBE list
> > > will never be empty. Which is why I keep asking about it...
> > The answer already in the Linux kernel hibernation core, do you need me to write a white paper to explain in detail or you need a
> conference call?
>
> I'm not sure why you don't just write a paragraph or two here in this
> email thread explaining what "the answer" is. Anyway, feel free to
> invite me to a call if you think it'd be easier to hash out that way.
Thank you very much to free up time to join the call. It was very nice to talk to you over the conference call and I did learn a lot from you.
Below is the summary of the experiment that benefit everyone:
To avoid inspecting a huge log, the experiment was carried out on the Qemu with 512MB of memory (128000 pages).
During hibernation, there are 22770 pages (out of 128000 pages) were identified need to be stored to the disk. Those pages consists of the kernel text code, rodata, page table, stack/heap/kmalloc/vmalloc memory, user space app, rootfs.....etc. The number of pages need to be stored to the disk are depends on the "workload" on the system.
When resume, only 21651 pages were assigned to the restore_pblist. The rest of the pages consists of meta_data pages and forbidden pages which were handled by the "resume kernel". Arch code will handle the pages assigned to the restore_pblist.
From the experiment, we also know that the game that is activated before hibernation is still "alive" after resume from hibernation and can continue to play without problem.

Thanks
Jee Heng

2023-03-03 08:09:49

by Andrew Jones

[permalink] [raw]
Subject: Re: [PATCH v4 4/4] RISC-V: Add arch functions to support hibernation/suspend-to-disk

On Fri, Mar 03, 2023 at 01:53:19AM +0000, JeeHeng Sia wrote:
> Hi Andrew,
>
>
> > -----Original Message-----
> > From: Andrew Jones <[email protected]>
...
> > I'm not sure why you don't just write a paragraph or two here in this
> > email thread explaining what "the answer" is. Anyway, feel free to
> > invite me to a call if you think it'd be easier to hash out that way.
> Thank you very much to free up time to join the call. It was very nice to talk to you over the conference call and I did learn a lot from you.
> Below is the summary of the experiment that benefit everyone:
> To avoid inspecting a huge log, the experiment was carried out on the Qemu with 512MB of memory (128000 pages).
> During hibernation, there are 22770 pages (out of 128000 pages) were identified need to be stored to the disk. Those pages consists of the kernel text code, rodata, page table, stack/heap/kmalloc/vmalloc memory, user space app, rootfs.....etc. The number of pages need to be stored to the disk are depends on the "workload" on the system.
> When resume, only 21651 pages were assigned to the restore_pblist. The rest of the pages consists of meta_data pages and forbidden pages which were handled by the "resume kernel". Arch code will handle the pages assigned to the restore_pblist.
> From the experiment, we also know that the game that is activated before hibernation is still "alive" after resume from hibernation and can continue to play without problem.
>

Thank you, Jee Heng. Indeed it looks like the majority of the pages that
are selected for the suspend image end up on the PBE list. While we don't
have definitive "this page must be on the PBE list" type of result, I
agree that we shouldn't need to worry about the PBE list ever being empty.

Thanks,
drew