From: Yulei Zhang <[email protected]>
x86 pat uses 'struct page' by only checking if it's system ram,
however it is not true if dmem is used, let's teach pat to
recognize this case if it is ram but it is !pfn_valid()
We always use WB for dmem and any attempt to change this
behavior will be rejected and WARN_ON is triggered
Signed-off-by: Xiao Guangrong <[email protected]>
Signed-off-by: Yulei Zhang <[email protected]>
---
arch/x86/mm/pat/memtype.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 8f665c352bf0..fd8a298fc30b 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -511,6 +511,13 @@ static int reserve_ram_pages_type(u64 start, u64 end,
for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) {
enum page_cache_mode type;
+ /*
+ * it's dmem if it's ram but not 'struct page' backend,
+ * we always use WB
+ */
+ if (WARN_ON(!pfn_valid(pfn)))
+ return -EBUSY;
+
page = pfn_to_page(pfn);
type = get_page_memtype(page);
if (type != _PAGE_CACHE_MODE_WB) {
@@ -539,6 +546,13 @@ static int free_ram_pages_type(u64 start, u64 end)
u64 pfn;
for (pfn = (start >> PAGE_SHIFT); pfn < (end >> PAGE_SHIFT); ++pfn) {
+ /*
+ * it's dmem, see the comments in
+ * reserve_ram_pages_type()
+ */
+ if (WARN_ON(!pfn_valid(pfn)))
+ continue;
+
page = pfn_to_page(pfn);
set_page_memtype(page, _PAGE_CACHE_MODE_WB);
}
@@ -714,6 +728,13 @@ static enum page_cache_mode lookup_memtype(u64 paddr)
if (pat_pagerange_is_ram(paddr, paddr + PAGE_SIZE)) {
struct page *page;
+ /*
+ * dmem always uses WB, see the comments in
+ * reserve_ram_pages_type()
+ */
+ if (!pfn_valid(paddr >> PAGE_SHIFT))
+ return rettype;
+
page = pfn_to_page(paddr >> PAGE_SHIFT);
return get_page_memtype(page);
}
--
2.28.0
On 08/10/20 09:53, [email protected] wrote:
> From: Yulei Zhang <[email protected]>
>
> x86 pat uses 'struct page' by only checking if it's system ram,
> however it is not true if dmem is used, let's teach pat to
> recognize this case if it is ram but it is !pfn_valid()
>
> We always use WB for dmem and any attempt to change this
> behavior will be rejected and WARN_ON is triggered
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> Signed-off-by: Yulei Zhang <[email protected]>
Hooks like these will make it very hard to merge this series.
I like the idea of struct page-backed memory, but this is a lot of code
and I wonder if it's worth adding all these complications.
One can already use mem= to remove the "struct page" cost for most of
the host memory, and manage the allocation of the remaining memory in
userspace with /dev/mem. What is the advantage of doing this in the kernel?
Paolo
On Tue, Oct 13, 2020 at 3:27 PM Paolo Bonzini <[email protected]> wrote:
>
> On 08/10/20 09:53, [email protected] wrote:
> > From: Yulei Zhang <[email protected]>
> >
> > x86 pat uses 'struct page' by only checking if it's system ram,
> > however it is not true if dmem is used, let's teach pat to
> > recognize this case if it is ram but it is !pfn_valid()
> >
> > We always use WB for dmem and any attempt to change this
> > behavior will be rejected and WARN_ON is triggered
> >
> > Signed-off-by: Xiao Guangrong <[email protected]>
> > Signed-off-by: Yulei Zhang <[email protected]>
>
> Hooks like these will make it very hard to merge this series.
>
> I like the idea of struct page-backed memory, but this is a lot of code
> and I wonder if it's worth adding all these complications.
>
> One can already use mem= to remove the "struct page" cost for most of
> the host memory, and manage the allocation of the remaining memory in
> userspace with /dev/mem. What is the advantage of doing this in the kernel?
>
> Paolo
>
hi Paolo,as far as I know there are a few limitations to play with
/dev/mem in this case.
1. access to /dev/men is restricted due to the security requirement,
but usually our virtual machines are unprivileged processes.
2. what we get from /dev/mem is a whole block of memory, as dynamic
VMs running on /dev/mem will cause memory fragment, it needs extra logic
to manage the allocation and recovery to avoid wasted memory. dmemfs
can support this and also leverage the kernel tlb management.
3. it needs to support hugepage with different page size granularity.
4. MCE recovery capability is also required.