2019-03-27 21:39:38

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P

Hi,

This patchset enables P2P on the RISC-V architecture. To do this on the
current kernel, we only need to be able to back IO memory with struct
pages using devm_memremap_pages(). This requires ARCH_HAS_ZONE_DEVICE,
ARCH_ENABLE_MEMORY_HOTPLUG, and ARCH_ENABLE_MEMORY_HOTREMOVE; which in
turn requires ARCH_SPARSEMEM_ENABLE. We also need to ensure that the
IO memory regions in hardware can be covered by the linear region
so that there is a linear relation ship between the virtual address and
the struct page address in the vmemmap region.

While our reason to do this work is for P2P, these features are all
useful, more generally, and also enable other kernel features.

The first patch in the series implements sparse mem. It was already
submitted and reviewed last cycle, only forgotten. It has been rebased
onto v5.1-rc2.

Patches 2 through 4 rework the architecture's virtual address space
mapping trying to get as much of the IO regions covered by the linear
mapping. With Sv39, we do not have enough address space to cover all the
typical hardware regions but we can get the majority of it.

Patch 5 and 6 implement memory hotplug and remove. These are relatively
straight forward additions similar to other arches.

Patch 7 implements pte_devmap which allows us to set
ARCH_HAS_ZONE_DEVICE.

The patchset was tested in QEMU and on a HiFive Unleashed board.
However, we were unable to actually test P2P transactions with this
exact set because we have been unable to get PCI working with v5.1-rc2.
We were able to get it running on a 4.19 era kernel (with a bunch of
out-of-tree patches for PCI on a Microsemi PolarFire board).

This series is based on v5.1-rc2 and a git tree is available here:

https://github.com/sbates130272/linux-p2pmem riscv-p2p-v1

Thanks,

Logan

--

Logan Gunthorpe (7):
RISC-V: Implement sparsemem
RISC-V: doc: Add file describing the virtual memory map
RISC-V: Rework kernel's virtual address space mapping
RISC-V: Update page tables to cover the whole linear mapping
RISC-V: Implement memory hotplug
RISC-V: Implement memory hot remove
RISC-V: Implement pte_devmap()

Documentation/riscv/mm.txt | 24 +++
arch/riscv/Kconfig | 32 +++-
arch/riscv/include/asm/page.h | 2 -
arch/riscv/include/asm/pgtable-64.h | 2 +
arch/riscv/include/asm/pgtable-bits.h | 8 +-
arch/riscv/include/asm/pgtable.h | 45 ++++-
arch/riscv/include/asm/sparsemem.h | 11 ++
arch/riscv/kernel/setup.c | 1 -
arch/riscv/mm/init.c | 251 ++++++++++++++++++++++++--
9 files changed, 354 insertions(+), 22 deletions(-)
create mode 100644 Documentation/riscv/mm.txt
create mode 100644 arch/riscv/include/asm/sparsemem.h

--
2.20.1


2019-03-27 21:37:42

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 4/7] RISC-V: Update page tables to cover the whole linear mapping

With the new virtual address changes in an earlier patch, we want the
page tables to cover more of the linear mapping region. Instead of
only mapping from PAGE_OFFSET and up, we instead map starting
from an aligned version of va_pa_offset such that all of the physical
address space will be mapped.

Signed-off-by: Logan Gunthorpe <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Anup Patel <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Zong Li <[email protected]>
Cc: Mike Rapoport <[email protected]>
---
arch/riscv/kernel/setup.c | 1 -
arch/riscv/mm/init.c | 27 +++++++++++++++------------
2 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index ecb654f6a79e..8286df8be31a 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -59,7 +59,6 @@ EXPORT_SYMBOL(empty_zero_page);
/* The lucky hart to first increment this variable will boot the other cores */
atomic_t hart_lottery;
unsigned long boot_cpu_hartid;
-
void __init parse_dtb(unsigned int hartid, void *dtb)
{
if (early_init_dt_scan(__va(dtb)))
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index b9d50031e78f..315194557c3d 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -150,8 +150,8 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
pgd_t trampoline_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);

#ifndef __PAGETABLE_PMD_FOLDED
-#define NUM_SWAPPER_PMDS ((uintptr_t)-PAGE_OFFSET >> PGDIR_SHIFT)
-pmd_t swapper_pmd[PTRS_PER_PMD*((-PAGE_OFFSET)/PGDIR_SIZE)] __page_aligned_bss;
+#define NUM_SWAPPER_PMDS ((uintptr_t)-VMALLOC_END >> PGDIR_SHIFT)
+pmd_t swapper_pmd[PTRS_PER_PMD*((-VMALLOC_END)/PGDIR_SIZE)] __page_aligned_bss;
pmd_t trampoline_pmd[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
pmd_t fixmap_pmd[PTRS_PER_PMD] __page_aligned_bss;
#endif
@@ -180,13 +180,18 @@ asmlinkage void __init setup_vm(void)
extern char _start;
uintptr_t i;
uintptr_t pa = (uintptr_t) &_start;
+ uintptr_t linear_start;
+ uintptr_t off;
pgprot_t prot = __pgprot(pgprot_val(PAGE_KERNEL) | _PAGE_EXEC);

va_pa_offset = PAGE_OFFSET - pa;
pfn_base = PFN_DOWN(pa);

+ linear_start = ALIGN_DOWN(va_pa_offset, PGDIR_SIZE);
+ off = linear_start - va_pa_offset;
+
/* Sanity check alignment and size */
- BUG_ON((PAGE_OFFSET % PGDIR_SIZE) != 0);
+ BUG_ON(linear_start <= VMALLOC_END);
BUG_ON((pa % (PAGE_SIZE * PTRS_PER_PTE)) != 0);

#ifndef __PAGETABLE_PMD_FOLDED
@@ -195,15 +200,14 @@ asmlinkage void __init setup_vm(void)
__pgprot(_PAGE_TABLE));
trampoline_pmd[0] = pfn_pmd(PFN_DOWN(pa), prot);

- for (i = 0; i < (-PAGE_OFFSET)/PGDIR_SIZE; ++i) {
- size_t o = (PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
-
+ for (i = 0; i < (-linear_start)/PGDIR_SIZE; ++i) {
+ size_t o = (linear_start >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
swapper_pg_dir[o] =
pfn_pgd(PFN_DOWN((uintptr_t)swapper_pmd) + i,
__pgprot(_PAGE_TABLE));
}
for (i = 0; i < ARRAY_SIZE(swapper_pmd); i++)
- swapper_pmd[i] = pfn_pmd(PFN_DOWN(pa + i * PMD_SIZE), prot);
+ swapper_pmd[i] = pfn_pmd(PFN_DOWN(off + i * PMD_SIZE), prot);

swapper_pg_dir[(FIXADDR_START >> PGDIR_SHIFT) % PTRS_PER_PGD] =
pfn_pgd(PFN_DOWN((uintptr_t)fixmap_pmd),
@@ -215,11 +219,10 @@ asmlinkage void __init setup_vm(void)
trampoline_pg_dir[(PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD] =
pfn_pgd(PFN_DOWN(pa), prot);

- for (i = 0; i < (-PAGE_OFFSET)/PGDIR_SIZE; ++i) {
- size_t o = (PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
-
- swapper_pg_dir[o] =
- pfn_pgd(PFN_DOWN(pa + i * PGDIR_SIZE), prot);
+ for (i = 0; i < (-linear_start)/PGDIR_SIZE; ++i) {
+ size_t o = (linear_start >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
+ swapper_pg_dir[o] = pfn_pgd(PFN_DOWN(off + i * PGDIR_SIZE),
+ prot);
}

swapper_pg_dir[(FIXADDR_START >> PGDIR_SHIFT) % PTRS_PER_PGD] =
--
2.20.1


2019-03-27 21:37:54

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 7/7] RISC-V: Implement pte_devmap()

Use the 2nd software bit in the PTE as the devmap bit and add the
appropriate accessors.

This also allows us to set ARCH_HAS_ZONE_DEVICE.

Signed-off-by: Logan Gunthorpe <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Laurent Dufour <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Anup Patel <[email protected]>
Cc: Zong Li <[email protected]>
Cc: Guo Ren <[email protected]>
Cc: "Stefan O'Rear" <[email protected]>
---
arch/riscv/Kconfig | 1 +
arch/riscv/include/asm/pgtable-bits.h | 8 ++++++--
arch/riscv/include/asm/pgtable.h | 10 ++++++++++
3 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 2cb39b4d6d6b..d365d7e17ed2 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -49,6 +49,7 @@ config RISCV
select GENERIC_IRQ_MULTI_HANDLER
select ARCH_HAS_PTE_SPECIAL
select HAVE_EBPF_JIT if 64BIT
+ select ARCH_HAS_ZONE_DEVICE

config MMU
def_bool y
diff --git a/arch/riscv/include/asm/pgtable-bits.h b/arch/riscv/include/asm/pgtable-bits.h
index 470755cb7558..9555d419a46f 100644
--- a/arch/riscv/include/asm/pgtable-bits.h
+++ b/arch/riscv/include/asm/pgtable-bits.h
@@ -30,9 +30,11 @@
#define _PAGE_GLOBAL (1 << 5) /* Global */
#define _PAGE_ACCESSED (1 << 6) /* Set by hardware on any access */
#define _PAGE_DIRTY (1 << 7) /* Set by hardware on any write */
-#define _PAGE_SOFT (1 << 8) /* Reserved for software */
+#define _PAGE_SOFT1 (1 << 8) /* Reserved for software */
+#define _PAGE_SOFT2 (1 << 9) /* Reserved for software */

-#define _PAGE_SPECIAL _PAGE_SOFT
+#define _PAGE_SPECIAL _PAGE_SOFT1
+#define _PAGE_DEVMAP _PAGE_SOFT2
#define _PAGE_TABLE _PAGE_PRESENT

/*
@@ -41,6 +43,8 @@
*/
#define _PAGE_PROT_NONE _PAGE_READ

+#define __HAVE_ARCH_PTE_DEVMAP
+
#define _PAGE_PFN_SHIFT 10

/* Set of bits to preserve across pte_modify() */
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index e071e2be3a6c..a0e6a5f8bbb5 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -248,6 +248,11 @@ static inline int pte_special(pte_t pte)
return pte_val(pte) & _PAGE_SPECIAL;
}

+static inline int pte_devmap(pte_t pte)
+{
+ return pte_val(pte) & _PAGE_DEVMAP;
+}
+
/* static inline pte_t pte_rdprotect(pte_t pte) */

static inline pte_t pte_wrprotect(pte_t pte)
@@ -289,6 +294,11 @@ static inline pte_t pte_mkspecial(pte_t pte)
return __pte(pte_val(pte) | _PAGE_SPECIAL);
}

+static inline pte_t pte_mkdevmap(pte_t pte)
+{
+ return __pte(pte_val(pte) | _PAGE_SPECIAL | _PAGE_DEVMAP);
+}
+
/* Modify page protection bits */
static inline pte_t pte_modify(pte_t pte, pgprot_t newprot)
{
--
2.20.1


2019-03-27 21:37:58

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 5/7] RISC-V: Implement memory hotplug

In order to define ARCH_ENABLE_MEMORY_HOTPLUG we need to implement
arch_add_memory() and vmemmap_free().

arch_add_memory() is very similar to the x86 versions except we
don't need to fuss with the mapping as we've already mapped the
entire linear region in riscv.

For now, vmemmap_free() is empty which is similar to other arches that
don't implement hot remove.

Signed-off-by: Logan Gunthorpe <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Anup Patel <[email protected]>
Cc: Atish Patra <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Zong Li <[email protected]>
Cc: Guo Ren <[email protected]>
---
arch/riscv/Kconfig | 3 +++
arch/riscv/mm/init.c | 27 +++++++++++++++++++++++++++
2 files changed, 30 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index d21e6a12e8b6..9477214a00e7 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -83,6 +83,9 @@ config ARCH_SPARSEMEM_ENABLE
config ARCH_SELECT_MEMORY_MODEL
def_bool ARCH_SPARSEMEM_ENABLE

+config ARCH_ENABLE_MEMORY_HOTPLUG
+ def_bool y
+
config STACKTRACE_SUPPORT
def_bool y

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 315194557c3d..0a54c3adf0ac 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -238,3 +238,30 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
return vmemmap_populate_basepages(start, end, node);
}
#endif
+
+#ifdef CONFIG_MEMORY_HOTPLUG
+void vmemmap_free(unsigned long start, unsigned long end,
+ struct vmem_altmap *altmap)
+{
+}
+
+int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
+ bool want_memblock)
+{
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ unsigned long nr_pages = size >> PAGE_SHIFT;
+ int ret;
+
+ if ((start + size) > -va_pa_offset) {
+ pr_err("Cannot hotplug memory from %08llx to %08llx as it doesn't fall within the linear mapping\n",
+ start, start + size);
+ return -EFAULT;
+ }
+
+ ret = __add_pages(nid, start_pfn, nr_pages, altmap, want_memblock);
+ WARN_ON_ONCE(ret);
+
+ return ret;
+}
+
+#endif
--
2.20.1


2019-03-27 21:38:04

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 6/7] RISC-V: Implement memory hot remove

Implementing arch_remove_memory() and filling in vmemap_free() allows
us to declare ARCH_ENABLE_MEMORY_HOTREMOVE.

arch_remove_memory() is very similar to x86 and we roughly copy the
remove_pagetable() function from x86 but with a bunch of the unnecessary
features stripped out.

Signed-off-by: Logan Gunthorpe <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "Stefan O'Rear" <[email protected]>
Cc: Anup Patel <[email protected]>
Cc: Zong Li <[email protected]>
Cc: Guo Ren <[email protected]>
---
arch/riscv/Kconfig | 3 +
arch/riscv/include/asm/pgtable-64.h | 2 +
arch/riscv/include/asm/pgtable.h | 5 +
arch/riscv/mm/init.c | 186 ++++++++++++++++++++++++++++
4 files changed, 196 insertions(+)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 9477214a00e7..2cb39b4d6d6b 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -86,6 +86,9 @@ config ARCH_SELECT_MEMORY_MODEL
config ARCH_ENABLE_MEMORY_HOTPLUG
def_bool y

+config ARCH_ENABLE_MEMORY_HOTREMOVE
+ def_bool y
+
config STACKTRACE_SUPPORT
def_bool y

diff --git a/arch/riscv/include/asm/pgtable-64.h b/arch/riscv/include/asm/pgtable-64.h
index 7aa0ea9bd8bb..d369be5467cf 100644
--- a/arch/riscv/include/asm/pgtable-64.h
+++ b/arch/riscv/include/asm/pgtable-64.h
@@ -67,6 +67,8 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
}

#define pmd_index(addr) (((addr) >> PMD_SHIFT) & (PTRS_PER_PMD - 1))
+#define pud_index(addr) (((addr) >> PUD_SHIFT) & (PTRS_PER_PUD - 1))
+#define p4d_index(addr) (((addr) >> P4D_SHIFT) & (PTRS_PER_P4D - 1))

static inline pmd_t *pmd_offset(pud_t *pud, unsigned long addr)
{
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 2a5070540996..e071e2be3a6c 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -173,6 +173,11 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
return (unsigned long)pfn_to_virt(pmd_val(pmd) >> _PAGE_PFN_SHIFT);
}

+static inline struct page *pud_page(pud_t pud)
+{
+ return pfn_to_page(pud_val(pud) >> _PAGE_PFN_SHIFT);
+}
+
/* Yields the page frame number (PFN) of a page table entry */
static inline unsigned long pte_pfn(pte_t pte)
{
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 0a54c3adf0ac..fffe1238434e 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -240,9 +240,175 @@ int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
#endif

#ifdef CONFIG_MEMORY_HOTPLUG
+static void __meminit free_pagetable(struct page *page, int order)
+{
+ unsigned long magic;
+ unsigned int nr_pages = 1 << order;
+
+ /* bootmem page has reserved flag */
+ if (PageReserved(page)) {
+ __ClearPageReserved(page);
+
+ magic = (unsigned long)page->freelist;
+ if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
+ while (nr_pages--)
+ put_page_bootmem(page++);
+ } else {
+ while (nr_pages--)
+ free_reserved_page(page++);
+ }
+ } else {
+ free_pages((unsigned long)page_address(page), order);
+ }
+}
+
+static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
+{
+ pte_t *pte;
+ int i;
+
+ for (i = 0; i < PTRS_PER_PTE; i++) {
+ pte = pte_start + i;
+ if (!pte_none(*pte))
+ return;
+ }
+
+ /* free a pte table */
+ free_pagetable(pmd_page(*pmd), 0);
+ spin_lock(&init_mm.page_table_lock);
+ pmd_clear(pmd);
+ spin_unlock(&init_mm.page_table_lock);
+}
+
+static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
+{
+ pmd_t *pmd;
+ int i;
+
+ for (i = 0; i < PTRS_PER_PMD; i++) {
+ pmd = pmd_start + i;
+ if (!pmd_none(*pmd))
+ return;
+ }
+
+ /* free a pmd table */
+ free_pagetable(pud_page(*pud), 0);
+ spin_lock(&init_mm.page_table_lock);
+ pud_clear(pud);
+ spin_unlock(&init_mm.page_table_lock);
+}
+
+static void __meminit
+remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end)
+{
+ unsigned long next;
+ pte_t *pte;
+
+ pte = pte_start + pte_index(addr);
+ for (; addr < end; addr = next, pte++) {
+ next = (addr + PAGE_SIZE) & PAGE_MASK;
+ if (next > end)
+ next = end;
+
+ if (!pte_present(*pte))
+ continue;
+
+ free_pagetable(pte_page(*pte), 0);
+
+ spin_lock(&init_mm.page_table_lock);
+ pte_clear(&init_mm, addr, pte);
+ spin_unlock(&init_mm.page_table_lock);
+ }
+
+ flush_tlb_all();
+}
+
+static void __meminit
+remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end)
+{
+ unsigned long next;
+ pte_t *pte_base;
+ pmd_t *pmd;
+
+ pmd = pmd_start + pmd_index(addr);
+ for (; addr < end; addr = next, pmd++) {
+ next = pmd_addr_end(addr, end);
+
+ if (!pmd_present(*pmd))
+ continue;
+
+ pte_base = (pte_t *)pmd_page_vaddr(*pmd);
+ remove_pte_table(pte_base, addr, next);
+ free_pte_table(pte_base, pmd);
+ }
+}
+
+static void __meminit
+remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end)
+{
+ unsigned long next;
+ pmd_t *pmd_base;
+ pud_t *pud;
+
+ pud = pud_start + pud_index(addr);
+ for (; addr < end; addr = next, pud++) {
+ next = pud_addr_end(addr, end);
+
+ if (!pud_present(*pud))
+ continue;
+
+ pmd_base = pmd_offset(pud, 0);
+ remove_pmd_table(pmd_base, addr, next);
+ free_pmd_table(pmd_base, pud);
+ }
+}
+
+static void __meminit
+remove_p4d_table(p4d_t *p4d_start, unsigned long addr, unsigned long end)
+{
+ unsigned long next;
+ pud_t *pud_base;
+ p4d_t *p4d;
+
+ p4d = p4d_start + p4d_index(addr);
+ for (; addr < end; addr = next, p4d++) {
+ next = p4d_addr_end(addr, end);
+
+ if (!p4d_present(*p4d))
+ continue;
+
+ pud_base = pud_offset(p4d, 0);
+ remove_pud_table(pud_base, addr, next);
+ }
+}
+
+/* start and end are both virtual address. */
+static void __meminit
+remove_pagetable(unsigned long start, unsigned long end)
+{
+ unsigned long next;
+ unsigned long addr;
+ pgd_t *pgd;
+ p4d_t *p4d;
+
+ for (addr = start; addr < end; addr = next) {
+ next = pgd_addr_end(addr, end);
+
+ pgd = pgd_offset_k(addr);
+ if (!pgd_present(*pgd))
+ continue;
+
+ p4d = p4d_offset(pgd, 0);
+ remove_p4d_table(p4d, addr, next);
+ }
+
+ flush_tlb_all();
+}
+
void vmemmap_free(unsigned long start, unsigned long end,
struct vmem_altmap *altmap)
{
+ remove_pagetable(start, end);
}

int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
@@ -264,4 +430,24 @@ int arch_add_memory(int nid, u64 start, u64 size, struct vmem_altmap *altmap,
return ret;
}

+#ifdef CONFIG_MEMORY_HOTREMOVE
+int __ref arch_remove_memory(int nid, u64 start, u64 size,
+ struct vmem_altmap *altmap)
+{
+ unsigned long start_pfn = start >> PAGE_SHIFT;
+ unsigned long nr_pages = size >> PAGE_SHIFT;
+ struct page *page = pfn_to_page(start_pfn);
+ struct zone *zone;
+ int ret;
+
+ if (altmap)
+ page += vmem_altmap_offset(altmap);
+ zone = page_zone(page);
+ ret = __remove_pages(zone, start_pfn, nr_pages, altmap);
+ WARN_ON_ONCE(ret);
+
+ return ret;
+}
+
+#endif
#endif
--
2.20.1


2019-03-27 21:38:08

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 3/7] RISC-V: Rework kernel's virtual address space mapping

The motivation for this is to support P2P transactions. P2P requires
having struct pages for IO memory which means the linear mapping must
be able to cover all of the IO regions. Unfortunately with Sv39 we are
not able to cover all the IO regions available on existing hardware,
but we can do better than what we currently do (which only cover's
physical memory).

To this end, we restructure the kernel's virtual address space region.
We position the vmemmap at the beginning of the region (based on how
many virtual address bits we have) and the VMALLOC region comes
immediately after. The linear mapping then takes up the remaining space.
PAGE_OFFSET will need to be within the linear mapping but may not be
the start of the mapping seeing many machines don't have RAM at address
zero and we may still want to access lower addresses through the
linear mapping.

With these changes, on a 64-bit system the virtual memory map (with
sparsemem enabled) will be:

32-bit:

00000000 - 7fffffff user space, different per mm (2G)
80000000 - 81ffffff virtual memory map (32MB)
82000000 - bfffffff vmalloc/ioremap space (1GB - 32MB)
c0000000 - ffffffff direct mapping of all phys. memory (1GB)

64-bit, Sv39:

0000000000000000 - 0000003fffffffff user space, different per mm (256GB)
hole caused by [38:63] sign extension
ffffffc000000000 - ffffffc0ffffffff virtual memory map (4GB)
ffffffc100000000 - ffffffd0ffffffff vmalloc/ioremap spac (64GB)
ffffffd100000000 - ffffffffffffffff linear mapping of phys. space (188GB)

On the Sifive hardware this allows us to provide struct pages for
the lower I/O TileLink address ranges, the 32-bit and 34-bit DRAM areas
and 172GB of 240GB of the high I/O TileLink region. Once we progress to
Sv48 we should be able to cover all the available memory regions..

For the MAXPHYSMEM_2GB case, the physical memory must be in the highest
2GB of address space, so we cannot cover the any of the I/O regions
that are higher than it but we do cover the lower I/O TileLink range.

Signed-off-by: Logan Gunthorpe <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Antony Pavlov <[email protected]>
Cc: "Stefan O'Rear" <[email protected]>
Cc: Anup Patel <[email protected]>
---
arch/riscv/Kconfig | 2 +-
arch/riscv/include/asm/page.h | 2 --
arch/riscv/include/asm/pgtable.h | 27 ++++++++++++++++++---------
3 files changed, 19 insertions(+), 12 deletions(-)

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 76fc340ae38e..d21e6a12e8b6 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -71,7 +71,7 @@ config PAGE_OFFSET
hex
default 0xC0000000 if 32BIT && MAXPHYSMEM_2GB
default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
- default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
+ default 0xffffffd200000000 if 64BIT && MAXPHYSMEM_128GB

config ARCH_FLATMEM_ENABLE
def_bool y
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 2a546a52f02a..fa0b8058a246 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -31,8 +31,6 @@
*/
#define PAGE_OFFSET _AC(CONFIG_PAGE_OFFSET, UL)

-#define KERN_VIRT_SIZE (-PAGE_OFFSET)
-
#ifndef __ASSEMBLY__

#define PAGE_UP(addr) (((addr)+((PAGE_SIZE)-1))&(~((PAGE_SIZE)-1)))
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 5a9fea00ba09..2a5070540996 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -89,22 +89,31 @@ extern pgd_t swapper_pg_dir[];
#define __S110 PAGE_SHARED_EXEC
#define __S111 PAGE_SHARED_EXEC

-#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
-#define VMALLOC_END (PAGE_OFFSET - 1)
-#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
+#define KERN_SPACE_START (-1UL << (CONFIG_VA_BITS - 1))

/*
* Roughly size the vmemmap space to be large enough to fit enough
* struct pages to map half the virtual address space. Then
* position vmemmap directly below the VMALLOC region.
*/
-#define VMEMMAP_SHIFT \
- (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
-#define VMEMMAP_SIZE (1UL << VMEMMAP_SHIFT)
-#define VMEMMAP_END (VMALLOC_START - 1)
-#define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE)
-
+#ifdef CONFIG_SPARSEMEM
+#define VMEMMAP_SIZE (UL(1) << (CONFIG_VA_BITS - PAGE_SHIFT - 1 + \
+ STRUCT_PAGE_MAX_SHIFT))
+#define VMEMMAP_START (KERN_SPACE_START)
+#define VMEMMAP_END (VMEMMAP_START + VMEMMAP_SIZE - 1)
#define vmemmap ((struct page *)VMEMMAP_START)
+#else
+#define VMEMMAP_END KERN_SPACE_START
+#endif
+
+#ifdef CONFIG_32BIT
+#define VMALLOC_SIZE ((1UL << 30) - VMEMMAP_SIZE)
+#else
+#define VMALLOC_SIZE (64UL << 30)
+#endif
+
+#define VMALLOC_START (VMEMMAP_END + 1)
+#define VMALLOC_END (VMALLOC_START + VMALLOC_SIZE - 1)

/*
* ZERO_PAGE is a global shared page that is always zero,
--
2.20.1


2019-03-27 21:38:21

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 1/7] RISC-V: Implement sparsemem

This patch implements sparsemem support for risc-v which helps pave the
way for memory hotplug and eventually P2P support.

We introduce Kconfig options for virtual and physical address bits which
are used to calculate the size of the vmemmap and set the
MAX_PHYSMEM_BITS.

The vmemmap is located directly before the VMALLOC region and sized
such that we can allocate enough pages to populate all the virtual
address space in the system (similar to the way it's done in arm64).

During initialization, call memblocks_present() and sparse_init(),
and provide a stub for vmemmap_populate() (all of which is similar to
arm64).

Signed-off-by: Logan Gunthorpe <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Andrew Waterman <[email protected]>
Cc: Olof Johansson <[email protected]>
Cc: Michael Clark <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Zong Li <[email protected]>
---
arch/riscv/Kconfig | 23 +++++++++++++++++++++++
arch/riscv/include/asm/pgtable.h | 21 +++++++++++++++++----
arch/riscv/include/asm/sparsemem.h | 11 +++++++++++
arch/riscv/mm/init.c | 11 +++++++++++
4 files changed, 62 insertions(+), 4 deletions(-)
create mode 100644 arch/riscv/include/asm/sparsemem.h

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index eb56c82d8aa1..76fc340ae38e 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -57,12 +57,32 @@ config ZONE_DMA32
bool
default y if 64BIT

+config VA_BITS
+ int
+ default 32 if 32BIT
+ default 39 if 64BIT
+
+config PA_BITS
+ int
+ default 34 if 32BIT
+ default 56 if 64BIT
+
config PAGE_OFFSET
hex
default 0xC0000000 if 32BIT && MAXPHYSMEM_2GB
default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB

+config ARCH_FLATMEM_ENABLE
+ def_bool y
+
+config ARCH_SPARSEMEM_ENABLE
+ def_bool y
+ select SPARSEMEM_VMEMMAP_ENABLE
+
+config ARCH_SELECT_MEMORY_MODEL
+ def_bool ARCH_SPARSEMEM_ENABLE
+
config STACKTRACE_SUPPORT
def_bool y

@@ -97,6 +117,9 @@ config PGTABLE_LEVELS
default 3 if 64BIT
default 2

+config HAVE_ARCH_PFN_VALID
+ def_bool y
+
menu "Platform type"

choice
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 1141364d990e..5a9fea00ba09 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -89,6 +89,23 @@ extern pgd_t swapper_pg_dir[];
#define __S110 PAGE_SHARED_EXEC
#define __S111 PAGE_SHARED_EXEC

+#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
+#define VMALLOC_END (PAGE_OFFSET - 1)
+#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
+
+/*
+ * Roughly size the vmemmap space to be large enough to fit enough
+ * struct pages to map half the virtual address space. Then
+ * position vmemmap directly below the VMALLOC region.
+ */
+#define VMEMMAP_SHIFT \
+ (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
+#define VMEMMAP_SIZE (1UL << VMEMMAP_SHIFT)
+#define VMEMMAP_END (VMALLOC_START - 1)
+#define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE)
+
+#define vmemmap ((struct page *)VMEMMAP_START)
+
/*
* ZERO_PAGE is a global shared page that is always zero,
* used for zero-mapped memory areas, etc.
@@ -412,10 +429,6 @@ static inline void pgtable_cache_init(void)
/* No page table caches to initialize */
}

-#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
-#define VMALLOC_END (PAGE_OFFSET - 1)
-#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
-
/*
* Task size is 0x40000000000 for RV64 or 0xb800000 for RV32.
* Note that PGDIR_SIZE must evenly divide TASK_SIZE.
diff --git a/arch/riscv/include/asm/sparsemem.h b/arch/riscv/include/asm/sparsemem.h
new file mode 100644
index 000000000000..b58ba2d9ed6e
--- /dev/null
+++ b/arch/riscv/include/asm/sparsemem.h
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef __ASM_SPARSEMEM_H
+#define __ASM_SPARSEMEM_H
+
+#ifdef CONFIG_SPARSEMEM
+#define MAX_PHYSMEM_BITS CONFIG_PA_BITS
+#define SECTION_SIZE_BITS 27
+#endif /* CONFIG_SPARSEMEM */
+
+#endif /* __ASM_SPARSEMEM_H */
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index b379a75ac6a6..b9d50031e78f 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -141,6 +141,9 @@ void __init setup_bootmem(void)
PFN_PHYS(end_pfn - start_pfn),
&memblock.memory, 0);
}
+
+ memblocks_present();
+ sparse_init();
}

pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
@@ -224,3 +227,11 @@ asmlinkage void __init setup_vm(void)
__pgprot(_PAGE_TABLE));
#endif
}
+
+#ifdef CONFIG_SPARSEMEM
+int __meminit vmemmap_populate(unsigned long start, unsigned long end, int node,
+ struct vmem_altmap *altmap)
+{
+ return vmemmap_populate_basepages(start, end, node);
+}
+#endif
--
2.20.1


2019-03-27 21:39:10

by Logan Gunthorpe

[permalink] [raw]
Subject: [PATCH 2/7] RISC-V: doc: Add file describing the virtual memory map

This file is similar to the x86_64 equivalent (in
Documentation/x86/x86_64/mm.txt) and describes the virtuas address space
usage for RISC-V.

Signed-off-by: Logan Gunthorpe <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
---
Documentation/riscv/mm.txt | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)
create mode 100644 Documentation/riscv/mm.txt

diff --git a/Documentation/riscv/mm.txt b/Documentation/riscv/mm.txt
new file mode 100644
index 000000000000..725dc85f2c65
--- /dev/null
+++ b/Documentation/riscv/mm.txt
@@ -0,0 +1,24 @@
+Sv32:
+
+00000000 - 7fffffff user space, different per mm (2G)
+80000000 - 81ffffff virtual memory map (32MB)
+82000000 - bfffffff vmalloc/ioremap space (1GB - 32MB)
+c0000000 - ffffffff direct mapping of lower phys. memory (1GB)
+
+Sv39:
+
+0000000000000000 - 0000003fffffffff user space, different per mm (256GB)
+hole caused by [38:63] sign extension
+ffffffc000000000 - ffffffc0ffffffff virtual memory map (4GB)
+ffffffc100000000 - ffffffd0ffffffff vmalloc/ioremap spac (64GB)
+ffffffd100000000 - ffffffffffffffff linear mapping of physical space (188GB)
+ ffffffd200000000 - 0xfffffff200000000 linear mapping of all physical memory
+
+The RISC-V architecture defines virtual address bits in multiples of nine
+starting from 39. These are referred to as Sv39, Sv48, Sv57 and Sv64.
+Currently only Sv39 is supported. Bits 63 through to the most-significant
+implemented bit are sign extended. This causes a hole between user space
+and kernel addresses if you interpret them as unsigned.
+
+The direct mapping covers as much of the physical memory space as
+possible so that it may cover some IO memory.
--
2.20.1


2019-03-28 05:40:30

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH 3/7] RISC-V: Rework kernel's virtual address space mapping

On Wed, 27 Mar 2019 14:36:39 PDT (-0700), [email protected] wrote:
> The motivation for this is to support P2P transactions. P2P requires
> having struct pages for IO memory which means the linear mapping must
> be able to cover all of the IO regions. Unfortunately with Sv39 we are
> not able to cover all the IO regions available on existing hardware,
> but we can do better than what we currently do (which only cover's
> physical memory).
>
> To this end, we restructure the kernel's virtual address space region.
> We position the vmemmap at the beginning of the region (based on how
> many virtual address bits we have) and the VMALLOC region comes
> immediately after. The linear mapping then takes up the remaining space.
> PAGE_OFFSET will need to be within the linear mapping but may not be
> the start of the mapping seeing many machines don't have RAM at address
> zero and we may still want to access lower addresses through the
> linear mapping.
>
> With these changes, on a 64-bit system the virtual memory map (with
> sparsemem enabled) will be:
>
> 32-bit:
>
> 00000000 - 7fffffff user space, different per mm (2G)
> 80000000 - 81ffffff virtual memory map (32MB)
> 82000000 - bfffffff vmalloc/ioremap space (1GB - 32MB)
> c0000000 - ffffffff direct mapping of all phys. memory (1GB)
>
> 64-bit, Sv39:
>
> 0000000000000000 - 0000003fffffffff user space, different per mm (256GB)
> hole caused by [38:63] sign extension
> ffffffc000000000 - ffffffc0ffffffff virtual memory map (4GB)
> ffffffc100000000 - ffffffd0ffffffff vmalloc/ioremap spac (64GB)
> ffffffd100000000 - ffffffffffffffff linear mapping of phys. space (188GB)
>
> On the Sifive hardware this allows us to provide struct pages for
> the lower I/O TileLink address ranges, the 32-bit and 34-bit DRAM areas
> and 172GB of 240GB of the high I/O TileLink region. Once we progress to
> Sv48 we should be able to cover all the available memory regions..
>
> For the MAXPHYSMEM_2GB case, the physical memory must be in the highest
> 2GB of address space, so we cannot cover the any of the I/O regions
> that are higher than it but we do cover the lower I/O TileLink range.

IIRC there was another patch floating around to fix an issue with overlapping
regions in the 32-bit port, did you also fix that issue? It's somewhere in my
email queue...

> Signed-off-by: Logan Gunthorpe <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Albert Ou <[email protected]>
> Cc: Antony Pavlov <[email protected]>
> Cc: "Stefan O'Rear" <[email protected]>
> Cc: Anup Patel <[email protected]>
> ---
> arch/riscv/Kconfig | 2 +-
> arch/riscv/include/asm/page.h | 2 --
> arch/riscv/include/asm/pgtable.h | 27 ++++++++++++++++++---------
> 3 files changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 76fc340ae38e..d21e6a12e8b6 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -71,7 +71,7 @@ config PAGE_OFFSET
> hex
> default 0xC0000000 if 32BIT && MAXPHYSMEM_2GB
> default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
> - default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
> + default 0xffffffd200000000 if 64BIT && MAXPHYSMEM_128GB
>
> config ARCH_FLATMEM_ENABLE
> def_bool y
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 2a546a52f02a..fa0b8058a246 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -31,8 +31,6 @@
> */
> #define PAGE_OFFSET _AC(CONFIG_PAGE_OFFSET, UL)
>
> -#define KERN_VIRT_SIZE (-PAGE_OFFSET)
> -
> #ifndef __ASSEMBLY__
>
> #define PAGE_UP(addr) (((addr)+((PAGE_SIZE)-1))&(~((PAGE_SIZE)-1)))
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 5a9fea00ba09..2a5070540996 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -89,22 +89,31 @@ extern pgd_t swapper_pg_dir[];
> #define __S110 PAGE_SHARED_EXEC
> #define __S111 PAGE_SHARED_EXEC
>
> -#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
> -#define VMALLOC_END (PAGE_OFFSET - 1)
> -#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
> +#define KERN_SPACE_START (-1UL << (CONFIG_VA_BITS - 1))
>
> /*
> * Roughly size the vmemmap space to be large enough to fit enough
> * struct pages to map half the virtual address space. Then
> * position vmemmap directly below the VMALLOC region.
> */
> -#define VMEMMAP_SHIFT \
> - (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
> -#define VMEMMAP_SIZE (1UL << VMEMMAP_SHIFT)
> -#define VMEMMAP_END (VMALLOC_START - 1)
> -#define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE)
> -
> +#ifdef CONFIG_SPARSEMEM
> +#define VMEMMAP_SIZE (UL(1) << (CONFIG_VA_BITS - PAGE_SHIFT - 1 + \
> + STRUCT_PAGE_MAX_SHIFT))
> +#define VMEMMAP_START (KERN_SPACE_START)
> +#define VMEMMAP_END (VMEMMAP_START + VMEMMAP_SIZE - 1)
> #define vmemmap ((struct page *)VMEMMAP_START)
> +#else
> +#define VMEMMAP_END KERN_SPACE_START
> +#endif
> +
> +#ifdef CONFIG_32BIT
> +#define VMALLOC_SIZE ((1UL << 30) - VMEMMAP_SIZE)
> +#else
> +#define VMALLOC_SIZE (64UL << 30)
> +#endif
> +
> +#define VMALLOC_START (VMEMMAP_END + 1)
> +#define VMALLOC_END (VMALLOC_START + VMALLOC_SIZE - 1)
>
> /*
> * ZERO_PAGE is a global shared page that is always zero,

2019-03-28 06:30:43

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 3/7] RISC-V: Rework kernel's virtual address space mapping

On Thu, Mar 28, 2019 at 11:09 AM Palmer Dabbelt <[email protected]> wrote:
>
> On Wed, 27 Mar 2019 14:36:39 PDT (-0700), [email protected] wrote:
> > The motivation for this is to support P2P transactions. P2P requires
> > having struct pages for IO memory which means the linear mapping must
> > be able to cover all of the IO regions. Unfortunately with Sv39 we are
> > not able to cover all the IO regions available on existing hardware,
> > but we can do better than what we currently do (which only cover's
> > physical memory).
> >
> > To this end, we restructure the kernel's virtual address space region.
> > We position the vmemmap at the beginning of the region (based on how
> > many virtual address bits we have) and the VMALLOC region comes
> > immediately after. The linear mapping then takes up the remaining space.
> > PAGE_OFFSET will need to be within the linear mapping but may not be
> > the start of the mapping seeing many machines don't have RAM at address
> > zero and we may still want to access lower addresses through the
> > linear mapping.
> >
> > With these changes, on a 64-bit system the virtual memory map (with
> > sparsemem enabled) will be:
> >
> > 32-bit:
> >
> > 00000000 - 7fffffff user space, different per mm (2G)
> > 80000000 - 81ffffff virtual memory map (32MB)
> > 82000000 - bfffffff vmalloc/ioremap space (1GB - 32MB)
> > c0000000 - ffffffff direct mapping of all phys. memory (1GB)
> >
> > 64-bit, Sv39:
> >
> > 0000000000000000 - 0000003fffffffff user space, different per mm (256GB)
> > hole caused by [38:63] sign extension
> > ffffffc000000000 - ffffffc0ffffffff virtual memory map (4GB)
> > ffffffc100000000 - ffffffd0ffffffff vmalloc/ioremap spac (64GB)
> > ffffffd100000000 - ffffffffffffffff linear mapping of phys. space (188GB)
> >
> > On the Sifive hardware this allows us to provide struct pages for
> > the lower I/O TileLink address ranges, the 32-bit and 34-bit DRAM areas
> > and 172GB of 240GB of the high I/O TileLink region. Once we progress to
> > Sv48 we should be able to cover all the available memory regions..
> >
> > For the MAXPHYSMEM_2GB case, the physical memory must be in the highest
> > 2GB of address space, so we cannot cover the any of the I/O regions
> > that are higher than it but we do cover the lower I/O TileLink range.
>
> IIRC there was another patch floating around to fix an issue with overlapping
> regions in the 32-bit port, did you also fix that issue? It's somewhere in my
> email queue...

That was a patch I submitted to fix overlapping FIXMAP and VMALLOC
regions.

This patch does not consider FIXMAP region.

I suggest we introduce asm/memory.h where we have all critical defines
related to virtual memory layout. Also, this header should have detailed
comments about virtual memory layout.

>
> > Signed-off-by: Logan Gunthorpe <[email protected]>
> > Cc: Palmer Dabbelt <[email protected]>
> > Cc: Albert Ou <[email protected]>
> > Cc: Antony Pavlov <[email protected]>
> > Cc: "Stefan O'Rear" <[email protected]>
> > Cc: Anup Patel <[email protected]>
> > ---
> > arch/riscv/Kconfig | 2 +-
> > arch/riscv/include/asm/page.h | 2 --
> > arch/riscv/include/asm/pgtable.h | 27 ++++++++++++++++++---------
> > 3 files changed, 19 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 76fc340ae38e..d21e6a12e8b6 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -71,7 +71,7 @@ config PAGE_OFFSET
> > hex
> > default 0xC0000000 if 32BIT && MAXPHYSMEM_2GB
> > default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
> > - default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
> > + default 0xffffffd200000000 if 64BIT && MAXPHYSMEM_128GB
> >
> > config ARCH_FLATMEM_ENABLE
> > def_bool y
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index 2a546a52f02a..fa0b8058a246 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -31,8 +31,6 @@
> > */
> > #define PAGE_OFFSET _AC(CONFIG_PAGE_OFFSET, UL)
> >
> > -#define KERN_VIRT_SIZE (-PAGE_OFFSET)
> > -
> > #ifndef __ASSEMBLY__
> >
> > #define PAGE_UP(addr) (((addr)+((PAGE_SIZE)-1))&(~((PAGE_SIZE)-1)))
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index 5a9fea00ba09..2a5070540996 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -89,22 +89,31 @@ extern pgd_t swapper_pg_dir[];
> > #define __S110 PAGE_SHARED_EXEC
> > #define __S111 PAGE_SHARED_EXEC
> >
> > -#define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
> > -#define VMALLOC_END (PAGE_OFFSET - 1)
> > -#define VMALLOC_START (PAGE_OFFSET - VMALLOC_SIZE)
> > +#define KERN_SPACE_START (-1UL << (CONFIG_VA_BITS - 1))
> >
> > /*
> > * Roughly size the vmemmap space to be large enough to fit enough
> > * struct pages to map half the virtual address space. Then
> > * position vmemmap directly below the VMALLOC region.
> > */
> > -#define VMEMMAP_SHIFT \
> > - (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
> > -#define VMEMMAP_SIZE (1UL << VMEMMAP_SHIFT)
> > -#define VMEMMAP_END (VMALLOC_START - 1)
> > -#define VMEMMAP_START (VMALLOC_START - VMEMMAP_SIZE)
> > -
> > +#ifdef CONFIG_SPARSEMEM
> > +#define VMEMMAP_SIZE (UL(1) << (CONFIG_VA_BITS - PAGE_SHIFT - 1 + \
> > + STRUCT_PAGE_MAX_SHIFT))
> > +#define VMEMMAP_START (KERN_SPACE_START)
> > +#define VMEMMAP_END (VMEMMAP_START + VMEMMAP_SIZE - 1)
> > #define vmemmap ((struct page *)VMEMMAP_START)
> > +#else
> > +#define VMEMMAP_END KERN_SPACE_START
> > +#endif
> > +
> > +#ifdef CONFIG_32BIT
> > +#define VMALLOC_SIZE ((1UL << 30) - VMEMMAP_SIZE)
> > +#else
> > +#define VMALLOC_SIZE (64UL << 30)
> > +#endif
> > +
> > +#define VMALLOC_START (VMEMMAP_END + 1)
> > +#define VMALLOC_END (VMALLOC_START + VMALLOC_SIZE - 1)
> >
> > /*
> > * ZERO_PAGE is a global shared page that is always zero,

Regards,
Anup

2019-03-28 10:05:17

by Anup Patel

[permalink] [raw]
Subject: Re: [PATCH 4/7] RISC-V: Update page tables to cover the whole linear mapping

On Thu, Mar 28, 2019 at 3:06 AM Logan Gunthorpe <[email protected]> wrote:
>
> With the new virtual address changes in an earlier patch, we want the
> page tables to cover more of the linear mapping region. Instead of
> only mapping from PAGE_OFFSET and up, we instead map starting
> from an aligned version of va_pa_offset such that all of the physical
> address space will be mapped.
>
> Signed-off-by: Logan Gunthorpe <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Albert Ou <[email protected]>
> Cc: Anup Patel <[email protected]>
> Cc: Atish Patra <[email protected]>
> Cc: Paul Walmsley <[email protected]>
> Cc: Zong Li <[email protected]>
> Cc: Mike Rapoport <[email protected]>
> ---
> arch/riscv/kernel/setup.c | 1 -
> arch/riscv/mm/init.c | 27 +++++++++++++++------------
> 2 files changed, 15 insertions(+), 13 deletions(-)
>
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index ecb654f6a79e..8286df8be31a 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -59,7 +59,6 @@ EXPORT_SYMBOL(empty_zero_page);
> /* The lucky hart to first increment this variable will boot the other cores */
> atomic_t hart_lottery;
> unsigned long boot_cpu_hartid;
> -
> void __init parse_dtb(unsigned int hartid, void *dtb)
> {
> if (early_init_dt_scan(__va(dtb)))
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index b9d50031e78f..315194557c3d 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -150,8 +150,8 @@ pgd_t swapper_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
> pgd_t trampoline_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
>
> #ifndef __PAGETABLE_PMD_FOLDED
> -#define NUM_SWAPPER_PMDS ((uintptr_t)-PAGE_OFFSET >> PGDIR_SHIFT)
> -pmd_t swapper_pmd[PTRS_PER_PMD*((-PAGE_OFFSET)/PGDIR_SIZE)] __page_aligned_bss;
> +#define NUM_SWAPPER_PMDS ((uintptr_t)-VMALLOC_END >> PGDIR_SHIFT)
> +pmd_t swapper_pmd[PTRS_PER_PMD*((-VMALLOC_END)/PGDIR_SIZE)] __page_aligned_bss;
> pmd_t trampoline_pmd[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
> pmd_t fixmap_pmd[PTRS_PER_PMD] __page_aligned_bss;
> #endif
> @@ -180,13 +180,18 @@ asmlinkage void __init setup_vm(void)
> extern char _start;
> uintptr_t i;
> uintptr_t pa = (uintptr_t) &_start;
> + uintptr_t linear_start;
> + uintptr_t off;
> pgprot_t prot = __pgprot(pgprot_val(PAGE_KERNEL) | _PAGE_EXEC);
>
> va_pa_offset = PAGE_OFFSET - pa;
> pfn_base = PFN_DOWN(pa);
>
> + linear_start = ALIGN_DOWN(va_pa_offset, PGDIR_SIZE);
> + off = linear_start - va_pa_offset;
> +
> /* Sanity check alignment and size */
> - BUG_ON((PAGE_OFFSET % PGDIR_SIZE) != 0);
> + BUG_ON(linear_start <= VMALLOC_END);
> BUG_ON((pa % (PAGE_SIZE * PTRS_PER_PTE)) != 0);
>
> #ifndef __PAGETABLE_PMD_FOLDED
> @@ -195,15 +200,14 @@ asmlinkage void __init setup_vm(void)
> __pgprot(_PAGE_TABLE));
> trampoline_pmd[0] = pfn_pmd(PFN_DOWN(pa), prot);
>
> - for (i = 0; i < (-PAGE_OFFSET)/PGDIR_SIZE; ++i) {
> - size_t o = (PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
> -
> + for (i = 0; i < (-linear_start)/PGDIR_SIZE; ++i) {
> + size_t o = (linear_start >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
> swapper_pg_dir[o] =
> pfn_pgd(PFN_DOWN((uintptr_t)swapper_pmd) + i,
> __pgprot(_PAGE_TABLE));
> }
> for (i = 0; i < ARRAY_SIZE(swapper_pmd); i++)
> - swapper_pmd[i] = pfn_pmd(PFN_DOWN(pa + i * PMD_SIZE), prot);
> + swapper_pmd[i] = pfn_pmd(PFN_DOWN(off + i * PMD_SIZE), prot);
>
> swapper_pg_dir[(FIXADDR_START >> PGDIR_SHIFT) % PTRS_PER_PGD] =
> pfn_pgd(PFN_DOWN((uintptr_t)fixmap_pmd),
> @@ -215,11 +219,10 @@ asmlinkage void __init setup_vm(void)
> trampoline_pg_dir[(PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD] =
> pfn_pgd(PFN_DOWN(pa), prot);
>
> - for (i = 0; i < (-PAGE_OFFSET)/PGDIR_SIZE; ++i) {
> - size_t o = (PAGE_OFFSET >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
> -
> - swapper_pg_dir[o] =
> - pfn_pgd(PFN_DOWN(pa + i * PGDIR_SIZE), prot);
> + for (i = 0; i < (-linear_start)/PGDIR_SIZE; ++i) {
> + size_t o = (linear_start >> PGDIR_SHIFT) % PTRS_PER_PGD + i;
> + swapper_pg_dir[o] = pfn_pgd(PFN_DOWN(off + i * PGDIR_SIZE),
> + prot);
> }
>
> swapper_pg_dir[(FIXADDR_START >> PGDIR_SHIFT) % PTRS_PER_PGD] =
> --
> 2.20.1
>

I understand that this patch is inline with your virtual memory layout cleanup
but the way we map virtual memory in swapper_pg_dir is bound to change.

We should not be mapping complete virtual memory in swapper_pd_dir()
rather we should only map based on amount of RAM available.

Refer, https://www.lkml.org/lkml/2019/3/24/3

The setup_vm() should only map vmlinux_start to vmlinux_end plus the
FDT. Complete virtual memory mapping should be done after we have
done early parsing of FDT when we know available memory banks in
setup_vm_final() (called from paging_init())

Regards,
Anup

2019-03-28 11:51:53

by Mike Rapoport

[permalink] [raw]
Subject: Re: [PATCH 2/7] RISC-V: doc: Add file describing the virtual memory map

Hi,

On Wed, Mar 27, 2019 at 03:36:38PM -0600, Logan Gunthorpe wrote:
> This file is similar to the x86_64 equivalent (in
> Documentation/x86/x86_64/mm.txt) and describes the virtuas address space
> usage for RISC-V.
>
> Signed-off-by: Logan Gunthorpe <[email protected]>
> Cc: Jonathan Corbet <[email protected]>
> Cc: Palmer Dabbelt <[email protected]>
> Cc: Albert Ou <[email protected]>
> ---
> Documentation/riscv/mm.txt | 24 ++++++++++++++++++++++++
> 1 file changed, 24 insertions(+)
> create mode 100644 Documentation/riscv/mm.txt
>
> diff --git a/Documentation/riscv/mm.txt b/Documentation/riscv/mm.txt
> new file mode 100644
> index 000000000000..725dc85f2c65
> --- /dev/null
> +++ b/Documentation/riscv/mm.txt
> @@ -0,0 +1,24 @@
> +Sv32:
> +
> +00000000 - 7fffffff user space, different per mm (2G)
> +80000000 - 81ffffff virtual memory map (32MB)
> +82000000 - bfffffff vmalloc/ioremap space (1GB - 32MB)
> +c0000000 - ffffffff direct mapping of lower phys. memory (1GB)
> +
> +Sv39:
> +
> +0000000000000000 - 0000003fffffffff user space, different per mm (256GB)
> +hole caused by [38:63] sign extension
> +ffffffc000000000 - ffffffc0ffffffff virtual memory map (4GB)
> +ffffffc100000000 - ffffffd0ffffffff vmalloc/ioremap spac (64GB)
> +ffffffd100000000 - ffffffffffffffff linear mapping of physical space (188GB)
> + ffffffd200000000 - 0xfffffff200000000 linear mapping of all physical memory
> +
> +The RISC-V architecture defines virtual address bits in multiples of nine
> +starting from 39. These are referred to as Sv39, Sv48, Sv57 and Sv64.
> +Currently only Sv39 is supported. Bits 63 through to the most-significant
> +implemented bit are sign extended. This causes a hole between user space
> +and kernel addresses if you interpret them as unsigned.
> +
> +The direct mapping covers as much of the physical memory space as
> +possible so that it may cover some IO memory.

Please move the text before the tables, so that meaning of Sv32 and Sv39
would be clear.

> --
> 2.20.1
>
>
> _______________________________________________
> linux-riscv mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/linux-riscv
>

--
Sincerely yours,
Mike.


2019-03-28 15:52:13

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 2/7] RISC-V: doc: Add file describing the virtual memory map



On 2019-03-28 5:49 a.m., Mike Rapoport wrote:
>> +
>> +The direct mapping covers as much of the physical memory space as
>> +possible so that it may cover some IO memory.
>
> Please move the text before the tables, so that meaning of Sv32 and Sv39
> would be clear.
>

Ok, thanks. I've queued up this change for a v2.

Logan

2019-03-28 15:55:15

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 3/7] RISC-V: Rework kernel's virtual address space mapping



On 2019-03-28 12:28 a.m., Anup Patel wrote:
>>> For the MAXPHYSMEM_2GB case, the physical memory must be in the highest
>>> 2GB of address space, so we cannot cover the any of the I/O regions
>>> that are higher than it but we do cover the lower I/O TileLink range.
>>
>> IIRC there was another patch floating around to fix an issue with overlapping
>> regions in the 32-bit port, did you also fix that issue? It's somewhere in my
>> email queue...
>
> That was a patch I submitted to fix overlapping FIXMAP and VMALLOC
> regions.
>
> This patch does not consider FIXMAP region.

Correct.

> I suggest we introduce asm/memory.h where we have all critical defines
> related to virtual memory layout. Also, this header should have detailed
> comments about virtual memory layout.

Seems like a sensible cleanup. It did seem like the defines for this
stuff were all over the place. I'm not really clear on all the stuff
that would belong in asm/memory.h so I think I'll leave such a cleanup
to you.

The second patch in this series added documentation to describe the
virtual memory layout which matches how it was done in x86.

Logan

2019-03-28 18:25:33

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 4/7] RISC-V: Update page tables to cover the whole linear mapping



On 2019-03-28 4:03 a.m., Anup Patel wrote:
> I understand that this patch is inline with your virtual memory layout cleanup
> but the way we map virtual memory in swapper_pg_dir is bound to change.
>
> We should not be mapping complete virtual memory in swapper_pd_dir()
> rather we should only map based on amount of RAM available.
>
> Refer, https://www.lkml.org/lkml/2019/3/24/3
>
> The setup_vm() should only map vmlinux_start to vmlinux_end plus the
> FDT. Complete virtual memory mapping should be done after we have
> done early parsing of FDT when we know available memory banks in
> setup_vm_final() (called from paging_init())

That makes sense, but I think a lot of it sounds a out of the scope of
what I'm doing in this patch set.

I could attempt to update my patchset so instead of expanding the linear
region on boot, we add the page tables in arch_add_memory. That would
make more sense when considering the direction you want to head with
setup_vm.

Logan

2019-04-25 11:42:27

by Palmer Dabbelt

[permalink] [raw]
Subject: Re: [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P

On Wed, 27 Mar 2019 14:36:36 PDT (-0700), [email protected] wrote:
> Hi,
>
> This patchset enables P2P on the RISC-V architecture. To do this on the
> current kernel, we only need to be able to back IO memory with struct
> pages using devm_memremap_pages(). This requires ARCH_HAS_ZONE_DEVICE,
> ARCH_ENABLE_MEMORY_HOTPLUG, and ARCH_ENABLE_MEMORY_HOTREMOVE; which in
> turn requires ARCH_SPARSEMEM_ENABLE. We also need to ensure that the
> IO memory regions in hardware can be covered by the linear region
> so that there is a linear relation ship between the virtual address and
> the struct page address in the vmemmap region.
>
> While our reason to do this work is for P2P, these features are all
> useful, more generally, and also enable other kernel features.
>
> The first patch in the series implements sparse mem. It was already
> submitted and reviewed last cycle, only forgotten. It has been rebased
> onto v5.1-rc2.
>
> Patches 2 through 4 rework the architecture's virtual address space
> mapping trying to get as much of the IO regions covered by the linear
> mapping. With Sv39, we do not have enough address space to cover all the
> typical hardware regions but we can get the majority of it.
>
> Patch 5 and 6 implement memory hotplug and remove. These are relatively
> straight forward additions similar to other arches.
>
> Patch 7 implements pte_devmap which allows us to set
> ARCH_HAS_ZONE_DEVICE.
>
> The patchset was tested in QEMU and on a HiFive Unleashed board.
> However, we were unable to actually test P2P transactions with this
> exact set because we have been unable to get PCI working with v5.1-rc2.
> We were able to get it running on a 4.19 era kernel (with a bunch of
> out-of-tree patches for PCI on a Microsemi PolarFire board).
>
> This series is based on v5.1-rc2 and a git tree is available here:
>
> https://github.com/sbates130272/linux-p2pmem riscv-p2p-v1

Looks like these don't build on rv32 when applied on top of 5.1-rc6. We now
have rv32_defconfig, which should make it easier to tests these sorts of
things.

2019-04-26 16:39:12

by Logan Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH 0/7] RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P



On 2019-04-24 5:23 p.m., Palmer Dabbelt wrote:
> On Wed, 27 Mar 2019 14:36:36 PDT (-0700), [email protected] wrote:
>> Hi,
>>
>> This patchset enables P2P on the RISC-V architecture. To do this on the
>> current kernel, we only need to be able to back IO memory with struct
>> pages using devm_memremap_pages(). This requires ARCH_HAS_ZONE_DEVICE,
>> ARCH_ENABLE_MEMORY_HOTPLUG, and ARCH_ENABLE_MEMORY_HOTREMOVE; which in
>> turn requires ARCH_SPARSEMEM_ENABLE. We also need to ensure that the
>> IO memory regions in hardware can be covered by the linear region
>> so that there is a linear relation ship between the virtual address and
>> the struct page address in the vmemmap region.
>>
>> While our reason to do this work is for P2P, these features are all
>> useful, more generally, and also enable other kernel features.
>>
>> The first patch in the series implements sparse mem. It was already
>> submitted and reviewed last cycle, only forgotten. It has been rebased
>> onto v5.1-rc2.
>>
>> Patches 2 through 4 rework the architecture's virtual address space
>> mapping trying to get as much of the IO regions covered by the linear
>> mapping. With Sv39, we do not have enough address space to cover all the
>> typical hardware regions but we can get the majority of it.
>>
>> Patch 5 and 6 implement memory hotplug and remove. These are relatively
>> straight forward additions similar to other arches.
>>
>> Patch 7 implements pte_devmap which allows us to set
>> ARCH_HAS_ZONE_DEVICE.
>>
>> The patchset was tested in QEMU and on a HiFive Unleashed board.
>> However, we were unable to actually test P2P transactions with this
>> exact set because we have been unable to get PCI working with v5.1-rc2.
>> We were able to get it running on a 4.19 era kernel (with a bunch of
>> out-of-tree patches for PCI on a Microsemi PolarFire board).
>>
>> This series is based on v5.1-rc2 and a git tree is available here:
>>
>> https://github.com/sbates130272/linux-p2pmem riscv-p2p-v1
>
> Looks like these don't build on rv32 when applied on top of 5.1-rc6.  We
> now
> have rv32_defconfig, which should make it easier to tests these sorts of
> things.

Thanks for the note. I've queued up the fixes for this. However, I'm
still a bit stuck on the memory hot remove stuff. I'm waiting for the
similar work in arm64 to be done so I can reuse some of it[1].

The first patch of this series that implements sparsemem builds on rv32
and was generally accepted in previous cycles; so I'd appreciate if you
can pick that one up for v5.2.

Thanks.

Logan


[1]
https://lore.kernel.org/lkml/[email protected]/T/#u