Happy New Year! This is v3 of the series to allow lowmem and vmalloc virtual
address space to be intermixed.
v3: Lots of changes here
- A bit of code refactoring on the ARM side
- Fixed Kconfig per Dave Hansen. Changed the name to something slightly more
descriptive (bike shedding still welcome)
- changed is_vmalloc_addr to just use a bitmap per suggestions from both Dave
and Andrew
- get_vmalloc_info now updated. Given what get_vmalloc_info is actually trying
to acheive, lowmem regions are omitted from the accounting.
- VMALLOC_TOTAL now accounted for correctly
- introduction of for_each_potential_vmalloc_area. This is used for places
where code needs to do something on each vmalloc range (formerly
VMALLOC_START, VMALLOC_END)
- getting rid of users of VMALLOC_START. The decision of which clients to
change was based on whether VMALLOC_START was being used as the start of
vmalloc region (converted over) or the end of the direct mapped area
(left alone).
v2: Fixed several comments by Kyungmin Park which led me to discover
several issues with the is_vmalloc_addr implementation. is_vmalloc_addr
is probably the ugliest part of the entire series and I debated if
adding extra vmalloc flags would make it less ugly.
Currently on 32-bit systems we have
Virtual Physical
PAGE_OFFSET +--------------+ PHYS_OFFSET +------------+
| | | |
| | | |
| | | |
| lowmem | | direct |
| | | mapped |
| | | |
| | | |
| | | |
+--------------+------------------>x------------>
| | | |
| | | |
| | | not-direct|
| | | mapped |
| vmalloc | | |
| | | |
| | | |
| | | |
+--------------+ +------------+
Where part of the virtual spaced above PAGE_OFFSET is reserved for direct
mapped lowmem and part of the virtual address space is reserved for vmalloc.
Obviously, we want to optimize for having as much direct mapped memory as
possible since there is a penalty for mapping/unmapping highmem. Unfortunately
system constraints often give memory layouts such as
Virtual Physical
PAGE_OFFSET +--------------+ PHYS_OFFSET +------------+
| | | |
| | | |
| | |xxxxxxxxxxxx|
| lowmem | |xxxxxxxxxxxx|
| | |xxxxxxxxxxxx|
| | |xxxxxxxxxxxx|
| | | |
| | | |
+--------------+------------------>x------------>
| | | |
| | | |
| | | not-direct|
| | | mapped |
| vmalloc | | |
| | | |
| | | |
| | | |
+--------------+ +------------+
(x = Linux cannot touch this memory)
where part of physical region that would be direct mapped as lowmem is not
actually in use by Linux.
This means that even though the system is not actually accessing the memory
we are still losing that portion of the direct mapped lowmem space. What this
series does is treat the virtual address space that would have been taken up
by the lowmem memory as vmalloc space and allows more lowmem to be mapped
Virtual Physical
PAGE_OFFSET +--------------+ PHYS_OFFSET +------------+
| | | |
| lowmem | | |
<----------------------------------+xxxxxxxxxxxx|
| | |xxxxxxxxxxxx|
| vmalloc | |xxxxxxxxxxxx|
<----------------------------------+xxxxxxxxxxxx|
| | | |
| lowmem | | |
| | | |
| | | |
| | | |
| | | |
+----------------------------------------------->
| vmalloc | | |
| | | not-direct|
| | | mapped |
| | | |
+--------------+ +------------+
The goal here is to allow as much lowmem to be mapped as if the block of memory
was not reserved from the physical lowmem region. Previously, we had been
hacking up the direct virt <-> phys translation to ignore a large region of
memory. This did not scale for multiple holes of memory however.
Open issues:
- vmalloc=<size> will account for all vmalloc now. This may have the
side effect of shrinking 'traditional' vmalloc too much for regular
static mappings. We were debating if this is just part of finding the
correct size for vmalloc or if there is a need for vmalloc_upper=
- People who like bike shedding more than I do can suggest better
config names if there is sufficient interest in the series.
Laura Abbott (11):
mce: acpi/apei: Use get_vm_area directly
iommu/omap: Use get_vm_area directly
percpu: use VMALLOC_TOTAL instead of VMALLOC_END - VMALLOC_START
dm: Use VMALLOC_TOTAL instead of VMALLCO_END - VMALLOC_START
staging: lustre: Use is_vmalloc_addr
arm: use is_vmalloc_addr
arm: mm: Add iotable_init_novmreserve
mm/vmalloc.c: Allow lowmem to be tracked in vmalloc
arm: mm: Track lowmem in vmalloc
arm: Use for_each_potential_vmalloc_area
fs/proc/kcore.c: Use for_each_potential_vmalloc_area
arch/arm/Kconfig | 3 +
arch/arm/include/asm/mach/map.h | 2 +
arch/arm/kvm/mmu.c | 12 ++-
arch/arm/mm/dma-mapping.c | 2 +-
arch/arm/mm/init.c | 104 ++++++++++++-----
arch/arm/mm/iomap.c | 3 +-
arch/arm/mm/ioremap.c | 17 ++-
arch/arm/mm/mm.h | 3 +-
arch/arm/mm/mmu.c | 55 ++++++++-
drivers/acpi/apei/ghes.c | 4 +-
drivers/iommu/omap-iovmm.c | 2 +-
drivers/md/dm-bufio.c | 4 +-
drivers/md/dm-stats.c | 2 +-
.../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 3 +-
fs/proc/kcore.c | 20 +++-
include/linux/mm.h | 6 +
include/linux/vmalloc.h | 31 +++++
mm/Kconfig | 6 +
mm/percpu.c | 4 +-
mm/vmalloc.c | 119 +++++++++++++++++---
20 files changed, 320 insertions(+), 82 deletions(-)
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
There's no need to use VMALLOC_START and VMALLOC_END with
__get_vm_area when get_vm_area does the exact same thing.
Convert over.
Signed-off-by: Laura Abbott <[email protected]>
---
drivers/acpi/apei/ghes.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index a30bc31..6e784b7 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -149,8 +149,8 @@ static atomic_t ghes_estatus_cache_alloced;
static int ghes_ioremap_init(void)
{
- ghes_ioremap_area = __get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
- VM_IOREMAP, VMALLOC_START, VMALLOC_END);
+ ghes_ioremap_area = get_vm_area(PAGE_SIZE * GHES_IOREMAP_PAGES,
+ VM_IOREMAP);
if (!ghes_ioremap_area) {
pr_err(GHES_PFX "Failed to allocate virtual memory area for atomic ioremap.\n");
return -ENOMEM;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
is_vmalloc_addr already does the range checking against VMALLOC_START and
VMALLOC_END. Use it.
Signed-off-by: Laura Abbott <[email protected]>
---
arch/arm/mm/iomap.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/arch/arm/mm/iomap.c b/arch/arm/mm/iomap.c
index 4614208..4bf5457 100644
--- a/arch/arm/mm/iomap.c
+++ b/arch/arm/mm/iomap.c
@@ -34,8 +34,7 @@ EXPORT_SYMBOL(pcibios_min_mem);
void pci_iounmap(struct pci_dev *dev, void __iomem *addr)
{
- if ((unsigned long)addr >= VMALLOC_START &&
- (unsigned long)addr < VMALLOC_END)
+ if (is_vmalloc_addr(addr))
iounmap(addr);
}
EXPORT_SYMBOL(pci_iounmap);
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
There is no need to call __get_vm_area with VMALLOC_START and
VMALLOC_END when get_vm_area already does that. Call get_vm_area
directly.
Signed-off-by: Laura Abbott <[email protected]>
---
drivers/iommu/omap-iovmm.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
index d147259..6280d50 100644
--- a/drivers/iommu/omap-iovmm.c
+++ b/drivers/iommu/omap-iovmm.c
@@ -214,7 +214,7 @@ static void *vmap_sg(const struct sg_table *sgt)
if (!total)
return ERR_PTR(-EINVAL);
- new = __get_vm_area(total, VM_IOREMAP, VMALLOC_START, VMALLOC_END);
+ new = get_vm_area(total, VM_IOREMAP);
if (!new)
return ERR_PTR(-ENOMEM);
va = (u32)new->addr;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
Instead of manually checking the bounds of VMALLOC_START and
VMALLOC_END, just use is_vmalloc_addr. That's what the function
was designed for.
Signed-off-by: Laura Abbott <[email protected]>
---
.../staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
index 26b49a2..9364863 100644
--- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
+++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
@@ -529,8 +529,7 @@ kiblnd_kvaddr_to_page (unsigned long vaddr)
{
struct page *page;
- if (vaddr >= VMALLOC_START &&
- vaddr < VMALLOC_END) {
+ if (is_vmalloc_addr(vaddr)) {
page = vmalloc_to_page ((void *)vaddr);
LASSERT (page != NULL);
return page;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
vmalloc already gives a useful macro to calculate the total vmalloc
size. Use it.
Signed-off-by: Laura Abbott <[email protected]>
---
mm/percpu.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/percpu.c b/mm/percpu.c
index 0d10def..afbf352 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -1686,10 +1686,10 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,
max_distance += ai->unit_size;
/* warn if maximum distance is further than 75% of vmalloc space */
- if (max_distance > (VMALLOC_END - VMALLOC_START) * 3 / 4) {
+ if (max_distance > VMALLOC_TOTAL * 3 / 4) {
pr_warning("PERCPU: max_distance=0x%zx too large for vmalloc "
"space 0x%lx\n", max_distance,
- (unsigned long)(VMALLOC_END - VMALLOC_START));
+ VMALLOC_TOTAL);
#ifdef CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK
/* and fail if we have fallback */
rc = -EINVAL;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
vmalloc already provides a macro to calculat the total vmalloc size,
VMALLOC_TOTAL. Use it.
Signed-off-by: Laura Abbott <[email protected]>
---
drivers/md/dm-bufio.c | 4 ++--
drivers/md/dm-stats.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 54bdd923..cd677f2 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -1736,8 +1736,8 @@ static int __init dm_bufio_init(void)
* Get the size of vmalloc space the same way as VMALLOC_TOTAL
* in fs/proc/internal.h
*/
- if (mem > (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100)
- mem = (VMALLOC_END - VMALLOC_START) * DM_BUFIO_VMALLOC_PERCENT / 100;
+ if (mem > VMALLOC_TOTAL * DM_BUFIO_VMALLOC_PERCENT / 100)
+ mem = VMALLOC_TOTAL * DM_BUFIO_VMALLOC_PERCENT / 100;
#endif
dm_bufio_default_cache_size = mem;
diff --git a/drivers/md/dm-stats.c b/drivers/md/dm-stats.c
index 28a9012..378ffb6 100644
--- a/drivers/md/dm-stats.c
+++ b/drivers/md/dm-stats.c
@@ -80,7 +80,7 @@ static bool __check_shared_memory(size_t alloc_size)
if (a >> PAGE_SHIFT > totalram_pages / DM_STATS_MEMORY_FACTOR)
return false;
#ifdef CONFIG_MMU
- if (a > (VMALLOC_END - VMALLOC_START) / DM_STATS_VMALLOC_FACTOR)
+ if (a > VMALLOC_TOTAL / DM_STATS_VMALLOC_FACTOR)
return false;
#endif
return true;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
With CONFIG_INTERMIX_VMALLOC, we can no longer assume all vmalloc
is contained between VMALLOC_START and VMALLOC_END. For code that
relies on operating on the vmalloc space, use
for_each_potential_vmalloc_area to track each area separately.
Signed-off-by: Laura Abbott <[email protected]>
---
fs/proc/kcore.c | 20 +++++++++++++++-----
1 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 5ed0e52..9be81a8 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -585,8 +585,6 @@ static struct notifier_block kcore_callback_nb __meminitdata = {
.priority = 0,
};
-static struct kcore_list kcore_vmalloc;
-
#ifdef CONFIG_ARCH_PROC_KCORE_TEXT
static struct kcore_list kcore_text;
/*
@@ -621,6 +619,11 @@ static void __init add_modules_range(void)
static int __init proc_kcore_init(void)
{
+ struct kcore_list *kcore_vmalloc;
+ unsigned long vstart;
+ unsigned long vend;
+ int i;
+
proc_root_kcore = proc_create("kcore", S_IRUSR, NULL,
&proc_kcore_operations);
if (!proc_root_kcore) {
@@ -629,9 +632,16 @@ static int __init proc_kcore_init(void)
}
/* Store text area if it's special */
proc_kcore_text_init();
- /* Store vmalloc area */
- kclist_add(&kcore_vmalloc, (void *)VMALLOC_START,
- VMALLOC_END - VMALLOC_START, KCORE_VMALLOC);
+ for_each_potential_vmalloc_area(&vstart, &vend, &i) {
+ kcore_vmalloc = kzalloc(sizeof(*kcore_vmalloc), GFP_KERNEL);
+ if (!kcore_vmalloc)
+ return 0;
+
+ /* Store vmalloc area */
+ kclist_add(kcore_vmalloc, (void *)vstart,
+ vend - vstart, KCORE_VMALLOC);
+ }
+
add_modules_range();
/* Store direct-map area from physical memory map */
kcore_update_ram();
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
With CONFIG_INTERMIX_VMALLOC it is no longer the case that all
vmalloc is contained between VMALLOC_START and VMALLOC_END.
Some portions of code still rely on operating on all those regions
however. Use for_each_potential_vmalloc_area where appropriate to
do whatever is necessary to those regions.
Signed-off-by: Laura Abbott <[email protected]>
---
arch/arm/kvm/mmu.c | 12 ++++++++----
arch/arm/mm/ioremap.c | 12 ++++++++----
arch/arm/mm/mmu.c | 9 +++++++--
3 files changed, 23 insertions(+), 10 deletions(-)
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 58090698..4d2ca7e 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -225,16 +225,20 @@ void free_boot_hyp_pgd(void)
void free_hyp_pgds(void)
{
unsigned long addr;
+ int i;
+ unsigned long vstart, unsigned long vend;
free_boot_hyp_pgd();
mutex_lock(&kvm_hyp_pgd_mutex);
if (hyp_pgd) {
- for (addr = PAGE_OFFSET; virt_addr_valid(addr); addr += PGDIR_SIZE)
- unmap_range(NULL, hyp_pgd, KERN_TO_HYP(addr), PGDIR_SIZE);
- for (addr = VMALLOC_START; is_vmalloc_addr((void*)addr); addr += PGDIR_SIZE)
- unmap_range(NULL, hyp_pgd, KERN_TO_HYP(addr), PGDIR_SIZE);
+ for_each_potential_nonvmalloc_area(&vstart, &vend, &i)
+ for (addr = vstart; addr < vend; addr += PGDIR_SIZE)
+ unmap_range(NULL, hyp_pgd, KERN_TO_HYP(addr), PGDIR_SIZE);
+ for_each_potential_vmalloc_area(&vstart, &vend, &i)
+ for (addr = vstart; addr < vend; addr += PGDIR_SIZE)
+ unmap_range(NULL, hyp_pgd, KERN_TO_HYP(addr), PGDIR_SIZE);
kfree(hyp_pgd);
hyp_pgd = NULL;
diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index ad92d4f..892bc82 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -115,13 +115,17 @@ EXPORT_SYMBOL(ioremap_page);
void __check_vmalloc_seq(struct mm_struct *mm)
{
unsigned int seq;
+ int i;
+ unsigned long vstart, vend;
do {
seq = init_mm.context.vmalloc_seq;
- memcpy(pgd_offset(mm, VMALLOC_START),
- pgd_offset_k(VMALLOC_START),
- sizeof(pgd_t) * (pgd_index(VMALLOC_END) -
- pgd_index(VMALLOC_START)));
+
+ for_each_potential_vmalloc_area(&vstart, &vend, &i)
+ memcpy(pgd_offset(mm, vstart),
+ pgd_offset_k(vstart),
+ sizeof(pgd_t) * (pgd_index(vend) -
+ pgd_index(vstart)));
mm->context.vmalloc_seq = seq;
} while (seq != init_mm.context.vmalloc_seq);
}
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 55bd742..af8e43c 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1217,6 +1217,8 @@ static void __init devicemaps_init(const struct machine_desc *mdesc)
struct map_desc map;
unsigned long addr;
void *vectors;
+ unsigned long vstart, vend;
+ int i;
/*
* Allocate the vector page early.
@@ -1225,8 +1227,11 @@ static void __init devicemaps_init(const struct machine_desc *mdesc)
early_trap_init(vectors);
- for (addr = VMALLOC_START; addr; addr += PMD_SIZE)
- pmd_clear(pmd_off_k(addr));
+
+ for_each_potential_vmalloc_area(&vstart, &vend, &i)
+ for (addr = vstart; addr < vend; addr += PMD_SIZE) {
+ pmd_clear(pmd_off_k(addr));
+ }
/*
* Map the kernel if it is XIP.
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
iotable_init is currently used by dma_contiguous_remap to remap
CMA memory appropriately. This has the side effect of reserving
the area of CMA in the vmalloc tracking structures. This is fine
under normal circumstances but it creates conflicts if we want
to track lowmem in vmalloc. Since dma_contiguous_remap is only
really concerned with the remapping, introduce iotable_init_novmreserve
to allow remapping of pages without reserving the virtual address
in vmalloc space.
Signed-off-by: Laura Abbott <[email protected]>
---
arch/arm/include/asm/mach/map.h | 2 ++
arch/arm/mm/dma-mapping.c | 2 +-
arch/arm/mm/ioremap.c | 5 +++--
arch/arm/mm/mm.h | 2 +-
arch/arm/mm/mmu.c | 17 ++++++++++++++---
5 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index 2fe141f..02e3509 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -37,6 +37,7 @@ struct map_desc {
#ifdef CONFIG_MMU
extern void iotable_init(struct map_desc *, int);
+extern void iotable_init_novmreserve(struct map_desc *, int);
extern void vm_reserve_area_early(unsigned long addr, unsigned long size,
void *caller);
@@ -56,6 +57,7 @@ extern int ioremap_page(unsigned long virt, unsigned long phys,
const struct mem_type *mtype);
#else
#define iotable_init(map,num) do { } while (0)
+#define iotable_init_novmreserve(map,num) do { } while(0)
#define vm_reserve_area_early(a,s,c) do { } while (0)
#endif
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index f61a570..c4c9f4b 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -470,7 +470,7 @@ void __init dma_contiguous_remap(void)
addr += PMD_SIZE)
pmd_clear(pmd_off_k(addr));
- iotable_init(&map, 1);
+ iotable_init_novmreserve(&map, 1);
}
}
diff --git a/arch/arm/mm/ioremap.c b/arch/arm/mm/ioremap.c
index f123d6e..ad92d4f 100644
--- a/arch/arm/mm/ioremap.c
+++ b/arch/arm/mm/ioremap.c
@@ -84,14 +84,15 @@ struct static_vm *find_static_vm_vaddr(void *vaddr)
return NULL;
}
-void __init add_static_vm_early(struct static_vm *svm)
+void __init add_static_vm_early(struct static_vm *svm, bool add_to_vm)
{
struct static_vm *curr_svm;
struct vm_struct *vm;
void *vaddr;
vm = &svm->vm;
- vm_area_add_early(vm);
+ if (add_to_vm)
+ vm_area_add_early(vm);
vaddr = vm->addr;
list_for_each_entry(curr_svm, &static_vmlist, list) {
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index d5a982d..6f9d28b 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -75,7 +75,7 @@ struct static_vm {
extern struct list_head static_vmlist;
extern struct static_vm *find_static_vm_vaddr(void *vaddr);
-extern __init void add_static_vm_early(struct static_vm *svm);
+extern __init void add_static_vm_early(struct static_vm *svm, bool add_to_vm);
#endif
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 580ef2d..5450b43 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -819,7 +819,8 @@ static void __init create_mapping(struct map_desc *md)
/*
* Create the architecture specific mappings
*/
-void __init iotable_init(struct map_desc *io_desc, int nr)
+static void __init __iotable_init(struct map_desc *io_desc, int nr,
+ bool add_to_vm)
{
struct map_desc *md;
struct vm_struct *vm;
@@ -840,10 +841,20 @@ void __init iotable_init(struct map_desc *io_desc, int nr)
vm->flags = VM_IOREMAP | VM_ARM_STATIC_MAPPING;
vm->flags |= VM_ARM_MTYPE(md->type);
vm->caller = iotable_init;
- add_static_vm_early(svm++);
+ add_static_vm_early(svm++, add_to_vm);
}
}
+void __init iotable_init(struct map_desc *io_desc, int nr)
+{
+ return __iotable_init(io_desc, nr, true);
+}
+
+void __init iotable_init_novmreserve(struct map_desc *io_desc, int nr)
+{
+ return __iotable_init(io_desc, nr, false);
+}
+
void __init vm_reserve_area_early(unsigned long addr, unsigned long size,
void *caller)
{
@@ -857,7 +868,7 @@ void __init vm_reserve_area_early(unsigned long addr, unsigned long size,
vm->size = size;
vm->flags = VM_IOREMAP | VM_ARM_EMPTY_MAPPING;
vm->caller = caller;
- add_static_vm_early(svm);
+ add_static_vm_early(svm, true);
}
#ifndef CONFIG_ARM_LPAE
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
vmalloc is currently assumed to be a completely separate address space
from the lowmem region. While this may be true in the general case,
there are some instances where lowmem and virtual space intermixing
provides gains. One example is needing to steal a large chunk of physical
lowmem for another purpose outside the systems usage. Rather than
waste the precious lowmem space on a 32-bit system, we can allow the
virtual holes created by the physical holes to be used by vmalloc
for virtual addressing. Track lowmem allocations in vmalloc to
allow mixing of lowmem and vmalloc.
Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Neeti Desai <[email protected]>
---
include/linux/mm.h | 6 ++
include/linux/vmalloc.h | 31 ++++++++++++
mm/Kconfig | 6 ++
mm/vmalloc.c | 119 ++++++++++++++++++++++++++++++++++++++++------
4 files changed, 146 insertions(+), 16 deletions(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3552717..3c2368d6 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -333,6 +333,10 @@ unsigned long vmalloc_to_pfn(const void *addr);
* On nommu, vmalloc/vfree wrap through kmalloc/kfree directly, so there
* is no special casing required.
*/
+
+#ifdef CONFIG_VMALLOC_INTERMIX
+extern int is_vmalloc_addr(const void *x);
+#else
static inline int is_vmalloc_addr(const void *x)
{
#ifdef CONFIG_MMU
@@ -343,6 +347,8 @@ static inline int is_vmalloc_addr(const void *x)
return 0;
#endif
}
+#endif
+
#ifdef CONFIG_MMU
extern int is_vmalloc_or_module_addr(const void *x);
#else
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 4b8a891..995041c 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -16,6 +16,7 @@ struct vm_area_struct; /* vma defining user mapping in mm_types.h */
#define VM_USERMAP 0x00000008 /* suitable for remap_vmalloc_range */
#define VM_VPAGES 0x00000010 /* buffer for pages was vmalloc'ed */
#define VM_UNINITIALIZED 0x00000020 /* vm_struct is not fully initialized */
+#define VM_LOWMEM 0x00000040 /* Tracking of direct mapped lowmem */
/* bits [20..32] reserved for arch specific ioremap internals */
/*
@@ -150,6 +151,31 @@ extern long vwrite(char *buf, char *addr, unsigned long count);
extern struct list_head vmap_area_list;
extern __init void vm_area_add_early(struct vm_struct *vm);
extern __init void vm_area_register_early(struct vm_struct *vm, size_t align);
+#ifdef CONFIG_VMALLOC_INTERMIX
+extern void __vmalloc_calc_next_area(int *v, unsigned long *start, unsigned long *end, bool is_vmalloc);
+extern void mark_vmalloc_reserved_area(void *addr, unsigned long size);
+
+#define for_each_potential_vmalloc_area(start, end, i) \
+ for (*i = 0, __vmalloc_calc_next_area((i), (start), (end), true); \
+ *start; \
+ __vmalloc_calc_next_area((i), (start), (end), true))
+
+#define for_each_potential_nonvmalloc_area(start, end, i) \
+ for (*i = 0, __vmalloc_calc_next_area((i), (start), (end), false); \
+ *start; \
+ __vmalloc_calc_next_area((i), (start), (end), false))
+
+#else
+static inline void mark_vmalloc_reserved_area(void *addr, unsigned long size)
+{ };
+
+#define for_each_potential_vmalloc_area(start, end, i) \
+ for (*i = 0, *start = VMALLOC_START, *end = VMALLOC_END; *i == 0; *i = 1)
+
+#define for_each_potential_nonvmalloc_area(start, end, i) \
+ for (*i = 0, *start = PAGE_OFFSET, *end = high_memory; *i == 0; *i = 1)
+
+#endif
#ifdef CONFIG_SMP
# ifdef CONFIG_MMU
@@ -180,7 +206,12 @@ struct vmalloc_info {
};
#ifdef CONFIG_MMU
+#ifdef CONFIG_VMALLOC_INTERMIX
+extern unsigned long total_vmalloc_size;
+#define VMALLOC_TOTAL total_vmalloc_size
+#else
#define VMALLOC_TOTAL (VMALLOC_END - VMALLOC_START)
+#endif
extern void get_vmalloc_info(struct vmalloc_info *vmi);
#else
diff --git a/mm/Kconfig b/mm/Kconfig
index 723bbe0..e3c37c4 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -552,3 +552,9 @@ config MEM_SOFT_DIRTY
it can be cleared by hands.
See Documentation/vm/soft-dirty.txt for more details.
+
+# Some architectures (mostly 32-bit) may wish to allow holes in the memory
+# map to be used as vmalloc to save on precious virtual address space.
+config VMALLOC_INTERMIX
+ def_bool n
+ depends on ARCH_TRACKS_VMALLOC
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 0fdf968..811f629 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -282,6 +282,82 @@ static unsigned long cached_align;
static unsigned long vmap_area_pcpu_hole;
+#ifdef CONFIG_VMALLOC_INTERMIX
+#define POSSIBLE_VMALLOC_START PAGE_OFFSET
+
+#define VMALLOC_BITMAP_SIZE ((VMALLOC_END - PAGE_OFFSET) >> \
+ PAGE_SHIFT)
+#define VMALLOC_TO_BIT(addr) ((addr - PAGE_OFFSET) >> PAGE_SHIFT)
+#define BIT_TO_VMALLOC(i) (PAGE_OFFSET + i * PAGE_SIZE)
+
+unsigned long total_vmalloc_size;
+unsigned long vmalloc_reserved;
+
+/*
+ * Bitmap of kernel virtual address space. A set bit indicates a region is
+ * part of the direct mapped region and should not be treated as vmalloc.
+ */
+DECLARE_BITMAP(possible_areas, VMALLOC_BITMAP_SIZE);
+
+void __vmalloc_calc_next_area(int *v, unsigned long *start, unsigned long *end,
+ bool want_vmalloc)
+{
+ int i = *v;
+ int next;
+
+ if (want_vmalloc)
+ next = find_next_zero_bit(possible_areas, VMALLOC_BITMAP_SIZE, i);
+ else
+ next = find_next_bit(possible_areas, VMALLOC_BITMAP_SIZE, i);
+
+ if (next >= VMALLOC_BITMAP_SIZE) {
+ *start = 0;
+ *end = 0;
+ return;
+ }
+
+ *start = BIT_TO_VMALLOC(next);
+
+ if (want_vmalloc)
+ *v = find_next_bit(possible_areas, VMALLOC_BITMAP_SIZE, next);
+ else
+ *v = find_next_zero_bit(possible_areas, VMALLOC_BITMAP_SIZE, next);
+
+ *end = BIT_TO_VMALLOC(*v);
+}
+
+void mark_vmalloc_reserved_area(void *x, unsigned long size)
+{
+ unsigned long addr = (unsigned long)x;
+
+ bitmap_set(possible_areas, VMALLOC_TO_BIT(addr), size >> PAGE_SHIFT);
+ vmalloc_reserved += size;
+}
+
+int is_vmalloc_addr(const void *x)
+{
+ unsigned long addr = (unsigned long)x;
+
+ if (addr < POSSIBLE_VMALLOC_START || addr >= VMALLOC_END)
+ return 0;
+
+ if (test_bit(VMALLOC_TO_BIT(addr), possible_areas))
+ return 0;
+
+ return 1;
+}
+EXPORT_SYMBOL(is_vmalloc_addr);
+
+static void calc_total_vmalloc_size(void)
+{
+ total_vmalloc_size = VMALLOC_END - POSSIBLE_VMALLOC_START -
+ vmalloc_reserved;
+}
+#else
+#define POSSIBLE_VMALLOC_START VMALLOC_START
+static void calc_total_vmalloc_size(void) { }
+#endif
+
static struct vmap_area *__find_vmap_area(unsigned long addr)
{
struct rb_node *n = vmap_area_root.rb_node;
@@ -497,7 +573,7 @@ static void __free_vmap_area(struct vmap_area *va)
* here too, consider only end addresses which fall inside
* vmalloc area proper.
*/
- if (va->va_end > VMALLOC_START && va->va_end <= VMALLOC_END)
+ if (va->va_end > POSSIBLE_VMALLOC_START && va->va_end <= VMALLOC_END)
vmap_area_pcpu_hole = max(vmap_area_pcpu_hole, va->va_end);
kfree_rcu(va, rcu_head);
@@ -785,7 +861,7 @@ static RADIX_TREE(vmap_block_tree, GFP_ATOMIC);
static unsigned long addr_to_vb_idx(unsigned long addr)
{
- addr -= VMALLOC_START & ~(VMAP_BLOCK_SIZE-1);
+ addr -= POSSIBLE_VMALLOC_START & ~(VMAP_BLOCK_SIZE-1);
addr /= VMAP_BLOCK_SIZE;
return addr;
}
@@ -806,7 +882,7 @@ static struct vmap_block *new_vmap_block(gfp_t gfp_mask)
return ERR_PTR(-ENOMEM);
va = alloc_vmap_area(VMAP_BLOCK_SIZE, VMAP_BLOCK_SIZE,
- VMALLOC_START, VMALLOC_END,
+ POSSIBLE_VMALLOC_START, VMALLOC_END,
node, gfp_mask);
if (IS_ERR(va)) {
kfree(vb);
@@ -1062,7 +1138,7 @@ void vm_unmap_ram(const void *mem, unsigned int count)
unsigned long addr = (unsigned long)mem;
BUG_ON(!addr);
- BUG_ON(addr < VMALLOC_START);
+ BUG_ON(addr < POSSIBLE_VMALLOC_START);
BUG_ON(addr > VMALLOC_END);
BUG_ON(addr & (PAGE_SIZE-1));
@@ -1099,7 +1175,7 @@ void *vm_map_ram(struct page **pages, unsigned int count, int node, pgprot_t pro
} else {
struct vmap_area *va;
va = alloc_vmap_area(size, PAGE_SIZE,
- VMALLOC_START, VMALLOC_END, node, GFP_KERNEL);
+ POSSIBLE_VMALLOC_START, VMALLOC_END, node, GFP_KERNEL);
if (IS_ERR(va))
return NULL;
@@ -1158,8 +1234,8 @@ void __init vm_area_register_early(struct vm_struct *vm, size_t align)
static size_t vm_init_off __initdata;
unsigned long addr;
- addr = ALIGN(VMALLOC_START + vm_init_off, align);
- vm_init_off = PFN_ALIGN(addr + vm->size) - VMALLOC_START;
+ addr = ALIGN(POSSIBLE_VMALLOC_START + vm_init_off, align);
+ vm_init_off = PFN_ALIGN(addr + vm->size) - POSSIBLE_VMALLOC_START;
vm->addr = (void *)addr;
@@ -1196,6 +1272,7 @@ void __init vmalloc_init(void)
vmap_area_pcpu_hole = VMALLOC_END;
+ calc_total_vmalloc_size();
vmap_initialized = true;
}
@@ -1363,16 +1440,17 @@ struct vm_struct *__get_vm_area_caller(unsigned long size, unsigned long flags,
*/
struct vm_struct *get_vm_area(unsigned long size, unsigned long flags)
{
- return __get_vm_area_node(size, 1, flags, VMALLOC_START, VMALLOC_END,
- NUMA_NO_NODE, GFP_KERNEL,
+ return __get_vm_area_node(size, 1, flags, POSSIBLE_VMALLOC_START,
+ VMALLOC_END, NUMA_NO_NODE, GFP_KERNEL,
__builtin_return_address(0));
}
struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags,
const void *caller)
{
- return __get_vm_area_node(size, 1, flags, VMALLOC_START, VMALLOC_END,
- NUMA_NO_NODE, GFP_KERNEL, caller);
+ return __get_vm_area_node(size, 1, flags, POSSIBLE_VMALLOC_START,
+ VMALLOC_END, NUMA_NO_NODE, GFP_KERNEL,
+ caller);
}
/**
@@ -1683,8 +1761,8 @@ static void *__vmalloc_node(unsigned long size, unsigned long align,
gfp_t gfp_mask, pgprot_t prot,
int node, const void *caller)
{
- return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
- gfp_mask, prot, node, caller);
+ return __vmalloc_node_range(size, align, POSSIBLE_VMALLOC_START,
+ VMALLOC_END, gfp_mask, prot, node, caller);
}
void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot)
@@ -2355,7 +2433,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets,
const size_t *sizes, int nr_vms,
size_t align)
{
- const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align);
+ const unsigned long vmalloc_start = ALIGN(POSSIBLE_VMALLOC_START, align);
const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1);
struct vmap_area **vas, *prev, *next;
struct vm_struct **vms;
@@ -2625,6 +2703,9 @@ static int s_show(struct seq_file *m, void *p)
if (v->flags & VM_VPAGES)
seq_printf(m, " vpages");
+ if (v->flags & VM_LOWMEM)
+ seq_printf(m, " lowmem");
+
show_numa_info(m, v);
seq_putc(m, '\n');
return 0;
@@ -2679,7 +2760,7 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
vmi->used = 0;
vmi->largest_chunk = 0;
- prev_end = VMALLOC_START;
+ prev_end = 0;
spin_lock(&vmap_area_lock);
@@ -2694,7 +2775,7 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
/*
* Some archs keep another range for modules in vmalloc space
*/
- if (addr < VMALLOC_START)
+ if (addr < POSSIBLE_VMALLOC_START)
continue;
if (addr >= VMALLOC_END)
break;
@@ -2702,6 +2783,12 @@ void get_vmalloc_info(struct vmalloc_info *vmi)
if (va->flags & (VM_LAZY_FREE | VM_LAZY_FREEING))
continue;
+ if (va->vm && va->vm->flags & VM_LOWMEM)
+ continue;
+
+ if (prev_end == 0)
+ prev_end = va->va_start;
+
vmi->used += (va->va_end - va->va_start);
free_area_size = addr - prev_end;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
Rather than always keeping lowmem and vmalloc separate, we can
now allow the two to be mixed. This means that all lowmem areas
need to be explicitly tracked in vmalloc to avoid over allocating.
Additionally, adjust the vmalloc reserve to account for the fact
that there may be a hole in the middle consisting of vmalloc.
Signed-off-by: Laura Abbott <[email protected]>
Signed-off-by: Neeti Desai <[email protected]>
---
arch/arm/Kconfig | 3 +
arch/arm/mm/init.c | 104 ++++++++++++++++++++++++++++++++++++----------------
arch/arm/mm/mm.h | 1 +
arch/arm/mm/mmu.c | 29 ++++++++++++++
4 files changed, 105 insertions(+), 32 deletions(-)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index c1f1a7e..fc7aef2 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -277,6 +277,9 @@ config GENERIC_BUG
def_bool y
depends on BUG
+config ARCH_TRACKS_VMALLOC
+ bool
+
source "init/Kconfig"
source "kernel/Kconfig.freezer"
diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
index 1f7b19a..ddfab22 100644
--- a/arch/arm/mm/init.c
+++ b/arch/arm/mm/init.c
@@ -574,6 +574,46 @@ static void __init free_highpages(void)
#endif
}
+#define MLK(b, t) b, t, ((t) - (b)) >> 10
+#define MLM(b, t) b, t, ((t) - (b)) >> 20
+#define MLK_ROUNDUP(b, t) b, t, DIV_ROUND_UP(((t) - (b)), SZ_1K)
+
+#ifdef CONFIG_VMALLOC_INTERMIX
+void print_vmalloc_lowmem_info(void)
+{
+ int i;
+ void *va_start, *va_end;
+
+ printk(KERN_NOTICE
+ " vmalloc : 0x%08lx - 0x%08lx (%4ld MB)\n",
+ MLM(VMALLOC_START, VMALLOC_END));
+
+ for (i = meminfo.nr_banks - 1; i >= 0; i--) {
+ if (!meminfo.bank[i].highmem) {
+ va_start = __va(meminfo.bank[i].start);
+ va_end = __va(meminfo.bank[i].start +
+ meminfo.bank[i].size);
+ printk(KERN_NOTICE
+ " lowmem : 0x%08lx - 0x%08lx (%4ld MB)\n",
+ MLM((unsigned long)va_start, (unsigned long)va_end));
+ }
+ if (i && ((meminfo.bank[i-1].start + meminfo.bank[i-1].size) !=
+ meminfo.bank[i].start)) {
+ if (meminfo.bank[i-1].start + meminfo.bank[i-1].size
+ <= MAX_HOLE_ADDRESS) {
+ va_start = __va(meminfo.bank[i-1].start
+ + meminfo.bank[i-1].size);
+ va_end = __va(meminfo.bank[i].start);
+ printk(KERN_NOTICE
+ " vmalloc : 0x%08lx - 0x%08lx (%4ld MB)\n",
+ MLM((unsigned long)va_start,
+ (unsigned long)va_end));
+ }
+ }
+ }
+}
+#endif
+
/*
* mem_init() marks the free areas in the mem_map and tells us how much
* memory is free. This is done after various parts of the system have
@@ -602,55 +642,52 @@ void __init mem_init(void)
mem_init_print_info(NULL);
-#define MLK(b, t) b, t, ((t) - (b)) >> 10
-#define MLM(b, t) b, t, ((t) - (b)) >> 20
-#define MLK_ROUNDUP(b, t) b, t, DIV_ROUND_UP(((t) - (b)), SZ_1K)
-
printk(KERN_NOTICE "Virtual kernel memory layout:\n"
" vector : 0x%08lx - 0x%08lx (%4ld kB)\n"
#ifdef CONFIG_HAVE_TCM
" DTCM : 0x%08lx - 0x%08lx (%4ld kB)\n"
" ITCM : 0x%08lx - 0x%08lx (%4ld kB)\n"
#endif
- " fixmap : 0x%08lx - 0x%08lx (%4ld kB)\n"
- " vmalloc : 0x%08lx - 0x%08lx (%4ld MB)\n"
- " lowmem : 0x%08lx - 0x%08lx (%4ld MB)\n"
-#ifdef CONFIG_HIGHMEM
- " pkmap : 0x%08lx - 0x%08lx (%4ld MB)\n"
-#endif
-#ifdef CONFIG_MODULES
- " modules : 0x%08lx - 0x%08lx (%4ld MB)\n"
-#endif
- " .text : 0x%p" " - 0x%p" " (%4d kB)\n"
- " .init : 0x%p" " - 0x%p" " (%4d kB)\n"
- " .data : 0x%p" " - 0x%p" " (%4d kB)\n"
- " .bss : 0x%p" " - 0x%p" " (%4d kB)\n",
-
+ " fixmap : 0x%08lx - 0x%08lx (%4ld kB)\n",
MLK(UL(CONFIG_VECTORS_BASE), UL(CONFIG_VECTORS_BASE) +
(PAGE_SIZE)),
#ifdef CONFIG_HAVE_TCM
MLK(DTCM_OFFSET, (unsigned long) dtcm_end),
MLK(ITCM_OFFSET, (unsigned long) itcm_end),
#endif
- MLK(FIXADDR_START, FIXADDR_TOP),
- MLM(VMALLOC_START, VMALLOC_END),
- MLM(PAGE_OFFSET, (unsigned long)high_memory),
+ MLK(FIXADDR_START, FIXADDR_TOP));
+#ifdef CONFIG_VMALLOC_INTERMIX
+ print_vmalloc_lowmem_info();
+#else
+ printk(KERN_NOTICE
+ " vmalloc : 0x%08lx - 0x%08lx (%4ld MB)\n"
+ " lowmem : 0x%08lx - 0x%08lx (%4ld MB)\n",
+ MLM(VMALLOC_START, VMALLOC_END),
+ MLM(PAGE_OFFSET, (unsigned long)high_memory));
+#endif
#ifdef CONFIG_HIGHMEM
- MLM(PKMAP_BASE, (PKMAP_BASE) + (LAST_PKMAP) *
+ printk(KERN_NOTICE
+ " pkmap : 0x%08lx - 0x%08lx (%4ld MB)\n"
+#endif
+#ifdef CONFIG_MODULES
+ " modules : 0x%08lx - 0x%08lx (%4ld MB)\n"
+#endif
+ " .text : 0x%p" " - 0x%p" " (%4d kB)\n"
+ " .init : 0x%p" " - 0x%p" " (%4d kB)\n"
+ " .data : 0x%p" " - 0x%p" " (%4d kB)\n"
+ " .bss : 0x%p" " - 0x%p" " (%4d kB)\n",
+#ifdef CONFIG_HIGHMEM
+ MLM(PKMAP_BASE, (PKMAP_BASE) + (LAST_PKMAP) *
(PAGE_SIZE)),
#endif
#ifdef CONFIG_MODULES
- MLM(MODULES_VADDR, MODULES_END),
+ MLM(MODULES_VADDR, MODULES_END),
#endif
- MLK_ROUNDUP(_text, _etext),
- MLK_ROUNDUP(__init_begin, __init_end),
- MLK_ROUNDUP(_sdata, _edata),
- MLK_ROUNDUP(__bss_start, __bss_stop));
-
-#undef MLK
-#undef MLM
-#undef MLK_ROUNDUP
+ MLK_ROUNDUP(_text, _etext),
+ MLK_ROUNDUP(__init_begin, __init_end),
+ MLK_ROUNDUP(_sdata, _edata),
+ MLK_ROUNDUP(__bss_start, __bss_stop));
/*
* Check boundaries twice: Some fundamental inconsistencies can
@@ -658,7 +695,7 @@ void __init mem_init(void)
*/
#ifdef CONFIG_MMU
BUILD_BUG_ON(TASK_SIZE > MODULES_VADDR);
- BUG_ON(TASK_SIZE > MODULES_VADDR);
+ BUG_ON(TASK_SIZE > MODULES_VADDR);
#endif
#ifdef CONFIG_HIGHMEM
@@ -677,6 +714,9 @@ void __init mem_init(void)
}
}
+#undef MLK
+#undef MLM
+#undef MLK_ROUNDUP
void free_initmem(void)
{
#ifdef CONFIG_HAVE_TCM
diff --git a/arch/arm/mm/mm.h b/arch/arm/mm/mm.h
index 6f9d28b..ba825b0 100644
--- a/arch/arm/mm/mm.h
+++ b/arch/arm/mm/mm.h
@@ -87,6 +87,7 @@ extern unsigned long arm_dma_pfn_limit;
#define arm_dma_pfn_limit (~0ul >> PAGE_SHIFT)
#endif
+#define MAX_HOLE_ADDRESS (PHYS_OFFSET + 0x10000000)
extern phys_addr_t arm_lowmem_limit;
void __init bootmem_init(void);
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 5450b43..55bd742 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -1006,6 +1006,19 @@ void __init sanity_check_meminfo(void)
int i, j, highmem = 0;
phys_addr_t vmalloc_limit = __pa(vmalloc_min - 1) + 1;
+#ifdef CONFIG_ARCH_TRACKS_VMALLOC
+ unsigned long hole_start;
+ for (i = 0; i < (meminfo.nr_banks - 1); i++) {
+ hole_start = meminfo.bank[i].start + meminfo.bank[i].size;
+ if (hole_start != meminfo.bank[i+1].start) {
+ if (hole_start <= MAX_HOLE_ADDRESS) {
+ vmalloc_min = (void *) (vmalloc_min +
+ (meminfo.bank[i+1].start - hole_start));
+ }
+ }
+ }
+#endif
+
for (i = 0, j = 0; i < meminfo.nr_banks; i++) {
struct membank *bank = &meminfo.bank[j];
phys_addr_t size_limit;
@@ -1304,6 +1317,21 @@ static void __init kmap_init(void)
#endif
}
+static void __init reserve_virtual_lowmem(phys_addr_t start, phys_addr_t end)
+{
+#ifdef CONFIG_ARCH_TRACKS_VMALLOC
+ struct vm_struct *vm;
+
+ vm = early_alloc_aligned(sizeof(*vm), __alignof__(*vm));
+ vm->addr = (void *)__phys_to_virt(start);
+ vm->size = end - start;
+ vm->flags = VM_LOWMEM;
+ vm->caller = reserve_virtual_lowmem;
+ vm_area_add_early(vm);
+ mark_vmalloc_reserved_area(vm->addr, vm->size);
+#endif
+}
+
static void __init map_lowmem(void)
{
struct memblock_region *reg;
@@ -1325,6 +1353,7 @@ static void __init map_lowmem(void)
map.type = MT_MEMORY;
create_mapping(&map);
+ reserve_virtual_lowmem(start, end);
}
}
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation
On 01/02/2014 01:53 PM, Laura Abbott wrote:
> is_vmalloc_addr already does the range checking against VMALLOC_START and
> VMALLOC_END. Use it.
FWIW, these first 6 look completely sane and should get merged
regardless of what gets done with the rest.
On Thu, Jan 02, 2014 at 01:53:19PM -0800, Laura Abbott wrote:
> There's no need to use VMALLOC_START and VMALLOC_END with
> __get_vm_area when get_vm_area does the exact same thing.
> Convert over.
>
> Signed-off-by: Laura Abbott <[email protected]>
Ack-by: Chen, Gong <[email protected]>
On Thu, Jan 02, 2014 at 01:53:20PM -0800, Laura Abbott wrote:
> diff --git a/drivers/iommu/omap-iovmm.c b/drivers/iommu/omap-iovmm.c
> index d147259..6280d50 100644
> --- a/drivers/iommu/omap-iovmm.c
> +++ b/drivers/iommu/omap-iovmm.c
> @@ -214,7 +214,7 @@ static void *vmap_sg(const struct sg_table *sgt)
> if (!total)
> return ERR_PTR(-EINVAL);
>
> - new = __get_vm_area(total, VM_IOREMAP, VMALLOC_START, VMALLOC_END);
> + new = get_vm_area(total, VM_IOREMAP);
This driver is a module but get_vm_area is not exported. You need to add
one extra EXPORT_SYMBOL_GPL(get_vm_area).
On 01/02/2014 01:53 PM, Laura Abbott wrote:
> The goal here is to allow as much lowmem to be mapped as if the block of memory
> was not reserved from the physical lowmem region. Previously, we had been
> hacking up the direct virt <-> phys translation to ignore a large region of
> memory. This did not scale for multiple holes of memory however.
How much lowmem do these holes end up eating up in practice, ballpark?
I'm curious how painful this is going to get.
On 1/3/2014 10:23 AM, Dave Hansen wrote:
> On 01/02/2014 01:53 PM, Laura Abbott wrote:
>> The goal here is to allow as much lowmem to be mapped as if the block of memory
>> was not reserved from the physical lowmem region. Previously, we had been
>> hacking up the direct virt <-> phys translation to ignore a large region of
>> memory. This did not scale for multiple holes of memory however.
>
> How much lowmem do these holes end up eating up in practice, ballpark?
> I'm curious how painful this is going to get.
>
In total, the worst case can be close to 100M with an average case
around 70M-80M. The split and number of holes vary with the layout but
end up with 60M-80M one hole and the rest in the other.
Thanks,
Laura
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
Hello,
On Fri, Jan 03, 2014 at 02:08:52PM -0800, Laura Abbott wrote:
> On 1/3/2014 10:23 AM, Dave Hansen wrote:
> >On 01/02/2014 01:53 PM, Laura Abbott wrote:
> >>The goal here is to allow as much lowmem to be mapped as if the block of memory
> >>was not reserved from the physical lowmem region. Previously, we had been
> >>hacking up the direct virt <-> phys translation to ignore a large region of
> >>memory. This did not scale for multiple holes of memory however.
> >
> >How much lowmem do these holes end up eating up in practice, ballpark?
> >I'm curious how painful this is going to get.
> >
>
> In total, the worst case can be close to 100M with an average case
> around 70M-80M. The split and number of holes vary with the layout
> but end up with 60M-80M one hole and the rest in the other.
One more thing I'd like to know is how bad direct virt <->phys tranlsation
in scale POV and how often virt<->phys tranlsation is called in your worload
so what's the gain from this patch?
Thanks.
On 1/3/2014 11:31 PM, Minchan Kim wrote:
> Hello,
>
> On Fri, Jan 03, 2014 at 02:08:52PM -0800, Laura Abbott wrote:
>> On 1/3/2014 10:23 AM, Dave Hansen wrote:
>>> On 01/02/2014 01:53 PM, Laura Abbott wrote:
>>>> The goal here is to allow as much lowmem to be mapped as if the block of memory
>>>> was not reserved from the physical lowmem region. Previously, we had been
>>>> hacking up the direct virt <-> phys translation to ignore a large region of
>>>> memory. This did not scale for multiple holes of memory however.
>>>
>>> How much lowmem do these holes end up eating up in practice, ballpark?
>>> I'm curious how painful this is going to get.
>>>
>>
>> In total, the worst case can be close to 100M with an average case
>> around 70M-80M. The split and number of holes vary with the layout
>> but end up with 60M-80M one hole and the rest in the other.
>
> One more thing I'd like to know is how bad direct virt <->phys tranlsation
> in scale POV and how often virt<->phys tranlsation is called in your worload
> so what's the gain from this patch?
>
> Thanks.
>
With one hole we did
#define __phys_to_virt(phys)
phys >= mem_hole_end ? mem_hole : normal
We had a single global variable to check for the bounds and to do
something similar with multiple holes the worst case would be O(number
of holes). This would also all need to be macroized. Detection and
accounting for these holes in other data structures (e.g. ARM meminfo)
would be increasingly complex and lead to delays in bootup. The
error/sanity checking for bad memory configurations would also be
messier. Non-linear lowmem mappings also make debugging more difficult.
virt <-> phys translation is used on hot paths in IOMMU mapping so we
want to keep virt <-> phys as fast as possible and not have to walk an
array of addresses every time.
Thanks,
Laura
--
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation
Hello,
On Mon, Jan 06, 2014 at 11:08:26AM -0800, Laura Abbott wrote:
> On 1/3/2014 11:31 PM, Minchan Kim wrote:
> >Hello,
> >
> >On Fri, Jan 03, 2014 at 02:08:52PM -0800, Laura Abbott wrote:
> >>On 1/3/2014 10:23 AM, Dave Hansen wrote:
> >>>On 01/02/2014 01:53 PM, Laura Abbott wrote:
> >>>>The goal here is to allow as much lowmem to be mapped as if the block of memory
> >>>>was not reserved from the physical lowmem region. Previously, we had been
> >>>>hacking up the direct virt <-> phys translation to ignore a large region of
> >>>>memory. This did not scale for multiple holes of memory however.
> >>>
> >>>How much lowmem do these holes end up eating up in practice, ballpark?
> >>>I'm curious how painful this is going to get.
> >>>
> >>
> >>In total, the worst case can be close to 100M with an average case
> >>around 70M-80M. The split and number of holes vary with the layout
> >>but end up with 60M-80M one hole and the rest in the other.
> >
> >One more thing I'd like to know is how bad direct virt <->phys tranlsation
> >in scale POV and how often virt<->phys tranlsation is called in your worload
> >so what's the gain from this patch?
> >
> >Thanks.
> >
>
> With one hole we did
>
> #define __phys_to_virt(phys)
> phys >= mem_hole_end ? mem_hole : normal
>
> We had a single global variable to check for the bounds and to do
> something similar with multiple holes the worst case would be
> O(number of holes). This would also all need to be macroized.
> Detection and accounting for these holes in other data structures
> (e.g. ARM meminfo) would be increasingly complex and lead to delays
> in bootup. The error/sanity checking for bad memory configurations
> would also be messier. Non-linear lowmem mappings also make
> debugging more difficult.
>
> virt <-> phys translation is used on hot paths in IOMMU mapping so
> we want to keep virt <-> phys as fast as possible and not have to
> walk an array of addresses every time.
When you send formal patch, please include things you mentioned
in the description rather than simple "This did not scale for multiple
holes of memory however" to justify your motivation and please include
number you got from this patch because it's mainly performance enhance
patch but doesn't include any number(yeb, you sent it as RFC so
I don't care now) so that it could make easy to judge that we need
this patch or not compared to adding complexity.
Thanks.
>
> Thanks,
> Laura
>
> --
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> hosted by The Linux Foundation
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to [email protected]. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"[email protected]"> [email protected] </a>
--
Kind regards,
Minchan Kim
On Thu, Jan 02, 2014 at 01:53:21PM -0800, Laura Abbott wrote:
> vmalloc already gives a useful macro to calculate the total vmalloc
> size. Use it.
>
> Signed-off-by: Laura Abbott <[email protected]>
Applied to percpu/for-3.14. Thanks.
--
tejun