2015-06-10 06:32:22

by wenwei tao

[permalink] [raw]
Subject: [RFC PATCH 0/6] add defer mechanism to ksm to make it more suitable for Android devices

I observe that it is unlikely for KSM to merge new pages from an area
that has already been scanned twice on Android mobile devices, so it's
a waste of power to scan these areas in high frequency. In this patchset,
a defer mechanism is introduced which is borrowed from page compaction to KSM.

A new slot list called active_slot is added into ksm_scan. MMs which have
VMA marked for merging via madvise are added (MM is new to KSM) or moved to
(MM is in the ksm_scan.mm_slot list) active_slot. In "scan_get_next_rmap_item()",
the active_slot list will be scaned firstly unless it is empty, then the mm_slot list.
MMs in the active_slot list will be scaned twice, after that they will be moved
to mm_slot list. Once scanning the mm_slot list, the defer mechanism will be activated:

a) if KSM scans "ksm_thread_pages_to_scan" pages but none of them get
merged or become unstable, increase the ksm_defer_shift(new member of ksm_scan)
by one (no more than 6 by now). And in the next "1UL << ksm_scan.ksm_defer_shift"
times KSM been scheduled or woken up it will not do the actual scan, compare
and merge job, it just schedule out.

b) if KSM scans "ksm_thread_pages_to_scan" pages and more than zero of them
get merged or become unstable, reset the ksm_defer_shift and ksm_considered
to zero.

Some applications may continue to produce new mergeable VMAs to KSM, in order
to avoid scanning VMAs of these applications that have already been scanned twice,
we use VM_HUGETLB to indicate new mergeable VMAs since hugetlb vm are not
supported by KSM.

Wenwei Tao (6):
mm: add defer mechanism to ksm to make it more suitable
mm: change the condition of identifying hugetlb vm
perf: change the condition of identifying hugetlb vm
fs/binfmt_elf.c: change the condition of identifying hugetlb vm
x86/mm: change the condition of identifying hugetlb vm
powerpc/kvm: change the condition of identifying hugetlb vm

arch/powerpc/kvm/e500_mmu_host.c | 3 +-
arch/x86/mm/tlb.c | 3 +-
fs/binfmt_elf.c | 2 +-
include/linux/hugetlb_inline.h | 2 +-
include/linux/mempolicy.h | 2 +-
kernel/events/core.c | 2 +-
mm/gup.c | 6 +-
mm/huge_memory.c | 17 ++-
mm/ksm.c | 230 +++++++++++++++++++++++++++++++++-----
mm/madvise.c | 6 +-
mm/memory.c | 5 +-
mm/mprotect.c | 6 +-
12 files changed, 238 insertions(+), 46 deletions(-)

--
1.7.9.5


2015-06-10 06:32:40

by wenwei tao

[permalink] [raw]
Subject: [RFC PATCH 1/6] mm: add defer mechanism to ksm to make it more suitable

I observe that it is unlikely for KSM to merge new pages from an area
that has already been scanned twice on Android mobile devices, so it's
a waste of power to continue to scan these areas in high frequency.
In this patch a defer mechanism is introduced which is borrowed from
page compaction to KSM. This defer mechanism can automatic lower the scan
frequency in the above case.

Signed-off-by: Wenwei Tao <[email protected]>
---
mm/ksm.c | 230 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--------
1 file changed, 203 insertions(+), 27 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 4162dce..54ffcb2 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -104,6 +104,7 @@ struct mm_slot {
struct list_head mm_list;
struct rmap_item *rmap_list;
struct mm_struct *mm;
+ unsigned long seqnr;
};

/**
@@ -117,9 +118,12 @@ struct mm_slot {
*/
struct ksm_scan {
struct mm_slot *mm_slot;
+ struct mm_slot *active_slot;
unsigned long address;
struct rmap_item **rmap_list;
unsigned long seqnr;
+ unsigned long ksm_considered;
+ unsigned int ksm_defer_shift;
};

/**
@@ -182,6 +186,11 @@ struct rmap_item {
#define UNSTABLE_FLAG 0x100 /* is a node of the unstable tree */
#define STABLE_FLAG 0x200 /* is listed from the stable tree */

+#define ACTIVE_SLOT_FLAG 0x100
+#define ACTIVE_SLOT_SEQNR 0x200
+#define KSM_MAX_DEFER_SHIFT 6
+
+
/* The stable and unstable tree heads */
static struct rb_root one_stable_tree[1] = { RB_ROOT };
static struct rb_root one_unstable_tree[1] = { RB_ROOT };
@@ -197,14 +206,22 @@ static DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
static struct mm_slot ksm_mm_head = {
.mm_list = LIST_HEAD_INIT(ksm_mm_head.mm_list),
};
+
+static struct mm_slot ksm_mm_active = {
+ .mm_list = LIST_HEAD_INIT(ksm_mm_active.mm_list),
+};
+
static struct ksm_scan ksm_scan = {
.mm_slot = &ksm_mm_head,
+ .active_slot = &ksm_mm_active,
};

static struct kmem_cache *rmap_item_cache;
static struct kmem_cache *stable_node_cache;
static struct kmem_cache *mm_slot_cache;

+static bool ksm_merged_or_unstable;
+
/* The number of nodes in the stable tree */
static unsigned long ksm_pages_shared;

@@ -336,6 +353,23 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm,
hash_add(mm_slots_hash, &mm_slot->link, (unsigned long)mm);
}

+static void move_to_active_list(struct mm_slot *mm_slot)
+{
+ if (mm_slot && !(mm_slot->seqnr & ACTIVE_SLOT_FLAG)) {
+ if (ksm_run & KSM_RUN_UNMERGE && mm_slot->rmap_list)
+ return;
+ if (mm_slot == ksm_scan.mm_slot) {
+ if (ksm_scan.active_slot == &ksm_mm_active)
+ return;
+ ksm_scan.mm_slot = list_entry(mm_slot->mm_list.next,
+ struct mm_slot, mm_list);
+ }
+ list_move_tail(&mm_slot->mm_list,
+ &ksm_scan.active_slot->mm_list);
+ mm_slot->seqnr |= (ACTIVE_SLOT_FLAG | ACTIVE_SLOT_SEQNR);
+ }
+}
+
/*
* ksmd, and unmerge_and_remove_all_rmap_items(), must not touch an mm's
* page tables after it has passed through ksm_exit() - which, if necessary,
@@ -772,6 +806,15 @@ static int unmerge_and_remove_all_rmap_items(void)
int err = 0;

spin_lock(&ksm_mmlist_lock);
+ mm_slot = list_entry(ksm_mm_active.mm_list.next,
+ struct mm_slot, mm_list);
+ while (mm_slot != &ksm_mm_active) {
+ list_move_tail(&mm_slot->mm_list, &ksm_mm_head.mm_list);
+ mm_slot->seqnr &= ~(ACTIVE_SLOT_FLAG | ACTIVE_SLOT_SEQNR);
+ mm_slot = list_entry(ksm_mm_active.mm_list.next,
+ struct mm_slot, mm_list);
+ }
+ ksm_scan.active_slot = &ksm_mm_active;
ksm_scan.mm_slot = list_entry(ksm_mm_head.mm_list.next,
struct mm_slot, mm_list);
spin_unlock(&ksm_mmlist_lock);
@@ -790,8 +833,8 @@ static int unmerge_and_remove_all_rmap_items(void)
if (err)
goto error;
}
-
remove_trailing_rmap_items(mm_slot, &mm_slot->rmap_list);
+ mm_slot->seqnr = 0;

spin_lock(&ksm_mmlist_lock);
ksm_scan.mm_slot = list_entry(mm_slot->mm_list.next,
@@ -806,6 +849,7 @@ static int unmerge_and_remove_all_rmap_items(void)
up_read(&mm->mmap_sem);
mmdrop(mm);
} else {
+ move_to_active_list(mm_slot);
spin_unlock(&ksm_mmlist_lock);
up_read(&mm->mmap_sem);
}
@@ -1401,6 +1445,9 @@ static void stable_tree_append(struct rmap_item *rmap_item,
ksm_pages_sharing++;
else
ksm_pages_shared++;
+
+ if (!ksm_merged_or_unstable)
+ ksm_merged_or_unstable = true;
}

/*
@@ -1468,6 +1515,9 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
checksum = calc_checksum(page);
if (rmap_item->oldchecksum != checksum) {
rmap_item->oldchecksum = checksum;
+ if ((rmap_item->address & UNSTABLE_FLAG) &&
+ !ksm_merged_or_unstable)
+ ksm_merged_or_unstable = true;
return;
}

@@ -1504,6 +1554,31 @@ static void cmp_and_merge_page(struct page *page, struct rmap_item *rmap_item)
}
}

+static bool skip_active_slot_vma(struct rmap_item ***rmap_list,
+ struct vm_area_struct *vma, struct mm_slot *mm_slot)
+{
+ unsigned char age;
+ struct rmap_item *rmap_item = **rmap_list;
+
+ age = (unsigned char)(ksm_scan.seqnr - mm_slot->seqnr);
+ if (age > 0)
+ return false;
+ if (!(vma->vm_flags & VM_HUGETLB)) {
+ while (rmap_item && rmap_item->address < vma->vm_end) {
+ if (rmap_item->address < vma->vm_start) {
+ **rmap_list = rmap_item->rmap_list;
+ remove_rmap_item_from_tree(rmap_item);
+ free_rmap_item(rmap_item);
+ rmap_item = **rmap_list;
+ } else {
+ *rmap_list = &rmap_item->rmap_list;
+ rmap_item = rmap_item->rmap_list;
+ }
+ }
+ return true;
+ } else
+ return false;
+}
static struct rmap_item *get_next_rmap_item(struct mm_slot *mm_slot,
struct rmap_item **rmap_list,
unsigned long addr)
@@ -1535,15 +1610,18 @@ static struct rmap_item *get_next_rmap_item(struct mm_slot *mm_slot,
static struct rmap_item *scan_get_next_rmap_item(struct page **page)
{
struct mm_struct *mm;
- struct mm_slot *slot;
+ struct mm_slot *slot, *next_slot;
struct vm_area_struct *vma;
struct rmap_item *rmap_item;
int nid;

- if (list_empty(&ksm_mm_head.mm_list))
+ if (list_empty(&ksm_mm_head.mm_list) &&
+ list_empty(&ksm_mm_active.mm_list))
return NULL;
-
- slot = ksm_scan.mm_slot;
+ if (ksm_scan.active_slot != &ksm_mm_active)
+ slot = ksm_scan.active_slot;
+ else
+ slot = ksm_scan.mm_slot;
if (slot == &ksm_mm_head) {
/*
* A number of pages can hang around indefinitely on per-cpu
@@ -1582,8 +1660,16 @@ static struct rmap_item *scan_get_next_rmap_item(struct page **page)
root_unstable_tree[nid] = RB_ROOT;

spin_lock(&ksm_mmlist_lock);
- slot = list_entry(slot->mm_list.next, struct mm_slot, mm_list);
- ksm_scan.mm_slot = slot;
+ if (unlikely(ksm_scan.seqnr == 0 &&
+ !list_empty(&ksm_mm_active.mm_list))) {
+ slot = list_entry(ksm_mm_active.mm_list.next,
+ struct mm_slot, mm_list);
+ ksm_scan.active_slot = slot;
+ } else {
+ slot = list_entry(slot->mm_list.next,
+ struct mm_slot, mm_list);
+ ksm_scan.mm_slot = slot;
+ }
spin_unlock(&ksm_mmlist_lock);
/*
* Although we tested list_empty() above, a racing __ksm_exit
@@ -1608,7 +1694,8 @@ next_mm:
continue;
if (ksm_scan.address < vma->vm_start)
ksm_scan.address = vma->vm_start;
- if (!vma->anon_vma)
+ if (!vma->anon_vma || (ksm_scan.active_slot == slot &&
+ skip_active_slot_vma(&ksm_scan.rmap_list, vma, slot)))
ksm_scan.address = vma->vm_end;

while (ksm_scan.address < vma->vm_end) {
@@ -1639,6 +1726,9 @@ next_mm:
ksm_scan.address += PAGE_SIZE;
cond_resched();
}
+ if ((slot->seqnr & (ACTIVE_SLOT_FLAG | ACTIVE_SLOT_SEQNR)) ==
+ ACTIVE_SLOT_FLAG && vma->vm_flags & VM_HUGETLB)
+ vma->vm_flags &= ~VM_HUGETLB;
}

if (ksm_test_exit(mm)) {
@@ -1652,8 +1742,25 @@ next_mm:
remove_trailing_rmap_items(slot, ksm_scan.rmap_list);

spin_lock(&ksm_mmlist_lock);
- ksm_scan.mm_slot = list_entry(slot->mm_list.next,
- struct mm_slot, mm_list);
+ slot->seqnr &= ~SEQNR_MASK;
+ slot->seqnr |= (ksm_scan.seqnr & SEQNR_MASK);
+ next_slot = list_entry(slot->mm_list.next,
+ struct mm_slot, mm_list);
+ if (slot == ksm_scan.active_slot) {
+ if (slot->seqnr & ACTIVE_SLOT_SEQNR)
+ slot->seqnr &= ~ACTIVE_SLOT_SEQNR;
+ else {
+ slot->seqnr &= ~ACTIVE_SLOT_FLAG;
+ list_move_tail(&slot->mm_list,
+ &ksm_scan.mm_slot->mm_list);
+ }
+ ksm_scan.active_slot = next_slot;
+ } else
+ ksm_scan.mm_slot = next_slot;
+
+ if (ksm_scan.active_slot == &ksm_mm_active)
+ ksm_scan.active_slot = list_entry(ksm_mm_active.mm_list.next,
+ struct mm_slot, mm_list);
if (ksm_scan.address == 0) {
/*
* We've completed a full scan of all vmas, holding mmap_sem
@@ -1664,6 +1771,9 @@ next_mm:
* or when all VM_MERGEABLE areas have been unmapped (and
* mmap_sem then protects against race with MADV_MERGEABLE).
*/
+ if (ksm_scan.active_slot == slot)
+ ksm_scan.active_slot = list_entry(slot->mm_list.next,
+ struct mm_slot, mm_list);
hash_del(&slot->link);
list_del(&slot->mm_list);
spin_unlock(&ksm_mmlist_lock);
@@ -1678,10 +1788,13 @@ next_mm:
}

/* Repeat until we've completed scanning the whole list */
- slot = ksm_scan.mm_slot;
- if (slot != &ksm_mm_head)
+ if (ksm_scan.active_slot != &ksm_mm_active) {
+ slot = ksm_scan.active_slot;
goto next_mm;
-
+ } else if (ksm_scan.mm_slot != &ksm_mm_head) {
+ slot = ksm_scan.mm_slot;
+ goto next_mm;
+ }
ksm_scan.seqnr++;
return NULL;
}
@@ -1705,9 +1818,40 @@ static void ksm_do_scan(unsigned int scan_npages)
}
}

+/*This is copyed from page compaction*/
+
+static inline void defer_ksm(void)
+{
+ ksm_scan.ksm_considered = 0;
+ if (++ksm_scan.ksm_defer_shift > KSM_MAX_DEFER_SHIFT)
+ ksm_scan.ksm_defer_shift = KSM_MAX_DEFER_SHIFT;
+
+}
+
+static inline bool ksm_defered(void)
+{
+ unsigned long defer_limit = 1UL << ksm_scan.ksm_defer_shift;
+
+ if (++ksm_scan.ksm_considered > defer_limit)
+ ksm_scan.ksm_considered = defer_limit;
+ return ksm_scan.ksm_considered < defer_limit &&
+ list_empty(&ksm_mm_active.mm_list);
+}
+
+static inline void reset_ksm_defer(void)
+{
+ if (ksm_scan.ksm_defer_shift != 0) {
+ ksm_scan.ksm_considered = 0;
+ ksm_scan.ksm_defer_shift = 0;
+ }
+}
+
+
static int ksmd_should_run(void)
{
- return (ksm_run & KSM_RUN_MERGE) && !list_empty(&ksm_mm_head.mm_list);
+ return (ksm_run & KSM_RUN_MERGE) &&
+ !(list_empty(&ksm_mm_head.mm_list) &&
+ list_empty(&ksm_mm_active.mm_list));
}

static int ksm_scan_thread(void *nothing)
@@ -1718,8 +1862,14 @@ static int ksm_scan_thread(void *nothing)
while (!kthread_should_stop()) {
mutex_lock(&ksm_thread_mutex);
wait_while_offlining();
- if (ksmd_should_run())
+ if (ksmd_should_run() && !ksm_defered()) {
+ ksm_merged_or_unstable = false;
ksm_do_scan(ksm_thread_pages_to_scan);
+ if (ksm_merged_or_unstable)
+ reset_ksm_defer();
+ else
+ defer_ksm();
+ }
mutex_unlock(&ksm_thread_mutex);

try_to_freeze();
@@ -1739,6 +1889,8 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
unsigned long end, int advice, unsigned long *vm_flags)
{
struct mm_struct *mm = vma->vm_mm;
+ unsigned long vma_length =
+ (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
int err;

switch (advice) {
@@ -1761,8 +1913,19 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
if (err)
return err;
}
-
- *vm_flags |= VM_MERGEABLE;
+ /*
+ * Since hugetlb vma is not supported by ksm
+ * use VM_HUGETLB to indicate new mergeable vma
+ */
+ *vm_flags |= VM_MERGEABLE | VM_HUGETLB;
+ if (vma_length > ksm_thread_pages_to_scan) {
+ struct mm_slot *mm_slot;
+
+ spin_lock(&ksm_mmlist_lock);
+ mm_slot = get_mm_slot(mm);
+ move_to_active_list(mm_slot);
+ spin_unlock(&ksm_mmlist_lock);
+ }
break;

case MADV_UNMERGEABLE:
@@ -1775,7 +1938,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
return err;
}

- *vm_flags &= ~VM_MERGEABLE;
+ *vm_flags &= ~(VM_MERGEABLE | VM_HUGETLB);
break;
}

@@ -1792,7 +1955,8 @@ int __ksm_enter(struct mm_struct *mm)
return -ENOMEM;

/* Check ksm_run too? Would need tighter locking */
- needs_wakeup = list_empty(&ksm_mm_head.mm_list);
+ needs_wakeup = list_empty(&ksm_mm_head.mm_list) &&
+ list_empty(&ksm_mm_active.mm_list);

spin_lock(&ksm_mmlist_lock);
insert_to_mm_slots_hash(mm, mm_slot);
@@ -1806,10 +1970,8 @@ int __ksm_enter(struct mm_struct *mm)
* scanning cursor, otherwise KSM pages in newly forked mms will be
* missed: then we might as well insert at the end of the list.
*/
- if (ksm_run & KSM_RUN_UNMERGE)
- list_add_tail(&mm_slot->mm_list, &ksm_mm_head.mm_list);
- else
- list_add_tail(&mm_slot->mm_list, &ksm_scan.mm_slot->mm_list);
+ list_add_tail(&mm_slot->mm_list, &ksm_scan.active_slot->mm_list);
+ mm_slot->seqnr |= (ACTIVE_SLOT_FLAG | ACTIVE_SLOT_SEQNR);
spin_unlock(&ksm_mmlist_lock);

set_bit(MMF_VM_MERGEABLE, &mm->flags);
@@ -1823,7 +1985,7 @@ int __ksm_enter(struct mm_struct *mm)

void __ksm_exit(struct mm_struct *mm)
{
- struct mm_slot *mm_slot;
+ struct mm_slot *mm_slot, *current_slot;
int easy_to_free = 0;

/*
@@ -1837,14 +1999,28 @@ void __ksm_exit(struct mm_struct *mm)

spin_lock(&ksm_mmlist_lock);
mm_slot = get_mm_slot(mm);
- if (mm_slot && ksm_scan.mm_slot != mm_slot) {
+ if (ksm_scan.active_slot != &ksm_mm_active)
+ current_slot = ksm_scan.active_slot;
+ else
+ current_slot = ksm_scan.mm_slot;
+ if (mm_slot && mm_slot != current_slot) {
if (!mm_slot->rmap_list) {
hash_del(&mm_slot->link);
list_del(&mm_slot->mm_list);
easy_to_free = 1;
} else {
- list_move(&mm_slot->mm_list,
- &ksm_scan.mm_slot->mm_list);
+ if (mm_slot == ksm_scan.mm_slot)
+ ksm_scan.mm_slot =
+ list_entry(mm_slot->mm_list.next,
+ struct mm_slot, mm_list);
+ if (ksm_run & KSM_RUN_UNMERGE)
+ list_move(&mm_slot->mm_list,
+ &ksm_scan.mm_slot->mm_list);
+ else {
+ list_move(&mm_slot->mm_list,
+ &ksm_scan.active_slot->mm_list);
+ mm_slot->seqnr |= ACTIVE_SLOT_FLAG;
+ }
}
}
spin_unlock(&ksm_mmlist_lock);
--
1.7.9.5

2015-06-10 06:33:05

by wenwei tao

[permalink] [raw]
Subject: [RFC PATCH 2/6] mm: change the condition of identifying hugetlb vm

Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao <[email protected]>
---
include/linux/hugetlb_inline.h | 2 +-
include/linux/mempolicy.h | 2 +-
mm/gup.c | 6 ++++--
mm/huge_memory.c | 17 ++++++++++++-----
mm/madvise.c | 6 ++++--
mm/memory.c | 5 +++--
mm/mprotect.c | 6 ++++--
7 files changed, 29 insertions(+), 15 deletions(-)

diff --git a/include/linux/hugetlb_inline.h b/include/linux/hugetlb_inline.h
index 2bb681f..08dff6f 100644
--- a/include/linux/hugetlb_inline.h
+++ b/include/linux/hugetlb_inline.h
@@ -7,7 +7,7 @@

static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
{
- return !!(vma->vm_flags & VM_HUGETLB);
+ return !!((vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB);
}

#else
diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
index 3d385c8..40ad136 100644
--- a/include/linux/mempolicy.h
+++ b/include/linux/mempolicy.h
@@ -178,7 +178,7 @@ static inline int vma_migratable(struct vm_area_struct *vma)
return 0;

#ifndef CONFIG_ARCH_ENABLE_HUGEPAGE_MIGRATION
- if (vma->vm_flags & VM_HUGETLB)
+ if ((vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
return 0;
#endif

diff --git a/mm/gup.c b/mm/gup.c
index a6e24e2..5803dab 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -166,7 +166,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
pud = pud_offset(pgd, address);
if (pud_none(*pud))
return no_page_table(vma, flags);
- if (pud_huge(*pud) && vma->vm_flags & VM_HUGETLB) {
+ if (pud_huge(*pud) &&
+ (vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB) {
page = follow_huge_pud(mm, address, pud, flags);
if (page)
return page;
@@ -178,7 +179,8 @@ struct page *follow_page_mask(struct vm_area_struct *vma,
pmd = pmd_offset(pud, address);
if (pmd_none(*pmd))
return no_page_table(vma, flags);
- if (pmd_huge(*pmd) && vma->vm_flags & VM_HUGETLB) {
+ if (pmd_huge(*pmd) &&
+ (vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB) {
page = follow_huge_pmd(mm, address, pmd, flags);
if (page)
return page;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index fc00c8c..5a9de7f 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1910,7 +1910,6 @@ out:
return ret;
}

-#define VM_NO_THP (VM_SPECIAL | VM_HUGETLB | VM_SHARED | VM_MAYSHARE)

int hugepage_madvise(struct vm_area_struct *vma,
unsigned long *vm_flags, int advice)
@@ -1929,7 +1928,9 @@ int hugepage_madvise(struct vm_area_struct *vma,
/*
* Be somewhat over-protective like KSM for now!
*/
- if (*vm_flags & (VM_HUGEPAGE | VM_NO_THP))
+ if (*vm_flags & (VM_HUGEPAGE | VM_SPECIAL |
+ VM_SHARED | VM_MAYSHARE) ||
+ (*vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
return -EINVAL;
*vm_flags &= ~VM_NOHUGEPAGE;
*vm_flags |= VM_HUGEPAGE;
@@ -1945,7 +1946,9 @@ int hugepage_madvise(struct vm_area_struct *vma,
/*
* Be somewhat over-protective like KSM for now!
*/
- if (*vm_flags & (VM_NOHUGEPAGE | VM_NO_THP))
+ if (*vm_flags & (VM_NOHUGEPAGE | VM_SPECIAL |
+ VM_SHARED | VM_MAYSHARE) ||
+ (*vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
return -EINVAL;
*vm_flags &= ~VM_HUGEPAGE;
*vm_flags |= VM_NOHUGEPAGE;
@@ -2052,7 +2055,8 @@ int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
if (vma->vm_ops)
/* khugepaged not yet working on file or special mappings */
return 0;
- VM_BUG_ON_VMA(vm_flags & VM_NO_THP, vma);
+ VM_BUG_ON_VMA(vm_flags & (VM_SPECIAL | VM_SHARED | VM_MAYSHARE) ||
+ (vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB, vma);
hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
hend = vma->vm_end & HPAGE_PMD_MASK;
if (hstart < hend)
@@ -2396,7 +2400,10 @@ static bool hugepage_vma_check(struct vm_area_struct *vma)
return false;
if (is_vma_temporary_stack(vma))
return false;
- VM_BUG_ON_VMA(vma->vm_flags & VM_NO_THP, vma);
+ VM_BUG_ON_VMA(vma->vm_flags &
+ (VM_SPECIAL | VM_SHARED | VM_MAYSHARE) ||
+ (vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) ==
+ VM_HUGETLB, vma);
return true;
}

diff --git a/mm/madvise.c b/mm/madvise.c
index d551475..ad1081e 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -278,7 +278,8 @@ static long madvise_dontneed(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
*prev = vma;
- if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
+ if (vma->vm_flags & (VM_LOCKED|VM_PFNMAP) ||
+ (vma->vm_flags & (VM_HUGETLB|VM_MERGEABLE)) == VM_HUGETLB)
return -EINVAL;

zap_page_range(vma, start, end - start, NULL);
@@ -299,7 +300,8 @@ static long madvise_remove(struct vm_area_struct *vma,

*prev = NULL; /* tell sys_madvise we drop mmap_sem */

- if (vma->vm_flags & (VM_LOCKED | VM_HUGETLB))
+ if (vma->vm_flags & VM_LOCKED ||
+ (vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
return -EINVAL;

f = vma->vm_file;
diff --git a/mm/memory.c b/mm/memory.c
index 8068893..266456c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1021,8 +1021,9 @@ int copy_page_range(struct mm_struct *dst_mm, struct mm_struct *src_mm,
* readonly mappings. The tradeoff is that copy_page_range is more
* efficient than faulting.
*/
- if (!(vma->vm_flags & (VM_HUGETLB | VM_PFNMAP | VM_MIXEDMAP)) &&
- !vma->anon_vma)
+ if (!(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP) ||
+ (vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB) &&
+ !vma->anon_vma)
return 0;

if (is_vm_hugetlb_page(vma))
diff --git a/mm/mprotect.c b/mm/mprotect.c
index 4472781..09cce5b 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -273,8 +273,10 @@ mprotect_fixup(struct vm_area_struct *vma, struct vm_area_struct **pprev,
* even if read-only so there is no need to account for them here
*/
if (newflags & VM_WRITE) {
- if (!(oldflags & (VM_ACCOUNT|VM_WRITE|VM_HUGETLB|
- VM_SHARED|VM_NORESERVE))) {
+ if (!(oldflags &
+ (VM_ACCOUNT|VM_WRITE|VM_SHARED|VM_NORESERVE) ||
+ (oldflags & (VM_HUGETLB | VM_MERGEABLE)) ==
+ VM_HUGETLB)) {
charged = nrpages;
if (security_vm_enough_memory_mm(mm, charged))
return -ENOMEM;
--
1.7.9.5

2015-06-10 06:45:56

by wenwei tao

[permalink] [raw]
Subject: [RFC PATCH 3/6] perf: change the condition of identifying hugetlb vm

Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao <[email protected]>
---
kernel/events/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f04daab..6313bdd 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5624,7 +5624,7 @@ static void perf_event_mmap_event(struct perf_mmap_event *mmap_event)
flags |= MAP_EXECUTABLE;
if (vma->vm_flags & VM_LOCKED)
flags |= MAP_LOCKED;
- if (vma->vm_flags & VM_HUGETLB)
+ if ((vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB)
flags |= MAP_HUGETLB;

goto got_name;
--
1.7.9.5

2015-06-10 06:32:54

by wenwei tao

[permalink] [raw]
Subject: [RFC PATCH 4/6] fs/binfmt_elf.c: change the condition of identifying hugetlb vm

Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao <[email protected]>
---
fs/binfmt_elf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 995986b..f529c8e 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1242,7 +1242,7 @@ static unsigned long vma_dump_size(struct vm_area_struct *vma,
return 0;

/* Hugetlb memory check */
- if (vma->vm_flags & VM_HUGETLB) {
+ if ((vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB) {
if ((vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_SHARED))
goto whole;
if (!(vma->vm_flags & VM_SHARED) && FILTER(HUGETLB_PRIVATE))
--
1.7.9.5

2015-06-10 06:33:59

by wenwei tao

[permalink] [raw]
Subject: [RFC PATCH 5/6] x86/mm: change the condition of identifying hugetlb vm

Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao <[email protected]>
---
arch/x86/mm/tlb.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3250f23..0247916 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -195,7 +195,8 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
goto out;
}

- if ((end != TLB_FLUSH_ALL) && !(vmflag & VM_HUGETLB))
+ if ((end != TLB_FLUSH_ALL) &&
+ !((vmflag & (VM_HUGETLB | VM_MERGEABLE)) == VM_HUGETLB))
base_pages_to_flush = (end - start) >> PAGE_SHIFT;

if (base_pages_to_flush > tlb_single_page_flush_ceiling) {
--
1.7.9.5

2015-06-10 06:33:13

by wenwei tao

[permalink] [raw]
Subject: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB and
VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a hugetlb
VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Signed-off-by: Wenwei Tao <[email protected]>
---
arch/powerpc/kvm/e500_mmu_host.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index cc536d4..d76f518 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -423,7 +423,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
break;
}
} else if (vma && hva >= vma->vm_start &&
- (vma->vm_flags & VM_HUGETLB)) {
+ ((vma->vm_flags & (VM_HUGETLB | VM_MERGEABLE)) ==
+ VM_HUGETLB)) {
unsigned long psize = vma_kernel_pagesize(vma);

tsize = (gtlbe->mas1 & MAS1_TSIZE_MASK) >>
--
1.7.9.5

2015-07-02 21:49:56

by Scott Wood

[permalink] [raw]
Subject: Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

On Wed, 2015-06-10 at 14:27 +0800, Wenwei Tao wrote:
> Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB
> and
> VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
> mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a
> hugetlb
> VMA only if it doesn't have VM_MERGEABLE been set in the same time.

Eww.

If you must overload such bit combinations, please hide it behind a
vm_is_hugetlb() function.

-Scott

2015-07-03 08:47:50

by wenwei tao

[permalink] [raw]
Subject: Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

Hi Scott

Thank you for your comments.

Kernel already has that function: is_vm_hugetlb_page() , but the
original code didn't use it,
in order to keep the coding style of the original code, I didn't use it either.

For the sentence like: "vma->vm_flags & VM_HUGETLB" , hiding it behind
'is_vm_hugetlb_page()' is ok,
but the sentence like: "vma->vm_flags &
(VM_LOCKED|VM_HUGETLB|VM_PFNMAP)" appears in the patch 2/6,
is it better to hide the bit combinations behind the
is_vm_hugetlb_page() ? In my patch I just replaced it with
"vma->vm_flags & (VM_LOCKED|VM_PFNMAP) || (vma->vm_flags &
(VM_HUGETLB|VM_MERGEABLE)) == VM_HUGETLB".

I am a newbie to Linux kernel, do you have any good suggestions on
this situation?

Thank you
Wenwei

2015-07-03 5:49 GMT+08:00 Scott Wood <[email protected]>:
> On Wed, 2015-06-10 at 14:27 +0800, Wenwei Tao wrote:
>> Hugetlb VMAs are not mergeable, that means a VMA couldn't have VM_HUGETLB
>> and
>> VM_MERGEABLE been set in the same time. So we use VM_HUGETLB to indicate new
>> mergeable VMAs. Because of that a VMA which has VM_HUGETLB been set is a
>> hugetlb
>> VMA only if it doesn't have VM_MERGEABLE been set in the same time.
>
> Eww.
>
> If you must overload such bit combinations, please hide it behind a
> vm_is_hugetlb() function.
>
> -Scott
>

2015-07-06 21:34:56

by Scott Wood

[permalink] [raw]
Subject: Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

On Fri, 2015-07-03 at 16:47 +0800, wenwei tao wrote:
> Hi Scott
>
> Thank you for your comments.
>
> Kernel already has that function: is_vm_hugetlb_page() , but the
> original code didn't use it,
> in order to keep the coding style of the original code, I didn't use it
> either.
>
> For the sentence like: "vma->vm_flags & VM_HUGETLB" , hiding it behind
> 'is_vm_hugetlb_page()' is ok,
> but the sentence like: "vma->vm_flags &
> (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)" appears in the patch 2/6,
> is it better to hide the bit combinations behind the
> is_vm_hugetlb_page() ? In my patch I just replaced it with
> "vma->vm_flags & (VM_LOCKED|VM_PFNMAP) || (vma->vm_flags &
> (VM_HUGETLB|VM_MERGEABLE)) == VM_HUGETLB".

If you're going to do non-obvious things with the flags, it should be done in
one place rather than throughout the code. Why would you do the above and
not "vma->vm_flags & (VM_LOCKED | VM_PFNMAP) || is_vm_hugetlb_page(vma)"?

-Scott

2015-07-07 08:06:01

by wenwei tao

[permalink] [raw]
Subject: Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

Hi Scott

I understand what you said.

I will use the function 'is_vm_hugetlb_page()' to hide the bit
combinations according to your comments in the next version of patch
set.

But for the situation like below, there isn't an obvious structure
'vma', using 'is_vm_hugetlb_page()' maybe costly or even not possible.
void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
unsigned long end, unsigned long vmflag)
{
...

if (end == TLB_FLUSH_ALL || tlb_flushall_shift == -1
|| vmflag & VM_HUGETLB) {
local_flush_tlb();
goto flush_all;
}
...
}


Thank you
Wenwei

2015-07-07 5:34 GMT+08:00 Scott Wood <[email protected]>:
> On Fri, 2015-07-03 at 16:47 +0800, wenwei tao wrote:
>> Hi Scott
>>
>> Thank you for your comments.
>>
>> Kernel already has that function: is_vm_hugetlb_page() , but the
>> original code didn't use it,
>> in order to keep the coding style of the original code, I didn't use it
>> either.
>>
>> For the sentence like: "vma->vm_flags & VM_HUGETLB" , hiding it behind
>> 'is_vm_hugetlb_page()' is ok,
>> but the sentence like: "vma->vm_flags &
>> (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)" appears in the patch 2/6,
>> is it better to hide the bit combinations behind the
>> is_vm_hugetlb_page() ? In my patch I just replaced it with
>> "vma->vm_flags & (VM_LOCKED|VM_PFNMAP) || (vma->vm_flags &
>> (VM_HUGETLB|VM_MERGEABLE)) == VM_HUGETLB".
>
> If you're going to do non-obvious things with the flags, it should be done in
> one place rather than throughout the code. Why would you do the above and
> not "vma->vm_flags & (VM_LOCKED | VM_PFNMAP) || is_vm_hugetlb_page(vma)"?
>
> -Scott
>

2015-07-07 19:48:00

by Scott Wood

[permalink] [raw]
Subject: Re: [RFC PATCH 6/6] powerpc/kvm: change the condition of identifying hugetlb vm

On Tue, 2015-07-07 at 16:05 +0800, wenwei tao wrote:
> Hi Scott
>
> I understand what you said.
>
> I will use the function 'is_vm_hugetlb_page()' to hide the bit
> combinations according to your comments in the next version of patch
> set.
>
> But for the situation like below, there isn't an obvious structure
> 'vma', using 'is_vm_hugetlb_page()' maybe costly or even not possible.
> void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
> unsigned long end, unsigned long vmflag)
> {
> ...
>
> if (end == TLB_FLUSH_ALL || tlb_flushall_shift == -1
> || vmflag & VM_HUGETLB) {
> local_flush_tlb();
> goto flush_all;
> }
> ...
> }

Add a function that operates on the flags directly, then.

-Scott