Morally, the question of whether an address lies in a gate vma should be asked
with respect to an mm, not a particular task.
Practically, dropping the dependency on task_struct will help make current and
future operations on mm's more flexible and convenient. In particular, it
allows some code paths to avoid the need to hold task_lock.
The only architecture this change impacts in any significant way is x86_64.
The principle change on that architecture is to mirror TIF_IA32 via
a new flag in mm_context_t.
This is the first of a two part series that implements safe writes to
/proc/pid/mem. I will be posting the second series to lkml shortly. These
patches are based on v2.6.38-rc8. The general approach used here was suggested
to me by Alexander Viro, but any mistakes present in these patches are entirely
my own.
--
steve
Stephen Wilson (5):
x86: add context tag to mark mm when running a task in 32-bit compatibility mode
x86: mark associated mm when running a task in 32 bit compatibility mode
mm: arch: make get_gate_vma take an mm_struct instead of a task_struct
mm: arch: make in_gate_area take an mm_struct instead of a task_struct
mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm
arch/powerpc/kernel/vdso.c | 6 +++---
arch/s390/kernel/vdso.c | 6 +++---
arch/sh/kernel/vsyscall/vsyscall.c | 6 +++---
arch/x86/ia32/ia32_aout.c | 1 +
arch/x86/include/asm/mmu.h | 6 ++++++
arch/x86/kernel/process_64.c | 8 ++++++++
arch/x86/mm/init_64.c | 16 ++++++++--------
arch/x86/vdso/vdso32-setup.c | 15 ++++++++-------
fs/binfmt_elf.c | 2 +-
fs/proc/task_mmu.c | 8 +++++---
include/linux/mm.h | 10 +++++-----
kernel/kallsyms.c | 4 ++--
mm/memory.c | 8 ++++----
mm/mlock.c | 4 ++--
mm/nommu.c | 2 +-
15 files changed, 61 insertions(+), 42 deletions(-)
This tag is intended to mirror the thread info TIF_IA32 flag. Will be used to
identify mm's which support 32 bit tasks running in compatibility mode without
requiring a reference to the task itself.
Signed-off-by: Stephen Wilson <[email protected]>
---
arch/x86/include/asm/mmu.h | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 80a1dee..8a2e5b4 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -13,6 +13,12 @@ typedef struct {
int size;
struct mutex lock;
void *vdso;
+
+#ifdef CONFIG_X86_64
+ /* True if mm supports a task running in 32 bit compatibility mode. */
+ unsigned short compat;
+#endif
+
} mm_context_t;
#ifdef CONFIG_SMP
--
1.7.3.5
This patch simply follows the same practice as for setting the TIF_IA32 flag.
In particular, an mm is marked as holding 32-bit tasks when a 32-bit binary is
exec'ed. Both ELF and a.out formats are updated.
Signed-off-by: Stephen Wilson <[email protected]>
---
arch/x86/ia32/ia32_aout.c | 1 +
arch/x86/kernel/process_64.c | 8 ++++++++
2 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/arch/x86/ia32/ia32_aout.c b/arch/x86/ia32/ia32_aout.c
index 2d93bdb..95cde43 100644
--- a/arch/x86/ia32/ia32_aout.c
+++ b/arch/x86/ia32/ia32_aout.c
@@ -298,6 +298,7 @@ static int load_aout_binary(struct linux_binprm *bprm, struct pt_regs *regs)
/* OK, This is the point of no return */
set_personality(PER_LINUX);
set_thread_flag(TIF_IA32);
+ current->mm->context.compat = 1;
setup_new_exec(bprm);
diff --git a/arch/x86/kernel/process_64.c b/arch/x86/kernel/process_64.c
index bd387e8..35f1221 100644
--- a/arch/x86/kernel/process_64.c
+++ b/arch/x86/kernel/process_64.c
@@ -501,6 +501,10 @@ void set_personality_64bit(void)
/* Make sure to be in 64bit mode */
clear_thread_flag(TIF_IA32);
+ /* Ensure the corresponding mm is not marked. */
+ if (current->mm)
+ current->mm->context.compat = 0;
+
/* TBD: overwrites user setup. Should have two bits.
But 64bit processes have always behaved this way,
so it's not too bad. The main problem is just that
@@ -516,6 +520,10 @@ void set_personality_ia32(void)
set_thread_flag(TIF_IA32);
current->personality |= force_personality32;
+ /* Mark the associated mm as containing 32-bit tasks. */
+ if (current->mm)
+ current->mm->context.compat = 1;
+
/* Prepare the first "return" to user space */
current_thread_info()->status |= TS_COMPAT;
}
--
1.7.3.5
Morally, the presence of a gate vma is more an attribute of a particular mm than
a particular task. Moreover, dropping the dependency on task_struct will help
make both existing and future operations on mm's more flexible and convenient.
Signed-off-by: Stephen Wilson <[email protected]>
---
arch/powerpc/kernel/vdso.c | 2 +-
arch/s390/kernel/vdso.c | 2 +-
arch/sh/kernel/vsyscall/vsyscall.c | 2 +-
arch/x86/mm/init_64.c | 6 +++---
arch/x86/vdso/vdso32-setup.c | 11 ++++++-----
fs/binfmt_elf.c | 2 +-
fs/proc/task_mmu.c | 8 +++++---
include/linux/mm.h | 2 +-
mm/memory.c | 4 ++--
mm/mlock.c | 4 ++--
10 files changed, 23 insertions(+), 20 deletions(-)
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index fd87287..6169f17 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -830,7 +830,7 @@ int in_gate_area(struct task_struct *task, unsigned long addr)
return 0;
}
-struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
return NULL;
}
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index f438d74..d19f305 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -347,7 +347,7 @@ int in_gate_area(struct task_struct *task, unsigned long addr)
return 0;
}
-struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
return NULL;
}
diff --git a/arch/sh/kernel/vsyscall/vsyscall.c b/arch/sh/kernel/vsyscall/vsyscall.c
index 242117c..3f9b6f4 100644
--- a/arch/sh/kernel/vsyscall/vsyscall.c
+++ b/arch/sh/kernel/vsyscall/vsyscall.c
@@ -94,7 +94,7 @@ const char *arch_vma_name(struct vm_area_struct *vma)
return NULL;
}
-struct vm_area_struct *get_gate_vma(struct task_struct *task)
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
return NULL;
}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 71a5929..2c1799f 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -870,10 +870,10 @@ static struct vm_area_struct gate_vma = {
.vm_flags = VM_READ | VM_EXEC
};
-struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
#ifdef CONFIG_IA32_EMULATION
- if (test_tsk_thread_flag(tsk, TIF_IA32))
+ if (!mm || mm->context.compat)
return NULL;
#endif
return &gate_vma;
@@ -881,7 +881,7 @@ struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
int in_gate_area(struct task_struct *task, unsigned long addr)
{
- struct vm_area_struct *vma = get_gate_vma(task);
+ struct vm_area_struct *vma = get_gate_vma(task->mm);
if (!vma)
return 0;
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index 36df991..1f651f6 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -417,11 +417,12 @@ const char *arch_vma_name(struct vm_area_struct *vma)
return NULL;
}
-struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
- struct mm_struct *mm = tsk->mm;
-
- /* Check to see if this task was created in compat vdso mode */
+ /*
+ * Check to see if the corresponding task was created in compat vdso
+ * mode.
+ */
if (mm && mm->context.vdso == (void *)VDSO_HIGH_BASE)
return &gate_vma;
return NULL;
@@ -429,7 +430,7 @@ struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
int in_gate_area(struct task_struct *task, unsigned long addr)
{
- const struct vm_area_struct *vma = get_gate_vma(task);
+ const struct vm_area_struct *vma = get_gate_vma(task->mm);
return vma && addr >= vma->vm_start && addr < vma->vm_end;
}
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index d5b640b..bbabdcc 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -1906,7 +1906,7 @@ static int elf_core_dump(struct coredump_params *cprm)
segs = current->mm->map_count;
segs += elf_core_extra_phdrs();
- gate_vma = get_gate_vma(current);
+ gate_vma = get_gate_vma(current->mm);
if (gate_vma != NULL)
segs++;
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 60b9148..bb548d4 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -126,7 +126,7 @@ static void *m_start(struct seq_file *m, loff_t *pos)
return NULL;
down_read(&mm->mmap_sem);
- tail_vma = get_gate_vma(priv->task);
+ tail_vma = get_gate_vma(priv->task->mm);
priv->tail_vma = tail_vma;
/* Start with last addr hint */
@@ -277,7 +277,8 @@ static int show_map(struct seq_file *m, void *v)
show_map_vma(m, vma);
if (m->count < m->size) /* vma is copied successfully */
- m->version = (vma != get_gate_vma(task))? vma->vm_start: 0;
+ m->version = (vma != get_gate_vma(task->mm))
+ ? vma->vm_start : 0;
return 0;
}
@@ -436,7 +437,8 @@ static int show_smap(struct seq_file *m, void *v)
(unsigned long)(mss.pss >> (10 + PSS_SHIFT)) : 0);
if (m->count < m->size) /* vma is copied successfully */
- m->version = (vma != get_gate_vma(task)) ? vma->vm_start : 0;
+ m->version = (vma != get_gate_vma(task->mm))
+ ? vma->vm_start : 0;
return 0;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f6385fc..b571921 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1568,7 +1568,7 @@ static inline bool kernel_page_present(struct page *page) { return true; }
#endif /* CONFIG_HIBERNATION */
#endif
-extern struct vm_area_struct *get_gate_vma(struct task_struct *tsk);
+extern struct vm_area_struct *get_gate_vma(struct mm_struct *mm);
#ifdef __HAVE_ARCH_GATE_AREA
int in_gate_area_no_task(unsigned long addr);
int in_gate_area(struct task_struct *task, unsigned long addr);
diff --git a/mm/memory.c b/mm/memory.c
index 5823698..aec7cbd 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1439,7 +1439,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
vma = find_extend_vma(mm, start);
if (!vma && in_gate_area(tsk, start)) {
unsigned long pg = start & PAGE_MASK;
- struct vm_area_struct *gate_vma = get_gate_vma(tsk);
+ struct vm_area_struct *gate_vma = get_gate_vma(tsk->mm);
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
@@ -3439,7 +3439,7 @@ static int __init gate_vma_init(void)
__initcall(gate_vma_init);
#endif
-struct vm_area_struct *get_gate_vma(struct task_struct *tsk)
+struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
{
#ifdef AT_SYSINFO_EHDR
return &gate_vma;
diff --git a/mm/mlock.c b/mm/mlock.c
index c3924c7f..2689a08c 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -237,7 +237,7 @@ long mlock_vma_pages_range(struct vm_area_struct *vma,
if (!((vma->vm_flags & (VM_DONTEXPAND | VM_RESERVED)) ||
is_vm_hugetlb_page(vma) ||
- vma == get_gate_vma(current))) {
+ vma == get_gate_vma(current->mm))) {
__mlock_vma_pages_range(vma, start, end, NULL);
@@ -332,7 +332,7 @@ static int mlock_fixup(struct vm_area_struct *vma, struct vm_area_struct **prev,
int lock = newflags & VM_LOCKED;
if (newflags == vma->vm_flags || (vma->vm_flags & VM_SPECIAL) ||
- is_vm_hugetlb_page(vma) || vma == get_gate_vma(current))
+ is_vm_hugetlb_page(vma) || vma == get_gate_vma(current->mm))
goto out; /* don't set VM_LOCKED, don't count */
pgoff = vma->vm_pgoff + ((start - vma->vm_start) >> PAGE_SHIFT);
--
1.7.3.5
Morally, the question of whether an address lies in a gate vma should be asked
with respect to an mm, not a particular task. Moreover, dropping the dependency
on task_struct will help make existing and future operations on mm's more
flexible and convenient.
Signed-off-by: Stephen Wilson <[email protected]>
---
arch/powerpc/kernel/vdso.c | 2 +-
arch/s390/kernel/vdso.c | 2 +-
arch/sh/kernel/vsyscall/vsyscall.c | 2 +-
arch/x86/mm/init_64.c | 4 ++--
arch/x86/vdso/vdso32-setup.c | 4 ++--
include/linux/mm.h | 4 ++--
mm/memory.c | 2 +-
7 files changed, 10 insertions(+), 10 deletions(-)
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 6169f17..467aa9e 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -825,7 +825,7 @@ int in_gate_area_no_task(unsigned long addr)
return 0;
}
-int in_gate_area(struct task_struct *task, unsigned long addr)
+int in_gate_area(struct mm_struct *mm, unsigned long addr)
{
return 0;
}
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index d19f305..9006e96 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -342,7 +342,7 @@ int in_gate_area_no_task(unsigned long addr)
return 0;
}
-int in_gate_area(struct task_struct *task, unsigned long addr)
+int in_gate_area(struct mm_struct *mm, unsigned long addr)
{
return 0;
}
diff --git a/arch/sh/kernel/vsyscall/vsyscall.c b/arch/sh/kernel/vsyscall/vsyscall.c
index 3f9b6f4..62c36a8 100644
--- a/arch/sh/kernel/vsyscall/vsyscall.c
+++ b/arch/sh/kernel/vsyscall/vsyscall.c
@@ -99,7 +99,7 @@ struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
return NULL;
}
-int in_gate_area(struct task_struct *task, unsigned long address)
+int in_gate_area(struct mm_struct *mm, unsigned long address)
{
return 0;
}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 2c1799f..3af0da3 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -879,9 +879,9 @@ struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
return &gate_vma;
}
-int in_gate_area(struct task_struct *task, unsigned long addr)
+int in_gate_area(struct mm_struct *mm, unsigned long addr)
{
- struct vm_area_struct *vma = get_gate_vma(task->mm);
+ struct vm_area_struct *vma = get_gate_vma(mm);
if (!vma)
return 0;
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index 1f651f6..f849bb2 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -428,9 +428,9 @@ struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
return NULL;
}
-int in_gate_area(struct task_struct *task, unsigned long addr)
+int in_gate_area(struct mm_struct *mm, unsigned long addr)
{
- const struct vm_area_struct *vma = get_gate_vma(task->mm);
+ const struct vm_area_struct *vma = get_gate_vma(mm);
return vma && addr >= vma->vm_start && addr < vma->vm_end;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index b571921..c700bbb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1571,10 +1571,10 @@ static inline bool kernel_page_present(struct page *page) { return true; }
extern struct vm_area_struct *get_gate_vma(struct mm_struct *mm);
#ifdef __HAVE_ARCH_GATE_AREA
int in_gate_area_no_task(unsigned long addr);
-int in_gate_area(struct task_struct *task, unsigned long addr);
+int in_gate_area(struct mm_struct *mm, unsigned long addr);
#else
int in_gate_area_no_task(unsigned long addr);
-#define in_gate_area(task, addr) ({(void)task; in_gate_area_no_task(addr);})
+#define in_gate_area(mm, addr) ({(void)mm; in_gate_area_no_task(addr);})
#endif /* __HAVE_ARCH_GATE_AREA */
int drop_caches_sysctl_handler(struct ctl_table *, int,
diff --git a/mm/memory.c b/mm/memory.c
index aec7cbd..d1347ac 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1437,7 +1437,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
struct vm_area_struct *vma;
vma = find_extend_vma(mm, start);
- if (!vma && in_gate_area(tsk, start)) {
+ if (!vma && in_gate_area(tsk->mm, start)) {
unsigned long pg = start & PAGE_MASK;
struct vm_area_struct *gate_vma = get_gate_vma(tsk->mm);
pgd_t *pgd;
--
1.7.3.5
Now that gate vma's are referenced with respect to a particular mm and not a
particular task it only makes sense to propagate the change to this predicate as
well.
Signed-off-by: Stephen Wilson <[email protected]>
---
arch/powerpc/kernel/vdso.c | 2 +-
arch/s390/kernel/vdso.c | 2 +-
arch/sh/kernel/vsyscall/vsyscall.c | 2 +-
arch/x86/mm/init_64.c | 8 ++++----
arch/x86/vdso/vdso32-setup.c | 2 +-
include/linux/mm.h | 6 +++---
kernel/kallsyms.c | 4 ++--
mm/memory.c | 2 +-
mm/nommu.c | 2 +-
9 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
index 467aa9e..142ab10 100644
--- a/arch/powerpc/kernel/vdso.c
+++ b/arch/powerpc/kernel/vdso.c
@@ -820,7 +820,7 @@ static int __init vdso_init(void)
}
arch_initcall(vdso_init);
-int in_gate_area_no_task(unsigned long addr)
+int in_gate_area_no_mm(unsigned long addr)
{
return 0;
}
diff --git a/arch/s390/kernel/vdso.c b/arch/s390/kernel/vdso.c
index 9006e96..d73630b 100644
--- a/arch/s390/kernel/vdso.c
+++ b/arch/s390/kernel/vdso.c
@@ -337,7 +337,7 @@ static int __init vdso_init(void)
}
arch_initcall(vdso_init);
-int in_gate_area_no_task(unsigned long addr)
+int in_gate_area_no_mm(unsigned long addr)
{
return 0;
}
diff --git a/arch/sh/kernel/vsyscall/vsyscall.c b/arch/sh/kernel/vsyscall/vsyscall.c
index 62c36a8..1d6d51a 100644
--- a/arch/sh/kernel/vsyscall/vsyscall.c
+++ b/arch/sh/kernel/vsyscall/vsyscall.c
@@ -104,7 +104,7 @@ int in_gate_area(struct mm_struct *mm, unsigned long address)
return 0;
}
-int in_gate_area_no_task(unsigned long address)
+int in_gate_area_no_mm(unsigned long address)
{
return 0;
}
diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 3af0da3..d3b5e2c 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -890,11 +890,11 @@ int in_gate_area(struct mm_struct *mm, unsigned long addr)
}
/*
- * Use this when you have no reliable task/vma, typically from interrupt
- * context. It is less reliable than using the task's vma and may give
- * false positives:
+ * Use this when you have no reliable mm, typically from interrupt
+ * context. It is less reliable than using a task's mm and may give
+ * false positives.
*/
-int in_gate_area_no_task(unsigned long addr)
+int in_gate_area_no_mm(unsigned long addr)
{
return (addr >= VSYSCALL_START) && (addr < VSYSCALL_END);
}
diff --git a/arch/x86/vdso/vdso32-setup.c b/arch/x86/vdso/vdso32-setup.c
index f849bb2..468d591 100644
--- a/arch/x86/vdso/vdso32-setup.c
+++ b/arch/x86/vdso/vdso32-setup.c
@@ -435,7 +435,7 @@ int in_gate_area(struct mm_struct *mm, unsigned long addr)
return vma && addr >= vma->vm_start && addr < vma->vm_end;
}
-int in_gate_area_no_task(unsigned long addr)
+int in_gate_area_no_mm(unsigned long addr)
{
return 0;
}
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c700bbb..694512d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1570,11 +1570,11 @@ static inline bool kernel_page_present(struct page *page) { return true; }
extern struct vm_area_struct *get_gate_vma(struct mm_struct *mm);
#ifdef __HAVE_ARCH_GATE_AREA
-int in_gate_area_no_task(unsigned long addr);
+int in_gate_area_no_mm(unsigned long addr);
int in_gate_area(struct mm_struct *mm, unsigned long addr);
#else
-int in_gate_area_no_task(unsigned long addr);
-#define in_gate_area(mm, addr) ({(void)mm; in_gate_area_no_task(addr);})
+int in_gate_area_no_mm(unsigned long addr);
+#define in_gate_area(mm, addr) ({(void)mm; in_gate_area_no_mm(addr);})
#endif /* __HAVE_ARCH_GATE_AREA */
int drop_caches_sysctl_handler(struct ctl_table *, int,
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c
index 6f6d091..b9d0fd1 100644
--- a/kernel/kallsyms.c
+++ b/kernel/kallsyms.c
@@ -64,14 +64,14 @@ static inline int is_kernel_text(unsigned long addr)
if ((addr >= (unsigned long)_stext && addr <= (unsigned long)_etext) ||
arch_is_kernel_text(addr))
return 1;
- return in_gate_area_no_task(addr);
+ return in_gate_area_no_mm(addr);
}
static inline int is_kernel(unsigned long addr)
{
if (addr >= (unsigned long)_stext && addr <= (unsigned long)_end)
return 1;
- return in_gate_area_no_task(addr);
+ return in_gate_area_no_mm(addr);
}
static int is_ksym_addr(unsigned long addr)
diff --git a/mm/memory.c b/mm/memory.c
index d1347ac..3863e86 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3448,7 +3448,7 @@ struct vm_area_struct *get_gate_vma(struct mm_struct *mm)
#endif
}
-int in_gate_area_no_task(unsigned long addr)
+int in_gate_area_no_mm(unsigned long addr)
{
#ifdef AT_SYSINFO_EHDR
if ((addr >= FIXADDR_USER_START) && (addr < FIXADDR_USER_END))
diff --git a/mm/nommu.c b/mm/nommu.c
index f59e142..e629143 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -1963,7 +1963,7 @@ error:
return -ENOMEM;
}
-int in_gate_area_no_task(unsigned long addr)
+int in_gate_area_no_mm(unsigned long addr)
{
return 0;
}
--
1.7.3.5
On Tue, Mar 8, 2011 at 4:31 PM, Stephen Wilson <[email protected]> wrote:
> Morally, the question of whether an address lies in a gate vma should be asked
> with respect to an mm, not a particular task.
>
> Practically, dropping the dependency on task_struct will help make current and
> future operations on mm's more flexible and convenient. ?In particular, it
> allows some code paths to avoid the need to hold task_lock.
Reviewed-by: Michel Lespinasse <[email protected]>
May I suggest ia32_compat instead of just compat for the flag name ?
--
Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
On Wed, Mar 09, 2011 at 05:09:09AM -0800, Michel Lespinasse wrote:
> On Tue, Mar 8, 2011 at 4:31 PM, Stephen Wilson <[email protected]> wrote:
> > Morally, the question of whether an address lies in a gate vma should be asked
> > with respect to an mm, not a particular task.
> >
> > Practically, dropping the dependency on task_struct will help make current and
> > future operations on mm's more flexible and convenient. ?In particular, it
> > allows some code paths to avoid the need to hold task_lock.
>
> Reviewed-by: Michel Lespinasse <[email protected]>
>
> May I suggest ia32_compat instead of just compat for the flag name ?
Yes, sounds good to me. Will change in the next iteration.
Thanks for the review!
> --
> Michel "Walken" Lespinasse
> A program is never fully debugged until the last user dies.
--
steve
On Tue, Mar 08, 2011 at 07:31:56PM -0500, Stephen Wilson wrote:
>
> Morally, the question of whether an address lies in a gate vma should be asked
> with respect to an mm, not a particular task.
>
> Practically, dropping the dependency on task_struct will help make current and
> future operations on mm's more flexible and convenient. In particular, it
> allows some code paths to avoid the need to hold task_lock.
>
> The only architecture this change impacts in any significant way is x86_64.
> The principle change on that architecture is to mirror TIF_IA32 via
> a new flag in mm_context_t.
The problem is -- you're adding a likely cache miss on mm_struct for
every 32bit compat syscall now, even if they don't need mm_struct
currently (and a lot of them do not) Unless there's a very good
justification to make up for this performance issue elsewhere
(including numbers) this seems like a bad idea.
> This is the first of a two part series that implements safe writes to
> /proc/pid/mem. I will be posting the second series to lkml shortly. These
Making every syscall slower for /proc/pid/mem doesn't seem like a good
tradeoff to me. Please solve this in some other way.
-Andi
On Thu, Mar 10, 2011 at 08:00:32AM -0800, Andi Kleen wrote:
> On Tue, Mar 08, 2011 at 07:31:56PM -0500, Stephen Wilson wrote:
> >
> > Morally, the question of whether an address lies in a gate vma should be asked
> > with respect to an mm, not a particular task.
> >
> > Practically, dropping the dependency on task_struct will help make current and
> > future operations on mm's more flexible and convenient. In particular, it
> > allows some code paths to avoid the need to hold task_lock.
> >
> > The only architecture this change impacts in any significant way is x86_64.
> > The principle change on that architecture is to mirror TIF_IA32 via
> > a new flag in mm_context_t.
>
> The problem is -- you're adding a likely cache miss on mm_struct for
> every 32bit compat syscall now, even if they don't need mm_struct
> currently (and a lot of them do not) Unless there's a very good
> justification to make up for this performance issue elsewhere
> (including numbers) this seems like a bad idea.
Hmm I see you're only setting it on exec time actually on rereading
the patches. I thought you were changing TS_COMPAT which is in
the syscall path.
Never mind. I have no problems with doing such a change on exec
time.
-Andi
On Thu, Mar 10, 2011 at 08:00:32AM -0800, Andi Kleen wrote:
> On Tue, Mar 08, 2011 at 07:31:56PM -0500, Stephen Wilson wrote:
> >
> > Morally, the question of whether an address lies in a gate vma should be asked
> > with respect to an mm, not a particular task.
> >
> > Practically, dropping the dependency on task_struct will help make current and
> > future operations on mm's more flexible and convenient. In particular, it
> > allows some code paths to avoid the need to hold task_lock.
> >
> > The only architecture this change impacts in any significant way is x86_64.
> > The principle change on that architecture is to mirror TIF_IA32 via
> > a new flag in mm_context_t.
>
> The problem is -- you're adding a likely cache miss on mm_struct for
> every 32bit compat syscall now, even if they don't need mm_struct
> currently (and a lot of them do not) Unless there's a very good
> justification to make up for this performance issue elsewhere
> (including numbers) this seems like a bad idea.
I do not think this will result in cache misses on the scale you
suggest. I am simply mirroring the *state* of the TIF_IA32 flag in
mm_struct, not testing/accessing it in the same way.
The only place where this flag is accessed (outside the exec() syscall
path) is in x86/mm/init_64.c, get_gate_vma(), which in turn is needed
by a few, relatively heavy weight, page locking/pinning routines on the
mm side (get_user_pages, for example). Patches 3 and 4 in the series
show the extent of the change.
Or am I missing something?
> > /proc/pid/mem. I will be posting the second series to lkml shortly. These
>
> Making every syscall slower for /proc/pid/mem doesn't seem like a good
> tradeoff to me. Please solve this in some other way.
>
> -Andi
--
steve
On Thu, Mar 10, 2011 at 08:38:09AM -0800, Andi Kleen wrote:
> On Thu, Mar 10, 2011 at 08:00:32AM -0800, Andi Kleen wrote:
> > On Tue, Mar 08, 2011 at 07:31:56PM -0500, Stephen Wilson wrote:
> > > The only architecture this change impacts in any significant way is x86_64.
> > > The principle change on that architecture is to mirror TIF_IA32 via
> > > a new flag in mm_context_t.
> >
> > The problem is -- you're adding a likely cache miss on mm_struct for
> > every 32bit compat syscall now, even if they don't need mm_struct
> > currently (and a lot of them do not) Unless there's a very good
> > justification to make up for this performance issue elsewhere
> > (including numbers) this seems like a bad idea.
>
> Hmm I see you're only setting it on exec time actually on rereading
> the patches. I thought you were changing TS_COMPAT which is in
> the syscall path.
>
> Never mind. I have no problems with doing such a change on exec
> time.
OK. Great! Does this mean I have your ACK'ed by or reviewed by?
Thanks for taking a look!
--
steve
On Thu, Mar 10, 2011 at 11:54:14AM -0500, Stephen Wilson wrote:
>
> On Thu, Mar 10, 2011 at 08:38:09AM -0800, Andi Kleen wrote:
> > On Thu, Mar 10, 2011 at 08:00:32AM -0800, Andi Kleen wrote:
> > > On Tue, Mar 08, 2011 at 07:31:56PM -0500, Stephen Wilson wrote:
> > > > The only architecture this change impacts in any significant way is x86_64.
> > > > The principle change on that architecture is to mirror TIF_IA32 via
> > > > a new flag in mm_context_t.
> > >
> > > The problem is -- you're adding a likely cache miss on mm_struct for
> > > every 32bit compat syscall now, even if they don't need mm_struct
> > > currently (and a lot of them do not) Unless there's a very good
> > > justification to make up for this performance issue elsewhere
> > > (including numbers) this seems like a bad idea.
> >
> > Hmm I see you're only setting it on exec time actually on rereading
> > the patches. I thought you were changing TS_COMPAT which is in
> > the syscall path.
> >
> > Never mind. I have no problems with doing such a change on exec
> > time.
>
> OK. Great! Does this mean I have your ACK'ed by or reviewed by?
I didn't read it all, but the first two patches looked ok.
-Andi