Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934512AbYCFOTT (ORCPT ); Thu, 6 Mar 2008 09:19:19 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1761799AbYCFOTB (ORCPT ); Thu, 6 Mar 2008 09:19:01 -0500 Received: from tomts10-srv.bellnexxia.net ([209.226.175.54]:43928 "EHLO tomts10-srv.bellnexxia.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761062AbYCFOS7 (ORCPT ); Thu, 6 Mar 2008 09:18:59 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ao8CACOMz0dMQWoK/2dsb2JhbACBVqpY Date: Thu, 6 Mar 2008 09:18:55 -0500 From: Mathieu Desnoyers To: Ingo Molnar Cc: Srinivasa DS , Andrew Morton , linux-kernel@vger.kernel.org, ananth@in.ibm.com, Jim Keniston , srikar@linux.vnet.ibm.com, SystemTAP , Andi Kleen , pageexec@freemail.hu, Thomas Gleixner , Ingo Molnar , "H. Peter Anvin" , Jeremy Fitzhardinge , Arjan van de Ven Subject: Re: [PATCH] x86 - Enhance DEBUG_RODATA support - alternatives Message-ID: <20080306141855.GA5767@Krystal> References: <200803061811.27555.srinivasa@in.ibm.com> <20080306125551.GA1597@Krystal> <200803061903.09003.srinivasa@in.ibm.com> <20080306134849.GA4088@Krystal> <20080306140105.GC28438@elte.hu> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline In-Reply-To: <20080306140105.GC28438@elte.hu> X-Editor: vi X-Info: http://krystal.dyndns.org:8080 X-Operating-System: Linux/2.6.21.3-grsec (i686) X-Uptime: 09:12:33 up 6 days, 10:23, 4 users, load average: 1.78, 1.05, 0.80 User-Agent: Mutt/1.5.16 (2007-06-11) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 11572 Lines: 335 * Ingo Molnar (mingo@elte.hu) wrote: > > [ Cc:-ed Arjan - he wrote and handles the DEBUG_RODATA bits. ] > > * Mathieu Desnoyers wrote: > > > Hrm, yeah, that's because http://lkml.org/lkml/2008/2/2/227 > > > > x86 - Enhance DEBUG_RODATA support - alternatives > > > > has been pulled out of the x86 tree. It implements the correct > > text_poke required to support this. It's been removed because Xen does > > not support the WP bit in the guest kernels. > > > > Can you try my new replacement implementation ? It creates another > > mapping instead of modifying the WP bit. > > thanks, applied it to x86.git. > > note, there were rejects and i had to hand-merge it - in general it's > best to send development x86 patches against: > > http://people.redhat.com/mingo/x86.git/README > > find the merged patch is below - i hope i merged it right ;) > Yep, it looks good. Ok, I'll try to update my patches against the x86 tree as much as possible in the future before submission. Due to the number of trees I work in (mainline for LTTng, vm.git for slub development, x86...), I may happen to forget to port my patches to the right tree. I'll do my best. Thanks, Mathieu > Ingo > > -------------> > Subject: x86: enhance DEBUG_RODATA support - alternatives > From: Mathieu Desnoyers > Date: Thu, 6 Mar 2008 08:48:49 -0500 > > Fix a memcpy that should be a text_poke (in apply_alternatives). > > Use kernel_wp_save/kernel_wp_restore in text_poke to support DEBUG_RODATA > correctly and so the CPU HOTPLUG special case can be removed. > > Add text_poke_early, for alternatives and paravirt boot-time and module load > time patching. > > Changelog: > > - Fix text_set and text_poke alignment check (mixed up bitwise and and or) > - Remove text_set > - Export add_nops, so it can be used by others. > - Document text_poke_early. > - Remove clflush, since it breaks some VIA architectures and is not strictly > necessary. > - Add kerneldoc to text_poke and text_poke_early. > - Create a second vmap instead of using the WP bit to support Xen and VMI. > - Move local_irq disable within text_poke and text_poke_early to be able to > be sleepable in these functions. > > Signed-off-by: Mathieu Desnoyers > CC: Andi Kleen > CC: pageexec@freemail.hu > CC: H. Peter Anvin > CC: Jeremy Fitzhardinge > Signed-off-by: Ingo Molnar > --- > arch/x86/kernel/alternative.c | 88 +++++++++++++++++++++++++++++++----------- > include/asm-x86/alternative.h | 23 ++++++++++ > 2 files changed, 87 insertions(+), 24 deletions(-) > > Index: linux-x86.q/arch/x86/kernel/alternative.c > =================================================================== > --- linux-x86.q.orig/arch/x86/kernel/alternative.c > +++ linux-x86.q/arch/x86/kernel/alternative.c > @@ -11,6 +11,8 @@ > #include > #include > #include > +#include > +#include > > #define MAX_PATCH_LEN (255-1) > > @@ -173,7 +175,7 @@ static const unsigned char*const * find_ > #endif /* CONFIG_X86_64 */ > > /* Use this to add nops to a buffer, then text_poke the whole buffer. */ > -static void add_nops(void *insns, unsigned int len) > +void add_nops(void *insns, unsigned int len) > { > const unsigned char *const *noptable = find_nop_table(); > > @@ -186,6 +188,7 @@ static void add_nops(void *insns, unsign > len -= noplen; > } > } > +EXPORT_SYMBOL_GPL(add_nops); > > extern struct alt_instr __alt_instructions[], __alt_instructions_end[]; > extern u8 *__smp_locks[], *__smp_locks_end[]; > @@ -219,7 +222,7 @@ void apply_alternatives(struct alt_instr > memcpy(insnbuf, a->replacement, a->replacementlen); > add_nops(insnbuf + a->replacementlen, > a->instrlen - a->replacementlen); > - text_poke(instr, insnbuf, a->instrlen); > + text_poke_early(instr, insnbuf, a->instrlen); > } > } > > @@ -280,7 +283,6 @@ void alternatives_smp_module_add(struct > void *text, void *text_end) > { > struct smp_alt_module *smp; > - unsigned long flags; > > if (noreplace_smp) > return; > @@ -306,39 +308,37 @@ void alternatives_smp_module_add(struct > __func__, smp->locks, smp->locks_end, > smp->text, smp->text_end, smp->name); > > - spin_lock_irqsave(&smp_alt, flags); > + spin_lock(&smp_alt); > list_add_tail(&smp->next, &smp_alt_modules); > if (boot_cpu_has(X86_FEATURE_UP)) > alternatives_smp_unlock(smp->locks, smp->locks_end, > smp->text, smp->text_end); > - spin_unlock_irqrestore(&smp_alt, flags); > + spin_unlock(&smp_alt); > } > > void alternatives_smp_module_del(struct module *mod) > { > struct smp_alt_module *item; > - unsigned long flags; > > if (smp_alt_once || noreplace_smp) > return; > > - spin_lock_irqsave(&smp_alt, flags); > + spin_lock(&smp_alt); > list_for_each_entry(item, &smp_alt_modules, next) { > if (mod != item->mod) > continue; > list_del(&item->next); > - spin_unlock_irqrestore(&smp_alt, flags); > + spin_unlock(&smp_alt); > DPRINTK("%s: %s\n", __func__, item->name); > kfree(item); > return; > } > - spin_unlock_irqrestore(&smp_alt, flags); > + spin_unlock(&smp_alt); > } > > void alternatives_smp_switch(int smp) > { > struct smp_alt_module *mod; > - unsigned long flags; > > #ifdef CONFIG_LOCKDEP > /* > @@ -355,7 +355,7 @@ void alternatives_smp_switch(int smp) > return; > BUG_ON(!smp && (num_online_cpus() > 1)); > > - spin_lock_irqsave(&smp_alt, flags); > + spin_lock(&smp_alt); > > /* > * Avoid unnecessary switches because it forces JIT based VMs to > @@ -379,7 +379,7 @@ void alternatives_smp_switch(int smp) > mod->text, mod->text_end); > } > smp_mode = smp; > - spin_unlock_irqrestore(&smp_alt, flags); > + spin_unlock(&smp_alt); > } > > #endif > @@ -407,7 +407,7 @@ void apply_paravirt(struct paravirt_patc > > /* Pad the rest with nops */ > add_nops(insnbuf + used, p->len - used); > - text_poke(p->instr, insnbuf, p->len); > + text_poke_early(p->instr, insnbuf, p->len); > } > } > extern struct paravirt_patch_site __start_parainstructions[], > @@ -416,8 +416,6 @@ extern struct paravirt_patch_site __star > > void __init alternative_instructions(void) > { > - unsigned long flags; > - > /* The patching is not fully atomic, so try to avoid local interruptions > that might execute the to be patched code. > Other CPUs are not running. */ > @@ -426,7 +424,6 @@ void __init alternative_instructions(voi > stop_mce(); > #endif > > - local_irq_save(flags); > apply_alternatives(__alt_instructions, __alt_instructions_end); > > /* switch to patch-once-at-boottime-only mode and free the > @@ -458,7 +455,6 @@ void __init alternative_instructions(voi > } > #endif > apply_paravirt(__parainstructions, __parainstructions_end); > - local_irq_restore(flags); > > if (smp_alt_once) > free_init_pages("SMP alternatives", > @@ -471,18 +467,64 @@ void __init alternative_instructions(voi > #endif > } > > -/* > - * Warning: > +/** > + * text_poke_early - Update instructions on a live kernel at boot time > + * @addr: address to modify > + * @opcode: source of the copy > + * @len: length to copy > + * > * When you use this code to patch more than one byte of an instruction > * you need to make sure that other CPUs cannot execute this code in parallel. > - * Also no thread must be currently preempted in the middle of these instructions. > - * And on the local CPU you need to be protected again NMI or MCE handlers > - * seeing an inconsistent instruction while you patch. > + * Also no thread must be currently preempted in the middle of these > + * instructions. And on the local CPU you need to be protected again NMI or MCE > + * handlers seeing an inconsistent instruction while you patch. > */ > -void __kprobes text_poke(void *addr, unsigned char *opcode, int len) > +void *text_poke_early(void *addr, const void *opcode, size_t len) > { > + unsigned long flags; > + local_irq_save(flags); > memcpy(addr, opcode, len); > + local_irq_restore(flags); > + sync_core(); > + /* Could also do a CLFLUSH here to speed up CPU recovery; but > + that causes hangs on some VIA CPUs. */ > + return addr; > +} > + > +/** > + * text_poke - Update instructions on a live kernel > + * @addr: address to modify > + * @opcode: source of the copy > + * @len: length to copy > + * > + * Only atomic text poke/set should be allowed when not doing early patching. > + * It means the size must be writable atomically and the address must be aligned > + * in a way that permits an atomic write. It also makes sure we fit on a single > + * page. > + */ > +void *__kprobes text_poke(void *addr, const void *opcode, size_t len) > +{ > + unsigned long flags; > + char *vaddr; > + int nr_pages = 2; > + > + BUG_ON(len > sizeof(long)); > + BUG_ON((((long)addr + len - 1) & ~(sizeof(long) - 1)) > + - ((long)addr & ~(sizeof(long) - 1))); > + { > + struct page *pages[2] = { virt_to_page(addr), > + virt_to_page(addr + PAGE_SIZE) }; > + if (!pages[1]) > + nr_pages = 1; > + vaddr = vmap(pages, nr_pages, VM_MAP, PAGE_KERNEL); > + WARN_ON(!vaddr); > + local_irq_save(flags); > + memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len); > + local_irq_restore(flags); > + vunmap(vaddr); > + } > sync_core(); > /* Could also do a CLFLUSH here to speed up CPU recovery; but > that causes hangs on some VIA CPUs. */ > + return addr; > } > Index: linux-x86.q/include/asm-x86/alternative.h > =================================================================== > --- linux-x86.q.orig/include/asm-x86/alternative.h > +++ linux-x86.q/include/asm-x86/alternative.h > @@ -156,6 +156,27 @@ apply_paravirt(struct paravirt_patch_sit > #define __parainstructions_end NULL > #endif > > -extern void text_poke(void *addr, unsigned char *opcode, int len); > +extern void add_nops(void *insns, unsigned int len); > + > +/* > + * Clear and restore the kernel write-protection flag on the local CPU. > + * Allows the kernel to edit read-only pages. > + * Side-effect: any interrupt handler running between save and restore will have > + * the ability to write to read-only pages. > + * > + * Warning: > + * Code patching in the UP case is safe if NMIs and MCE handlers are stopped and > + * no thread can be preempted in the instructions being modified (no iret to an > + * invalid instruction possible) or if the instructions are changed from a > + * consistent state to another consistent state atomically. > + * More care must be taken when modifying code in the SMP case because of > + * Intel's errata. > + * On the local CPU you need to be protected again NMI or MCE handlers seeing an > + * inconsistent instruction while you patch. > + * The _early version expects the memory to already be RW. > + */ > + > +extern void *text_poke(void *addr, const void *opcode, size_t len); > +extern void *text_poke_early(void *addr, const void *opcode, size_t len); > > #endif /* _ASM_X86_ALTERNATIVE_H */ -- Mathieu Desnoyers Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/