Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67;
Date:   Thu, 10 Jan 2019 11:20:04 -0600
From:   Josh Poimboeuf <jpoimboe@redhat.com>
To:     Nadav Amit <namit@vmware.com>
Cc:     X86 ML <x86@kernel.org>, LKML <linux-kernel@vger.kernel.org>,
        Ard Biesheuvel <ard.biesheuvel@linaro.org>,
        Andy Lutomirski <luto@kernel.org>,
        Steven Rostedt <rostedt@goodmis.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Ingo Molnar <mingo@kernel.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Linus Torvalds <torvalds@linux-foundation.org>,
        Masami Hiramatsu <mhiramat@kernel.org>,
        Jason Baron <jbaron@akamai.com>, Jiri Kosina <jkosina@suse.cz>,
        David Laight <David.Laight@ACULAB.COM>,
        Borislav Petkov <bp@alien8.de>,
        Julia Cartwright <julia@ni.com>, Jessica Yu <jeyu@kernel.org>,
        "H. Peter Anvin" <hpa@zytor.com>,
        Rasmus Villemoes <linux@rasmusvillemoes.dk>,
        Edward Cree <ecree@solarflare.com>,
        Daniel Bristot de Oliveira <bristot@redhat.com>
Subject: Re: [PATCH v3 5/6] x86/alternative: Use a single access in
 text_poke() where possible
Message-ID: <20190110172004.wuh45xoafynfm2df@treble>
References: <cover.1547073843.git.jpoimboe@redhat.com>
 <279b8003f7f0a6831d090ab822d37bc958f974de.1547073843.git.jpoimboe@redhat.com>
 <8138A1EE-359D-4CD2-8E96-5BF00313AB3B@vmware.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <8138A1EE-359D-4CD2-8E96-5BF00313AB3B@vmware.com>
User-Agent: NeoMutt/20180716
Sender: linux-kernel-owner@vger.kernel.org
Precedence: bulk

On Thu, Jan 10, 2019 at 09:32:23AM +0000, Nadav Amit wrote:
> > @@ -714,14 +714,39 @@ void *text_poke(void *addr, const void *opcode, size_t len)
> > 	}
> > 	BUG_ON(!pages[0]);
> > 	local_irq_save(flags);
> > +
> > 	set_fixmap(FIX_TEXT_POKE0, page_to_phys(pages[0]));
> > 	if (pages[1])
> > 		set_fixmap(FIX_TEXT_POKE1, page_to_phys(pages[1]));
> > -	vaddr = (char *)fix_to_virt(FIX_TEXT_POKE0);
> > -	memcpy(&vaddr[(unsigned long)addr & ~PAGE_MASK], opcode, len);
> > +
> > +	vaddr = fix_to_virt(FIX_TEXT_POKE0) + ((unsigned long)addr & ~PAGE_MASK);
> > +
> > +	/*
> > +	 * Use a single access where possible.  Note that a single unaligned
> > +	 * multi-byte write will not necessarily be atomic on x86-32, or if the
> > +	 * address crosses a cache line boundary.
> > +	 */
> > +	switch (len) {
> > +	case 1:
> > +		WRITE_ONCE(*(u8 *)vaddr, *(u8 *)opcode);
> > +		break;
> > +	case 2:
> > +		WRITE_ONCE(*(u16 *)vaddr, *(u16 *)opcode);
> > +		break;
> > +	case 4:
> > +		WRITE_ONCE(*(u32 *)vaddr, *(u32 *)opcode);
> > +		break;
> > +	case 8:
> > +		WRITE_ONCE(*(u64 *)vaddr, *(u64 *)opcode);
> > +		break;
> > +	default:
> > +		memcpy((void *)vaddr, opcode, len);
> > +	}
> > +
> 
> Even if Intel and AMD CPUs are guaranteed to run instructions from L1
> atomically, this may break instruction emulators, such as those that
> hypervisors use. They might not read instructions atomically if on SMP VMs
> when the VM's text_poke() races with the emulated instruction fetch.
> 
> While I can't find a reason for hypervisors to emulate this instruction,
> smarter people might find ways to turn it into a security exploit.

Interesting point... but I wonder if it's a realistic concern.  BTW,
text_poke_bp() also relies on undocumented behavior.

The entire instruction doesn't need to be read atomically; just the
32-bit call destination.  Assuming the hypervisor is x86-64, and it uses
a 32-bit access to read the call destination (which seems logical), the
intra-cacheline reads will be atomic, as stated in the SDM.

If the above assumptions are not true, and the hypervisor reads the call
destination non-atomically (which seems unlikely IMO), even then I don't
see how it could be realistically exploitable.  It would just oops from
calling a corrupt address.

-- 
Josh