Message-ID: <1493170764.36058.19.camel@ranerica-desktop>
Subject: Re: [v6 PATCH 02/21] x86/mpx: Do not use SIB index if index points
 to R/ESP
From: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
To: Borislav Petkov <bp@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>, Thomas Gleixner <tglx@linutronix.de>,
        "H. Peter Anvin" <hpa@zytor.com>, Andy Lutomirski <luto@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Andrew Morton <akpm@linux-foundation.org>,
        Brian Gerst <brgerst@gmail.com>, Chris Metcalf <cmetcalf@mellanox.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Paolo Bonzini <pbonzini@redhat.com>, Liang Z Li <liang.z.li@intel.com>,
        Masami Hiramatsu <mhiramat@kernel.org>, Huang Rui <ray.huang@amd.com>,
        Jiri Slaby <jslaby@suse.cz>, Jonathan Corbet <corbet@lwn.net>,
        "Michael S. Tsirkin" <mst@redhat.com>,
        Paul Gortmaker <paul.gortmaker@windriver.com>,
        Vlastimil Babka <vbabka@suse.cz>, Chen Yucong <slaoub@gmail.com>,
        Alexandre Julliard <julliard@winehq.org>, Stas Sergeev <stsp@list.ru>,
        Fenghua Yu <fenghua.yu@intel.com>,
        "Ravi V. Shankar" <ravi.v.shankar@intel.com>,
        Shuah Khan <shuah@kernel.org>, linux-kernel@vger.kernel.org,
        x86@kernel.org, linux-msdos@vger.kernel.org, wine-devel@winehq.org,
        Adam Buchbinder <adam.buchbinder@gmail.com>,
        Colin Ian King <colin.king@canonical.com>,
        Lorenzo Stoakes <lstoakes@gmail.com>,
        Qiaowei Ren <qiaowei.ren@intel.com>,
        Nathan Howard <liverlint@gmail.com>,
        Adan Hawthorn <adanhawthorn@gmail.com>, Joe Perches <joe@perches.com>
Date: Tue, 25 Apr 2017 18:39:24 -0700
In-Reply-To: <20170411113126.ygq52fdoe6rp6jzw@pd.tnic>
References: <20170308003254.27833-1-ricardo.neri-calderon@linux.intel.com>
         <20170308003254.27833-3-ricardo.neri-calderon@linux.intel.com>
         <20170411113126.ygq52fdoe6rp6jzw@pd.tnic>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 6152
Lines: 158

Hi Boris,

I am sorry I missed your feedback earlier. Thanks for commenting!

On Tue, 2017-04-11 at 13:31 +0200, Borislav Petkov wrote:
> On Tue, Mar 07, 2017 at 04:32:35PM -0800, Ricardo Neri wrote:
> > Section 2.2.1.2 of the Intel 64 and IA-32 Architectures Software
> > Developer's Manual volume 2A states that when memory addressing is used
> > (i.e., mod part of ModR/M is not 3), a SIB byte is used and the index of
> > the SIB byte points to the R/ESP (i.e., index = 4), the index should not be
> > used in the computation of the memory address.
> > 
> > In these cases the address is simply the value present in the register
> > pointed by the base part of the SIB byte plus the displacement byte.
> > 
> > An example of such instruction could be
> > 
> >     insn -0x80(%rsp)
> > 
> > This is represented as:
> > 
> >      [opcode] 4c 23 80
> > 
> >       ModR/M=0x4c: mod: 0x1, reg: 0x1: r/m: 0x4(R/ESP)
> >       SIB=0x23: sc: 0, index: 0x100(R/ESP), base: 0x11(R/EBX):
> >       Displacement -0x80
> > 
> > The correct address is (base) + displacement; no index is used.
> > 
> > We can achieve the desired effect of not using the index by making
> > get_reg_offset return -EDOM in this particular case. This value indicates
> > callers that they should not use the index to calculate the address.
> > EINVAL continues to indicate that an error when decoding the SIB byte.
> > 
> > Care is taken to allow R12 to be used as index, which is a valid scenario.
> > 
> > Cc: Dave Hansen <dave.hansen@linux.intel.com>
> > Cc: Adam Buchbinder <adam.buchbinder@gmail.com>
> > Cc: Colin Ian King <colin.king@canonical.com>
> > Cc: Lorenzo Stoakes <lstoakes@gmail.com>
> > Cc: Qiaowei Ren <qiaowei.ren@intel.com>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: Nathan Howard <liverlint@gmail.com>
> > Cc: Adan Hawthorn <adanhawthorn@gmail.com>
> > Cc: Joe Perches <joe@perches.com>
> > Cc: Ravi V. Shankar <ravi.v.shankar@intel.com>
> > Cc: x86@kernel.org
> > Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
> > ---
> >  arch/x86/mm/mpx.c | 19 +++++++++++++++++--
> >  1 file changed, 17 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/mm/mpx.c b/arch/x86/mm/mpx.c
> > index ff112e3..d9e92d6 100644
> > --- a/arch/x86/mm/mpx.c
> > +++ b/arch/x86/mm/mpx.c
> > @@ -110,6 +110,13 @@ static int get_reg_offset(struct insn *insn, struct pt_regs *regs,
> >  		regno = X86_SIB_INDEX(insn->sib.value);
> >  		if (X86_REX_X(insn->rex_prefix.value))
> >  			regno += 8;
> > +		/*
> > +		 * If mod !=3, register R/ESP (regno=4) is not used as index in
> > +		 * the address computation. Check is done after looking at REX.X
> > +		 * This is because R12 (regno=12) can be used as an index.
> > +		 */
> > +		if (regno == 4 && X86_MODRM_MOD(insn->modrm.value) != 3)
> > +			return -EDOM;
> 
> Hmm, ok, so this is a bit confusing, to me at least. Maybe you're saying
> the same things but here's how I see it:
> 
> 1. When ModRM.mod != 11b and ModRM.rm == 100b, all that does mean
> is that you have a SIB byte following. I.e., you have indexed
> register-indirect addressing.

Yes, callers of this function already know that there is a SIB byte
because they saw ModRM.mod != 11b and ModRM.rm == 100b and struct
insn.sib.nbytes is non zero.
> 
> Now, you still need to decode the SIB byte and it goes this way:
> 
> SIB.index == 100b means that the index register specification is
> null, i.e., the scale*index portion of that indexed register-indirect
> addressing is null, i.e., you have an offset following the SIB byte.
> Now, depending on ModRM.mod, that offset is:

Yes, for this reason if ModRM.rm != 11b and an index of 100b is found
the function return -EDOM to indicate callers to not use the index. We
need to return -EDOM because this function returns an offset from the
base of struct pt_regs for successful cases. A negative value indicates
to not use the offset.

Perhaps a better wording is to say as you propose: the scale*index
portion that indexed register-indirect addressing is null. I will take
your wording!
> 
> ModRM.mod == 01b -> 1 byte offset
> ModRM.mod == 10b -> 4 bytes offset

Callers will now the size of the offset based on struct
insn.displacement.value.

> 
> That's why for an instruction like this one (let's use your example) you
> have:
> 
> 	8b 4c 23 80             mov    -0x80(%rbx,%riz,1),%ecx
> 
> That's basically a binutils hack to state that the SIB index register is
> null.
> 
> Another SIB index register works, of course:
> 
> 	 8b 4c 03 80             mov -0x80(%rbx,%rax,1),%ecx
> 
> Ok, so far so good.
> 
> 2. Now, the %r12 thing is part of the REX implications to those
> encodings: That's the REX.X bit which adds a fourth bit to the encoding
> of the SIB base register, i.e., if you specify a register with
> SIB.index, you want to be able to specify all 16 regs, thus the 4th
> bit. That's why it says that the SIB byte is required for %r12-based
> addressing.
> 
> I.e., you can still have a SIB.index == 100b addressing with an index
> register which is not null but that is only because SIB.index is now
> {REX.X=1b, 100b}, i.e.:
> 
> Prefixes:
>  REX:                   0x43 { 4 [w]: 0 [r]: 0 [x]: 1 [b]: 1 }
> Opcode:                 0x8b
> ModRM:                  0x4c  [mod:1b][.R:0b,reg:1b][.B:1b,r/m:1100b]
>                         register-indirect mode, 1-byte offset in displ. field
> SIB:                    0x63 [.B:1b,base:1011b][.X:1b,idx:1100b][scale: 1]
> 
>  MOV Gv,Ev; MOV reg{16,32,64} reg/mem{16,32,64}
>                0:       43 8b 4c 63 80          mov -0x80(%r11,%r12,2),%ecx

Correct, that is why we check the value of regno (the value of ModRM.rm)
after correcting its value in case we find REX.X set. In this way we can
use %r12.
> 
> So, I'm not saying your version is necessarily wrong - I'm just saying
> that it could explain the situation a bit more verbose.

Sure. I will be more verbose in my commit message.
> 
> Btw, I'd flip the if-test above:
> 
> 	if (X86_MODRM_MOD(insn->modrm.value) != 3 && regno == 4)
> 
> to make it just like the order the conditions are specified in the
> manuals.

Will do.

Thanks and BR,
Ricardo