LinuxLists.cc - [PATCH] KVM: X86: Avoid explictly fetch instruction in x86_decode

2020-03-04 02:17:17

Subject: [PATCH] KVM: X86: Avoid explictly fetch instruction in x86_decode_insn()

insn_fetch() will always implicitly refill instruction buffer properly
when the buffer is empty, so we don't need to explicitly fetch it even
if insn_len==0 for x86_decode_insn().

Signed-off-by: Peter Xu <[email protected]>
---
arch/x86/kvm/emulate.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index dd19fb3539e0..04f33c1ca926 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -5175,11 +5175,6 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
ctxt->opcode_len = 1;
if (insn_len > 0)
memcpy(ctxt->fetch.data, insn, insn_len);
- else {
- rc = __do_insn_fetch_bytes(ctxt, 1);
- if (rc != X86EMUL_CONTINUE)
- goto done;
- }

switch (mode) {
case X86EMUL_MODE_REAL:
--
2.24.1

2020-03-04 02:37:38

by Miaohe Lin

[permalink] [raw]

Subject: Re: [PATCH] KVM: X86: Avoid explictly fetch instruction in x86_decode_insn()

Hi:
Peter Xu <[email protected]> writes:
>insn_fetch() will always implicitly refill instruction buffer properly when the buffer is empty, so we don't need to explicitly fetch it even if insn_len==0 for x86_decode_insn().
>
>Signed-off-by: Peter Xu <[email protected]>
>---
> arch/x86/kvm/emulate.c | 5 -----
> 1 file changed, 5 deletions(-)
>
>diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index dd19fb3539e0..04f33c1ca926 100644
>--- a/arch/x86/kvm/emulate.c
>+++ b/arch/x86/kvm/emulate.c
>@@ -5175,11 +5175,6 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
> ctxt->opcode_len = 1;
> if (insn_len > 0)
> memcpy(ctxt->fetch.data, insn, insn_len);
>- else {
>- rc = __do_insn_fetch_bytes(ctxt, 1);
>- if (rc != X86EMUL_CONTINUE)
>- goto done;
>- }
>
> switch (mode) {
> case X86EMUL_MODE_REAL:

Looks good, thanks. But it seems we should also take care of the comment in __do_insn_fetch_bytes(), as we do not
load instruction at the beginning of x86_decode_insn() now, which may be misleading:
/*
* One instruction can only straddle two pages,
* and one has been loaded at the beginning of
* x86_decode_insn. So, if not enough bytes
* still, we must have hit the 15-byte boundary.
*/
if (unlikely(size < op_size))
return emulate_gp(ctxt, 0);

2020-03-04 07:31:17

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH] KVM: X86: Avoid explictly fetch instruction in x86_decode_insn()

On 04/03/20 03:37, linmiaohe wrote:
> Hi:
> Peter Xu <[email protected]> writes:
>> insn_fetch() will always implicitly refill instruction buffer properly when the buffer is empty, so we don't need to explicitly fetch it even if insn_len==0 for x86_decode_insn().
>>
>> Signed-off-by: Peter Xu <[email protected]>
>> ---
>> arch/x86/kvm/emulate.c | 5 -----
>> 1 file changed, 5 deletions(-)
>>
>> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index dd19fb3539e0..04f33c1ca926 100644
>> --- a/arch/x86/kvm/emulate.c
>> +++ b/arch/x86/kvm/emulate.c
>> @@ -5175,11 +5175,6 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
>> ctxt->opcode_len = 1;
>> if (insn_len > 0)
>> memcpy(ctxt->fetch.data, insn, insn_len);
>> - else {
>> - rc = __do_insn_fetch_bytes(ctxt, 1);
>> - if (rc != X86EMUL_CONTINUE)
>> - goto done;
>> - }
>>
>> switch (mode) {
>> case X86EMUL_MODE_REAL:

This is a a small (but measurable) optimization; going through
__do_insn_fetch_bytes instead of do_insn_fetch_bytes is a little bit
faster because it lets you mark the branch in do_insn_fetch_bytes as
unlikely, and in general it allows the branch predictor to do a better job.

Paolo

> Looks good, thanks. But it seems we should also take care of the comment in __do_insn_fetch_bytes(), as we do not
> load instruction at the beginning of x86_decode_insn() now, which may be misleading:
> /*
> * One instruction can only straddle two pages,
> * and one has been loaded at the beginning of
> * x86_decode_insn. So, if not enough bytes
> * still, we must have hit the 15-byte boundary.
> */
> if (unlikely(size < op_size))
> return emulate_gp(ctxt, 0);
>

2020-03-04 15:19:23

by Peter Xu

[permalink] [raw]

Subject: Re: [PATCH] KVM: X86: Avoid explictly fetch instruction in x86_decode_insn()

On Wed, Mar 04, 2020 at 08:30:49AM +0100, Paolo Bonzini wrote:
> On 04/03/20 03:37, linmiaohe wrote:
> > Hi:
> > Peter Xu <[email protected]> writes:
> >> insn_fetch() will always implicitly refill instruction buffer properly when the buffer is empty, so we don't need to explicitly fetch it even if insn_len==0 for x86_decode_insn().
> >>
> >> Signed-off-by: Peter Xu <[email protected]>
> >> ---
> >> arch/x86/kvm/emulate.c | 5 -----
> >> 1 file changed, 5 deletions(-)
> >>
> >> diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index dd19fb3539e0..04f33c1ca926 100644
> >> --- a/arch/x86/kvm/emulate.c
> >> +++ b/arch/x86/kvm/emulate.c
> >> @@ -5175,11 +5175,6 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
> >> ctxt->opcode_len = 1;
> >> if (insn_len > 0)
> >> memcpy(ctxt->fetch.data, insn, insn_len);
> >> - else {
> >> - rc = __do_insn_fetch_bytes(ctxt, 1);
> >> - if (rc != X86EMUL_CONTINUE)
> >> - goto done;
> >> - }
> >>
> >> switch (mode) {
> >> case X86EMUL_MODE_REAL:
>
> This is a a small (but measurable) optimization; going through
> __do_insn_fetch_bytes instead of do_insn_fetch_bytes is a little bit
> faster because it lets you mark the branch in do_insn_fetch_bytes as
> unlikely, and in general it allows the branch predictor to do a better job.

Ah I see, that makes sense. Thanks!

--
Peter Xu

2020-03-04 15:33:27

by Peter Xu

[permalink] [raw]

Subject: Re: [PATCH] KVM: X86: Avoid explictly fetch instruction in x86_decode_insn()

On Wed, Mar 04, 2020 at 02:37:06AM +0000, linmiaohe wrote:
> Hi:
> Peter Xu <[email protected]> writes:
> >insn_fetch() will always implicitly refill instruction buffer properly when the buffer is empty, so we don't need to explicitly fetch it even if insn_len==0 for x86_decode_insn().
> >
> >Signed-off-by: Peter Xu <[email protected]>
> >---
> > arch/x86/kvm/emulate.c | 5 -----
> > 1 file changed, 5 deletions(-)
> >
> >diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index dd19fb3539e0..04f33c1ca926 100644
> >--- a/arch/x86/kvm/emulate.c
> >+++ b/arch/x86/kvm/emulate.c
> >@@ -5175,11 +5175,6 @@ int x86_decode_insn(struct x86_emulate_ctxt *ctxt, void *insn, int insn_len)
> > ctxt->opcode_len = 1;
> > if (insn_len > 0)
> > memcpy(ctxt->fetch.data, insn, insn_len);
> >- else {
> >- rc = __do_insn_fetch_bytes(ctxt, 1);
> >- if (rc != X86EMUL_CONTINUE)
> >- goto done;
> >- }
> >
> > switch (mode) {
> > case X86EMUL_MODE_REAL:
>
> Looks good, thanks. But it seems we should also take care of the comment in __do_insn_fetch_bytes(), as we do not
> load instruction at the beginning of x86_decode_insn() now, which may be misleading:
> /*
> * One instruction can only straddle two pages,
> * and one has been loaded at the beginning of
> * x86_decode_insn. So, if not enough bytes
> * still, we must have hit the 15-byte boundary.
> */
> if (unlikely(size < op_size))
> return emulate_gp(ctxt, 0);

Right, thanks for spotting that (even if the patch to be dropped :).

I guess not only the comment, but the check might even fail if we
apply the patch. Because when the fetch is the 1st attempt and
unluckily that acrosses one page boundary (because we'll only fetch
until either 15 bytes or the page boundary), so that single fetch
could be smaller than op_size provided.

Thanks,

--
Peter Xu

2020-03-04 18:21:06

by Paolo Bonzini

[permalink] [raw]

Subject: Re: [PATCH] KVM: X86: Avoid explictly fetch instruction in x86_decode_insn()

On 04/03/20 16:32, Peter Xu wrote:
>> Looks good, thanks. But it seems we should also take care of the comment in __do_insn_fetch_bytes(), as we do not
>> load instruction at the beginning of x86_decode_insn() now, which may be misleading:
>> /*
>> * One instruction can only straddle two pages,
>> * and one has been loaded at the beginning of
>> * x86_decode_insn. So, if not enough bytes
>> * still, we must have hit the 15-byte boundary.
>> */
>> if (unlikely(size < op_size))
>> return emulate_gp(ctxt, 0);
> Right, thanks for spotting that (even if the patch to be dropped :).
>
> I guess not only the comment, but the check might even fail if we
> apply the patch. Because when the fetch is the 1st attempt and
> unluckily that acrosses one page boundary (because we'll only fetch
> until either 15 bytes or the page boundary), so that single fetch
> could be smaller than op_size provided.

Right, priming the decode cache with one byte from the current page
cannot fail, and then we know that the next call must be at the
beginning of the next page.

Paolo

2020-03-04 20:30:51

by Nadav Amit

[permalink] [raw]

Subject: Re: [PATCH] KVM: X86: Avoid explictly fetch instruction in x86_decode_insn()

> On Mar 4, 2020, at 10:19 AM, Paolo Bonzini <[email protected]> wrote:
>
> On 04/03/20 16:32, Peter Xu wrote:
>>> Looks good, thanks. But it seems we should also take care of the comment in __do_insn_fetch_bytes(), as we do not
>>> load instruction at the beginning of x86_decode_insn() now, which may be misleading:
>>> /*
>>> * One instruction can only straddle two pages,
>>> * and one has been loaded at the beginning of
>>> * x86_decode_insn. So, if not enough bytes
>>> * still, we must have hit the 15-byte boundary.
>>> */
>>> if (unlikely(size < op_size))
>>> return emulate_gp(ctxt, 0);
>> Right, thanks for spotting that (even if the patch to be dropped :).
>>
>> I guess not only the comment, but the check might even fail if we
>> apply the patch. Because when the fetch is the 1st attempt and
>> unluckily that acrosses one page boundary (because we'll only fetch
>> until either 15 bytes or the page boundary), so that single fetch
>> could be smaller than op_size provided.
>
> Right, priming the decode cache with one byte from the current page
> cannot fail, and then we know that the next call must be at the
> beginning of the next page.

IIRC I encountered (and fixed) a similar KVM bug in the past. It is a shame
I never wrote a unit test (and I don’t have time now), but it would be nice
to have one.