The current JIT uses the following sequence to zero-extend into the
upper 32 bits of the destination register for BPF_LDX BPF_{B,H,W},
when the destination register is not on the stack:
EMIT3(0xC7, add_1reg(0xC0, dst_hi), 0);
However, this is not a valid instruction on x86.
This patch fixes the problem by instead emitting "xor dst_hi,dst_hi"
to clear the upper 32 bits.
This bug may not be currently triggerable as BPF_REG_AX is the only
register not stored on the stack and the verifier uses it in a limited
way, and the verifier implements a zero-extension optimization. But the
JIT should avoid emitting invalid instructions regardless.
Fixes: 03f5781be2c7b ("bpf, x86_32: add eBPF JIT compiler for ia32")
Signed-off-by: Xi Wang <[email protected]>
Signed-off-by: Luke Nelson <[email protected]>
---
arch/x86/net/bpf_jit_comp32.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index 4d2a7a764602..cc9ad3892ea6 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -1854,7 +1854,9 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
STACK_VAR(dst_hi));
EMIT(0x0, 4);
} else {
- EMIT3(0xC7, add_1reg(0xC0, dst_hi), 0);
+ /* xor dst_hi,dst_hi */
+ EMIT2(0x33,
+ add_2reg(0xC0, dst_hi, dst_hi));
}
break;
case BPF_DW:
--
2.17.1
The current JIT clobbers the destination register for BPF_JSET BPF_X
and BPF_K by using "and" and "or" instructions. This is fine when the
destination register is a temporary loaded from a register stored on
the stack but not otherwise.
This patch fixes the problem (for both BPF_K and BPF_X) by always loading
the destination register into temporaries since BPF_JSET should not
modify the destination register.
This bug may not be currently triggerable as BPF_REG_AX is the only
register not stored on the stack and the verifier uses it in a limited
way.
Fixes: 03f5781be2c7b ("bpf, x86_32: add eBPF JIT compiler for ia32")
Signed-off-by: Xi Wang <[email protected]>
Signed-off-by: Luke Nelson <[email protected]>
---
arch/x86/net/bpf_jit_comp32.c | 22 ++++++++++++++++++----
1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/arch/x86/net/bpf_jit_comp32.c b/arch/x86/net/bpf_jit_comp32.c
index cc9ad3892ea6..ba7d9ccfc662 100644
--- a/arch/x86/net/bpf_jit_comp32.c
+++ b/arch/x86/net/bpf_jit_comp32.c
@@ -2015,8 +2015,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
case BPF_JMP | BPF_JSET | BPF_X:
case BPF_JMP32 | BPF_JSET | BPF_X: {
bool is_jmp64 = BPF_CLASS(insn->code) == BPF_JMP;
- u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
- u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+ u8 dreg_lo = IA32_EAX;
+ u8 dreg_hi = IA32_EDX;
u8 sreg_lo = sstk ? IA32_ECX : src_lo;
u8 sreg_hi = sstk ? IA32_EBX : src_hi;
@@ -2028,6 +2028,13 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
add_2reg(0x40, IA32_EBP,
IA32_EDX),
STACK_VAR(dst_hi));
+ } else {
+ /* mov dreg_lo,dst_lo */
+ EMIT2(0x89, add_2reg(0xC0, dreg_lo, dst_lo));
+ if (is_jmp64)
+ /* mov dreg_hi,dst_hi */
+ EMIT2(0x89,
+ add_2reg(0xC0, dreg_hi, dst_hi));
}
if (sstk) {
@@ -2052,8 +2059,8 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
case BPF_JMP | BPF_JSET | BPF_K:
case BPF_JMP32 | BPF_JSET | BPF_K: {
bool is_jmp64 = BPF_CLASS(insn->code) == BPF_JMP;
- u8 dreg_lo = dstk ? IA32_EAX : dst_lo;
- u8 dreg_hi = dstk ? IA32_EDX : dst_hi;
+ u8 dreg_lo = IA32_EAX;
+ u8 dreg_hi = IA32_EDX;
u8 sreg_lo = IA32_ECX;
u8 sreg_hi = IA32_EBX;
u32 hi;
@@ -2066,6 +2073,13 @@ static int do_jit(struct bpf_prog *bpf_prog, int *addrs, u8 *image,
add_2reg(0x40, IA32_EBP,
IA32_EDX),
STACK_VAR(dst_hi));
+ } else {
+ /* mov dreg_lo,dst_lo */
+ EMIT2(0x89, add_2reg(0xC0, dreg_lo, dst_lo));
+ if (is_jmp64)
+ /* mov dreg_hi,dst_hi */
+ EMIT2(0x89,
+ add_2reg(0xC0, dreg_hi, dst_hi));
}
/* mov ecx,imm32 */
--
2.17.1
On 2020-04-21 10:15, Luke Nelson wrote:
> The current JIT uses the following sequence to zero-extend into the
> upper 32 bits of the destination register for BPF_LDX BPF_{B,H,W},
> when the destination register is not on the stack:
>
> EMIT3(0xC7, add_1reg(0xC0, dst_hi), 0);
>
> However, this is not a valid instruction on x86.
>
> This patch fixes the problem by instead emitting "xor dst_hi,dst_hi"
> to clear the upper 32 bits.
x32 is not x86-32. In Linux we generally call the latter "i386".
C7 /0 imm32 is a valid instruction on i386. However, it is also
inefficient when the destination is a register, because B8+r imm32 is
equivalent, and when the value is zero, XOR is indeed more efficient.
The real error is using EMIT3() instead of EMIT2_off32(), but XOR is
more efficient. However, let's make the bug statement *correct*, or it
is going to confuse the Hades out of people in the future.
-hpa
On Tue, Apr 21, 2020 at 10:39 AM H. Peter Anvin <[email protected]> wrote:
> x32 is not x86-32. In Linux we generally call the latter "i386".
Agreed. Most of the previous patches to this file use "x32" and this
one just wanted to be consistent.
> C7 /0 imm32 is a valid instruction on i386. However, it is also
> inefficient when the destination is a register, because B8+r imm32 is
> equivalent, and when the value is zero, XOR is indeed more efficient.
>
> The real error is using EMIT3() instead of EMIT2_off32(), but XOR is
> more efficient. However, let's make the bug statement *correct*, or it
> is going to confuse the Hades out of people in the future.
I don't see how the bug statement is incorrect, which merely points
out that "C7 C0 0" is an invalid instruction, regardless of whether
the JIT intended to emit C7 /0 imm32, B8+r imm32, 31 /r, 33 /r, or any
other equivalent form.
On Tue, Apr 21, 2020 at 3:32 PM Xi Wang <[email protected]> wrote:
>
> On Tue, Apr 21, 2020 at 10:39 AM H. Peter Anvin <[email protected]> wrote:
> > x32 is not x86-32. In Linux we generally call the latter "i386".
>
> Agreed. Most of the previous patches to this file use "x32" and this
> one just wanted to be consistent.
>
> > C7 /0 imm32 is a valid instruction on i386. However, it is also
> > inefficient when the destination is a register, because B8+r imm32 is
> > equivalent, and when the value is zero, XOR is indeed more efficient.
> >
> > The real error is using EMIT3() instead of EMIT2_off32(), but XOR is
> > more efficient. However, let's make the bug statement *correct*, or it
> > is going to confuse the Hades out of people in the future.
>
> I don't see how the bug statement is incorrect, which merely points
> out that "C7 C0 0" is an invalid instruction, regardless of whether
> the JIT intended to emit C7 /0 imm32, B8+r imm32, 31 /r, 33 /r, or any
> other equivalent form.
You should explain the reason it is invalid, ie. the instruction
encoding needs a 32-bit immediate but the current code only emits an
8-bit immediate.
--
Brian Gerst
On Tue, Apr 21, 2020 at 8:22 PM Brian Gerst <[email protected]> wrote:
> You should explain the reason it is invalid, ie. the instruction
> encoding needs a 32-bit immediate but the current code only emits an
> 8-bit immediate.
Thanks! Will do in v2.
On April 21, 2020 12:26:12 PM PDT, Xi Wang <[email protected]> wrote:
>On Tue, Apr 21, 2020 at 10:39 AM H. Peter Anvin <[email protected]> wrote:
>> x32 is not x86-32. In Linux we generally call the latter "i386".
>
>Agreed. Most of the previous patches to this file use "x32" and this
>one just wanted to be consistent.
>
>> C7 /0 imm32 is a valid instruction on i386. However, it is also
>> inefficient when the destination is a register, because B8+r imm32 is
>> equivalent, and when the value is zero, XOR is indeed more efficient.
>>
>> The real error is using EMIT3() instead of EMIT2_off32(), but XOR is
>> more efficient. However, let's make the bug statement *correct*, or
>it
>> is going to confuse the Hades out of people in the future.
>
>I don't see how the bug statement is incorrect, which merely points
>out that "C7 C0 0" is an invalid instruction, regardless of whether
>the JIT intended to emit C7 /0 imm32, B8+r imm32, 31 /r, 33 /r, or any
>other equivalent form.
C7 C0 0 is *not* an invalid instruction, although it is incomplete. It is a different, but arguably even more serious, problem.
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Wed, Apr 22, 2020 at 12:13 AM <[email protected]> wrote:
> C7 C0 0 is *not* an invalid instruction, although it is incomplete. It is a different, but arguably even more serious, problem.
Yep, it would "eat" three bytes coming after that. :)