Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751699AbdFHCcs (ORCPT ); Wed, 7 Jun 2017 22:32:48 -0400 Received: from mail-pf0-f194.google.com ([209.85.192.194]:35610 "EHLO mail-pf0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751203AbdFHCcq (ORCPT ); Wed, 7 Jun 2017 22:32:46 -0400 Date: Wed, 7 Jun 2017 19:32:42 -0700 From: Alexei Starovoitov To: Edward Cree Cc: davem@davemloft.net, Alexei Starovoitov , Daniel Borkmann , netdev@vger.kernel.org, iovisor-dev , LKML Subject: Re: [RFC PATCH net-next 2/5] bpf/verifier: rework value tracking Message-ID: <20170608023239.lsqijtfcg5fadpai@ast-mbp> References: <92db9689-af6a-e172-ba57-195e588f9cc0@solarflare.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170421 (1.8.2) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 9920 Lines: 282 On Wed, Jun 07, 2017 at 03:58:31PM +0100, Edward Cree wrote: > Tracks value alignment by means of tracking known & unknown bits. > Tightens some min/max value checks and fixes a couple of bugs therein. > > Signed-off-by: Edward Cree > --- > include/linux/bpf.h | 34 +- > include/linux/bpf_verifier.h | 40 +- > include/linux/tnum.h | 58 ++ > kernel/bpf/Makefile | 2 +- > kernel/bpf/tnum.c | 163 +++++ > kernel/bpf/verifier.c | 1641 +++++++++++++++++++++++------------------- > 6 files changed, 1170 insertions(+), 768 deletions(-) yeah! That's cool. Overall I like the direction. I don't understand it completely yet, so ony few nits so far: > +/* Arithmetic and logical ops */ > +/* Shift a tnum left (by a fixed shift) */ > +struct tnum tn_sl(struct tnum a, u8 shift); > +/* Shift a tnum right (by a fixed shift) */ > +struct tnum tn_sr(struct tnum a, u8 shift); I think in few month we will forget what these abbreviations mean. Can you change it to tnum_rshift, tnum_lshift, tnum_add ? > +/* half-multiply add: acc += (unknown * mask * value) */ > +static struct tnum hma(struct tnum acc, u64 value, u64 mask) hma? is it a standard abbreviation? > -static void init_reg_state(struct bpf_reg_state *regs) > +static void mark_reg_known_zero(struct bpf_reg_state *regs, u32 regno) > { > - int i; > - > - for (i = 0; i < MAX_BPF_REG; i++) > - mark_reg_not_init(regs, i); > - > - /* frame pointer */ > - regs[BPF_REG_FP].type = FRAME_PTR; > - > - /* 1st arg to a function */ > - regs[BPF_REG_1].type = PTR_TO_CTX; > + BUG_ON(regno >= MAX_BPF_REG); > + __mark_reg_known_zero(regs + regno); I know we have BUG_ONs in the code and it was never hit, but since you're rewriting it please change it to WARN_ON and set all regs into NOT_INIT in such case. This way if we really have a bug, it hopefully won't crash. > -/* check read/write into an adjusted map element */ > -static int check_map_access_adj(struct bpf_verifier_env *env, u32 regno, > +/* check read/write into a map element with possible variable offset */ > +static int check_map_access(struct bpf_verifier_env *env, u32 regno, > int off, int size) > { > struct bpf_verifier_state *state = &env->cur_state; > struct bpf_reg_state *reg = &state->regs[regno]; > int err; > > - /* We adjusted the register to this map value, so we > - * need to change off and size to min_value and max_value > - * respectively to make sure our theoretical access will be > - * safe. > + /* We may have adjusted the register to this map value, so we > + * need to try adding each of min_value and max_value to off > + * to make sure our theoretical access will be safe. > */ > if (log_level) > print_verifier_state(state); > - env->varlen_map_value_access = true; > + /* If the offset is variable, we will need to be stricter in state > + * pruning from now on. > + */ > + if (reg->align.mask) > + env->varlen_map_value_access = true; i think this align.mask access was used in few places. May be worth to do static inline helper with clear name? > switch (reg->type) { > case PTR_TO_PACKET: > + /* special case, because of NET_IP_ALIGN */ > return check_pkt_ptr_alignment(reg, off, size, strict); > - case PTR_TO_MAP_VALUE_ADJ: > - return check_val_ptr_alignment(reg, size, strict); > + case PTR_TO_MAP_VALUE: > + pointer_desc = "value "; > + break; > + case PTR_TO_CTX: > + pointer_desc = "context "; > + break; > + case PTR_TO_STACK: > + pointer_desc = "stack "; > + break; thank you for making errors more human readable. > + char tn_buf[48]; > + > + tn_strn(tn_buf, sizeof(tn_buf), reg->align); > + verbose("variable ctx access align=%s off=%d size=%d", > + tn_buf, off, size); > + return -EACCES; > + } > + off += reg->align.value; I think 'align' is an odd name for this field. May be rename off/align fields into s32 fixed_off; struct tnum var_off; > > - } else if (reg->type == FRAME_PTR || reg->type == PTR_TO_STACK) { > + } else if (reg->type == PTR_TO_STACK) { > + /* stack accesses must be at a fixed offset, so that we can > + * determine what type of data were returned. > + */ > + if (reg->align.mask) { > + char tn_buf[48]; > + > + tn_strn(tn_buf, sizeof(tn_buf), reg->align); > + verbose("variable stack access align=%s off=%d size=%d", > + tn_buf, off, size); > + return -EACCES; hmm. why this restriction? I thought one of key points of the diff that ptr+var tracking logic will now apply not only to map_value, but to stack_ptr as well? > } > > - if (!err && size <= 2 && value_regno >= 0 && env->allow_ptr_leaks && > - state->regs[value_regno].type == UNKNOWN_VALUE) { > - /* 1 or 2 byte load zero-extends, determine the number of > - * zero upper bits. Not doing it fo 4 byte load, since > - * such values cannot be added to ptr_to_packet anyway. > - */ > - state->regs[value_regno].imm = 64 - size * 8; > + if (!err && size < BPF_REG_SIZE && value_regno >= 0 && t == BPF_READ && > + state->regs[value_regno].type == SCALAR_VALUE) { > + /* b/h/w load zero-extends, mark upper bits as known 0 */ > + state->regs[value_regno].align.value &= (1ULL << (size * 8)) - 1; > + state->regs[value_regno].align.mask &= (1ULL << (size * 8)) - 1; probably another helper from tnum.h is needed. > + /* sign bit is known zero, so we can bound the value */ > + state->regs[value_regno].min_value = 0; > + state->regs[value_regno].max_value = min_t(u64, > + state->regs[value_regno].align.mask, > + BPF_REGISTER_MAX_RANGE); min_t with mask? should it be align.value? > } > return err; > } > @@ -1000,9 +1068,18 @@ static int check_xadd(struct bpf_verifier_env *env, struct bpf_insn *insn) > BPF_SIZE(insn->code), BPF_WRITE, -1); > } > > +/* Does this register contain a constant zero? */ > +static bool register_is_null(struct bpf_reg_state reg) > +{ > + return reg.type == SCALAR_VALUE && reg.align.mask == 0 && > + reg.align.value == 0; align.mask == 0 && align.value==0 into helper in tnum.h ? > @@ -1024,7 +1101,15 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno, > return -EACCES; > } > > - off = regs[regno].imm; > + /* Only allow fixed-offset stack reads */ > + if (regs[regno].align.mask) { > + char tn_buf[48]; > + > + tn_strn(tn_buf, sizeof(tn_buf), regs[regno].align); > + verbose("invalid variable stack read R%d align=%s\n", > + regno, tn_buf); > + } same question as before. can it be relaxed? The support for char arr[32]; accee arr[n] was requested several times and folks used map_value[n] as a workaround. Seems with this var stack logic it's one step away, no? > - if (src_reg->imm < 48) { > - verbose("cannot add integer value with %lld upper zero bits to ptr_to_packet\n", > - src_reg->imm); > - return -EACCES; > - } > - > - had_id = (dst_reg->id != 0); > - > - /* dst_reg stays as pkt_ptr type and since some positive > - * integer value was added to the pointer, increment its 'id' > - */ > - dst_reg->id = ++env->id_gen; great to see it's being generalized. > + if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) { > + if (!env->allow_ptr_leaks) { > + verbose("R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n", > + dst); > + return -EACCES; > + } i guess mark_map_reg() logic will cover good cases and actual math on ptr_to_map_or_null will happen only in broken programs. just feels a bit fragile, since it probably depends on order we will evaluate the branches? it's not an issue with this patch. we have the same situation today. just thinking out loud. > + /* Got here implies adding two SCALAR_VALUEs */ > + if (WARN_ON_ONCE(ptr_reg)) { > + verbose("verifier internal error\n"); > + return -EINVAL; ... > + if (WARN_ON(!src_reg)) { > + verbose("verifier internal error\n"); > + return -EINVAL; > } i'm lost with these bits. Can you add a comment in what circumstances this can be hit and what would be the consequences? > +/* Returns true if (rold safe implies rcur safe) */ > +static bool regsafe(struct bpf_reg_state *rold, > + struct bpf_reg_state *rcur, > + bool varlen_map_access) > +{ > + if (memcmp(rold, rcur, sizeof(*rold)) == 0) > return true; > + if (rold->type == NOT_INIT) > + /* explored state can't have used this */ > return true; > + if (rcur->type == NOT_INIT) > + return false; > + switch (rold->type) { > + case SCALAR_VALUE: > + if (rcur->type == SCALAR_VALUE) { > + /* new val must satisfy old val knowledge */ > + return range_within(rold, rcur) && > + tn_in(rold->align, rcur->align); > + } else { > + /* if we knew anything about the old value, we're not > + * equal, because we can't know anything about the > + * scalar value of the pointer in the new value. > + */ > + return rold->min_value == BPF_REGISTER_MIN_RANGE && > + rold->max_value == BPF_REGISTER_MAX_RANGE && > + !~rold->align.mask; > + } > + case PTR_TO_MAP_VALUE: > + if (varlen_map_access) { > + /* If the new min/max/align satisfy the old ones and > + * everything else matches, we are OK. > + * We don't care about the 'id' value, because nothing > + * uses it for PTR_TO_MAP_VALUE (only for ..._OR_NULL) > + */ > + return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0 && > + range_within(rold, rcur) && > + tn_in(rold->align, rcur->align); > + } else { > + /* If the ranges/align were not the same, but > + * everything else was and we didn't do a variable > + * access into a map then we are a-ok. > + */ > + return memcmp(rold, rcur, offsetof(struct bpf_reg_state, id)) == 0; > + } > + case PTR_TO_MAP_VALUE_OR_NULL: does this new state comparison logic helps? Do you have any numbers before/after in the number of insns it had to process for the tests in selftests ?