Content-Type: text/plain; charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\))
Subject: Re: [iovisor-dev] [PATCH v3 net-next 02/12] bpf/verifier: rework
 value tracking
From: Nadav Amit <nadav.amit@gmail.com>
In-Reply-To: <68a2487e-f706-1b61-5c4c-20ffe6d51127@solarflare.com>
Date: Wed, 12 Jul 2017 15:07:27 -0700
Cc: "David S. Miller" <davem@davemloft.net>,
        Alexei Starovoitov <alexei.starovoitov@gmail.com>,
        Alexei Starovoitov <ast@fb.com>,
        Daniel Borkmann <daniel@iogearbox.net>, netdev@vger.kernel.org,
        iovisor-dev <iovisor-dev@lists.iovisor.org>,
        linux-kernel@vger.kernel.org
Message-Id: <12DCE590-7F67-4639-A825-61A24AC44ED3@gmail.com>
References: <adc11342-737f-4e06-bce3-f0a92b5594a5@solarflare.com>
 <2244b48b-f415-3239-6912-cb09f0abc546@solarflare.com>
 <E65DE75F-FC5F-41BC-9F7B-C03D324D8C6F@gmail.com>
 <abd3c014-f33a-6e86-aead-be38697a2cec@solarflare.com>
 <3A96914E-3009-4E19-B138-7A636A76D9C8@gmail.com>
 <68a2487e-f706-1b61-5c4c-20ffe6d51127@solarflare.com>
To: Edward Cree <ecree@solarflare.com>
Sender: linux-kernel-owner@vger.kernel.org
Content-Transfer-Encoding: 8bit
Content-Length: 8140
Lines: 154

Edward Cree <ecree@solarflare.com> wrote:

> On 07/07/17 18:45, Nadav Amit wrote:
>> For me changes such as:
>> 
>>> if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
>>> -			dst_reg->min_value -= min_val;
>>> +			dst_reg->min_value -= max_val;
>> 
>> are purely cryptic. What happened here? Was there a bug before and if so
>> what are its implications? Why can’t it be in a separate patch?
> In this specific case, there was a bug before: if (say) src and dst were
> both unknown bytes (so range 0 to 255), it would compute the new min and max
> to be 0, so it would think the result is known to be 0.  But that's wrong,
> because it could be anything from -255 to +255.  The bug's implications are
> that it could be used to construct an out-of-range offset to (say) a map
> pointer which the verifier would think was in-range and thus accept.

This sounds like a serious bug that may need to be backported to stable
versions, no? In this case I would assume it should be in a separate patch
so it could be applied separately.

> It might be possible to put it in a separate patch, but in general (not
> necessarily the case here) isolated fixes to range handling break some of
> the existing regression tests.  That is why I ended up doing patch #4,
> because I couldn't find a small fix for the patch #1 test without breaking
> others.  Essentially, this patch started out as just the tnum tracking to
> replace imm and align, and then rolled in everything I had to do to get the
> regression tests to pass again.  So some of those things are fixes that
> could go in earlier patches, but because of the order I wrote it in I'd have
> to disentangle them.  I can do that if it's necessary, but I'm not sure it'd
> really make the patch that much more readable so I'd rather avoid that work
> if I can get away with it…

I clearly understand and can relate to it. Still, your patch really stands
out:

git log --stat kernel/bpf/ | grep -G '^ kernel/bpf' | tr -s " " \
>  | sort -g -r  -k 3 | head -n 10

 kernel/bpf/verifier.c | 1075 ++++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/bpf/bpf_lru_list.c | 567 ++++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/core.c | 536 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/lpm_trie.c | 503 ++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c | 441 +++++++++++++++++++++++++++++++++++++++++++++++++-
 kernel/bpf/inode.c | 387 +++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/hashtab.c | 362 +++++++++++++++++++++++++++++++++++++++++++++++++++
 kernel/bpf/verifier.c | 329 +++++++++++++++++++++++++++++++++++++++++++++++---
 kernel/bpf/hashtab.c | 275 ++++++++++++++++++++++++++++++++++++++++++---------
 kernel/bpf/verifier.c | 266 +++++++++++++++++++-------------------------------

>> I also think that changes such as:
>>> -	s64 min_value;
>>> -	u64 max_value;
>> [snip]
>>> +	s64 min_value; /* minimum possible (s64)value */
>>> +	u64 max_value; /* maximum possible (u64)value */
>> Should have been avoided. Personally, I find this comment redundant (to say
>> the least). It does not help to reduce the diff size.
> The comment is meaningful, though perhaps badly phrased.  It's an attempt to
> define the semantics of these fields (which previously was unclear); e.g. the
> first one means "minimum value when interpreted as signed", i.e. the (s64) in
> the comment is a cast.
> Apparently those weren't the semantics the original author intended, but I'm
> not sure the original semantics were well-defined and I certainly don't
> understand them well enough to define them, hence why I defined my own here
> (and then redefined them in patch #4).

Makes more sense now.

>> In this regard, I think that refactoring should have been done first and not
>> together with the logic changes. As another example, changing UNKNOWN_VALUE
>> to SCALAR_VALUE should have been a separate, easy to understand patch.
> But SCALAR_VALUE is the union UNKNOWN_VALUE *or* CONST_IMM, and merging those
> together means all of the ripping-out of evaluate_reg_alu() and friends and
> thus depends on much of the rest of the patch.
>>> The latter is also needed for correctness in computing reg->range;
>>> if 'pkt_ptr + offset' could conceivably overflow, then the result
>>> could be < pkt_end without being a valid pointer into the packet.
>>> We thus rely on the assumption that the packet pointer will never be
>>> within MAX_PACKET_OFF of the top of the address space.  (This
>>> assumption is, again, carried over from the existing verifier.)
>> I understand the limitations (I think). I agree that CONST being spillable
>> is not directly related. As for the possible packet offsets/range:
>> intentionally or not you do make some changes that push the 64k packet size
>> limit even deeper into the code. While the packet size should be limited to
>> avoid overflow, IIUC the requirement is that:
>> 
>> 	64 > log(n_insn) + log(MAX_PACKET_OFF) + 1
> I don't think that's right, unless you also make each addition to a packet-
> pointer require a max_value <= MAX_PACKET_OFF.  It's also a very loose bound
> because it assumes every instruction is such an add.

I remembered it was like that before (you could not add to packet-pointer a
value unless the high bits were cleared). I need to revisit the code…

> I think it makes far more sense to do it the way I have done, where the bounds
> are tracked all the way through the arithmetic and then only checked against
> MAX_PACKET_OFF when doing the access (and when doing a test against a
> PTR_TO_PACKET_END, i.e. find_good_pkt_pointers(), though for some reason I
> only added that check in patch #4).
> That way we can allow things like (for the sake of example) adding $BIG_NUMBER
> to a packet pointer and then subtracting it again.
>> Such an assertion may be staticly checked (using BUILD_BUG_ON), but I don’t
>> think should propagate into the entire code, especially in a non consistent
>> way. For example:
>> 
>>> struct bpf_reg_state {
>>> 	enum bpf_reg_type type;
>>> 	union {
>>> -		/* valid when type == CONST_IMM | PTR_TO_STACK | UNKNOWN_VALUE */
>>> -		s64 imm;
>>> -
>>> -		/* valid when type == PTR_TO_PACKET* */
>>> -		struct {
>>> -			u16 off;
>>> -			u16 range;
>>> -		};
>>> +		/* valid when type == PTR_TO_PACKET */
>>> +		u32 range;
>> I would (personally) prefer range (and offsets) to be u64. I could
>> understand if you left the range as u16 (since packet size is limited to
>> 64k). But why would it be u32?
> I'm not sure; I think I did that so that it would be the same size as the
> struct it's replacing.  In other words, no reason really.  But I will have
> to think about the implications for overflow handling if it's changed.
>> Or:
>>> -	if (off < 0 || size <= 0 || off + size > reg->range) {
>>> +	if (off < 0 || size <= 0 || off > MAX_PACKET_OFF ||
>>> +	    off + size > reg->range) {
>> Why don’t you check (off + size > max(MAX_PACKET_OFF, reg->range))? Is there
>> a reason you ignore size when comparing to MAX_PACKET_OFF ?
> Actually, having thought further about this, I think only the check in
> find_good_pkt_pointers() is necessary (though possibly taking account of the
> fixed offset as well as the var_off), since that naturally limits how much
> reg->range we can acquire to MAX_PACKET_OFF.
> But you're right that checking off rather than off + size is weird, I don't
> recall why I did that and I can't see a reason for it.

These are the sort of comments that I think are important for good code
quality, but I think you would not get due to the intimidating size of the
patch.

Now, I am not the maintainer or even a serious contributor to the BPF code.
For me, such changes are just a headache when I rebase my private patches. I
will get over them. But breaking patches to small ones is a good practice.
Fixes may need to be backported to old versions. It allows reviewers to
provide useful feedback, make it easier for other developers to resolve
conflicts, allows to easily revert buggy changes, enables to find buggy
patches through bisection and so on. Take it into consideration.

Thanks,
Nadav