Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18;
MIME-Version: 1.0
References: <20210518204226.GR4441@paulmck-ThinkPad-P17-Gen-1>
 <CANpmjNN+nS1CAz=0vVdJLAr_N+zZxqp3nm5cxCCiP-SAx3uSyA@mail.gmail.com>
 <20210519185305.GC4441@paulmck-ThinkPad-P17-Gen-1> <CANpmjNMskihABCyNo=cK5c0vbNBP=fcUO5-ZqBJCiO4XGM47DA@mail.gmail.com>
 <CANpmjNMPvAucMQoZeLQAP_WiwiLT6XBoss=EZ4xAbrHnMwdt5g@mail.gmail.com>
 <c179dc74-662d-567f-0285-fcfce6adf0a5@redhat.com> <YMyC/Dy7XoxTeIWb@elver.google.com>
 <35852e24-9b19-a442-694c-42eb4b5a4387@redhat.com> <YNBqTVFpvpXUbG4z@elver.google.com>
 <01a0161a-44d2-5a32-7b7a-fdb13debfe57@redhat.com> <YNG/8EcdPBfH/Taf@elver.google.com>
 <93e7048a-209f-82f2-8d28-ff8347595695@redhat.com>
In-Reply-To: <93e7048a-209f-82f2-8d28-ff8347595695@redhat.com>
From:   Marco Elver <elver@google.com>
Date:   Thu, 24 Jun 2021 10:09:19 +0200
Message-ID: <CANpmjNNZ8rQ3W=3hMemrdS_V7fXf4a_mY785eo5XaPNeTngExg@mail.gmail.com>
Subject: Re: Functional Coverage via RV? (was: "Learning-based Controlled
 Concurrency Testing")
To:     Daniel Bristot de Oliveira <bristot@redhat.com>
Cc:     Dmitry Vyukov <dvyukov@google.com>,
        "Paul E. McKenney" <paulmck@kernel.org>,
        syzkaller <syzkaller@googlegroups.com>,
        kasan-dev <kasan-dev@googlegroups.com>,
        LKML <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Precedence: bulk

On Wed, 23 Jun 2021 at 11:10, Daniel Bristot de Oliveira
<bristot@redhat.com> wrote:
> On 6/22/21 12:48 PM, Marco Elver wrote:
> > On Mon, Jun 21, 2021 at 09:25PM +0200, Daniel Bristot de Oliveira wrote:
> >> On 6/21/21 12:30 PM, Marco Elver wrote:
> >>> On Mon, Jun 21, 2021 at 10:23AM +0200, Daniel Bristot de Oliveira wrote:
> >>> [...]
> >>>>> Yes, unlike code/structural coverage (which is what we have today via
> >>>>> KCOV) functional coverage checks if some interesting states were reached
> >>>>> (e.g. was buffer full/empty, did we observe transition a->b etc.).
> >>>>
> >>>> So you want to observe a given a->b transition, not that B was visited?
> >>>
> >>> An a->b transition would imply that a and b were visited.
> >>
> >> HA! let's try again with a less abstract example...
> >
> > Terminology misunderstanding.
> >
> > I mean "state transition". Writing "a->b transition" led me to infer 'a'
> > and 'b' are states, but from below I infer that you meant an "event
> > trace" (viz. event sequence).  So it seems I was wrong.
> >
> > Let me be clearer: transition A -[a]-> B implies states A and B were
> > visited.
>
> right
>
> Hence, knowing that event 'a' occurred is sufficient, and
> > actually provides a little more information than just "A and B were
> > visited".
>
> iff [a] happens only from A to B...
>
> >
> >>
> >>   |   +------------ on --+----------------+
> >>   v   ^                  +--------v       v
> >> +========+               |        +===========+>--- suspend ---->+===========+
> >> |  OFF   |               +- on --<|     ON    |                  | SUSPENDED |
> >> +========+ <------ shutdown -----<+===========+<----- on -------<+===========+
> >>     ^                                    v                             v
> >>     +--------------- off ----------------+-----------------------------+
> >>
> >> Do you care about:
> >>
> >> 1) states [OFF|ON|SUSPENDED] being visited a # of times; or
> >> 2) the occurrence of the [on|suspend|off] events a # of times; or
> >> 3) the language generated by the "state machine"; like:
> >>
> >>    the occurrence of *"on -> suspend -> on -> off"*
> >>
> >>          which is != of
> >>
> >>    the occurrence of *"on -> on -> suspend -> off"*
> >>
> >>          although the same events and states occurred the same # of times
> >> ?
> >
> > They are all interesting, but unrealistic for a fuzzer to keep track of.
> > We can't realistically keep track of all possible event traces. Nor that
> > some state or event was visited # of times.
>
> We can track this easily via RV, and doing that is already on my todo list. But
> now I got that we do not need all these information for the functional coverage.
>
> > What I did mean is as described above: the simple occurrence of an
> > event, as it implies some previous and next state were visited.
> >
> > The fuzzer then builds up knowledge of which inputs cause some events to
> > occur. Because it knows it has inputs for such events, it will then try
> > to further combine these inputs hoping to reach new coverage. This leads
> > to various distinct event traces using the events it has already
> > observed. All of this is somewhat random of course, because fuzzers are
> > not meant to be model checkers.
> >
> > If someone wants something more complex as you describe, it'd have to
> > explicitly become part of the model (if possible?). The problem of
> > coverage explosion applies, and we may not recommend such usage anyway.
>
> I did not mean to make GCOV/the fuzzer to keep track of these information. I was
> trying to understand what are the best way to provide the information that you
> all need.
>
> >> RV can give you all... but the way to inform this might be different.
> >>
> >>>> I still need to understand what you are aiming to verify, and what is the
> >>>> approach that you would like to use to express the specifications of the systems...
> >>>>
> >>>> Can you give me a simple example?
> >>>
> >>> The older discussion started around a discussion how to get the fuzzer
> >>> into more interesting states in complex concurrent algorithms. But
> >>> otherwise I have no idea ... we were just brainstorming and got to the
> >>> point where it looked like "functional coverage" would improve automated
> >>> test generation in general. And then I found RV which pretty much can
> >>> specify "functional coverage" and almost gets that information to KCOV
> >>> "for free".
> >>
> >> I think we will end up having an almost for free solution, but worth the price.
> >>
> >>>> so, you want to have a different function for every transition so KCOV can
> >>>> observe that?
> >>>
> >>> Not a different function, just distinct "basic blocks". KCOV uses
> >>> compiler instrumentation, and a sequence of non-branching instructions
> >>> denote one point of coverage; at the next branch (conditional or otherwise)
> >>> it then records which branch was taken and therefore we know which code
> >>> paths were covered.
> >>
> >> ah, got it. But can't KCOV be extended with another source of information?
> >
> > Not without changing KCOV. And I think we're weary of something like
> > that due to the potential for coverage explosion. -fsanitize-coverage
> > has various options to capture different types of coverage actually, not
> > purely basic block based coverage. (KCOV already supports
> > KCOV_ENABLE_COMPARISONS, perhaps that could help somehow. It captures
> > arguments of comparisons.)
> >
> >>>>>
> >>>>> From what I can tell this doesn't quite happen today, because
> >>>>> automaton::function is a lookup table as an array.
> >>>>
> >>>> It is a the transition function of the formal automaton definition. Check this:
> >>>>
> >>>> https://bristot.me/wp-content/uploads/2020/01/JSA_preprint.pdf
> >>>>
> >>>> page 9.
> >>>>
> >>>> Could this just
> >>>>> become a generated function with a switch statement? Because then I
> >>>>> think we'd pretty much have all the ingredients we need.
> >>>>
> >>>> a switch statement that would.... call a different function for each transition?
> >>>
> >>> No, just a switch statement that returns the same thing as it does
> >>> today. But KCOV wouldn't see different different coverage with the
> >>> current version because it's all in one basic block because it looks up
> >>> the next state given the current state out of the array. If it was a
> >>> switch statement doing the same thing, the compiler will turn the thing
> >>> into conditional branches and KCOV then knows which code path
> >>> (effectively the transition) was covered.
> >
> > Per Dmitry's comment, yes we need to be careful that the compiler
> > doesn't collapse the switch statement somehow. But this should be
> > achievable with a bunch or 'barrier()' after every 'case ...:'.
>
> Changing the "function" will add some overhead for the runtime monitor use-case.
> For example, for the safety-critical systems that will run with a monitor
> enabled to detect a failure and react to it.
>
> But! I can extend the idea of the reactor to receive the successful state
> transitions or create the "observer" abstraction, to which we can attach a
> generic that will make the switch statements. This function can be
> auto-generated by dot2k as well...
>
> This reactor/observer can be enabed/disabled so... we can add as much annotation
> and barriers as we want.
>
> Thoughts?

That sounds reasonable. Simply having an option (Kconfig would be
ideal) to enable the KCOV-friendly version of the transition function
is good enough for the fuzzer usecase. The kernels built for fuzzing
usually include lots of other debug options anyway, and aren't
production kernels.

Thanks,
-- Marco