MIME-Version: 1.0
In-Reply-To: <20170627180353.Horde.IIRh_TLNBuucfm8xZhbOEix@gator4166.hostgator.com>
References: <20170627180353.Horde.IIRh_TLNBuucfm8xZhbOEix@gator4166.hostgator.com>
From: Frans Klaver <fransklaver@gmail.com>
Date: Wed, 28 Jun 2017 07:35:35 +0200
Message-ID: <CAH6sp9N6bEA1P4061JpyDOpLbFAoQjg0+tozG3sKL0FxdwDcUg@mail.gmail.com>
Subject: Re: [kernel-sched-cputime] question about probable bug in cputime_adjust()
To: "Gustavo A. R. Silva" <garsilva@embeddedor.com>
Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
        "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 4702
Lines: 139

On Wed, Jun 28, 2017 at 1:03 AM, Gustavo A. R. Silva
<garsilva@embeddedor.com> wrote:
>
> Hello everybody,
>
> While looking into Coverity ID 1371643 I ran into the following piece of
> code at kernel/sched/cputime.c:568:
>
> 568/*
> 569 * Adjust tick based cputime random precision against scheduler runtime
> 570 * accounting.
> 571 *
> 572 * Tick based cputime accounting depend on random scheduling timeslices
> of a
> 573 * task to be interrupted or not by the timer.  Depending on these
> 574 * circumstances, the number of these interrupts may be over or
> 575 * under-optimistic, matching the real user and system cputime with a
> variable
> 576 * precision.
> 577 *
> 578 * Fix this by scaling these tick based values against the total runtime
> 579 * accounted by the CFS scheduler.
> 580 *
> 581 * This code provides the following guarantees:
> 582 *
> 583 *   stime + utime == rtime
> 584 *   stime_i+1 >= stime_i, utime_i+1 >= utime_i
> 585 *
> 586 * Assuming that rtime_i+1 >= rtime_i.
> 587 */
> 588static void cputime_adjust(struct task_cputime *curr,
> 589                           struct prev_cputime *prev,
> 590                           u64 *ut, u64 *st)
> 591{
> 592        u64 rtime, stime, utime;
> 593        unsigned long flags;
> 594
> 595        /* Serialize concurrent callers such that we can honour our
> guarantees */
> 596        raw_spin_lock_irqsave(&prev->lock, flags);
> 597        rtime = curr->sum_exec_runtime;
> 598
> 599        /*
> 600         * This is possible under two circumstances:
> 601         *  - rtime isn't monotonic after all (a bug);
> 602         *  - we got reordered by the lock.
> 603         *
> 604         * In both cases this acts as a filter such that the rest of the
> code
> 605         * can assume it is monotonic regardless of anything else.
> 606         */
> 607        if (prev->stime + prev->utime >= rtime)
> 608                goto out;
> 609
> 610        stime = curr->stime;
> 611        utime = curr->utime;
> 612
> 613        /*
> 614         * If either stime or both stime and utime are 0, assume all
> runtime is
> 615         * userspace. Once a task gets some ticks, the monotonicy code at
> 616         * 'update' will ensure things converge to the observed ratio.
> 617         */
> 618        if (stime == 0) {
> 619                utime = rtime;
> 620                goto update;
> 621        }
> 622
> 623        if (utime == 0) {
> 624                stime = rtime;
> 625                goto update;
> 626        }
> 627
> 628        stime = scale_stime(stime, rtime, stime + utime);
> 629
> 630update:
> 631        /*
> 632         * Make sure stime doesn't go backwards; this preserves
> monotonicity
> 633         * for utime because rtime is monotonic.
> 634         *
> 635         *  utime_i+1 = rtime_i+1 - stime_i
> 636         *            = rtime_i+1 - (rtime_i - utime_i)
> 637         *            = (rtime_i+1 - rtime_i) + utime_i
> 638         *            >= utime_i
> 639         */
> 640        if (stime < prev->stime)
> 641                stime = prev->stime;
> 642        utime = rtime - stime;
> 643
> 644        /*
> 645         * Make sure utime doesn't go backwards; this still preserves
> 646         * monotonicity for stime, analogous argument to above.
> 647         */
> 648        if (utime < prev->utime) {
> 649                utime = prev->utime;
> 650                stime = rtime - utime;
> 651        }
> 652
> 653        prev->stime = stime;
> 654        prev->utime = utime;
> 655out:
> 656        *ut = prev->utime;
> 657        *st = prev->stime;
> 658        raw_spin_unlock_irqrestore(&prev->lock, flags);
> 659}
>
>
> The issue here is that the value assigned to variable utime at line 619 is
> overwritten at line 642, which would make such variable assignment useless.

It isn't completely useless, since utime is used in line 628 to calculate stime.


> But I'm suspicious that such assignment is actually correct and that line
> 642 should be included into the IF block at line 640. Something similar to
> the following patch:
>
> --- a/kernel/sched/cputime.c
> +++ b/kernel/sched/cputime.c
> @@ -637,9 +637,10 @@ static void cputime_adjust(struct task_cputime *curr,
>          *            = (rtime_i+1 - rtime_i) + utime_i
>          *            >= utime_i
>          */
> -       if (stime < prev->stime)
> +       if (stime < prev->stime) {
>                 stime = prev->stime;
> -       utime = rtime - stime;
> +               utime = rtime - stime;
> +       }
>
>
> If you confirm this, I will send a patch in a full and proper form.
>
> I'd really appreciate your comments.

If you do that, how would you meet the guarantee made in line 583?

Frans