Date: Wed, 28 Jun 2017 18:57:19 -0500
Message-ID: <20170628185719.Horde.2-GXEXxQwqkKUOuLD4nHa0d@gator4166.hostgator.com>
From: "Gustavo A. R. Silva" <garsilva@embeddedor.com>
To: Frans Klaver <fransklaver@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
        linux-kernel@vger.kernel.org
Subject: Re: [kernel-sched-cputime] question about probable bug in
 cputime_adjust()
References: <20170627180353.Horde.IIRh_TLNBuucfm8xZhbOEix@gator4166.hostgator.com>
 <CAH6sp9N6bEA1P4061JpyDOpLbFAoQjg0+tozG3sKL0FxdwDcUg@mail.gmail.com>
 <CAH6sp9PwucD7GSk-9CE_E99GbWbuD-wh-0AMXMoLGhU9v_qdkA@mail.gmail.com>
In-Reply-To: <CAH6sp9PwucD7GSk-9CE_E99GbWbuD-wh-0AMXMoLGhU9v_qdkA@mail.gmail.com>
User-Agent: Horde Application Framework 5
Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes
MIME-Version: 1.0
Content-Disposition: inline
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 6664
Lines: 208

Hi Frans,

Quoting Frans Klaver <fransklaver@gmail.com>:

> On Wed, Jun 28, 2017 at 7:35 AM, Frans Klaver <fransklaver@gmail.com> wrote:
>> On Wed, Jun 28, 2017 at 1:03 AM, Gustavo A. R. Silva
>> <garsilva@embeddedor.com> wrote:
>>>
>>> Hello everybody,
>>>
>>> While looking into Coverity ID 1371643 I ran into the following piece of
>>> code at kernel/sched/cputime.c:568:
>>>
>>> 568/*
>>> 569 * Adjust tick based cputime random precision against scheduler runtime
>>> 570 * accounting.
>>> 571 *
>>> 572 * Tick based cputime accounting depend on random scheduling timeslices
>>> of a
>>> 573 * task to be interrupted or not by the timer.  Depending on these
>>> 574 * circumstances, the number of these interrupts may be over or
>>> 575 * under-optimistic, matching the real user and system cputime with a
>>> variable
>>> 576 * precision.
>>> 577 *
>>> 578 * Fix this by scaling these tick based values against the total runtime
>>> 579 * accounted by the CFS scheduler.
>>> 580 *
>>> 581 * This code provides the following guarantees:
>>> 582 *
>>> 583 *   stime + utime == rtime
>>> 584 *   stime_i+1 >= stime_i, utime_i+1 >= utime_i
>>> 585 *
>>> 586 * Assuming that rtime_i+1 >= rtime_i.
>>> 587 */
>>> 588static void cputime_adjust(struct task_cputime *curr,
>>> 589                           struct prev_cputime *prev,
>>> 590                           u64 *ut, u64 *st)
>>> 591{
>>> 592        u64 rtime, stime, utime;
>>> 593        unsigned long flags;
>>> 594
>>> 595        /* Serialize concurrent callers such that we can honour our
>>> guarantees */
>>> 596        raw_spin_lock_irqsave(&prev->lock, flags);
>>> 597        rtime = curr->sum_exec_runtime;
>>> 598
>>> 599        /*
>>> 600         * This is possible under two circumstances:
>>> 601         *  - rtime isn't monotonic after all (a bug);
>>> 602         *  - we got reordered by the lock.
>>> 603         *
>>> 604         * In both cases this acts as a filter such that the rest of the
>>> code
>>> 605         * can assume it is monotonic regardless of anything else.
>>> 606         */
>>> 607        if (prev->stime + prev->utime >= rtime)
>>> 608                goto out;
>>> 609
>>> 610        stime = curr->stime;
>>> 611        utime = curr->utime;
>>> 612
>>> 613        /*
>>> 614         * If either stime or both stime and utime are 0, assume all
>>> runtime is
>>> 615         * userspace. Once a task gets some ticks, the  
>>> monotonicy code at
>>> 616         * 'update' will ensure things converge to the observed ratio.
>>> 617         */
>>> 618        if (stime == 0) {
>>> 619                utime = rtime;
>>> 620                goto update;
>>> 621        }
>>> 622
>>> 623        if (utime == 0) {
>>> 624                stime = rtime;
>>> 625                goto update;
>>> 626        }
>>> 627
>>> 628        stime = scale_stime(stime, rtime, stime + utime);
>>> 629
>>> 630update:
>>> 631        /*
>>> 632         * Make sure stime doesn't go backwards; this preserves
>>> monotonicity
>>> 633         * for utime because rtime is monotonic.
>>> 634         *
>>> 635         *  utime_i+1 = rtime_i+1 - stime_i
>>> 636         *            = rtime_i+1 - (rtime_i - utime_i)
>>> 637         *            = (rtime_i+1 - rtime_i) + utime_i
>>> 638         *            >= utime_i
>>> 639         */
>>> 640        if (stime < prev->stime)
>>> 641                stime = prev->stime;
>>> 642        utime = rtime - stime;
>>> 643
>>> 644        /*
>>> 645         * Make sure utime doesn't go backwards; this still preserves
>>> 646         * monotonicity for stime, analogous argument to above.
>>> 647         */
>>> 648        if (utime < prev->utime) {
>>> 649                utime = prev->utime;
>>> 650                stime = rtime - utime;
>>> 651        }
>>> 652
>>> 653        prev->stime = stime;
>>> 654        prev->utime = utime;
>>> 655out:
>>> 656        *ut = prev->utime;
>>> 657        *st = prev->stime;
>>> 658        raw_spin_unlock_irqrestore(&prev->lock, flags);
>>> 659}
>>>
>>>
>>> The issue here is that the value assigned to variable utime at line 619 is
>>> overwritten at line 642, which would make such variable assignment useless.
>>
>> It isn't completely useless, since utime is used in line 628 to  
>> calculate stime.
>
> Oh, I missed 'goto update'. Never mind about this one.
>
>
>>> But I'm suspicious that such assignment is actually correct and that line
>>> 642 should be included into the IF block at line 640. Something similar to
>>> the following patch:
>>>
>>> --- a/kernel/sched/cputime.c
>>> +++ b/kernel/sched/cputime.c
>>> @@ -637,9 +637,10 @@ static void cputime_adjust(struct task_cputime *curr,
>>>          *            = (rtime_i+1 - rtime_i) + utime_i
>>>          *            >= utime_i
>>>          */
>>> -       if (stime < prev->stime)
>>> +       if (stime < prev->stime) {
>>>                 stime = prev->stime;
>>> -       utime = rtime - stime;
>>> +               utime = rtime - stime;
>>> +       }
>>>
>>>
>>> If you confirm this, I will send a patch in a full and proper form.
>>>
>>> I'd really appreciate your comments.
>>
>> If you do that, how would you meet the guarantee made in line 583?
>>

You are right, I see now.

Then in this case the following patch would be the way to go:

--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -615,10 +615,8 @@ static void cputime_adjust(struct task_cputime *curr,
          * userspace. Once a task gets some ticks, the monotonicy code at
          * 'update' will ensure things converge to the observed ratio.
          */
-       if (stime == 0) {
-               utime = rtime;
+       if (stime == 0)
                 goto update;
-       }

         if (utime == 0) {
                 stime = rtime;


but I think this one is even better:


--- a/kernel/sched/cputime.c
+++ b/kernel/sched/cputime.c
@@ -615,19 +615,11 @@ static void cputime_adjust(struct task_cputime *curr,
          * userspace. Once a task gets some ticks, the monotonicy code at
          * 'update' will ensure things converge to the observed ratio.
          */
-       if (stime == 0) {
-               utime = rtime;
-               goto update;
-       }
-
-       if (utime == 0) {
+       if (stime != 0 && utime == 0)
                 stime = rtime;
-               goto update;
-       }
-
-       stime = scale_stime(stime, rtime, stime + utime);
+       else
+               stime = scale_stime(stime, rtime, stime + utime);

-update:
         /*
          * Make sure stime doesn't go backwards; this preserves monotonicity
          * for utime because rtime is monotonic.


What do you think?

Thank you!
--
Gustavo A. R. Silva