Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751611AbdF1Ffo (ORCPT ); Wed, 28 Jun 2017 01:35:44 -0400 Received: from mail-ua0-f194.google.com ([209.85.217.194]:35119 "EHLO mail-ua0-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751501AbdF1Ffg (ORCPT ); Wed, 28 Jun 2017 01:35:36 -0400 MIME-Version: 1.0 In-Reply-To: <20170627180353.Horde.IIRh_TLNBuucfm8xZhbOEix@gator4166.hostgator.com> References: <20170627180353.Horde.IIRh_TLNBuucfm8xZhbOEix@gator4166.hostgator.com> From: Frans Klaver Date: Wed, 28 Jun 2017 07:35:35 +0200 Message-ID: Subject: Re: [kernel-sched-cputime] question about probable bug in cputime_adjust() To: "Gustavo A. R. Silva" Cc: Ingo Molnar , Peter Zijlstra , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4702 Lines: 139 On Wed, Jun 28, 2017 at 1:03 AM, Gustavo A. R. Silva wrote: > > Hello everybody, > > While looking into Coverity ID 1371643 I ran into the following piece of > code at kernel/sched/cputime.c:568: > > 568/* > 569 * Adjust tick based cputime random precision against scheduler runtime > 570 * accounting. > 571 * > 572 * Tick based cputime accounting depend on random scheduling timeslices > of a > 573 * task to be interrupted or not by the timer. Depending on these > 574 * circumstances, the number of these interrupts may be over or > 575 * under-optimistic, matching the real user and system cputime with a > variable > 576 * precision. > 577 * > 578 * Fix this by scaling these tick based values against the total runtime > 579 * accounted by the CFS scheduler. > 580 * > 581 * This code provides the following guarantees: > 582 * > 583 * stime + utime == rtime > 584 * stime_i+1 >= stime_i, utime_i+1 >= utime_i > 585 * > 586 * Assuming that rtime_i+1 >= rtime_i. > 587 */ > 588static void cputime_adjust(struct task_cputime *curr, > 589 struct prev_cputime *prev, > 590 u64 *ut, u64 *st) > 591{ > 592 u64 rtime, stime, utime; > 593 unsigned long flags; > 594 > 595 /* Serialize concurrent callers such that we can honour our > guarantees */ > 596 raw_spin_lock_irqsave(&prev->lock, flags); > 597 rtime = curr->sum_exec_runtime; > 598 > 599 /* > 600 * This is possible under two circumstances: > 601 * - rtime isn't monotonic after all (a bug); > 602 * - we got reordered by the lock. > 603 * > 604 * In both cases this acts as a filter such that the rest of the > code > 605 * can assume it is monotonic regardless of anything else. > 606 */ > 607 if (prev->stime + prev->utime >= rtime) > 608 goto out; > 609 > 610 stime = curr->stime; > 611 utime = curr->utime; > 612 > 613 /* > 614 * If either stime or both stime and utime are 0, assume all > runtime is > 615 * userspace. Once a task gets some ticks, the monotonicy code at > 616 * 'update' will ensure things converge to the observed ratio. > 617 */ > 618 if (stime == 0) { > 619 utime = rtime; > 620 goto update; > 621 } > 622 > 623 if (utime == 0) { > 624 stime = rtime; > 625 goto update; > 626 } > 627 > 628 stime = scale_stime(stime, rtime, stime + utime); > 629 > 630update: > 631 /* > 632 * Make sure stime doesn't go backwards; this preserves > monotonicity > 633 * for utime because rtime is monotonic. > 634 * > 635 * utime_i+1 = rtime_i+1 - stime_i > 636 * = rtime_i+1 - (rtime_i - utime_i) > 637 * = (rtime_i+1 - rtime_i) + utime_i > 638 * >= utime_i > 639 */ > 640 if (stime < prev->stime) > 641 stime = prev->stime; > 642 utime = rtime - stime; > 643 > 644 /* > 645 * Make sure utime doesn't go backwards; this still preserves > 646 * monotonicity for stime, analogous argument to above. > 647 */ > 648 if (utime < prev->utime) { > 649 utime = prev->utime; > 650 stime = rtime - utime; > 651 } > 652 > 653 prev->stime = stime; > 654 prev->utime = utime; > 655out: > 656 *ut = prev->utime; > 657 *st = prev->stime; > 658 raw_spin_unlock_irqrestore(&prev->lock, flags); > 659} > > > The issue here is that the value assigned to variable utime at line 619 is > overwritten at line 642, which would make such variable assignment useless. It isn't completely useless, since utime is used in line 628 to calculate stime. > But I'm suspicious that such assignment is actually correct and that line > 642 should be included into the IF block at line 640. Something similar to > the following patch: > > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -637,9 +637,10 @@ static void cputime_adjust(struct task_cputime *curr, > * = (rtime_i+1 - rtime_i) + utime_i > * >= utime_i > */ > - if (stime < prev->stime) > + if (stime < prev->stime) { > stime = prev->stime; > - utime = rtime - stime; > + utime = rtime - stime; > + } > > > If you confirm this, I will send a patch in a full and proper form. > > I'd really appreciate your comments. If you do that, how would you meet the guarantee made in line 583? Frans