Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751703AbdF1GDO (ORCPT ); Wed, 28 Jun 2017 02:03:14 -0400 Received: from mail-vk0-f65.google.com ([209.85.213.65]:32907 "EHLO mail-vk0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751300AbdF1GDI (ORCPT ); Wed, 28 Jun 2017 02:03:08 -0400 MIME-Version: 1.0 In-Reply-To: References: <20170627180353.Horde.IIRh_TLNBuucfm8xZhbOEix@gator4166.hostgator.com> From: Frans Klaver Date: Wed, 28 Jun 2017 08:03:07 +0200 Message-ID: Subject: Re: [kernel-sched-cputime] question about probable bug in cputime_adjust() To: "Gustavo A. R. Silva" Cc: Ingo Molnar , Peter Zijlstra , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4977 Lines: 142 On Wed, Jun 28, 2017 at 7:35 AM, Frans Klaver wrote: > On Wed, Jun 28, 2017 at 1:03 AM, Gustavo A. R. Silva > wrote: >> >> Hello everybody, >> >> While looking into Coverity ID 1371643 I ran into the following piece of >> code at kernel/sched/cputime.c:568: >> >> 568/* >> 569 * Adjust tick based cputime random precision against scheduler runtime >> 570 * accounting. >> 571 * >> 572 * Tick based cputime accounting depend on random scheduling timeslices >> of a >> 573 * task to be interrupted or not by the timer. Depending on these >> 574 * circumstances, the number of these interrupts may be over or >> 575 * under-optimistic, matching the real user and system cputime with a >> variable >> 576 * precision. >> 577 * >> 578 * Fix this by scaling these tick based values against the total runtime >> 579 * accounted by the CFS scheduler. >> 580 * >> 581 * This code provides the following guarantees: >> 582 * >> 583 * stime + utime == rtime >> 584 * stime_i+1 >= stime_i, utime_i+1 >= utime_i >> 585 * >> 586 * Assuming that rtime_i+1 >= rtime_i. >> 587 */ >> 588static void cputime_adjust(struct task_cputime *curr, >> 589 struct prev_cputime *prev, >> 590 u64 *ut, u64 *st) >> 591{ >> 592 u64 rtime, stime, utime; >> 593 unsigned long flags; >> 594 >> 595 /* Serialize concurrent callers such that we can honour our >> guarantees */ >> 596 raw_spin_lock_irqsave(&prev->lock, flags); >> 597 rtime = curr->sum_exec_runtime; >> 598 >> 599 /* >> 600 * This is possible under two circumstances: >> 601 * - rtime isn't monotonic after all (a bug); >> 602 * - we got reordered by the lock. >> 603 * >> 604 * In both cases this acts as a filter such that the rest of the >> code >> 605 * can assume it is monotonic regardless of anything else. >> 606 */ >> 607 if (prev->stime + prev->utime >= rtime) >> 608 goto out; >> 609 >> 610 stime = curr->stime; >> 611 utime = curr->utime; >> 612 >> 613 /* >> 614 * If either stime or both stime and utime are 0, assume all >> runtime is >> 615 * userspace. Once a task gets some ticks, the monotonicy code at >> 616 * 'update' will ensure things converge to the observed ratio. >> 617 */ >> 618 if (stime == 0) { >> 619 utime = rtime; >> 620 goto update; >> 621 } >> 622 >> 623 if (utime == 0) { >> 624 stime = rtime; >> 625 goto update; >> 626 } >> 627 >> 628 stime = scale_stime(stime, rtime, stime + utime); >> 629 >> 630update: >> 631 /* >> 632 * Make sure stime doesn't go backwards; this preserves >> monotonicity >> 633 * for utime because rtime is monotonic. >> 634 * >> 635 * utime_i+1 = rtime_i+1 - stime_i >> 636 * = rtime_i+1 - (rtime_i - utime_i) >> 637 * = (rtime_i+1 - rtime_i) + utime_i >> 638 * >= utime_i >> 639 */ >> 640 if (stime < prev->stime) >> 641 stime = prev->stime; >> 642 utime = rtime - stime; >> 643 >> 644 /* >> 645 * Make sure utime doesn't go backwards; this still preserves >> 646 * monotonicity for stime, analogous argument to above. >> 647 */ >> 648 if (utime < prev->utime) { >> 649 utime = prev->utime; >> 650 stime = rtime - utime; >> 651 } >> 652 >> 653 prev->stime = stime; >> 654 prev->utime = utime; >> 655out: >> 656 *ut = prev->utime; >> 657 *st = prev->stime; >> 658 raw_spin_unlock_irqrestore(&prev->lock, flags); >> 659} >> >> >> The issue here is that the value assigned to variable utime at line 619 is >> overwritten at line 642, which would make such variable assignment useless. > > It isn't completely useless, since utime is used in line 628 to calculate stime. Oh, I missed 'goto update'. Never mind about this one. >> But I'm suspicious that such assignment is actually correct and that line >> 642 should be included into the IF block at line 640. Something similar to >> the following patch: >> >> --- a/kernel/sched/cputime.c >> +++ b/kernel/sched/cputime.c >> @@ -637,9 +637,10 @@ static void cputime_adjust(struct task_cputime *curr, >> * = (rtime_i+1 - rtime_i) + utime_i >> * >= utime_i >> */ >> - if (stime < prev->stime) >> + if (stime < prev->stime) { >> stime = prev->stime; >> - utime = rtime - stime; >> + utime = rtime - stime; >> + } >> >> >> If you confirm this, I will send a patch in a full and proper form. >> >> I'd really appreciate your comments. > > If you do that, how would you meet the guarantee made in line 583? > > Frans