Received: by 2002:ac0:a5a6:0:0:0:0:0 with SMTP id m35-v6csp3293912imm; Fri, 24 Aug 2018 14:19:39 -0700 (PDT) X-Google-Smtp-Source: ANB0VdbadhqehfZA0KFrFkTTGCQU0r9iwxxTcEMSA6bAek2ZJxTnxEHUknaZYmjkEsCpvRazDCnJ X-Received: by 2002:a62:c182:: with SMTP id i124-v6mr3568202pfg.248.1535145579314; Fri, 24 Aug 2018 14:19:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1535145579; cv=none; d=google.com; s=arc-20160816; b=TxxfGqokX8ZdCvrG/tNT+zyQawjfohF2B0Pz9U5+xLJK1GIG1+JmWVE+1PcZMi6m4d MRLIwIxB612oD3BicDcSGWzA8T1ZPaO7y1GgCqA89QGFcblYOPmLiZNdJZkIhxoSGCPM ICZ+oFP5V7oKVC3TSuuWRzYYD1otdpZFtBSzdKbdGAKHaGdEe+QUTWlP5ovUwHJjTgeU Ycmpfd30OC4YH0Zmv42fBSRKvFPq55wpjJkvkuBaIutibGEiTVTFmtz9gKl6z81j4MTj 36ZbfvGrLdyaY7bbt4j45oRiM1MOUDKS6+A0/hEN+Q7UUpohmoQLoAKuKsVFHRwgTSwE +mgw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:dkim-signature :arc-authentication-results; bh=uuvu11NH8YH24kGqArIE/COOEs5QbC624UmtiN8D7ps=; b=rtMG/AkZwR1x11VlDeHUec1TRQ/WmOu6y7erSOG/bqqogeYZvzRsu1t2xspz/HFhLM bNF786Irdn1ZM2UHYrfefRUX4iCbG+55+kdbYuj4516xGLXjMn3c0NnKof+Cr1wW2qN7 /XP/Go1IMLDTCxpF2gi00s6Y/10RsOz4Ai+CRcbwk1f1m+2V3l0nF8hkaLFkZn2ZukK9 5J4ued56rzFp8xWm6SB/deHwfVPO+16O7RMy/dwQwI/+3Hdn85mzHpCgJMjiV9ZIhi1S 5v6wt9R0G3lmGgcOJyQvA0e51W56Ix/+NlOpI4+IosaIfklgVj8exLpsVSfJFJcrXJNe 5zhQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=gHNWJFTy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k130-v6si8720764pfc.282.2018.08.24.14.19.24; Fri, 24 Aug 2018 14:19:39 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=gHNWJFTy; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727965AbeHYAxw (ORCPT + 99 others); Fri, 24 Aug 2018 20:53:52 -0400 Received: from mail-pg1-f196.google.com ([209.85.215.196]:46024 "EHLO mail-pg1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726508AbeHYAxw (ORCPT ); Fri, 24 Aug 2018 20:53:52 -0400 Received: by mail-pg1-f196.google.com with SMTP id m4-v6so4305391pgv.12 for ; Fri, 24 Aug 2018 14:17:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=uuvu11NH8YH24kGqArIE/COOEs5QbC624UmtiN8D7ps=; b=gHNWJFTyA+P95G2/Qg9CGDCcB9kTRSWVn2wC1GTkKgxjAe8cSA3waGU/2FqqQY4Xvd RWPHw7ldWLHkAopsM61F5MSv54wDZhlWaNJj7VqzNoq2lcKMTgxNbH/IuF25/6eTViCM qTZ6yTS7ts2vHEmddUxIxzV6sO1KvraLYFeosHZAcA1InRwclSW1sNJL5pNfcdLpQIEE qoGknx8zfrYYiRhY+U2G9xgX+i9oUIdw1n/nk7upaMS3ECecqg0fVU2blaCLRFpd6zCH 39kfrZUnrLWzvoJ9eAIcOhyjOykUOzpJjmvpKLyz5ejfJQnQqIMu/0bXNdXrmkZN0TAJ SfqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=uuvu11NH8YH24kGqArIE/COOEs5QbC624UmtiN8D7ps=; b=aqjPQj/7UUx9hyTXXBVFMebLsd43MPdinqMC/C5+0RiOGTbKGBQG00+cl02B0McU92 EqdR7uq62AdyxRSVebFzsBhczgU5htpNufzzAYCqIBwKLIbc/d8HNr8wx/++wDfoiTJY zL1303MoD0OGQFs0dj+bh99FFk/k6tqPDxRNSmGb5RpijiBLom3poOrglhk/8FhBJRw6 wk3YjhWk5ogquFklKJ3Gi0JQuGrQOYGs/AGK5DI6XCYlSQ7sHcLzRbenNHu+ZCrwxL7g N6b2O8s+xT1E5KuIu0T6aPhjvr9oSAhnfFmoKwAGjDiHoZXl3aAFbasEuznndCVFRbPc WB3A== X-Gm-Message-State: APzg51DFsCKL6EVNF458qi0CNcqM1n1u+FZ7D3Ktz7GSrsDbAEbPJk7r JiHFBLH9apu7s/leg0QU6srwHQ== X-Received: by 2002:a62:c182:: with SMTP id i124-v6mr3561677pfg.248.1535145451894; Fri, 24 Aug 2018 14:17:31 -0700 (PDT) Received: from smuckle.san.corp.google.com ([2620:15c:2d:3:fa74:b312:5fef:6cbf]) by smtp.gmail.com with ESMTPSA id e7-v6sm17747975pgc.55.2018.08.24.14.17.30 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 24 Aug 2018 14:17:31 -0700 (PDT) Subject: Re: [PATCH] sched/fair: vruntime should normalize when switching from fair To: Juri Lelli , Dietmar Eggemann Cc: Miguel de Dios , Peter Zijlstra , Ingo Molnar , linux-kernel@vger.kernel.org, kernel-team@android.com, Todd Kjos , Paul Turner , Quentin Perret , Patrick Bellasi , Chris Redpath , Morten Rasmussen , John Dias References: <20180817182728.76129-1-smuckle@google.com> <20180824065419.GB24860@localhost.localdomain> From: Steve Muckle Message-ID: <2ddcbabf-d78f-d497-d6d4-8aa6a8947bef@google.com> Date: Fri, 24 Aug 2018 14:17:29 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.6.0 MIME-Version: 1.0 In-Reply-To: <20180824065419.GB24860@localhost.localdomain> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-GB Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08/23/2018 11:54 PM, Juri Lelli wrote: >> I tried to catch this issue on my Arm64 Juno board using pi_test (and a >> slightly adapted pip_test (usleep_val = 1500 and keep low as cfs)) from >> rt-tests but wasn't able to do so. >> >> # pi_stress --inversions=1 --duration=1 --groups=1 --sched id=low,policy=cfs >> >> Starting PI Stress Test >> Number of thread groups: 1 >> Duration of test run: 1 seconds >> Number of inversions per group: 1 >> Admin thread SCHED_FIFO priority 4 >> 1 groups of 3 threads will be created >> High thread SCHED_FIFO priority 3 >> Med thread SCHED_FIFO priority 2 >> Low thread SCHED_OTHER nice 0 >> >> # ./pip_stress >> >> In both cases, the cfs task entering rt_mutex_setprio() is queued, so >> dequeue_task_fair()->dequeue_entity(), which subtracts cfs_rq->min_vruntime >> from se->vruntime, is called on it before it gets the rt prio. >> >> Maybe it requires a very specific use of the pthread library to provoke this >> issue by making sure that the cfs tasks really blocks/sleeps? > > Maybe one could play with rt-app to recreate such specific use case? > > https://github.com/scheduler-tools/rt-app/blob/master/doc/tutorial.txt#L459 This was reproduced for me on tip of mainline by using the program at the end of this mail. It was run in a 2 CPU virtualbox VM. Relevant annotated bits of the trace: low-prio thread vruntime is 752ms pi-vruntime-tes-598 [001] d... 520.572459: sched_stat_runtime: comm=pi-vruntime-tes pid=598 runtime=29953 [ns] vruntime=752888705 [ns] low-prio thread waits on a_sem pi-vruntime-tes-598 [001] d... 520.572465: sched_switch: prev_comm=pi-vruntime-tes prev_pid=598 prev_prio=120 prev_state=D ==> next_comm=swapper/1 next_pid=0 next_prio=120 high prio thread finishes wakeup, then sleeps for 1ms -0 [000] dNh. 520.572483: sched_wakeup: comm=pi-vruntime-tes pid=597 prio=19 target_cpu=000 -0 [000] d... 520.572486: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=pi-vruntime-tes next_pid=597 next_prio=19 pi-vruntime-tes-597 [000] d... 520.572498: sched_switch: prev_comm=pi-vruntime-tes prev_pid=597 prev_prio=19 prev_state=D ==> next_comm=swapper/0 next_pid=0 next_prio=120 high prio thread wakes up after 1ms sleep, posts a_sem which starts to wake low-prio thread, then tries to grab pi_mutex, which low-prio thread has -0 [000] d.h. 520.573876: sched_waking: comm=pi-vruntime-tes pid=597 prio=19 target_cpu=000 -0 [000] dNh. 520.573879: sched_wakeup: comm=pi-vruntime-tes pid=597 prio=19 target_cpu=000 -0 [000] d... 520.573887: sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=pi-vruntime-tes next_pid=597 next_prio=19 pi-vruntime-tes-597 [000] d... 520.573895: sched_waking: comm=pi-vruntime-tes pid=598 prio=120 target_cpu=001 low-prio thread pid 598 gets pi_mutex priority inheritance, this happens while low-prio thread is in waking state pi-vruntime-tes-597 [000] d... 520.573911: sched_pi_setprio: comm=pi-vruntime-tes pid=598 oldprio=120 newprio=19 high-prio thread sleeps on pi_mutex pi-vruntime-tes-597 [000] d... 520.573919: sched_switch: prev_comm=pi-vruntime-tes prev_pid=597 prev_prio=19 prev_state=D ==> next_comm=swapper/0 next_pid=0 next_prio=120 low-prio thread finishes wakeup -0 [001] dNh. 520.573932: sched_wakeup: comm=pi-vruntime-tes pid=598 prio=19 target_cpu=001 -0 [001] d... 520.573936: sched_switch: prev_comm=swapper/1 prev_pid=0 prev_prio=120 prev_state=S ==> next_comm=pi-vruntime-tes next_pid=598 next_prio=19 low-prio thread releases pi-mutex, loses pi boost, high-prio thread wakes for pi-mutex pi-vruntime-tes-598 [001] d... 520.573946: sched_pi_setprio: comm=pi-vruntime-tes pid=598 oldprio=19 newprio=120 pi-vruntime-tes-598 [001] dN.. 520.573954: sched_waking: comm=pi-vruntime-tes pid=597 prio=19 target_cpu=000 low-prio thread vruntime is 1505ms pi-vruntime-tes-598 [001] dN.. 520.573966: sched_stat_runtime: comm=pi-vruntime-tes pid=598 runtime=20150 [ns] vruntime=1505797560 [ns] The timing is quite sensitive since the task being boosted has to be caught in the TASK_WAKING state. The program: /* * Test case for vruntime management during rtmutex priority inheritance * promotion and demotion. * * build with -lpthread */ #define _GNU_SOURCE #include #include #include #include #include #define ERROR_CHECK(x) \ if (x) \ fprintf(stderr, "Error at line %d", __LINE__); pthread_mutex_t pi_mutex; sem_t a_sem; sem_t b_sem; void *rt_thread_func(void *arg) { int policy; int i = 0; cpu_set_t cpuset; CPU_ZERO(&cpuset); CPU_SET(0, &cpuset); ERROR_CHECK(pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset)); while(i++ < 100) { sem_wait(&b_sem); usleep(1000); sem_post(&a_sem); pthread_mutex_lock(&pi_mutex); pthread_mutex_unlock(&pi_mutex); } } void *low_prio_thread_func(void *arg) { int i = 0; cpu_set_t cpuset; CPU_ZERO(&cpuset); CPU_SET(1, &cpuset); ERROR_CHECK(pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), &cpuset)); while(i++ < 100) { pthread_mutex_lock(&pi_mutex); sem_post(&b_sem); sem_wait(&a_sem); pthread_mutex_unlock(&pi_mutex); } } int main() { pthread_t rt_thread; pthread_t low_prio_thread; pthread_attr_t rt_thread_attrs; pthread_attr_t low_prio_thread_attrs; struct sched_param rt_thread_sched_params; struct sched_param low_prio_thread_sched_params; pthread_mutexattr_t mutex_attrs; sem_init(&a_sem, 0, 0); sem_init(&b_sem, 0, 0); ERROR_CHECK(pthread_mutexattr_init(&mutex_attrs)); ERROR_CHECK(pthread_mutexattr_setprotocol(&mutex_attrs, PTHREAD_PRIO_INHERIT)); ERROR_CHECK(pthread_mutex_init(&pi_mutex, &mutex_attrs)); rt_thread_sched_params.sched_priority = 80; low_prio_thread_sched_params.sched_priority = 0; pthread_attr_init(&rt_thread_attrs); pthread_attr_init(&low_prio_thread_attrs); ERROR_CHECK(pthread_attr_setinheritsched(&rt_thread_attrs, PTHREAD_EXPLICIT_SCHED)); ERROR_CHECK(pthread_attr_setinheritsched(&low_prio_thread_attrs, PTHREAD_EXPLICIT_SCHED)); ERROR_CHECK(pthread_attr_setschedpolicy(&rt_thread_attrs, SCHED_FIFO)); ERROR_CHECK(pthread_attr_setschedpolicy(&low_prio_thread_attrs, SCHED_OTHER)); ERROR_CHECK(pthread_attr_setschedparam(&rt_thread_attrs, &rt_thread_sched_params)); ERROR_CHECK(pthread_attr_setschedparam(&low_prio_thread_attrs, &low_prio_thread_sched_params)); ERROR_CHECK(pthread_create(&rt_thread, &rt_thread_attrs, rt_thread_func, NULL)); ERROR_CHECK(pthread_create(&low_prio_thread, &low_prio_thread_attrs, low_prio_thread_func, NULL)); ERROR_CHECK(pthread_join(rt_thread, NULL)); ERROR_CHECK(pthread_join(low_prio_thread, NULL)); return 0; }