Received: by 2002:a05:6a10:1a4d:0:0:0:0 with SMTP id nk13csp1339678pxb; Wed, 2 Feb 2022 02:42:39 -0800 (PST) X-Google-Smtp-Source: ABdhPJxueLl2dTMyHuX0ND4B0xbPlSOO1g/b0JHA9cCgCUx7b7OYHGv1b1Re7VlCWS9BfTfLrh/Z X-Received: by 2002:a17:903:1d1:: with SMTP id e17mr29417979plh.162.1643798558831; Wed, 02 Feb 2022 02:42:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1643798558; cv=none; d=google.com; s=arc-20160816; b=egASxTnTLv/RWyuxdqUdg2g7DH34CJgJJk74Bh3bwZvqh8ajgq6MurOpxKMZMzKS+y fndI8YP+Yz9U0Cd8HvnbSiWk40B/cPY7B44/KbD2/PP4Zd3C4FKMMEYKibtROcETZj25 q5hh7dW1WRDKkRwIi9pokCvYmX9KaNqsWO6NMCHXAS9nHP9y+v6uUDp4ZdgqXPGv/bb3 nVdabZxoVjRA0rpOjSw6n3ZDWvi5gYaYcYnZ38aKQsnnCMb834kPnAavMSwZtfTW69Mc 7Cc+jt5QwTbJL7DiSozgAiFe8UcbATl4CQmp6wH2GAagtVDMm9cCxyKetaMG1t+0m5c2 Vwjg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:content-language :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject; bh=nFwqJR2NFyscBmphvAEuzV7mtnZ+Fzumb9AZGXT0y0Q=; b=uDcrQmG6MMKkMUv+9lEwL8bxRw1jFzJ52ZBmgnt8ADQ6kMBVR/+IKf90a7rXvIdW7C Owpz2/t1EdFYr/IYJX3ERW/NlBd8pi/0JuxKfh2+IomGyJOgwu5obuAe2/eOsOaYxipa 3hz6BUZJDavRsZ4jI2sMdkpWOdjWup5Z8ZRi0gTaFXeoMY095Y5sJ05MEb90ivn+Ead2 iKmvwTlkct19uJN6H8DOyFEcxsvAfx0ZA+uCKUzGZeSrkLP8hIq5jtWjDJCv+L4Ciou4 RSiHxcqfHEYpY+6ug1NNiJbd5nC5NWmZhEQMRorxjeeFazK2JYnQ5nQZ8GeXCSeGhrAx EQXA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id i23si3584404pgm.348.2022.02.02.02.42.26; Wed, 02 Feb 2022 02:42:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240343AbiBAP2f (ORCPT + 99 others); Tue, 1 Feb 2022 10:28:35 -0500 Received: from foss.arm.com ([217.140.110.172]:46980 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240416AbiBAP2V (ORCPT ); Tue, 1 Feb 2022 10:28:21 -0500 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id DFB9711B3; Tue, 1 Feb 2022 07:28:20 -0800 (PST) Received: from [192.168.178.6] (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 498BB3F40C; Tue, 1 Feb 2022 07:28:18 -0800 (PST) Subject: Re: [PATCH v2] sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race To: Valentin Schneider , linux-kernel@vger.kernel.org, linux-rt-users@vger.kernel.org Cc: John Keeping , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira References: <20220127154059.974729-1-valentin.schneider@arm.com> From: Dietmar Eggemann Message-ID: Date: Tue, 1 Feb 2022 16:28:10 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0 MIME-Version: 1.0 In-Reply-To: <20220127154059.974729-1-valentin.schneider@arm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 27/01/2022 16:40, Valentin Schneider wrote: > John reported that push_rt_task() can end up invoking > find_lowest_rq(rq->curr) when curr is not an RT task (in this case a CFS > one), which causes mayhem down convert_prio(). > > This can happen when current gets demoted to e.g. CFS when releasing an > rt_mutex, and the local CPU gets hit with an rto_push_work irqwork before > getting the chance to reschedule. Exactly who triggers this work isn't > entirely clear to me - switched_from_rt() only invokes rt_queue_pull_task() > if there are no RT tasks on the local RQ, which means the local CPU can't > be in the rto_mask. > > My current suspected sequence is something along the lines of the below, > with the demoted task being current. > > mark_wakeup_next_waiter() > rt_mutex_adjust_prio() > rt_mutex_setprio() // deboost originally-CFS task > check_class_changed() > switched_from_rt() // Only rt_queue_pull_task() if !rq->rt.rt_nr_running > switched_to_fair() // Sets need_resched > __balance_callbacks() // if pull_rt_task(), tell_cpu_to_push() can't select local CPU per the above > raw_spin_rq_unlock(rq) > > // need_resched is set, so task_woken_rt() can't > // invoke push_rt_tasks(). Best I can come up with is > // local CPU has rt_nr_migratory >= 2 after the demotion, so stays > // in the rto_mask, and then: > > > push_rt_task() > // breakage follows here as rq->curr is CFS > > Move an existing check to check rq->curr vs the next pushable task's > priority before getting anywhere near find_lowest_rq(). While at it, add an > explicit sched_class of rq->curr check prior to invoking > find_lowest_rq(rq->curr). Align the DL logic to also reschedule regardless > of next_task's migratability. > > Link: http://lore.kernel.org/r/Yb3vXx3DcqVOi+EA@donbot > Fixes: a7c81556ec4d ("sched: Fix migrate_disable() vs rt/dl balancing") > Reported-by: John Keeping > Signed-off-by: Valentin Schneider > Tested-by: John Keeping > --- > v1 -> v2: Reworded comments, added DL part (Dietmar) > --- LGTM. Rescheduling in case rq->curr has lower prio (including CFS tasks) than next_task and bailing out in case rq->curr is DL or stop-task prevents the bug from happening. The only small issue is the fact that, unlike in push_rt_task(), the DL logic only compares DL tasks (if (dl_task(rq->curr) ...), so you miss rescheduling when rq->curr is a lower priority non-DL task. Reviewed-by: Dietmar Eggemann [...]