Received: by 2002:a25:c205:0:0:0:0:0 with SMTP id s5csp5687678ybf; Thu, 5 Mar 2020 05:19:14 -0800 (PST) X-Google-Smtp-Source: ADFU+vvUrQzz742awxLVhjb8mZFE0trMEICNwA9PZ23TGochqFJq7SvIFCkblNnJkUedxY67KVhW X-Received: by 2002:a9d:7a98:: with SMTP id l24mr6394810otn.62.1583414354718; Thu, 05 Mar 2020 05:19:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1583414354; cv=none; d=google.com; s=arc-20160816; b=iUjkBQBksv6YvkWHWTzVLHFrRZR5PEgwpzOrPLJEdRI/qRIFQxfk7KVbpGCACCu5eH Jj+X0gXcrabWO4gfr88bvMjqI3US80H0IUgLgqHLye4g5Cw1ndcxmQ9CVjWpBKVOGs4v hkNO0as53QKUWc9zYYIzhRODCWABGap9Y1kv96mOA6E+qOxv1NXp9XHh7QrMW1zA4TS3 tOA8M5fRFlCwuaTEMcSPFUIkDxeG8/XssRFqxh9nv52RzV/+BAMWrQY+FDpeAvxXVDvX +pq3q3InS+7HSJvpfxkjgRF7l8h7odpjMsYcGw2lK4GZn21ycC68vgqSheHHO+EbgED8 7ShA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date:autocrypt :references:cc:to:from:subject; bh=VCPAZ9M0VeUqAWwD7d+45Mac7Q03/2FCdf0FBF81u58=; b=1Ak1W1dC/NW2YlkEdoxGWTB5bZoTsK686aFY1Nq6vNl++kcjh2dUU/NAB4HvhH/6Oa nFEyqWe1XKGgJZ9c58JgP210ZKy7ErM3AOYVcWOm6rQ8aouqu9VHxeVa7BU5Nx6Zf2WY IST0NRUhJFkFP88sqqOubVPcI0HpVCXIiHzC/6PNEQB++nKhppd/kdbA+HnGOGveMZeq lvP/O+Z1jUQTIuoWqtwZOZ0tFW9ygfdUIc4FDEzPfL6rhxCj+w7rXb94MiOFOfQnU82w +4bxnDszH+EzpjmKrpMYVQ1tkkC1lscEm3mQUhFfvALO9q7zrhRPKfvUEgiBDnrpvTVc BFnA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id c8si3516939oto.95.2020.03.05.05.19.01; Thu, 05 Mar 2020 05:19:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726079AbgCENSe (ORCPT + 99 others); Thu, 5 Mar 2020 08:18:34 -0500 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:25688 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725974AbgCENSe (ORCPT ); Thu, 5 Mar 2020 08:18:34 -0500 Received: from pps.filterd (m0098393.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.42/8.16.0.42) with SMTP id 025DElDm066165 for ; Thu, 5 Mar 2020 08:18:33 -0500 Received: from e06smtp05.uk.ibm.com (e06smtp05.uk.ibm.com [195.75.94.101]) by mx0a-001b2d01.pphosted.com with ESMTP id 2yj8hcwfm1-1 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NOT) for ; Thu, 05 Mar 2020 08:18:33 -0500 Received: from localhost by e06smtp05.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 5 Mar 2020 13:18:31 -0000 Received: from b06cxnps3075.portsmouth.uk.ibm.com (9.149.109.195) by e06smtp05.uk.ibm.com (192.168.101.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Thu, 5 Mar 2020 13:18:29 -0000 Received: from d06av24.portsmouth.uk.ibm.com (d06av24.portsmouth.uk.ibm.com [9.149.105.60]) by b06cxnps3075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 025DISUs59900138 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 5 Mar 2020 13:18:28 GMT Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 0EF724204B; Thu, 5 Mar 2020 13:18:28 +0000 (GMT) Received: from d06av24.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D286642041; Thu, 5 Mar 2020 13:18:27 +0000 (GMT) Received: from oc7455500831.ibm.com (unknown [9.152.224.141]) by d06av24.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 5 Mar 2020 13:18:27 +0000 (GMT) Subject: Re: 5.6-rc3: WARNING: CPU: 48 PID: 17435 at kernel/sched/fair.c:380 enqueue_task_fair+0x328/0x440 From: Christian Borntraeger To: Vincent Guittot , Dietmar Eggemann Cc: Ingo Molnar , Peter Zijlstra , "linux-kernel@vger.kernel.org" References: <2108173c-beaa-6b84-1bc3-8f575fb95954@de.ibm.com> <7be92e79-731b-220d-b187-d38bde80ad16@arm.com> <805cbe05-2424-7d74-5e11-37712c189eb6@de.ibm.com> <20200305093003.GA32088@vingu-book> <15252de5-9a2d-19ae-607a-594ee88d1ba1@de.ibm.com> <20200305123351.GB32088@vingu-book> Autocrypt: addr=borntraeger@de.ibm.com; prefer-encrypt=mutual; keydata= xsFNBE6cPPgBEAC2VpALY0UJjGmgAmavkL/iAdqul2/F9ONz42K6NrwmT+SI9CylKHIX+fdf J34pLNJDmDVEdeb+brtpwC9JEZOLVE0nb+SR83CsAINJYKG3V1b3Kfs0hydseYKsBYqJTN2j CmUXDYq9J7uOyQQ7TNVoQejmpp5ifR4EzwIFfmYDekxRVZDJygD0wL/EzUr8Je3/j548NLyL 4Uhv6CIPf3TY3/aLVKXdxz/ntbLgMcfZsDoHgDk3lY3r1iwbWwEM2+eYRdSZaR4VD+JRD7p8 0FBadNwWnBce1fmQp3EklodGi5y7TNZ/CKdJ+jRPAAnw7SINhSd7PhJMruDAJaUlbYaIm23A +82g+IGe4z9tRGQ9TAflezVMhT5J3ccu6cpIjjvwDlbxucSmtVi5VtPAMTLmfjYp7VY2Tgr+ T92v7+V96jAfE3Zy2nq52e8RDdUo/F6faxcumdl+aLhhKLXgrozpoe2nL0Nyc2uqFjkjwXXI OBQiaqGeWtxeKJP+O8MIpjyGuHUGzvjNx5S/592TQO3phpT5IFWfMgbu4OreZ9yekDhf7Cvn /fkYsiLDz9W6Clihd/xlpm79+jlhm4E3xBPiQOPCZowmHjx57mXVAypOP2Eu+i2nyQrkapaY IdisDQfWPdNeHNOiPnPS3+GhVlPcqSJAIWnuO7Ofw1ZVOyg/jwARAQABzUNDaHJpc3RpYW4g Qm9ybnRyYWVnZXIgKDJuZCBJQk0gYWRkcmVzcykgPGJvcm50cmFlZ2VyQGxpbnV4LmlibS5j b20+wsF5BBMBAgAjBQJdP/hMAhsDBwsJCAcDAgEGFQgCCQoLBBYCAwECHgECF4AACgkQEXu8 gLWmHHy/pA/+JHjpEnd01A0CCyfVnb5fmcOlQ0LdmoKWLWPvU840q65HycCBFTt6V62cDljB kXFFxMNA4y/2wqU0H5/CiL963y3gWIiJsZa4ent+KrHl5GK1nIgbbesfJyA7JqlB0w/E/SuY NRQwIWOo/uEvOgXnk/7+rtvBzNaPGoGiiV1LZzeaxBVWrqLtmdi1iulW/0X/AlQPuF9dD1Px hx+0mPjZ8ClLpdSp5d0yfpwgHtM1B7KMuQPQZGFKMXXTUd3ceBUGGczsgIMipZWJukqMJiJj QIMH0IN7XYErEnhf0GCxJ3xAn/J7iFpPFv8sFZTvukntJXSUssONnwiKuld6ttUaFhSuSoQg OFYR5v7pOfinM0FcScPKTkrRsB5iUvpdthLq5qgwdQjmyINt3cb+5aSvBX2nNN135oGOtlb5 tf4dh00kUR8XFHRrFxXx4Dbaw4PKgV3QLIHKEENlqnthH5t0tahDygQPnSucuXbVQEcDZaL9 WgJqlRAAj0pG8M6JNU5+2ftTFXoTcoIUbb0KTOibaO9zHVeGegwAvPLLNlKHiHXcgLX1tkjC DrvE2Z0e2/4q7wgZgn1kbvz7ZHQZB76OM2mjkFu7QNHlRJ2VXJA8tMXyTgBX6kq1cYMmd/Hl OhFrAU3QO1SjCsXA2CDk9MM1471mYB3CTXQuKzXckJnxHkHOwU0ETpw8+AEQAJjyNXvMQdJN t07BIPDtbAQk15FfB0hKuyZVs+0lsjPKBZCamAAexNRk11eVGXK/YrqwjChkk60rt3q5i42u PpNMO9aS8cLPOfVft89Y654Qd3Rs1WRFIQq9xLjdLfHh0i0jMq5Ty+aiddSXpZ7oU6E+ud+X Czs3k5RAnOdW6eV3+v10sUjEGiFNZwzN9Udd6PfKET0J70qjnpY3NuWn5Sp1ZEn6lkq2Zm+G 9G3FlBRVClT30OWeiRHCYB6e6j1x1u/rSU4JiNYjPwSJA8EPKnt1s/Eeq37qXXvk+9DYiHdT PcOa3aNCSbIygD3jyjkg6EV9ZLHibE2R/PMMid9FrqhKh/cwcYn9FrT0FE48/2IBW5mfDpAd YvpawQlRz3XJr2rYZJwMUm1y+49+1ZmDclaF3s9dcz2JvuywNq78z/VsUfGz4Sbxy4ShpNpG REojRcz/xOK+FqNuBk+HoWKw6OxgRzfNleDvScVmbY6cQQZfGx/T7xlgZjl5Mu/2z+ofeoxb vWWM1YCJAT91GFvj29Wvm8OAPN/+SJj8LQazd9uGzVMTz6lFjVtH7YkeW/NZrP6znAwv5P1a DdQfiB5F63AX++NlTiyA+GD/ggfRl68LheSskOcxDwgI5TqmaKtX1/8RkrLpnzO3evzkfJb1 D5qh3wM1t7PZ+JWTluSX8W25ABEBAAHCwV8EGAECAAkFAk6cPPgCGwwACgkQEXu8gLWmHHz8 2w//VjRlX+tKF3szc0lQi4X0t+pf88uIsvR/a1GRZpppQbn1jgE44hgF559K6/yYemcvTR7r 6Xt7cjWGS4wfaR0+pkWV+2dbw8Xi4DI07/fN00NoVEpYUUnOnupBgychtVpxkGqsplJZQpng v6fauZtyEcUK3dLJH3TdVQDLbUcL4qZpzHbsuUnTWsmNmG4Vi0NsEt1xyd/Wuw+0kM/oFEH1 4BN6X9xZcG8GYUbVUd8+bmio8ao8m0tzo4pseDZFo4ncDmlFWU6hHnAVfkAs4tqA6/fl7RLN JuWBiOL/mP5B6HDQT9JsnaRdzqF73FnU2+WrZPjinHPLeE74istVgjbowvsgUqtzjPIG5pOj cAsKoR0M1womzJVRfYauWhYiW/KeECklci4TPBDNx7YhahSUlexfoftltJA8swRshNA/M90/ i9zDo9ySSZHwsGxG06ZOH5/MzG6HpLja7g8NTgA0TD5YaFm/oOnsQVsf2DeAGPS2xNirmknD jaqYefx7yQ7FJXXETd2uVURiDeNEFhVZWb5CiBJM5c6qQMhmkS4VyT7/+raaEGgkEKEgHOWf ZDP8BHfXtszHqI3Fo1F4IKFo/AP8GOFFxMRgbvlAs8z/+rEEaQYjxYJqj08raw6P4LFBqozr nS4h0HDFPrrp1C2EMVYIQrMokWvlFZbCpsdYbBI= Date: Thu, 5 Mar 2020 14:18:27 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit X-TM-AS-GCONF: 00 x-cbid: 20030513-0020-0000-0000-000003B0C307 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 20030513-0021-0000-0000-00002208FE83 Message-Id: <8a83d9ff-ee7b-c1a5-a315-c97f21724ae3@de.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.138,18.0.572 definitions=2020-03-05_03:2020-03-05,2020-03-05 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 priorityscore=1501 suspectscore=0 adultscore=0 phishscore=0 mlxscore=0 bulkscore=0 malwarescore=0 mlxlogscore=999 spamscore=0 lowpriorityscore=0 impostorscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2001150001 definitions=main-2003050085 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 05.03.20 13:48, Christian Borntraeger wrote: > > > On 05.03.20 13:33, Vincent Guittot wrote: >> Le jeudi 05 mars 2020 à 13:12:39 (+0100), Dietmar Eggemann a écrit : >>> On 05/03/2020 12:28, Christian Borntraeger wrote: >>>> >>>> On 05.03.20 10:30, Vincent Guittot wrote: >>>>> Le mercredi 04 mars 2020 à 20:59:33 (+0100), Christian Borntraeger a écrit : >>>>>> >>>>>> On 04.03.20 20:38, Christian Borntraeger wrote: >>>>>>> >>>>>>> >>>>>>> On 04.03.20 20:19, Dietmar Eggemann wrote: >>> >>> [...] >>> >>>> It seems to speed up the issue when I do a compile job in parallel on the host: >>>> >>>> Do you also need the sysfs tree? >>> >>> [ 87.932552] CPU23 path=/machine.slice/machine-test.slice/machine-qemu\x2d18\x2dtest10. on_list=1 nr_running=1 throttled=0 p=[CPU 2/KVM 2662] >>> [ 87.932559] CPU23 path=/machine.slice/machine-test.slice/machine-qemu\x2d18\x2dtest10. on_list=0 nr_running=3 throttled=0 p=[CPU 2/KVM 2662] >>> [ 87.932562] CPU23 path=/machine.slice/machine-test.slice on_list=1 nr_running=1 throttled=1 p=[CPU 2/KVM 2662] >>> [ 87.932564] CPU23 path=/machine.slice on_list=1 nr_running=0 throttled=0 p=[CPU 2/KVM 2662] >>> [ 87.932566] CPU23 path=/ on_list=1 nr_running=1 throttled=0 p=[CPU 2/KVM 2662] >>> [ 87.951872] CPU23 path=/ on_list=1 nr_running=2 throttled=0 p=[ksoftirqd/23 126] >>> [ 87.987528] CPU23 path=/user.slice on_list=1 nr_running=2 throttled=0 p=[as 6737] >>> [ 87.987533] CPU23 path=/ on_list=1 nr_running=1 throttled=0 p=[as 6737] >>> >>> Arrh, looks like 'char path[64]' is too small to hold 'machine.slice/machine-test.slice/machine-qemu\x2d18\x2dtest10.scope/vcpuX' ! >>> ^ >>> But I guess that the 'on_list=0' for 'machine-qemu\x2d18\x2dtest10.scope' could be the missing hint? >> >> yes the if (cfs_bandwidth_used()) at the end of enqueue_task_fair is not enough >> to ensure that all cfs will be added back. It will "work" for the 1st enqueue >> because the throttled cfs will be added and will reset tmp_alone_branch but not >> for the next one >> >> Compare to the previous proposed fix, we can optimize it a bit with: >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 9ccde775e02e..3b19e508641d 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -4035,10 +4035,16 @@ enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) >> __enqueue_entity(cfs_rq, se); >> se->on_rq = 1; >> >> - if (cfs_rq->nr_running == 1) { >> + /* >> + * When bandwidth control is enabled, cfs might have been removed because of >> + * a parent been throttled but cfs->nr_running > 1. Try to add it >> + * unconditionnally. >> + */ >> + if (cfs_rq->nr_running == 1 || cfs_bandwidth_used()) > > This needs a forward declaration for cfs_bandwidth_used, but with that it compiles fine > and its seems to work fine so far. Will keep it running for while. So I am no longer able to reproduce this issue in the last 30 minutes. As I have been able to reproduce the issue pretty quickly in the latest trials (more guests, more gcc threads) it looks like that this patch fixes the issue. I will keep it running for a day or so, but I think I can already say. Tested-by: Christian Borntraeger