Received: by 2002:a05:6a10:a841:0:0:0:0 with SMTP id d1csp3508753pxy; Mon, 26 Apr 2021 03:32:39 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxAX19yVMF65FFq5I8PqIOnCK8Fu6aaMrgu4XR1Tindj1OsnUJpuK0SjtUPXMaFZTB7u2Ei X-Received: by 2002:a63:e114:: with SMTP id z20mr16066979pgh.388.1619433159603; Mon, 26 Apr 2021 03:32:39 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1619433159; cv=none; d=google.com; s=arc-20160816; b=LtgL3JrIKNIMKanfu/asoOeHHsBKUS/T9KdeQOH3xicGEiCaDBLpV1kZE7ECCG+6mi xXbryYvlSk3IIoio+TqkZ1U5K0ohMYHwyEMKcrvyWMhWYEGcK8UJZEl8C2Z1waSc3Syc ldHjHih8tjLxnIZ0l5ArL2zjFn6jagwfpJaEZl1U681TD34Lp+DukixeQv75TLsbjm+e O/3eBpLNgDvRKmFL13xr4rTdJmEzO2ot3i1l/SkuzTPFve9wE2ku4W8YL2qizW5kIez0 7rNfN8ka7rnHqSB4OmE1tJZtWYE7GjyOJDbtCA/Ett3DoW3XoVUphjFz6+i+ClWTAwDj 85rA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:reply-to:message-id:subject:cc:to:from:date :dkim-signature; bh=OY/Vz0TTSll4veQ0tws+YMGlTri+mQfL3/jPhpoTd60=; b=ZZMDZhv+8WHkxqtlLT8fprymF4LXK192Mf/RqSYGdO1RFRcAS5Lu/UKuO13lBzdWwC ACmzP93odztEv8pg1z5Lka1/MNnyGR5YUZFr4AYryb+ZHPKereUflQ6mt1X/FLmeLLtA UgIvBFt5Hn9kiuHDJ1Bvvbe/p4CDG29BL2ydwOIInHNEOaFVSzHfq/mg2BMnrmzK5H7z I0duzDB2Y77a3akNH0Ch5Uz7L346jeyoZ/W3CIRtP6GQZWMWipGvwCQt3F1qReF+27uy OmHyKaUDGI/SGPo0pzXLsRlfxhTt8zBCyyPzuKK+vFOrhEDFKAQvWiMS5WIh1jOvAYhW ZEqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=IFWfrYse; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id n18si16208444plc.365.2021.04.26.03.32.27; Mon, 26 Apr 2021 03:32:39 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@ibm.com header.s=pp1 header.b=IFWfrYse; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232656AbhDZKbp (ORCPT + 99 others); Mon, 26 Apr 2021 06:31:45 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:17076 "EHLO mx0b-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232278AbhDZKbp (ORCPT ); Mon, 26 Apr 2021 06:31:45 -0400 Received: from pps.filterd (m0127361.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.43/8.16.0.43) with SMTP id 13QA3O0v123310; Mon, 26 Apr 2021 06:30:41 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=date : from : to : cc : subject : message-id : reply-to : references : mime-version : content-type : in-reply-to; s=pp1; bh=OY/Vz0TTSll4veQ0tws+YMGlTri+mQfL3/jPhpoTd60=; b=IFWfrYse1Bsr/vm2cANG0MET47LDT0ChUzKeeEh117JXh/6HvfJJYJqLMpXKEma1wCVr dYiPQJCDHa3x0CAbULJdBT5f2PjBTMb71P/s11FN1bmo3mW/zk+rHEeddghcV3sjZlkS zrEB/udupGq9Fx6y+G6cq7uD4fmryu3joQLVuBPF962lf9nrAoTmMq4rFwOmyOQ6EECJ EXiAvtmjpCwTYMTGnA5Jd3NvRGa8AUTVCjLDyNeafMl00Rr0wsUilnE8lg8Q1C67yVdp IYQbc9E0qeTuAUJvf8AZZW23Heduvo8EoukOMengyssU0YncqZgQFBxlJfzgbErem3OO Dg== Received: from ppma01fra.de.ibm.com (46.49.7a9f.ip4.static.sl-reverse.com [159.122.73.70]) by mx0a-001b2d01.pphosted.com with ESMTP id 385u0b9epf-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Apr 2021 06:30:40 -0400 Received: from pps.filterd (ppma01fra.de.ibm.com [127.0.0.1]) by ppma01fra.de.ibm.com (8.16.0.43/8.16.0.43) with SMTP id 13QASUEN002397; Mon, 26 Apr 2021 10:30:39 GMT Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by ppma01fra.de.ibm.com with ESMTP id 384ay80dfn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Mon, 26 Apr 2021 10:30:38 +0000 Received: from d06av21.portsmouth.uk.ibm.com (d06av21.portsmouth.uk.ibm.com [9.149.105.232]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 13QAUZSE30802340 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 26 Apr 2021 10:30:35 GMT Received: from d06av21.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 9DE0E52051; Mon, 26 Apr 2021 10:30:35 +0000 (GMT) Received: from linux.vnet.ibm.com (unknown [9.126.150.29]) by d06av21.portsmouth.uk.ibm.com (Postfix) with SMTP id 6836B52054; Mon, 26 Apr 2021 10:30:33 +0000 (GMT) Date: Mon, 26 Apr 2021 16:00:32 +0530 From: Srikar Dronamraju To: Mel Gorman Cc: Ingo Molnar , Peter Zijlstra , LKML , Rik van Riel , Thomas Gleixner , Valentin Schneider , Vincent Guittot , Dietmar Eggemann , Michael Ellerman , Gautham R Shenoy , Parth Shah Subject: Re: [PATCH 00/10] sched/fair: wake_affine improvements Message-ID: <20210426103032.GI2633526@linux.vnet.ibm.com> Reply-To: Srikar Dronamraju References: <20210422102326.35889-1-srikar@linux.vnet.ibm.com> <20210423082532.GA4239@techsingularity.net> <20210423103129.GH2633526@linux.vnet.ibm.com> <20210423123854.GC4239@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20210423123854.GC4239@techsingularity.net> X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: MfUx8kq-WFdmUJhH18nWD7uX7QPWAQir X-Proofpoint-GUID: MfUx8kq-WFdmUJhH18nWD7uX7QPWAQir X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:6.0.391,18.0.761 definitions=2021-04-26_03:2021-04-26,2021-04-26 signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 adultscore=0 suspectscore=0 mlxlogscore=999 impostorscore=0 bulkscore=0 mlxscore=0 phishscore=0 clxscore=1015 spamscore=0 malwarescore=0 priorityscore=1501 lowpriorityscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2104060000 definitions=main-2104260072 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org * Mel Gorman [2021-04-23 13:38:55]: Hi Mel, > On Fri, Apr 23, 2021 at 04:01:29PM +0530, Srikar Dronamraju wrote: > > > The series also oopses a *lot* and didn't get through a run of basic > > > workloads on x86 on any of three machines. An example oops is > > > > > > > Can you pass me your failing config. I am somehow not been seeing this > > either on x86 or on Powerpc on multiple systems. > > The machines have since moved onto testing something else (Rik's patch > for newidle) but the attached config should be close enough. > > > Also if possible cat /proc/schedstat and cat > > /proc/sys/kernel/sched_domain/cpu0/domain*/name > > > > For the vanilla kernel > > SMT > MC > NUMA I was able to reproduce the problem and analyze why it would panic in cpus_share_cache. In my patch(es), we have code snippets like this. if (tsds->idle_core != -1) { if (cpumask_test_cpu(tsds->idle_core, p->cpus_ptr)) return tsds->idle_core; return this_cpu; } Here when we tested the idle_core and cpumask_test_cpu, tsds->idle_core may not have been -1; However by the time it returns, tsds->idle_core could be -1; cpus_share_cpus() then tries to find sd_llc_id for -1 and crashes. Its more easier to reproduce this on a machine with more cores in a LLC than say a Power10/Power9. Hence we are hitting this more often on x86. One way could be to save the idle_core to a local variable, but that negates the whole purpose since we may end up choosing a busy CPU. I will find a way to fix this problem. -- Thanks and Regards Srikar Dronamraju