Received: by 10.223.164.221 with SMTP id h29csp1853465wrb; Thu, 2 Nov 2017 01:51:04 -0700 (PDT) X-Google-Smtp-Source: ABhQp+TrRWUaWtikX09XEDfMNiIF8RzImS7TBHDPuD3buXrMtqxFMFYDIlWc2DkyUPZWsLTg3Pe/ X-Received: by 10.98.139.8 with SMTP id j8mr3004158pfe.30.1509612664608; Thu, 02 Nov 2017 01:51:04 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1509612664; cv=none; d=google.com; s=arc-20160816; b=MhSjt8PWnszVXFsjXjVI3fOPiGwvmjwG8roBRKjemd9xgJCLXL6is1qolDnQaA9Vy3 cHHDtJFGTVh1kQX+8KjNwZm8tj8Z69w4XEGdeV0QBDP3/R5no2QepsCKkD+KsS91jvd+ TXeFammQht43vFZOZLUt3yi/kpHUeg6OYFrbInEkJ2MhIJheq3HI2fVNOg9I4ApudT6D jlcM1+ZFLBNloZiG34P92X4DPuNScUxXh95JMJcjGMzywUp4B3FsXb5/4mqs7sDwKCp4 BwuDdDnX1yWN3poEjMo064vBBPgJI6aaTE9V/VheKIqaolCxSy+AZnyRswTAKautBvok c1pA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:message-id:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :from:references:cc:to:subject:arc-authentication-results; bh=5BqFYRtemK/Q04I9dG6qOUkLimKQ3jjK1PURO8/UoUk=; b=khxr7Lh3NEBkGcIyPZ6AdutNaDtwaIDHoEOW/WzvlQq1/VnSTpD6ZVgeP0El2faySS poO5Mw6MnobA7kiaB9Ew+gBGcoT90Cr8rNuNQuvAFHG8JE7a+DdquYiZUBfOpxVwhJw4 bnRMBH/YEGue5n9vJyoU/6QwG8IdCfR1dYvE0EtiMZN3n7MCerYsruiV9PCavZiAIH7a G87Rhe2FYNxkXLXZUezbC+IaeSl7R2FgCyI/Md0FHmi9zvm2wpOaNT9YcG1uF7wOdeVR 6jd15snHRF54vt7TkzzI+yPhVy5T/V1Bzaqcili72nOPEW0K6+Q03DlZgy8oOFztKZfX LvOQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g13si2984756pgu.692.2017.11.02.01.50.50; Thu, 02 Nov 2017 01:51:04 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=ibm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755247AbdKBItA (ORCPT + 99 others); Thu, 2 Nov 2017 04:49:00 -0400 Received: from mx0b-001b2d01.pphosted.com ([148.163.158.5]:54720 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752567AbdKBIs5 (ORCPT ); Thu, 2 Nov 2017 04:48:57 -0400 Received: from pps.filterd (m0098413.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.21/8.16.0.21) with SMTP id vA28meZK137697 for ; Thu, 2 Nov 2017 04:48:56 -0400 Received: from e31.co.us.ibm.com (e31.co.us.ibm.com [32.97.110.149]) by mx0b-001b2d01.pphosted.com with ESMTP id 2dyyaj2u2h-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Thu, 02 Nov 2017 04:48:56 -0400 Received: from localhost by e31.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 2 Nov 2017 02:48:55 -0600 Received: from b03cxnp07028.gho.boulder.ibm.com (9.17.130.15) by e31.co.us.ibm.com (192.168.1.131) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Thu, 2 Nov 2017 02:48:51 -0600 Received: from b03ledav005.gho.boulder.ibm.com (b03ledav005.gho.boulder.ibm.com [9.17.130.236]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id vA28mpWf5243336; Thu, 2 Nov 2017 01:48:51 -0700 Received: from b03ledav005.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 519D1BE051; Thu, 2 Nov 2017 02:48:51 -0600 (MDT) Received: from [9.124.35.133] (unknown [9.124.35.133]) by b03ledav005.gho.boulder.ibm.com (Postfix) with ESMTP id 13FAABE04C; Thu, 2 Nov 2017 02:48:49 -0600 (MDT) Subject: Re: [PATCH] powerpc/perf: Fix core-imc hotplug callback failure during imc initialization To: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, maddy@linux.vnet.ibm.com References: <1509443398-26539-1-git-send-email-anju@linux.vnet.ibm.com> <87efpi95wb.fsf@concordia.ellerman.id.au> From: Anju T Sudhakar Date: Thu, 2 Nov 2017 14:18:48 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: <87efpi95wb.fsf@concordia.ellerman.id.au> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-TM-AS-GCONF: 00 x-cbid: 17110208-8235-0000-0000-00000C80AB62 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007996; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000239; SDB=6.00939985; UDB=6.00473956; IPR=6.00720241; BA=6.00005666; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00017832; XFM=3.00000015; UTC=2017-11-02 08:48:53 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17110208-8236-0000-0000-00003E4892DC Message-Id: <62edaba7-c5ff-3b0b-3f46-8f44cba2f6d5@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-11-02_02:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 impostorscore=0 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1707230000 definitions=main-1711020116 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Wednesday 01 November 2017 06:22 AM, Michael Ellerman wrote: > Anju T Sudhakar writes: > >> Call trace observed during boot: > What's the actual oops? The actual oops is: [ 0.750749] PCI: CLS 0 bytes, default 128 [ 0.750855] Unpacking initramfs... [ 1.570445] Freeing initrd memory: 23168K [ 1.571090] rtas_flash: no firmware flash support [ 1.573873] nest_capp0_imc performance monitor hardware support registered [ 1.574006] nest_capp1_imc performance monitor hardware support registered [ 1.579616] core_imc memory allocation for cpu 56 failed [ 1.579730] Unable to handle kernel paging request for data at address 0xffa400010 [ 1.579797] Faulting instruction address: 0xc000000000bf3294 0:mon> e cpu 0x0: Vector: 300 (Data Access) at [c000000ff38ff8d0] pc: c000000000bf3294: mutex_lock+0x34/0x90 lr: c000000000bf3288: mutex_lock+0x28/0x90 sp: c000000ff38ffb50 msr: 9000000002009033 dar: ffa400010 dsisr: 80000 current = 0xc000000ff383de00 paca = 0xc000000007ae0000 softe: 0 irq_happened: 0x01 pid = 13, comm = cpuhp/0 Linux version 4.11.0-39.el7a.ppc64le (mockbuild@ppc-058.build.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Tue Oct 3 07:42:44 EDT 2017 0:mon> t [c000000ff38ffb80] c0000000002ddfac perf_pmu_migrate_context+0xac/0x470 [c000000ff38ffc40] c00000000011385c ppc_core_imc_cpu_offline+0x1ac/0x1e0 [c000000ff38ffc90] c000000000125758 cpuhp_invoke_callback+0x198/0x5d0 [c000000ff38ffd00] c00000000012782c cpuhp_thread_fun+0x8c/0x3d0 [c000000ff38ffd60] c0000000001678d0 smpboot_thread_fn+0x290/0x2a0 [c000000ff38ffdc0] c00000000015ee78 kthread+0x168/0x1b0 [c000000ff38ffe30] c00000000000b368 ret_from_kernel_thread+0x5c/0x74 >> [c000000ff38ffb80] c0000000002ddfac perf_pmu_migrate_context+0xac/0x470 >> [c000000ff38ffc40] c00000000011385c ppc_core_imc_cpu_offline+0x1ac/0x1e0 >> [c000000ff38ffc90] c000000000125758 cpuhp_invoke_callback+0x198/0x5d0 >> [c000000ff38ffd00] c00000000012782c cpuhp_thread_fun+0x8c/0x3d0 >> [c000000ff38ffd60] c0000000001678d0 smpboot_thread_fn+0x290/0x2a0 >> [c000000ff38ffdc0] c00000000015ee78 kthread+0x168/0x1b0 >> [c000000ff38ffe30] c00000000000b368 ret_from_kernel_thread+0x5c/0x74 >> >> While registering the cpuhoplug callbacks for core-imc, if we fails >> in the cpuhotplug online path for any random core (either because opal call to >> initialize the core-imc counters fails or because memory allocation fails for >> that core), ppc_core_imc_cpu_offline() will get invoked for other cpus who >> successfully returned from cpuhotplug online path. >> >> But in the ppc_core_imc_cpu_offline() path we are trying to migrate the event >> context, when core-imc counters are not even initialized. Thus creating the >> above stack dump. >> >> Add a check to see if core-imc counters are enabled or not in the cpuhotplug >> offline path before migrating the context to handle this failing scenario. > Why do we need a bool to track this? Can't we just check the data > structure we're deinitialising has been initialised? > > Doesn't this also mean we won't cleanup the initialisation for any CPUs > that have been initialised? we do the cleanup in the failing case. Thanks for the review. Thanks, Anju > cheers > >> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c >> index 8812624..08139f9 100644 >> --- a/arch/powerpc/perf/imc-pmu.c >> +++ b/arch/powerpc/perf/imc-pmu.c >> @@ -30,6 +30,7 @@ static struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS]; >> static cpumask_t nest_imc_cpumask; >> struct imc_pmu_ref *nest_imc_refc; >> static int nest_pmus; >> +static bool core_imc_enabled; >> >> /* Core IMC data structures and variables */ >> >> @@ -607,6 +608,19 @@ static int ppc_core_imc_cpu_offline(unsigned int cpu) >> if (!cpumask_test_and_clear_cpu(cpu, &core_imc_cpumask)) >> return 0; >> >> + /* >> + * See if core imc counters are enabled or not. >> + * >> + * Suppose we reach here from core_imc_cpumask_init(), >> + * since we failed at the cpuhotplug online path for any random >> + * core (either because opal call to initialize the core-imc counters >> + * failed or because memory allocation failed). >> + * We need to check whether core imc counters are enabled or not before >> + * migrating the event context from cpus in the other cores. >> + */ >> + if (!core_imc_enabled) >> + return 0; >> + >> /* Find any online cpu in that core except the current "cpu" */ >> ncpu = cpumask_any_but(cpu_sibling_mask(cpu), cpu); >> >> @@ -1299,6 +1313,7 @@ int init_imc_pmu(struct device_node *parent, struct imc_pmu *pmu_ptr, int pmu_id >> return ret; >> } >> >> + core_imc_enabled = true; >> break; >> case IMC_DOMAIN_THREAD: >> ret = thread_imc_cpu_init(); >> -- >> 2.7.4 From 1582855389887885843@xxx Wed Nov 01 09:28:51 +0000 2017 X-GM-THRID: 1582766204567645963 X-Gmail-Labels: Inbox,Category Forums