Received: by 2002:ad5:4acb:0:0:0:0:0 with SMTP id n11csp2020233imw; Tue, 5 Jul 2022 21:13:29 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vb5aL8WMoAcRvrUlKHGIr50GjeGImVvcxyQV9fXZJfl7xUO0xdYoB5aj382Aa5zQl2kg5Q X-Received: by 2002:a17:906:a219:b0:6e4:86a3:44ea with SMTP id r25-20020a170906a21900b006e486a344eamr37990909ejy.385.1657080808836; Tue, 05 Jul 2022 21:13:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1657080808; cv=none; d=google.com; s=arc-20160816; b=RSEBCC5vg+GUVxL/RjHWFhStw/e4EjdEaq42Cpjcir/1g3i2vQh/6mRjwgQhgK7nrw p7hxq7+NiUqUczV5ahU8PGZtx/IRVrBXQ6Gvmta7e9DBxr8RWXRsJzs9Uj50rt4zouLt reWNTvmF/0qns1GdtuSlAk2Xl7CNWSJigfja+PyYKgeLXp3+izIAQTZ/BAyprPAzDe4m SHXLf44EAPCdqRTB7TA9Ddnai7583O32lXgCiQiVBH/bVyPFio5Mko8MvHQ8iDl21I4C guuh59n3BNC76LTcTGdjaGXPNRyVMZDEmf25YGi2oXWHPt7pVB+S5aOD8NwmEKeYFT2H 4Phg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:from:references :cc:to:subject:dkim-signature; bh=FtAQ47kRhcsSDlE6KTDwkolcwd+jvA+etT4YjBKsmM0=; b=UdZhyyBMv+DC83yhVsmQRvNn81TYkjcitZ3/KEbUxStWSBW6PuoJrfFipyL1h3TYOf 7TymtpB9SQyRI2b5nQl5PCmJH+AuI6USpxJSlsnKByieBSKUEDKm744holAgpxc2CSAh NY+3sz4OEmtB7Wd40S+5RxQwS2mRlHwGndvWyrRl7BznBUQcH4uGlBYUumC2otiSLZdG k8VPhYpYk4Yvba4tyrSZ9BqNISiSLa+BwIejDiP72H6Xyk0g0IUk3+wXKzGTip9Vlrr7 j49WBch+74DpnShNZn9SumdrPRFnaf3kwIydGGKV8WcnIzKVQ25EpV6RH/Ud7K13UPwb 6hjw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcdkim header.b=i52ws6+a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id yz12-20020a170906dc4c00b0072aed3b27f5si2282513ejb.729.2022.07.05.21.13.03; Tue, 05 Jul 2022 21:13:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@quicinc.com header.s=qcdkim header.b=i52ws6+a; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=quicinc.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231130AbiGFDuS (ORCPT + 99 others); Tue, 5 Jul 2022 23:50:18 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49658 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231801AbiGFDty (ORCPT ); Tue, 5 Jul 2022 23:49:54 -0400 Received: from alexa-out-sd-02.qualcomm.com (alexa-out-sd-02.qualcomm.com [199.106.114.39]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6C502205DB for ; Tue, 5 Jul 2022 20:48:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=quicinc.com; i=@quicinc.com; q=dns/txt; s=qcdkim; t=1657079288; x=1688615288; h=subject:to:cc:references:from:message-id:date: mime-version:in-reply-to:content-transfer-encoding; bh=FtAQ47kRhcsSDlE6KTDwkolcwd+jvA+etT4YjBKsmM0=; b=i52ws6+aDrB9+XkNaAFnhw3IruZXZnF7auoqvUgUiyOdfb4b9fOLY6Pm wjowZJyz7GpwHNyNVu9KMLv8oh1/GA4gY1j4zWK3oykMiW9yqqy6W4Q+Q 4GHcKUL3jhuJeG6Gjwr15zfeR/dChDQaY8QSIb3UAJM1K3TwjfHz77krC U=; Received: from unknown (HELO ironmsg01-sd.qualcomm.com) ([10.53.140.141]) by alexa-out-sd-02.qualcomm.com with ESMTP; 05 Jul 2022 20:48:08 -0700 X-QCInternal: smtphost Received: from nasanex01c.na.qualcomm.com ([10.47.97.222]) by ironmsg01-sd.qualcomm.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jul 2022 20:48:07 -0700 Received: from nalasex01a.na.qualcomm.com (10.47.209.196) by nasanex01c.na.qualcomm.com (10.47.97.222) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Tue, 5 Jul 2022 20:47:33 -0700 Received: from [10.47.234.156] (10.49.16.6) by nalasex01a.na.qualcomm.com (10.47.209.196) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Tue, 5 Jul 2022 20:47:33 -0700 Subject: Re: [PATCH] sched: fix rq lock recursion issue To: Steven Rostedt CC: , , , , , , , , , References: <20220624074240.13108-1-quic_satyap@quicinc.com> From: Satya Durga Srinivasu Prabhala Message-ID: <037be4d3-0474-ba9f-fd0f-4bd9af3e835d@quicinc.com> Date: Tue, 5 Jul 2022 20:47:33 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Originating-IP: [10.49.16.6] X-ClientProxiedBy: nalasex01b.na.qualcomm.com (10.47.209.197) To nalasex01a.na.qualcomm.com (10.47.209.196) X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 6/30/22 3:37 PM, Steven Rostedt wrote: > On Fri, Jun 24, 2022 at 12:42:40AM -0700, Satya Durga Srinivasu Prabhala wrote: >> Below recursion is observed in a rare scenario where __schedule() >> takes rq lock, at around same time task's affinity is being changed, >> bpf function for tracing sched_switch calls migrate_enabled(), >> checks for affinity change (cpus_ptr != cpus_mask) lands into >> __set_cpus_allowed_ptr which tries acquire rq lock and causing the >> recursion bug. >> >> Fix the issue by switching to preempt_enable/disable() for non-RT >> Kernels. >> >> -010 |spin_bug(lock = ???, msg = ???) >> -011 |debug_spin_lock_before(inline) >> -011 |do_raw_spin_lock(lock = 0xFFFFFF89323BB600) >> -012 |_raw_spin_lock(inline) >> -012 |raw_spin_rq_lock_nested(inline) >> -012 |raw_spin_rq_lock(inline) >> -012 |task_rq_lock(p = 0xFFFFFF88CFF1DA00, rf = 0xFFFFFFC03707BBE8) >> -013 |__set_cpus_allowed_ptr(inline) >> -013 |migrate_enable() >> -014 |trace_call_bpf(call = ?, ctx = 0xFFFFFFFDEF954600) >> -015 |perf_trace_run_bpf_submit(inline) >> -015 |perf_trace_sched_switch(__data = 0xFFFFFFE82CF0BCB8, preempt = FALSE, prev = ?, next = ?) >> -016 |__traceiter_sched_switch(inline) >> -016 |trace_sched_switch(inline) > trace_sched_switch() disables preemption. > > So how is this a fix? Thanks for your time and comments. I was more looking at non-RT Kernel where switching to preempt_disable/enable() helps as it's just increment/decrement of count. I agree, this isn't a right fix. I'm still cross checking on easy way to repro the issue. Will cross check further and get back. > > -- Steve > >> -016 |__schedule(sched_mode = ?) >> -017 |schedule() >> -018 |arch_local_save_flags(inline) >> -018 |arch_irqs_disabled(inline) >> -018 |__raw_spin_lock_irq(inline) >> -018 |_raw_spin_lock_irq(inline) >> -018 |worker_thread(__worker = 0xFFFFFF88CE251300) >> -019 |kthread(_create = 0xFFFFFF88730A5A80) >> -020 |ret_from_fork(asm) >> >> Signed-off-by: Satya Durga Srinivasu Prabhala >> --- >> kernel/sched/core.c | 8 ++++++++ >> 1 file changed, 8 insertions(+) >> >> diff --git a/kernel/sched/core.c b/kernel/sched/core.c >> index bfa7452ca92e..e254e9227341 100644 >> --- a/kernel/sched/core.c >> +++ b/kernel/sched/core.c >> @@ -2223,6 +2223,7 @@ static void migrate_disable_switch(struct rq *rq, struct task_struct *p) >> >> void migrate_disable(void) >> { >> +#ifdef CONFIG_PREEMPT_RT >> struct task_struct *p = current; >> >> if (p->migration_disabled) { >> @@ -2234,11 +2235,15 @@ void migrate_disable(void) >> this_rq()->nr_pinned++; >> p->migration_disabled = 1; >> preempt_enable(); >> +#else >> + preempt_disable(); >> +#endif >> } >> EXPORT_SYMBOL_GPL(migrate_disable); >> >> void migrate_enable(void) >> { >> +#ifdef CONFIG_PREEMPT_RT >> struct task_struct *p = current; >> >> if (p->migration_disabled > 1) { >> @@ -2265,6 +2270,9 @@ void migrate_enable(void) >> p->migration_disabled = 0; >> this_rq()->nr_pinned--; >> preempt_enable(); >> +#else >> + preempt_enable(); >> +#endif >> } >> EXPORT_SYMBOL_GPL(migrate_enable); >> >> -- >> 2.36.1