Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp2892662pxf; Sun, 4 Apr 2021 19:11:12 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyXldvo7NZRhHQe1q1POU71wHiHNyIGXBip0Zt3zC2BSFSR/hZh/F8QhlBV1wReFLlnTUKA X-Received: by 2002:a17:906:1f93:: with SMTP id t19mr26688628ejr.443.1617588672588; Sun, 04 Apr 2021 19:11:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1617588672; cv=none; d=google.com; s=arc-20160816; b=LCuJeRwiJwrRI8LQaDfEuaoeEInv0d2l4dsJa3CYBiVrIDftvDTJQCOOs2mpekg5zU g6wJ4N+5+ChAz4tUtkRD3H/1cX7zmy23c/LpQvAsXOnHW8m6mFgRIRnudKfkDf/llNK3 Oflpz2H9jyHi0/hg9HFTFMLR8oHRRjSzqf09+diJzZv+5QcJpICDc+G8uvUCC5RimEQ2 oSnzspGOET6BN2UgFQOOZAVpYCTisb8l+V6uBEEzvoOpIdpayY5K23tD1S2FdzLodord nTNj4ZTmRbUNlUNihEY8zGgEvUMzlu64gPZtLOVIozNAy03HuuaEx9spvf82kprmvYdB GRkQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-language:content-transfer-encoding :in-reply-to:mime-version:user-agent:date:message-id:organization :references:cc:to:from:subject:dkim-signature; bh=YD512Im+MfzeFXa/2sskA5SKXgI3nkmLTxAtwF3W3+0=; b=y8dqDAAUREdHPzOl14aqQcLpPW9j8KIFF8/BIic3CaFXfD9mrIerHI4VU6cworRPUA GTLX5UVcszPvcYXhrsURZ1fiAZ57LmdPouWeqm4zYsWyCotxsTdmBNW4JJck/emJbIUe WcOayNlq5NRbtHZie8JVJk3LZ7ZJmWfV2MLERGzdzbpieKBLJ5LS+UFSKxVPa7meLFfw Q5zjSML959hCiQDWjHg+X/+aXCtSQpFeRb/+4UE+41FdVBkIvxXjs+hcFEIRNvAvQNGX Y3rkMhd5BAel1RAvl+gfoyKlnaU3HOeCg3QA4MENIT/1irkGUi2tJNaYL3m+0CaygbpZ NQEQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Hoyv6rx+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p14si11786187ejc.493.2021.04.04.19.10.34; Sun, 04 Apr 2021 19:11:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=Hoyv6rx+; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231867AbhDEBaS (ORCPT + 99 others); Sun, 4 Apr 2021 21:30:18 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:54040 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231887AbhDEBaD (ORCPT ); Sun, 4 Apr 2021 21:30:03 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1617586194; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YD512Im+MfzeFXa/2sskA5SKXgI3nkmLTxAtwF3W3+0=; b=Hoyv6rx+mwZwQKv+nVikHx1MxbMT6wZ6eeQej1X405ibNjzVBAh0JRsNoRt/Oq32Nz+KjI VQBDiqVGjnOZPlBGZsQCxHGS+13U3KBSQg4M+NUA6RECpl3MvHqTE5dT6CAGzteL2HYUM4 QWyKKnuPEwtHehBCVWPw8/8h9pQkR9s= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-225-c6p3Y2UHN-ugBOdMMTZ4pg-1; Sun, 04 Apr 2021 21:29:53 -0400 X-MC-Unique: c6p3Y2UHN-ugBOdMMTZ4pg-1 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 8EE7687A826; Mon, 5 Apr 2021 01:29:51 +0000 (UTC) Received: from llong.remote.csb (ovpn-113-6.rdu2.redhat.com [10.10.113.6]) by smtp.corp.redhat.com (Postfix) with ESMTP id 051276B8D4; Mon, 5 Apr 2021 01:29:43 +0000 (UTC) Subject: Re: [PATCH v3] sched/debug: Use sched_debug_lock to serialize use of cgroup_path[] only From: Waiman Long To: Steven Rostedt Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Bharata B Rao , Phil Auld , Daniel Thompson , linux-kernel@vger.kernel.org References: <20210401181030.7689-1-longman@redhat.com> <20210402164014.53c84f05@gandalf.local.home> <20210404120231.13843854@oasis.local.home> <4014fe97-5875-f64a-7b68-854a2b08394e@redhat.com> Organization: Red Hat Message-ID: <03f8cfd4-bb73-a7e5-83f8-7b0731071ae8@redhat.com> Date: Sun, 4 Apr 2021 21:29:43 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <4014fe97-5875-f64a-7b68-854a2b08394e@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 4/4/21 9:27 PM, Waiman Long wrote: > On 4/4/21 12:02 PM, Steven Rostedt wrote: >> On Fri, 2 Apr 2021 23:09:09 -0400 >> Waiman Long wrote: >> >>> The main problem with sched_debug_lock is that under certain >>> circumstances, a lock waiter may wait a long time to acquire the lock >>> (in seconds). We can't insert touch_nmi_watchdog() while the cpu is >>> waiting for the spinlock. >> The problem I have with the patch is that it seems to be a hack (as it >> doesn't fix the issue in all cases). Since sched_debug_lock is >> "special", perhaps we can add wrappers to take it, and instead of doing >> the spin_lock_irqsave(), do a trylock loop. Add lockdep annotation to >> tell lockdep that this is not a try lock (so that it can still detect >> deadlocks). >> >> Then have the strategically placed touch_nmi_watchdog() also increment >> a counter. Then in that trylock loop, if it sees the counter get >> incremented, it knows that forward progress is being made by the lock >> holder, and it too can call touch_nmi_watchdog(). > > Thanks for the suggestion, but it also sound complicated. > > I think we can fix this lockup problem if we are willing to lose some > information in case of contention. As you have suggested, a trylock > will be used to acquire sched_debug_lock. If succeeded, all is good. > Otherwise, a shorter stack buffer will be used for cgroup path. The > path may be truncated in this case. If we detect that the full length > of the buffer is used, we assume truncation and add, e.g. "...", to > indicate there is more to the actual path. > > Do you think this is an acceptable comprise? Actually, I don't really need to disable interrupt under this situation as deadlock can't happen. Cheers, Longman