Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp1919603pxk; Tue, 1 Sep 2020 10:51:01 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwIFC5ePfRQgJ+ohGsZ/eTSPGB+pkIha7jDGIjTc7dOSlVnw88TJJOcsXFojF9mzKW7lCYt X-Received: by 2002:a17:906:a58:: with SMTP id x24mr2829521ejf.500.1598982661639; Tue, 01 Sep 2020 10:51:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1598982661; cv=none; d=google.com; s=arc-20160816; b=lz1p2ZLFgZlsLgvDjbgHfKhV2UbIiK1vadZrRVLchSD67YkQIamkBZ816JBql878ID o1dPCvrUoK9a2EB+M6B0IHkwegryQ5g6vkmc4wu+HgQDd0GoVN2hgSCST2PbojujnJKo NG3Fdp2nsbLm85t50kK9Lud29v+2UJcvoiiPXn/pYI7QdtqOvSFEpyLuflPihrWnXxvH h3mRfi5kvcE9ldGte84M8KgKahz8LnaHcu98WEj3repkWMRbKtvNp32V7+7U0nxMiK/P HZrQB7tJ5uha6o9VTl7CWLgXH/4uuI93yBtqm3iHTj7YvGK//K1LkK4webF2k8vvH9Sg 9R4A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:reply-to:message-id :subject:cc:to:from:date:dkim-signature; bh=JMNERpGukLw/U1d4QgTWlcs4tBYXdp8I6rJ5X9lxO60=; b=lgxHkOfiCvccrugIID3Gnq1dS9zLBP1rzVoIitlWOLOKbT40xB4Zai9dsC+LzDn7kn W4II1RQ9e7q0zcL8C56dIAo3jB5rAPYr7sglpysQEo2fq3UrJF0PLnE29onLYSmRcG19 MBRkV8QbwaaLdNCySm867oqeK2Wv+WhYpMwGAWo0DNm4pLL0VKgEcuOyoL/3iILH+VEI EvBjDThEuXm8w2djQAfg1LZEo7BRLDL5vULlOhMrC96OiXpBJhjVS1GUpz8Atc1BhNwj 2PI+3PIoouxb0m/svek09rtBYm0T4IJU4dy1xmgehj7DJQwbJq9VG3FSlYOA6tHSnV29 aFfg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=CPjMH1sp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id e22si1081416ejx.604.2020.09.01.10.50.38; Tue, 01 Sep 2020 10:51:01 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=default header.b=CPjMH1sp; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726102AbgIARtk (ORCPT + 99 others); Tue, 1 Sep 2020 13:49:40 -0400 Received: from mail.kernel.org ([198.145.29.99]:47384 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728079AbgIARtj (ORCPT ); Tue, 1 Sep 2020 13:49:39 -0400 Received: from paulmck-ThinkPad-P72.home (unknown [50.45.173.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id CF45F207D3; Tue, 1 Sep 2020 17:49:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1598982578; bh=EQLNAMxkmTDyb5sNsTlYm1ZYeU+slvP9jlY01JuWhxE=; h=Date:From:To:Cc:Subject:Reply-To:References:In-Reply-To:From; b=CPjMH1spEHC/EXgkSnX/sTmsb1QFWaxN618Ysa6Es7zmc7gHFLbijarh3EehqAsDa sm3iQsk0HTNdqAUwdNZx/w4X2MZ2EptSvFq0BKPPmTZJ4P1e1DmI/zPwP1fddox6+Y 6TMScjdZvKLsB6IAgkvpj/u5Idy7DAN0xmbMwJow= Received: by paulmck-ThinkPad-P72.home (Postfix, from userid 1000) id AF1F73523091; Tue, 1 Sep 2020 10:49:38 -0700 (PDT) Date: Tue, 1 Sep 2020 10:49:38 -0700 From: "Paul E. McKenney" To: peterz@infradead.org, mingo@redhat.com, will@kernel.org Cc: linux-kernel@vger.kernel.org Subject: Re: Question on task_blocks_on_rt_mutex() Message-ID: <20200901174938.GA8158@paulmck-ThinkPad-P72> Reply-To: paulmck@kernel.org References: <20200831224911.GA13114@paulmck-ThinkPad-P72> <20200831232130.GA28456@paulmck-ThinkPad-P72> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200831232130.GA28456@paulmck-ThinkPad-P72> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Aug 31, 2020 at 04:21:30PM -0700, Paul E. McKenney wrote: > On Mon, Aug 31, 2020 at 03:49:11PM -0700, Paul E. McKenney wrote: > > Hello! > > > > The task_blocks_on_rt_mutex() function uses rt_mutex_owner() to > > take a snapshot of the lock owner right up front. At this point, > > the ->wait_lock is held, which at first glance prevents the owner > > from leaving. Except that if there are not yet any waiters (that is, > > the low-order bit of ->owner is zero), rt_mutex_fastunlock() might > > locklessly clear the ->owner field. And in that case, it looks like > > task_blocks_on_rt_mutex() will blithely continue using the ex-owner's > > task_struct structure, without anything that I can see that prevents > > the ex-owner from exiting. > > > > What am I missing here? > > One thing I missed is that the low-order bit of ->owner would already > be set by this point. > > > The reason that I am looking into this is that locktorture scenario LOCK05 > > hangs, and does so leaving the torture_rtmutex.waiters field equal to 0x1. > > This is of course a legal transitional state, but I would not expect it > > to persist for more than three minutes. Yet it often does. > > > > This leads me to believe that there is a way for an unlock to fail to wake > > up a task concurrently acquiring the lock. This seems to be repaired > > by later lock acquisitions, and in fact setting the locktorture.stutter > > module parameter to zero avoids the hang. Except that I first found the > > above apparently unprotected access to what was recently the owner task. > > > > Thoughts? > > Some breakage elsewhere, presumably... And it might be a lost wakeup, given that ->locktorture_wake_up is equal to 1 for one of the locktorture writer tasks given the patch below. Yes, this is a virtual environment. Except that none of the other locktorture scenarios make this happen, nor the rcutorture scenario, nor the scftorture scenarios. Maybe just the wrong timing? Who knows, looking further. Thanx, Paul ------------------------------------------------------------------------ diff --git a/include/linux/sched.h b/include/linux/sched.h index 93ecd93..054b6f8 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -720,6 +720,7 @@ struct task_struct { struct list_head rcu_node_entry; struct rcu_node *rcu_blocked_node; #endif /* #ifdef CONFIG_PREEMPT_RCU */ + u8 locktorture_wake_up; #ifdef CONFIG_TASKS_RCU unsigned long rcu_tasks_nvcsw; diff --git a/kernel/locking/locktorture.c b/kernel/locking/locktorture.c index 316531d..7f183be 100644 --- a/kernel/locking/locktorture.c +++ b/kernel/locking/locktorture.c @@ -617,6 +617,7 @@ static int lock_torture_writer(void *arg) VERBOSE_TOROUT_STRING("lock_torture_writer task started"); set_user_nice(current, MAX_NICE); + current->locktorture_wake_up = 0; do { if ((torture_random(&rand) & 0xfffff) == 0) @@ -635,6 +636,7 @@ static int lock_torture_writer(void *arg) lock_is_write_held = false; WRITE_ONCE(last_lock_release, jiffies); cxt.cur_ops->writeunlock(); + current->locktorture_wake_up = 0; stutter_wait("lock_torture_writer"); } while (!torture_must_stop()); diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index cfdd5b9..4a0b046 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1053,6 +1053,7 @@ static void mark_wakeup_next_waiter(struct wake_q_head *wake_q, */ preempt_disable(); wake_q_add(wake_q, waiter->task); + waiter->task->locktorture_wake_up = 1; raw_spin_unlock(¤t->pi_lock); }