Received: by 2002:a05:7412:5112:b0:fa:6e18:a558 with SMTP id fm18csp1685614rdb; Thu, 25 Jan 2024 03:12:38 -0800 (PST) X-Google-Smtp-Source: AGHT+IGN3202NA7q8gDzzZmBYdMbq55sRumzjKThPrnTo/c/4BqYh17ok3p2E2tQXGmXTh7UNqP2 X-Received: by 2002:a17:906:5299:b0:a31:7dc2:e9af with SMTP id c25-20020a170906529900b00a317dc2e9afmr435300ejm.55.1706181158176; Thu, 25 Jan 2024 03:12:38 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706181158; cv=pass; d=google.com; s=arc-20160816; b=zBm0F3MksKBKHACf5A69ijZbLz2Ez+5NtX9vfj5/pOhgiJ08EUtgMD+w/n8N5rkSKJ tFbdPqRdv6JrRMsJr4QpkHJproX0HXfDMDp8PUD/tZj1qUuZPxl6hISuw9Z/+Pp/v8Io OGO9o0ESOkHh5mLUztjguPat6n01V+NSc4vZBFFpmg7YSrpdcNN26A//HYXH5AbNmTQH kkCH5HPf6wI8iKE/nJcDAaILmHApiDTFQsqr3K//8UOKpVXzwQYcoKFdZc46CMd7PgkY tWZFqgbu07aOlJvdc8Cnk2pSYz8J8AOc/0Xa3uOE8iGFQgYxQt49xgQsLvgc2T3iVIxy anCA== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:references:in-reply-to:message-id :date:subject:cc:to:from; bh=jMTT6vmI3pTtO/6N7cUtqZI8FhFjyM0WJQQ+kvAV50s=; fh=4d3gfJQADFoOWMsp4KPr+Ui7HnWAi8QY3Ln5MlzJ5us=; b=IW0bWukJe9dv0xzKzii+ojh2ttgeYqrrOnuJ5WE93UIshUXYnMFk8w8Ev580uHBgOa gYbutD6cd/NrcUHShX4M8EYW107X3GDCBGkny+gwz9530UoH+B/NfS/7fqJRzSmVgnO2 Y88itallgVdYeM9rIpGRxQAtLB84hjsaaY/hURGO4nnJT3Ise9NabyiVUW7UlJENR9oj +4khHA3BOO9kjOuq4l1tksMaNz53Lm0buWFv+7Tssn0AnUH1AeSyDa8md/AyeTIvTcZB SzC6kaXziIOF8W49EEngEY7/3OJK/i2s2IKbiT7lCWw3ir1ifHVYJ+RuZd1NuFeU0Tx7 N8Cg== ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=sina.com); spf=pass (google.com: domain of linux-kernel+bounces-38474-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-38474-linux.lists.archive=gmail.com@vger.kernel.org" Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id h9-20020a170906260900b00a316dd5b5f8si593897ejc.903.2024.01.25.03.12.38 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 25 Jan 2024 03:12:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-38474-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=sina.com); spf=pass (google.com: domain of linux-kernel+bounces-38474-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-38474-linux.lists.archive=gmail.com@vger.kernel.org" Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id DE20C1F2861D for ; Thu, 25 Jan 2024 11:12:23 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 012C65C5ED; Thu, 25 Jan 2024 11:05:23 +0000 (UTC) Received: from mail115-76.sinamail.sina.com.cn (mail115-76.sinamail.sina.com.cn [218.30.115.76]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 538505BAF0 for ; Thu, 25 Jan 2024 11:05:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=218.30.115.76 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706180722; cv=none; b=CTKgJHxszRYe5UTjMub6Sb5gzJKTLITgX+CXRLJBaEWUaz4sr5UO4LBkdnYFNL/WVceyPGdSKcIxszBxoRirXlPsQxESOC1ZpS/3+62idxj2Iok1mSi/JkmcktwLr4/IInoAalshW6VXOYLIfDbDQxjFu7nsW4XirsquYQmNI2Q= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706180722; c=relaxed/simple; bh=dkocADbOYvTMw/1B14GzZTy0MDUDsy+hKS032mNu1oY=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=nhoscBlI9IK4BbQsjPf63k9/sdG8eqZVoaoM5u5B5eBCW/Gezkz7M3/aJLVzMK729juTCQi5D+kFbXkhZHONaU+AW2RVe8SY9/DVyQC0swRtmMzUkGk2B82fidRZ0aDVWTOAZkyL8g9AbK875E8w2vM++4JEUNTFD0FA3TWKIDY= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com; spf=pass smtp.mailfrom=sina.com; arc=none smtp.client-ip=218.30.115.76 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=sina.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=sina.com X-SMAIL-HELO: localhost.localdomain Received: from unknown (HELO localhost.localdomain)([113.118.66.48]) by sina.com (10.75.12.45) with ESMTP id 65B2406000006131; Thu, 25 Jan 2024 19:05:07 +0800 (CST) X-Sender: hdanton@sina.com X-Auth-ID: hdanton@sina.com Authentication-Results: sina.com; spf=none smtp.mailfrom=hdanton@sina.com; dkim=none header.i=none; dmarc=none action=none header.from=hdanton@sina.com X-SMAIL-MID: 90902331457749 X-SMAIL-UIID: C48E56203AD94D7ABC5CFEFD23E8F7E5-20240125-190507-1 From: Hillf Danton To: Benjamin Segall Cc: Peter Zijlstra , Ingo Molnar , Will Deacon , Waiman Long , Boqun Feng , linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] locking/percpu-rwsem: do not do lock handoff in percpu_up_write Date: Thu, 25 Jan 2024 19:04:56 +0800 Message-Id: <20240125110456.783-1-hdanton@sina.com> In-Reply-To: References: <20240123150541.1508-1-hdanton@sina.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit On Wed, 24 Jan 2024 14:10:43 -0800 Benjamin Segall > Hillf Danton writes: > > On Mon, 22 Jan 2024 14:59:14 -0800 Benjamin Segall > >> So the actual problem we saw was that one job had severe slowdowns > >> during startup with certain other jobs on the machine, and the slowdowns > >> turned out to be some cgroup moves it did during startup. The antagonist > >> jobs were spawning huge numbers of threads and some other internal bugs > >> were exacerbating their contention. The lock handoff meant that a batch > >> of antagonist threads would receive the read lock of > >> cgroup_threadgroup_rwsem and at least some of those threads would take a > >> long time to be scheduled. > > > > If you want to avoid starved lock waiter, take a look at RWSEM_FLAG_HANDOFF > > in rwsem_down_read_slowpath(). > > rwsem's HANDOFF flag is the exact opposite of what this patch is doing. You and I are not on the same page. > Percpu-rwsem's current code has perfect handoff for read->write, and a very > short window for write->read (or write->write) to be beaten by a new writer. Given no chance left for spin on owner who is legal to take a ten-minute nap, the right thing known to do on behalf of starved waiters is to add the HANDOFF mechanism without any heuristic like you proposed for instance, in order to force lock acquirers to go the slow path. Only for thoughts. --- x/kernel/locking/percpu-rwsem.c +++ y/kernel/locking/percpu-rwsem.c @@ -22,6 +22,7 @@ int __percpu_init_rwsem(struct percpu_rw rcuwait_init(&sem->writer); init_waitqueue_head(&sem->waiters); atomic_set(&sem->block, 0); + atomic_set(&sem->ww, 0); /* write waiters */ #ifdef CONFIG_DEBUG_LOCK_ALLOC debug_check_no_locks_freed((void *)sem, sizeof(*sem)); lockdep_init_map(&sem->dep_map, name, key, 0); @@ -135,6 +136,9 @@ static int percpu_rwsem_wake_function(st wake_up_process(p); put_task_struct(p); + if (!reader) + atomic_dec(&sem->ww); + return !reader; /* wake (readers until) 1 writer */ } @@ -148,8 +152,10 @@ static void percpu_rwsem_wait(struct per * Serialize against the wakeup in percpu_up_write(), if we fail * the trylock, the wakeup must see us on the list. */ - wait = !__percpu_rwsem_trylock(sem, reader); + wait = atomic_read(&sem->ww) || !__percpu_rwsem_trylock(sem, reader); if (wait) { + if (!reader) + atomic_inc(&sem->ww); wq_entry.flags |= WQ_FLAG_EXCLUSIVE | reader * WQ_FLAG_CUSTOM; __add_wait_queue_entry_tail(&sem->waiters, &wq_entry); } @@ -166,7 +172,7 @@ static void percpu_rwsem_wait(struct per bool __sched __percpu_down_read(struct percpu_rw_semaphore *sem, bool try) { - if (__percpu_down_read_trylock(sem)) + if (!atomic_read(&sem->ww) && __percpu_down_read_trylock(sem)) return true; if (try) @@ -234,7 +240,7 @@ void __sched percpu_down_write(struct pe * Try set sem->block; this provides writer-writer exclusion. * Having sem->block set makes new readers block. */ - if (!__percpu_down_write_trylock(sem)) + if (atomic_read(&sem->ww) || !__percpu_down_read_trylock(sem)) percpu_rwsem_wait(sem, /* .reader = */ false); /* smp_mb() implied by __percpu_down_write_trylock() on success -- D matches A */