Received: by 2002:a25:7ec1:0:0:0:0:0 with SMTP id z184csp744215ybc; Tue, 19 Nov 2019 08:33:58 -0800 (PST) X-Google-Smtp-Source: APXvYqwn1bde2nsvk3hKQQxxJEg+jU9o5TMXokuqf8QT+ihM7XGDIpS+kWrl3WYAOGB7oA7LXvBL X-Received: by 2002:a17:906:a396:: with SMTP id k22mr7377532ejz.66.1574181238167; Tue, 19 Nov 2019 08:33:58 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1574181238; cv=none; d=google.com; s=arc-20160816; b=c8MxP5aIB8lkst8ZF8t4bcu3mig7OcbWTU8f1RPXXH4IjQ667O9IKlo/Hx/nrzY9yl yCmJGCULRn8T/6/ukPPJCeuqMYnTOyBKJeSAEVpXqEjB7Oez3V3flEjdEubvOUUDG/F9 fbF2gJZYdFn8JsI2y15yr+uuFDLIgS9t887yt3QpSnPM2vJEO/HXwyzmgf8kAvNcKTK9 tqo6hw3h0CGxK/xg/JUqTTdIWZMPfT4aG48H362SHdO9CEoinz+OffTJ7B8si36qGmge FSbz4vsEiurlff+IpbiJUU92GdVp93Ukqb8jI1ObSB2BvJt9UpZieN4Y0Ko400Pirq39 pR7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:organization:from:references:cc:to:subject :dkim-signature; bh=tIK0+i5q+Lj6zrxj0R2LIAra704J9qSy97urLw0B4Lg=; b=CWdqY88/xQkyGELOseS9aE5ITRFNrFTN2almpk7FjltUDxDcEzoRVYTRx0YAI2sREo SBM+CMprYhV8WwWpsSYj0nq3wFsHq0yb6CPWo50+SAdyMC7MsruOnYj9DSHhu98yj+PM GMg8UXab/IynzaoiGPJQ/OUYCCoL7UUZlTvruWD/i1FAgPw0gc07L02xGfJpXG582mGW ++FRs5+4BhsvbIIMKJJ4nVyQotg+nSLKrmD/9krjD+4O3/+LaD1lJNzErncwIGBT9Kf8 m3CNKtfdjqlKircv3RRV69ndwczey/57+8L3nQF1eJutA4E0ZUBlHp/J7LeMdn4zpBv5 z+1g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=i8EkUU8r; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id w21si13883344eja.324.2019.11.19.08.33.27; Tue, 19 Nov 2019 08:33:58 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=i8EkUU8r; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728445AbfKSQ3G (ORCPT + 99 others); Tue, 19 Nov 2019 11:29:06 -0500 Received: from us-smtp-1.mimecast.com ([207.211.31.81]:59196 "EHLO us-smtp-delivery-1.mimecast.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1728212AbfKSQ3F (ORCPT ); Tue, 19 Nov 2019 11:29:05 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1574180945; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=tIK0+i5q+Lj6zrxj0R2LIAra704J9qSy97urLw0B4Lg=; b=i8EkUU8rpLJIV3cD1F9+NZrkwxNV8uLr+pNeGvFiob6Fy9RA36O4MviDc+wluweV/tSVVK YDy7esMp70hDc65HR5l/wY6SCIIiQamGvgwZTflwvq8l8/WcCMy0kxvwoTb9lC1ITra4yP z7ANTngIiyLVN/tZVEV5qmFsKvEpMS0= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-401-frUyA5U1NeOhTQjcufm6dg-1; Tue, 19 Nov 2019 11:29:03 -0500 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 51D13107ACC5; Tue, 19 Nov 2019 16:29:00 +0000 (UTC) Received: from llong.remote.csb (dhcp-17-59.bos.redhat.com [10.18.17.59]) by smtp.corp.redhat.com (Postfix) with ESMTP id E821B1001925; Tue, 19 Nov 2019 16:28:50 +0000 (UTC) Subject: Re: [PATCH 5/5] locking/percpu-rwsem: Remove the embedded rwsem To: Oleg Nesterov Cc: Peter Zijlstra , mingo@kernel.org, will@kernel.org, tglx@linutronix.de, linux-kernel@vger.kernel.org, bigeasy@linutronix.de, juri.lelli@redhat.com, williams@redhat.com, bristot@redhat.com, dave@stgolabs.net, jack@suse.com References: <20191113102115.116470462@infradead.org> <20191113102855.925208237@infradead.org> <20191119155826.GA4739@redhat.com> From: Waiman Long Organization: Red Hat Message-ID: Date: Tue, 19 Nov 2019 11:28:50 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.2 MIME-Version: 1.0 In-Reply-To: <20191119155826.GA4739@redhat.com> Content-Language: en-US X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 X-MC-Unique: frUyA5U1NeOhTQjcufm6dg-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 11/19/19 10:58 AM, Oleg Nesterov wrote: > On 11/19, Waiman Long wrote: >> On 11/13/19 5:21 AM, Peter Zijlstra wrote: >>> +static int percpu_rwsem_wake_function(struct wait_queue_entry *wq_entr= y, >>> +=09=09=09=09 unsigned int mode, int wake_flags, >>> +=09=09=09=09 void *key) >>> +{ >>> +=09struct task_struct *p =3D get_task_struct(wq_entry->private); >>> +=09bool reader =3D wq_entry->flags & WQ_FLAG_CUSTOM; >>> +=09struct percpu_rw_semaphore *sem =3D key; >>> + >>> +=09/* concurrent against percpu_down_write(), can get stolen */ >>> +=09if (!__percpu_rwsem_trylock(sem, reader)) >>> +=09=09return 1; >>> + >>> +=09list_del_init(&wq_entry->entry); >>> +=09smp_store_release(&wq_entry->private, NULL); >>> + >>> +=09wake_up_process(p); >>> +=09put_task_struct(p); >>> + >>> +=09return !reader; /* wake 'all' readers and 1 writer */ >>> +} >>> + >> If I read the function correctly, you are setting the WQ_FLAG_EXCLUSIVE >> for both readers and writers and __wake_up() is called with an exclusive >> count of one. So only one reader or writer is woken up each time. > This depends on what percpu_rwsem_wake_function() returns. If it returns = 1, > __wake_up_common() stops, exactly because all waiters have WQ_FLAG_EXCLUS= IVE. > >> However, the comment above said we wake 'all' readers and 1 writer. That >> doesn't match the actual code, IMO. > Well, "'all' readers" probably means "all readers before writer", > >> To match the comments, you should >> have set WQ_FLAG_EXCLUSIVE flag only on writer. In this case, you >> probably don't need WQ_FLAG_CUSTOM to differentiate between readers and >> writers. > See above... > > note also the > > =09if (!__percpu_rwsem_trylock(sem, reader)) > =09=09return 1; > > at the start of percpu_rwsem_wake_function(). We want to stop wake_up_com= mon() > as soon as percpu_rwsem_trylock() fails. Because we know that if it fails= once > it can't succeed later. Although iiuc this can only happen if another (ne= w) > writer races with __wake_up(&sem->waiters). > > > I guess WQ_FLAG_CUSTOM can be avoided, percpu_rwsem_wait() could do > > =09if (read) > =09=09__add_wait_queue_entry_tail(...); > =09else { > =09=09wq_entry.flags |=3D WQ_FLAG_EXCLUSIVE; > =09=09__add_wait_queue(...); > =09} > > but this is "unfair". Thanks for the explanation. That clarifies my understanding of the patch. Cheers, Longman