Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp846720img; Wed, 20 Mar 2019 12:08:05 -0700 (PDT) X-Google-Smtp-Source: APXvYqz3TCUCBTE3MMhCJxSY/C6f5nxNE1jr9jVh7T2qu3CR8592AYa+/TgwzjSnnWThQ5WQawQy X-Received: by 2002:a63:9752:: with SMTP id d18mr9045677pgo.0.1553108885841; Wed, 20 Mar 2019 12:08:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553108885; cv=none; d=google.com; s=arc-20160816; b=0nM5qIqFSpHHDwzVAh5t/ISaA67calvWIlg2gFEQRyi2vfYmeLDFuYyv0jQtTKuTh2 ecRY2palQrdc+8aIBmE1KWHyCwyRDcPPxvULCjotegDkqIcufyI+aBKG+f5rkQqfnRvs WbgyRYVKC91tdwjjQjdo7M16UIVA2An0cL/Pr92k0U5xjU2sNyhwElW6v7Qg3Q/DuBmD /XniRrNTo74kCr2abNyxTBQ+Kds63UPzGNqZQrjAtaJ8gJQgkmke6vskx3XN1Y8TVdo/ bC+UQ7Trj+JBEMDJiDaHHOL0Pt724LO3r3UUkLvINu+ObVwTkTBL81F3DTy6TFulkSj0 GCAQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=ZNJHCEZFUbAgeYmK5i7JoKG8+W7HRyCQ5tL7OVd5ZHI=; b=Q8OP0KAArNoJYeFPAXHdy0g6eHHHtcg8xdn9ivpq+pNIgmmz5oo8nE4ErK4Tsr93tv BcRVAL51P8VsyonLW6Jy14DynRQImFqkbQ284uPpO9/ccqf3tukGYIUXlTQ3QvTUrJqk LhPE5eguuCuDjW+ccDVUNfpnCAbdubWuFggAR/mD3jW5YsbAyqMj+aZNO7IxLIIzb1FG cr8LSqVzksSU6DS193i7DAOQ0gyv0W1a1XxGEbz2drhD+5B/rULVt+RU4gBJabKIqnH2 u4A7WqFnwhf5UoxpfEDKYcCYLEydNv9+QR+Tt9uUqH2azPOXUU7+5Fwbp0ghoOYZRleJ GBFg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r59si2575480plb.257.2019.03.20.12.07.50; Wed, 20 Mar 2019 12:08:05 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727357AbfCTTGh (ORCPT + 99 others); Wed, 20 Mar 2019 15:06:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:12602 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726023AbfCTTGh (ORCPT ); Wed, 20 Mar 2019 15:06:37 -0400 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id DB61A3003BD6; Wed, 20 Mar 2019 19:06:36 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DEDEF6012C; Wed, 20 Mar 2019 19:06:30 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id x2KJ6UT5018077; Wed, 20 Mar 2019 15:06:30 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id x2KJ6Tjd018073; Wed, 20 Mar 2019 15:06:29 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Wed, 20 Mar 2019 15:06:29 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Nikos Tsironis cc: snitzer@redhat.com, agk@redhat.com, dm-devel@redhat.com, paulmck@linux.ibm.com, hch@infradead.org, iliastsi@arrikto.com, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3 0/6] dm snapshot: Improve performance using a more fine-grained locking scheme In-Reply-To: <20190317122258.21760-1-ntsironis@arrikto.com> Message-ID: References: <20190317122258.21760-1-ntsironis@arrikto.com> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Wed, 20 Mar 2019 19:06:37 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Acked-by: Mikulas Patocka On Sun, 17 Mar 2019, Nikos Tsironis wrote: > dm-snapshot uses a single mutex to serialize every access to the > snapshot state, including accesses to the exception hash tables. This > mutex is a bottleneck preventing dm-snapshot to scale as the number of > threads doing IO increases. > > The major contention points are __origin_write()/snapshot_map() and > pending_complete(), i.e., the submission and completion of pending > exceptions. > > This patchset substitutes the single mutex with: > > * A read-write semaphore, which protects the mostly read fields of the > snapshot structure. > > * Per-bucket bit spinlocks, that protect accesses to the exception > hash tables. > > fio benchmarks using the null_blk device show significant performance > improvements as the number of worker processes increases. Write latency > is almost halved and write IOPS are nearly doubled. > > The relevant patch provides detailed benchmark results. > > A summary of the patchset follows: > > 1. The first patch removes an unnecessary use of WRITE_ONCE() in > hlist_add_behind(). > > 2. The second patch adds two helper functions to linux/list_bl.h, > which is used to implement the per-bucket bit spinlocks in > dm-snapshot. > > 3. The third patch removes the need to sleep holding the snapshot lock > in pending_complete(), thus allowing us to replace the mutex with > the per-bucket bit spinlocks. > > 4. Patches 4, 5 and 6 change the locking scheme, as described > previously. > > Changes in v3: > - Don't use WRITE_ONCE() in hlist_bl_add_behind(), as it's not needed. > - Fix hlist_add_behind() to also not use WRITE_ONCE(). > - Use uintptr_t instead of unsigned long in hlist_bl_add_before(). > > v2: https://www.redhat.com/archives/dm-devel/2019-March/msg00007.html > > Changes in v2: > - Split third patch of v1 into three patches: 3/5, 4/5, 5/5. > > v1: https://www.redhat.com/archives/dm-devel/2018-December/msg00161.html > > Nikos Tsironis (6): > list: Don't use WRITE_ONCE() in hlist_add_behind() > list_bl: Add hlist_bl_add_before/behind helpers > dm snapshot: Don't sleep holding the snapshot lock > dm snapshot: Replace mutex with rw semaphore > dm snapshot: Make exception tables scalable > dm snapshot: Use fine-grained locking scheme > > drivers/md/dm-exception-store.h | 3 +- > drivers/md/dm-snap.c | 359 +++++++++++++++++++++++++++------------- > include/linux/list.h | 2 +- > include/linux/list_bl.h | 26 +++ > 4 files changed, 269 insertions(+), 121 deletions(-) > > -- > 2.11.0 >