Received: by 2002:ac0:950e:0:0:0:0:0 with SMTP id f14csp1142711imc; Sun, 17 Mar 2019 05:24:15 -0700 (PDT) X-Google-Smtp-Source: APXvYqzbccnlfcK6ISBjE+H3UWqP9tj2iWHcsfdaKdGFOi6GXT5GlgQ4C2GIRcCeDFqToxKlybyD X-Received: by 2002:a63:6a88:: with SMTP id f130mr12638974pgc.114.1552825454928; Sun, 17 Mar 2019 05:24:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1552825454; cv=none; d=google.com; s=arc-20160816; b=jvq8uHM6rtakfVR8VBRI1ziN3fAzErwjzF/IxPodOf7OprMTMHa7v985Gy7UJpECWz fK14ngm3/R1rYY3V81Bl8I+yynKXQwiEuxJqOVR1qddGYqoRUY6Q52gh7rFX/S2//RbH Yc1f7u9UKKIx2m9OY6/gaDb1UHhkQaMeexrRrUFcYBBckgErL5UmTDusgEFcviXiTcCA LE2iEsOefHrdIdUJl9TMnkMQkc3h73pqQTQ/Fx1Sqpthpo238mkrdfeilBGTBAk/VNf2 id2/hw6xkZy0jm4B4TB3wIs/TN+DGJEEOlJU0rIp9cF3tCOfsfJBmNUkwD2+6Fip3Oh3 6lBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:dkim-signature; bh=LS2wGEo98p3sOaiHxUdnxpaMzre1zdqr6qoEd6dzm8w=; b=dLYwhIOB4eK7lU8GUSTzPmO0N3L24DuuMrcMTkFp3JKTbNNnCmxulIFZnY5CP9zauH kQx0oITIO3HCi5nJLcX30Evkkok3pntlDqAr9hMlHvMf3yWSVglYJevYzKFhKCc1hWPg ab/VBpNkgqxckXCCXiMmT3QZBDfeMfXUp4QxmX7nq8Kx9Eg5UG6ScsG+arRpXe3I+7Pv VbiCsMN0C6W6nBa69OrknD/xkIHffpkChPOCFgfUOpz6f3AaDzvDjpFZxNkEYuAr5nke edlUEK7dGrpEw4WVnVZqSC0D2nGBs2/TBlWB0tET0fzVRoZ+j9JvZ8we28C+IVxAdhO5 cHLA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@arrikto-com.20150623.gappssmtp.com header.s=20150623 header.b=S4Bu5ZV0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g10si6485758plo.308.2019.03.17.05.23.59; Sun, 17 Mar 2019 05:24:14 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@arrikto-com.20150623.gappssmtp.com header.s=20150623 header.b=S4Bu5ZV0; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727279AbfCQMXP (ORCPT + 99 others); Sun, 17 Mar 2019 08:23:15 -0400 Received: from mail-wm1-f65.google.com ([209.85.128.65]:35100 "EHLO mail-wm1-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727253AbfCQMXK (ORCPT ); Sun, 17 Mar 2019 08:23:10 -0400 Received: by mail-wm1-f65.google.com with SMTP id 81so2709553wme.0 for ; Sun, 17 Mar 2019 05:23:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=arrikto-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=LS2wGEo98p3sOaiHxUdnxpaMzre1zdqr6qoEd6dzm8w=; b=S4Bu5ZV0mHuz2hJwAsIiTemYmWzQs0Na6dUoueHKXRvtLhMYM75DAd4XDHu0A66hua YfDJNbY7Ayi0yMFiFqApU9wJiaNzPf5ctx4dwuIh3dBL5aKaaXtuyS4/QK962VuilqFq yzK1p5lkf1Hf9nx++hD3KFNL+c5Uzxn20lJycDoHom9SXUMcil6Hp6n7sePs/XHyrtCT j/AKFZvbCb1Imc85QqA1SQ6sNU63hsL0ZnmqWswjoZnmW56EzOi+9ONUYT3rj8XjvKAm IQXUDDWppMoZIHfcKXhJmVNnPoNoavgAKwVpoyjnHUhjbisUEJxHMtI97Bu9U1Vwi6s1 WXFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=LS2wGEo98p3sOaiHxUdnxpaMzre1zdqr6qoEd6dzm8w=; b=HHbDn01o24wZalGhWJ9O3MBs/ZvK8AWi/ka9xZPf+XP8vqFouk6/WYmRb3gKANd0Pu hCX3hqcZKIbmPfvgJT3fpY/by0Af22gwNeG9ByMR4uSxO1IAwWio/2Ovcz0ES+qaa4E7 Wxk1qB9bIjUZJD1bKTi3B3T6X8BIRRFlDTmc9wQe3TSvw9ddhFtKEQcZ1WMzsdYvhv7w ufWfhnlMSZMR04z0RKG6SVfW8jOOQkJS3trpNhLvMLN39M/FhH+g79LZsOfhOUTdbRKp NvjdUMM3C42prU3recLWVr9/9bZBrZsDeY1DeErZ518HwKkPtykpRtthWLsx39gArH+o Cr3g== X-Gm-Message-State: APjAAAUH9LYQvLoHJ/amAdTq3KUouU5YmRZTnq6VNJyYh4kSh1BzevI8 i9OMR6TvnzvyPTVjakGtYpJ5dA== X-Received: by 2002:a1c:4cf:: with SMTP id 198mr3702778wme.125.1552825388152; Sun, 17 Mar 2019 05:23:08 -0700 (PDT) Received: from snf-864.vm.snf.arr ([31.177.62.212]) by smtp.gmail.com with ESMTPSA id z10sm5453292wrs.11.2019.03.17.05.23.07 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sun, 17 Mar 2019 05:23:07 -0700 (PDT) From: Nikos Tsironis To: snitzer@redhat.com, agk@redhat.com, dm-devel@redhat.com Cc: mpatocka@redhat.com, paulmck@linux.ibm.com, hch@infradead.org, iliastsi@arrikto.com, linux-kernel@vger.kernel.org Subject: [PATCH v3 6/6] dm snapshot: Use fine-grained locking scheme Date: Sun, 17 Mar 2019 14:22:58 +0200 Message-Id: <20190317122258.21760-7-ntsironis@arrikto.com> X-Mailer: git-send-email 2.11.0 In-Reply-To: <20190317122258.21760-1-ntsironis@arrikto.com> References: <20190317122258.21760-1-ntsironis@arrikto.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Substitute the global locking scheme with a fine grained one, employing the read-write semaphore and the scalable exception tables with per-bucket locks introduced by the previous two commits. Summarizing, we now use a read-write semaphore to protect the mostly read fields of the snapshot structure, e.g., valid, active, etc., and per-bucket bit spinlocks to protect accesses to the complete and pending exception tables. Finally, we use an extra spinlock (pe_allocation_lock) to serialize the allocation of new exceptions by the exception store. This allocation is really fast, so the extra spinlock doesn't hurt the performance. This scheme allows dm-snapshot to scale better, resulting in increased IOPS and reduced latency. Following are some benchmark results using the null_blk device: modprobe null_blk gb=1024 bs=512 submit_queues=8 hw_queue_depth=4096 \ queue_mode=2 irqmode=1 completion_nsec=1 nr_devices=1 * Benchmark fio_origin_randwrite_throughput_N, from the device mapper test suite [1] (direct IO, random 4K writes to origin device, IO engine libaio): +--------------+-------------+------------+ | # of workers | IOPS Before | IOPS After | +--------------+-------------+------------+ | 1 | 57708 | 66421 | | 2 | 63415 | 77589 | | 4 | 67276 | 98839 | | 8 | 60564 | 109258 | +--------------+-------------+------------+ * Benchmark fio_origin_randwrite_latency_N, from the device mapper test suite [1] (direct IO, random 4K writes to origin device, IO engine psync): +--------------+-----------------------+----------------------+ | # of workers | Latency (usec) Before | Latency (usec) After | +--------------+-----------------------+----------------------+ | 1 | 16.25 | 13.27 | | 2 | 31.65 | 25.08 | | 4 | 55.28 | 41.08 | | 8 | 121.47 | 74.44 | +--------------+-----------------------+----------------------+ * Benchmark fio_snapshot_randwrite_throughput_N, from the device mapper test suite [1] (direct IO, random 4K writes to snapshot device, IO engine libaio): +--------------+-------------+------------+ | # of workers | IOPS Before | IOPS After | +--------------+-------------+------------+ | 1 | 72593 | 84938 | | 2 | 97379 | 134973 | | 4 | 90610 | 143077 | | 8 | 90537 | 180085 | +--------------+-------------+------------+ * Benchmark fio_snapshot_randwrite_latency_N, from the device mapper test suite [1] (direct IO, random 4K writes to snapshot device, IO engine psync): +--------------+-----------------------+----------------------+ | # of workers | Latency (usec) Before | Latency (usec) After | +--------------+-----------------------+----------------------+ | 1 | 12.53 | 10.6 | | 2 | 19.78 | 14.89 | | 4 | 40.37 | 23.47 | | 8 | 89.32 | 48.48 | +--------------+-----------------------+----------------------+ [1] https://github.com/jthornber/device-mapper-test-suite Signed-off-by: Nikos Tsironis Signed-off-by: Ilias Tsitsimpis --- drivers/md/dm-snap.c | 84 +++++++++++++++++++++++++++------------------------- 1 file changed, 44 insertions(+), 40 deletions(-) diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index ab75857309aa..4530d0620a72 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -77,7 +77,9 @@ struct dm_snapshot { atomic_t pending_exceptions_count; - /* Protected by "lock" */ + spinlock_t pe_allocation_lock; + + /* Protected by "pe_allocation_lock" */ sector_t exception_start_sequence; /* Protected by kcopyd single-threaded callback */ @@ -1245,6 +1247,7 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv) s->snapshot_overflowed = 0; s->active = 0; atomic_set(&s->pending_exceptions_count, 0); + spin_lock_init(&s->pe_allocation_lock); s->exception_start_sequence = 0; s->exception_complete_sequence = 0; s->out_of_order_tree = RB_ROOT; @@ -1522,6 +1525,13 @@ static void __invalidate_snapshot(struct dm_snapshot *s, int err) dm_table_event(s->ti->table); } +static void invalidate_snapshot(struct dm_snapshot *s, int err) +{ + down_write(&s->lock); + __invalidate_snapshot(s, err); + up_write(&s->lock); +} + static void pending_complete(void *context, int success) { struct dm_snap_pending_exception *pe = context; @@ -1537,8 +1547,7 @@ static void pending_complete(void *context, int success) if (!success) { /* Read/write error - snapshot is unusable */ - down_write(&s->lock); - __invalidate_snapshot(s, -EIO); + invalidate_snapshot(s, -EIO); error = 1; dm_exception_table_lock(&lock); @@ -1547,8 +1556,7 @@ static void pending_complete(void *context, int success) e = alloc_completed_exception(GFP_NOIO); if (!e) { - down_write(&s->lock); - __invalidate_snapshot(s, -ENOMEM); + invalidate_snapshot(s, -ENOMEM); error = 1; dm_exception_table_lock(&lock); @@ -1556,11 +1564,13 @@ static void pending_complete(void *context, int success) } *e = pe->e; - down_write(&s->lock); + down_read(&s->lock); dm_exception_table_lock(&lock); if (!s->valid) { + up_read(&s->lock); free_completed_exception(e); error = 1; + goto out; } @@ -1572,13 +1582,12 @@ static void pending_complete(void *context, int success) * merging can overwrite the chunk in origin. */ dm_insert_exception(&s->complete, e); + up_read(&s->lock); /* Wait for conflicting reads to drain */ if (__chunk_is_tracked(s, pe->e.old_chunk)) { dm_exception_table_unlock(&lock); - up_write(&s->lock); __check_for_conflicting_io(s, pe->e.old_chunk); - down_write(&s->lock); dm_exception_table_lock(&lock); } @@ -1595,8 +1604,6 @@ static void pending_complete(void *context, int success) full_bio->bi_end_io = pe->full_bio_end_io; increment_pending_exceptions_done_count(); - up_write(&s->lock); - /* Submit any pending write bios */ if (error) { if (full_bio) @@ -1738,8 +1745,8 @@ __lookup_pending_exception(struct dm_snapshot *s, chunk_t chunk) /* * Inserts a pending exception into the pending table. * - * NOTE: a write lock must be held on snap->lock before calling - * this. + * NOTE: a write lock must be held on the chunk's pending exception table slot + * before calling this. */ static struct dm_snap_pending_exception * __insert_pending_exception(struct dm_snapshot *s, @@ -1751,12 +1758,15 @@ __insert_pending_exception(struct dm_snapshot *s, pe->started = 0; pe->full_bio = NULL; + spin_lock(&s->pe_allocation_lock); if (s->store->type->prepare_exception(s->store, &pe->e)) { + spin_unlock(&s->pe_allocation_lock); free_pending_exception(pe); return NULL; } pe->exception_sequence = s->exception_start_sequence++; + spin_unlock(&s->pe_allocation_lock); dm_insert_exception(&s->pending, &pe->e); @@ -1768,8 +1778,8 @@ __insert_pending_exception(struct dm_snapshot *s, * for this chunk, otherwise it allocates a new one and inserts * it into the pending table. * - * NOTE: a write lock must be held on snap->lock before calling - * this. + * NOTE: a write lock must be held on the chunk's pending exception table slot + * before calling this. */ static struct dm_snap_pending_exception * __find_pending_exception(struct dm_snapshot *s, @@ -1820,7 +1830,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) if (!s->valid) return DM_MAPIO_KILL; - down_write(&s->lock); + down_read(&s->lock); dm_exception_table_lock(&lock); if (!s->valid || (unlikely(s->snapshot_overflowed) && @@ -1845,17 +1855,9 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) pe = __lookup_pending_exception(s, chunk); if (!pe) { dm_exception_table_unlock(&lock); - up_write(&s->lock); pe = alloc_pending_exception(s); - down_write(&s->lock); dm_exception_table_lock(&lock); - if (!s->valid || s->snapshot_overflowed) { - free_pending_exception(pe); - r = DM_MAPIO_KILL; - goto out_unlock; - } - e = dm_lookup_exception(&s->complete, chunk); if (e) { free_pending_exception(pe); @@ -1866,10 +1868,15 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) pe = __find_pending_exception(s, pe, chunk); if (!pe) { dm_exception_table_unlock(&lock); + up_read(&s->lock); + + down_write(&s->lock); if (s->store->userspace_supports_overflow) { - s->snapshot_overflowed = 1; - DMERR("Snapshot overflowed: Unable to allocate exception."); + if (s->valid && !s->snapshot_overflowed) { + s->snapshot_overflowed = 1; + DMERR("Snapshot overflowed: Unable to allocate exception."); + } } else __invalidate_snapshot(s, -ENOMEM); up_write(&s->lock); @@ -1887,8 +1894,10 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) bio->bi_iter.bi_size == (s->store->chunk_size << SECTOR_SHIFT)) { pe->started = 1; + dm_exception_table_unlock(&lock); - up_write(&s->lock); + up_read(&s->lock); + start_full_bio(pe, bio); goto out; } @@ -1896,10 +1905,12 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) bio_list_add(&pe->snapshot_bios, bio); if (!pe->started) { - /* this is protected by snap->lock */ + /* this is protected by the exception table lock */ pe->started = 1; + dm_exception_table_unlock(&lock); - up_write(&s->lock); + up_read(&s->lock); + start_copy(pe); goto out; } @@ -1910,7 +1921,7 @@ static int snapshot_map(struct dm_target *ti, struct bio *bio) out_unlock: dm_exception_table_unlock(&lock); - up_write(&s->lock); + up_read(&s->lock); out: return r; } @@ -2234,7 +2245,7 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, chunk = sector_to_chunk(snap->store, sector); dm_exception_table_lock_init(snap, chunk, &lock); - down_write(&snap->lock); + down_read(&snap->lock); dm_exception_table_lock(&lock); /* Only deal with valid and active snapshots */ @@ -2253,16 +2264,9 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, goto next_snapshot; dm_exception_table_unlock(&lock); - up_write(&snap->lock); pe = alloc_pending_exception(snap); - down_write(&snap->lock); dm_exception_table_lock(&lock); - if (!snap->valid) { - free_pending_exception(pe); - goto next_snapshot; - } - pe2 = __lookup_pending_exception(snap, chunk); if (!pe2) { @@ -2275,9 +2279,9 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, pe = __insert_pending_exception(snap, pe, chunk); if (!pe) { dm_exception_table_unlock(&lock); - __invalidate_snapshot(snap, -ENOMEM); - up_write(&snap->lock); + up_read(&snap->lock); + invalidate_snapshot(snap, -ENOMEM); continue; } } else { @@ -2310,7 +2314,7 @@ static int __origin_write(struct list_head *snapshots, sector_t sector, next_snapshot: dm_exception_table_unlock(&lock); - up_write(&snap->lock); + up_read(&snap->lock); if (pe_to_start_now) { start_copy(pe_to_start_now); -- 2.11.0