Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp2627149ybb; Mon, 30 Mar 2020 09:44:44 -0700 (PDT) X-Google-Smtp-Source: ADFU+vvQYhPOyxnFKIzSmmpS/y5N/ZZi7jUoJIPdNKCcspZY9/OLGe6SS9sLGnOyUgPqxe9kUaoc X-Received: by 2002:a9d:6c89:: with SMTP id c9mr9623976otr.337.1585586684632; Mon, 30 Mar 2020 09:44:44 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1585586684; cv=none; d=google.com; s=arc-20160816; b=c4rcrosWaoN2WBkdpThU+bYDLf3rsOyu6hbQM3tbqm03imxwB55LG6gZcBvrYPdLwE IgeUSm85v1G7Lp90/NfB6e58dPNlhKayIiC35BevT5Jm9jfLyxKlMh73ABARB0wM2WeN aHI/dMaoGvGh8XOSbDKAEfo5qojN0q+68cKt9x49IU+Ajaqn6yEQx2qQoyTjG/TJ8Bd2 1OIXNFWVMxwpBZZ4i7fJIT3QAe1ZX1G2tzkWfHqKzPiQfnL3xtnKwevhsZ/Mk25GK33c YOBt7e1prsIrCzCL/nsz0M5fXX92usKmMalncA9XlgHA0P+YqgRmKNOEfl5ne5Ifw9bb gj1g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=Y6hmqqgJZVAB60gBBaZ1yCpOs9sXAVPbKnrYOJXvo4g=; b=O90rME50AKq41sySZ4XwngKREF1bHiEYk5NbKbhsRwgV22dbdNbfYXpR3YddJfEVQC VzJ23CiGUFnog2ily0dcYAzS6JpEOv0bfNAqq1Nj5Cj3T2ciUyFG0ezMxAwbR+j9pzgH w45H2ltSlAU8iSxT16+S0rNcD4HJAWQ98UQehMrZ8lhBDMOmW+1+5SJ8JOTCjK6W/uoq 4puWH3DZgLGV6O0B5koN42YcCOCyqp00eY/3saV4cd7zhmzRKqBdzET5/X78p07djALH yJwOHay7hP6CjRI/CX+gvS8fMacjt7i3s1uqtn17EroQ7TOr0JfhcTKrdgGUxxKimrmQ G/cQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EcfTVghv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z22si6466902oto.237.2020.03.30.09.44.32; Mon, 30 Mar 2020 09:44:44 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b=EcfTVghv; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729112AbgC3QeO (ORCPT + 99 others); Mon, 30 Mar 2020 12:34:14 -0400 Received: from us-smtp-delivery-74.mimecast.com ([63.128.21.74]:53248 "EHLO us-smtp-delivery-74.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727191AbgC3QeO (ORCPT ); Mon, 30 Mar 2020 12:34:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1585586053; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Y6hmqqgJZVAB60gBBaZ1yCpOs9sXAVPbKnrYOJXvo4g=; b=EcfTVghvr8TFA8Q0DUpIy4eOAH+8MzmqsClyVHG0QIaqmdMKeCwlKJsdFakYuP96wzB+sj NrzPz8XKAe4uNcWqHKcloxlNHkW4dC8sOcnsboiCfJU29gyfckB+m8Gg3cn1ROJpOPJhg7 H+1HLAcGq91EvJVb4txwNaSsnAs8Rbk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-240-28tmbJQrPOeJs1K-V1KGdA-1; Mon, 30 Mar 2020 12:34:11 -0400 X-MC-Unique: 28tmbJQrPOeJs1K-V1KGdA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.phx2.redhat.com [10.5.11.22]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 9B79818FF661; Mon, 30 Mar 2020 16:34:10 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (file01.intranet.prod.int.rdu2.redhat.com [10.11.5.7]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 484DE1001925; Mon, 30 Mar 2020 16:34:05 +0000 (UTC) Received: from file01.intranet.prod.int.rdu2.redhat.com (localhost [127.0.0.1]) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4) with ESMTP id 02UGY49r010458; Mon, 30 Mar 2020 12:34:04 -0400 Received: from localhost (mpatocka@localhost) by file01.intranet.prod.int.rdu2.redhat.com (8.14.4/8.14.4/Submit) with ESMTP id 02UGY4fc010454; Mon, 30 Mar 2020 12:34:04 -0400 X-Authentication-Warning: file01.intranet.prod.int.rdu2.redhat.com: mpatocka owned process doing -bs Date: Mon, 30 Mar 2020 12:34:04 -0400 (EDT) From: Mikulas Patocka X-X-Sender: mpatocka@file01.intranet.prod.int.rdu2.redhat.com To: Lukas Straub cc: linux-kernel , dm-devel , Mike Snitzer , Alasdair Kergon Subject: Re: [dm-devel] [PATCH v2] dm-integrity: Prevent RMW for full metadata buffer writes In-Reply-To: <20200227142601.61f3cd54@luklap> Message-ID: References: <20200227142601.61f3cd54@luklap> User-Agent: Alpine 2.02 (LRH 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Scanned-By: MIMEDefang 2.84 on 10.5.11.22 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi I tested it on my rotational disk: On Thu, 27 Feb 2020, Lukas Straub wrote: > If a full metadata buffer is being written, don't read it first. This > prevents a read-modify-write cycle and increases performance on HDDs > considerably. > > To do this we now calculate the checksums for all sectors in the bio in one > go in integrity_metadata and then pass the result to dm_integrity_rw_tag, > which now checks if we overwrite the whole buffer. > > Benchmarking with a 5400RPM HDD with bitmap mode: > integritysetup format --no-wipe --batch-mode --interleave-sectors $((64*1024)) -t 4 -s 512 -I crc32c -B /dev/sdc > integritysetup open --buffer-sectors=1 -I crc32c -B /dev/sdc hdda_integ > dd if=/dev/zero of=/dev/mapper/hdda_integ bs=64K count=$((16*1024*4)) conv=fsync oflag=direct status=progress > > Without patch: > 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 400.326 s, 10.7 MB/s I get 42.3 MB/s. (it seems that my disk has better prefetch than yours). > With patch: > 4294967296 bytes (4.3 GB, 4.0 GiB) copied, 41.2057 s, 104 MB/s I get 110 MB/s. > Signed-off-by: Lukas Straub BUT: when I removed "--buffer-sectors=1" from the command line, I've got these numbers: without the patch: 101 MB/s with the patch: 101 MB/s So, you crippled performance with "--buffer-sectors=1" and then made a patch to restore it. The patch only helps 10%. Mikulas > --- > Hello Everyone, > So here is v2, now checking if we overwrite a whole metadata buffer instead > of the (buggy) check if we overwrite a whole tag area before. > Performance stayed the same (with --buffer-sectors=1). > > The only quirk now is that it advertises a very big optimal io size in the > standard configuration (where buffer_sectors=128). Is this a Problem? > > v2: > -fix dm_integrity_rw_tag to check if we overwrite a whole metadat buffer > -fix optimal io size to check if we overwrite a whole metadata buffer > > Regards, > Lukas Straub > > drivers/md/dm-integrity.c | 81 +++++++++++++++++++++++---------------- > 1 file changed, 47 insertions(+), 34 deletions(-) > > diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c > index b225b3e445fa..a6d3cf1406df 100644 > --- a/drivers/md/dm-integrity.c > +++ b/drivers/md/dm-integrity.c > @@ -1309,9 +1309,17 @@ static int dm_integrity_rw_tag(struct dm_integrity_c *ic, unsigned char *tag, se > if (unlikely(r)) > return r; > > - data = dm_bufio_read(ic->bufio, *metadata_block, &b); > - if (IS_ERR(data)) > - return PTR_ERR(data); > + /* Don't read metadata sectors from disk if we're going to overwrite them completely */ > + if (op == TAG_WRITE && *metadata_offset == 0 \ > + && total_size >= (1U << SECTOR_SHIFT << ic->log2_buffer_sectors)) { > + data = dm_bufio_new(ic->bufio, *metadata_block, &b); > + if (IS_ERR(data)) > + return PTR_ERR(data); > + } else { > + data = dm_bufio_read(ic->bufio, *metadata_block, &b); > + if (IS_ERR(data)) > + return PTR_ERR(data); > + } > > to_copy = min((1U << SECTOR_SHIFT << ic->log2_buffer_sectors) - *metadata_offset, total_size); > dp = data + *metadata_offset; > @@ -1514,6 +1522,8 @@ static void integrity_metadata(struct work_struct *w) > { > struct dm_integrity_io *dio = container_of(w, struct dm_integrity_io, work); > struct dm_integrity_c *ic = dio->ic; > + unsigned sectors_to_process = dio->range.n_sectors; > + sector_t sector = dio->range.logical_sector; > > int r; > > @@ -1522,16 +1532,14 @@ static void integrity_metadata(struct work_struct *w) > struct bio_vec bv; > unsigned digest_size = crypto_shash_digestsize(ic->internal_hash); > struct bio *bio = dm_bio_from_per_bio_data(dio, sizeof(struct dm_integrity_io)); > - char *checksums; > + char *checksums, *checksums_ptr; > unsigned extra_space = unlikely(digest_size > ic->tag_size) ? digest_size - ic->tag_size : 0; > char checksums_onstack[HASH_MAX_DIGESTSIZE]; > - unsigned sectors_to_process = dio->range.n_sectors; > - sector_t sector = dio->range.logical_sector; > > if (unlikely(ic->mode == 'R')) > goto skip_io; > > - checksums = kmalloc((PAGE_SIZE >> SECTOR_SHIFT >> ic->sb->log2_sectors_per_block) * ic->tag_size + extra_space, > + checksums = kmalloc((dio->range.n_sectors >> ic->sb->log2_sectors_per_block) * ic->tag_size + extra_space, > GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN); > if (!checksums) { > checksums = checksums_onstack; > @@ -1542,49 +1550,45 @@ static void integrity_metadata(struct work_struct *w) > } > } > > + checksums_ptr = checksums; > __bio_for_each_segment(bv, bio, iter, dio->orig_bi_iter) { > unsigned pos; > - char *mem, *checksums_ptr; > - > -again: > + char *mem; > mem = (char *)kmap_atomic(bv.bv_page) + bv.bv_offset; > pos = 0; > - checksums_ptr = checksums; > do { > integrity_sector_checksum(ic, sector, mem + pos, checksums_ptr); > - checksums_ptr += ic->tag_size; > - sectors_to_process -= ic->sectors_per_block; > + > + if (likely(checksums != checksums_onstack)) { > + checksums_ptr += ic->tag_size; > + } else { > + r = dm_integrity_rw_tag(ic, checksums, &dio->metadata_block, &dio->metadata_offset, > + ic->tag_size, !dio->write ? TAG_CMP : TAG_WRITE); > + if (unlikely(r)) > + goto internal_hash_error; > + } > + > pos += ic->sectors_per_block << SECTOR_SHIFT; > sector += ic->sectors_per_block; > - } while (pos < bv.bv_len && sectors_to_process && checksums != checksums_onstack); > + sectors_to_process -= ic->sectors_per_block; > + } while (pos < bv.bv_len && sectors_to_process); > kunmap_atomic(mem); > > - r = dm_integrity_rw_tag(ic, checksums, &dio->metadata_block, &dio->metadata_offset, > - checksums_ptr - checksums, !dio->write ? TAG_CMP : TAG_WRITE); > - if (unlikely(r)) { > - if (r > 0) { > - DMERR_LIMIT("Checksum failed at sector 0x%llx", > - (unsigned long long)(sector - ((r + ic->tag_size - 1) / ic->tag_size))); > - r = -EILSEQ; > - atomic64_inc(&ic->number_of_mismatches); > - } > - if (likely(checksums != checksums_onstack)) > - kfree(checksums); > - goto error; > - } > - > if (!sectors_to_process) > break; > + } > > - if (unlikely(pos < bv.bv_len)) { > - bv.bv_offset += pos; > - bv.bv_len -= pos; > - goto again; > + if (likely(checksums != checksums_onstack)) { > + r = dm_integrity_rw_tag(ic, checksums, &dio->metadata_block, &dio->metadata_offset, > + (dio->range.n_sectors >> ic->sb->log2_sectors_per_block) * ic->tag_size, > + !dio->write ? TAG_CMP : TAG_WRITE); > + if (unlikely(r)) { > + kfree(checksums); > + goto internal_hash_error; > } > + kfree(checksums); > } > > - if (likely(checksums != checksums_onstack)) > - kfree(checksums); > } else { > struct bio_integrity_payload *bip = dio->orig_bi_integrity; > > @@ -1615,6 +1619,13 @@ static void integrity_metadata(struct work_struct *w) > skip_io: > dec_in_flight(dio); > return; > +internal_hash_error: > + if (r > 0) { > + DMERR_LIMIT("Checksum failed at sector 0x%llx", > + (unsigned long long)(sector - ((r + ic->tag_size - 1) / ic->tag_size))); > + r = -EILSEQ; > + atomic64_inc(&ic->number_of_mismatches); > + } > error: > dio->bi_status = errno_to_blk_status(r); > dec_in_flight(dio); > @@ -3019,6 +3030,8 @@ static void dm_integrity_io_hints(struct dm_target *ti, struct queue_limits *lim > limits->physical_block_size = ic->sectors_per_block << SECTOR_SHIFT; > blk_limits_io_min(limits, ic->sectors_per_block << SECTOR_SHIFT); > } > + > + blk_limits_io_opt(limits, 1U << SECTOR_SHIFT << ic->log2_buffer_sectors >> ic->log2_tag_size << SECTOR_SHIFT ); > } > > static void calculate_journal_section_size(struct dm_integrity_c *ic) > -- > 2.20.1 > > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel >