Received: by 2002:a25:4158:0:0:0:0:0 with SMTP id o85csp2768370yba; Mon, 15 Apr 2019 20:02:40 -0700 (PDT) X-Google-Smtp-Source: APXvYqyx/spXgswCfilDfL+BLo4+pEqWopqI3CDSNE+Hd8R8aOhLXYX5Xm/Iue9yyV1XRcnwMGo8 X-Received: by 2002:aa7:9ab1:: with SMTP id x17mr74061737pfi.4.1555383760730; Mon, 15 Apr 2019 20:02:40 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1555383760; cv=none; d=google.com; s=arc-20160816; b=pRTsb3SOOZPcqAqKAdeJsKYAD5H79+xdfNxuwzFXJeOHVtz9iuTGNHaMBSPDwbBin2 DKab9dLxFTMcrP9pA7oz2wcHzCkA91gvwuPxQI3Gk2zHdtSKrY3CAy8o6EgVmhnzjb79 H9tL3Mlu56a9Ic3LPJo6u79tnXnfZEBYsqvqgHD0MXIhub5da65Aa3iecd7cJh5jbqH/ QTMuvVUgPQdZ5KkXbOZZKgEAkkuurQc1/NZKcw+K3X1irVoaxtOpLiVS22yl3M9SBIh3 5QMEhdz1OC6tJAI1eKK3UglcUg+/dDbnMXL7LwBiVKlNnrKX6IjcAyD+QQovfG59UjdD xHTA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=ZoZdA0oK6m8yyR2ic0jSeDxCigKd0Eo6aCUTBzz2luM=; b=jw9evphgu+kBzgyE5yfkgBaDL8SJPlYz37rQzCSfvaMXHSLEmBpnFSym8Qov+9Nsp2 kWNyH9g33tm2Lz9wdwY/ZlzNaq4Smfdn5uR9Gqi+O9qHoXaw8WbMVUoHI/Tvhdo8W02L zN2Itu7H5VNVH/RwGJ7n2nSTk+XwmJPhmK/IKv1a1H1PX9o/1IoY75XPlQcEhcp6guzx Mxm5AVyA1j+Uo+avnyJBdV1e5yASmT7ShmY9qBP3w83hybR1BH3a7J2QpOAFxKuBt2YX F+QuzSotN5w0xhVGUaoWw2aGAV9NQf6LEgX3f8mxR8yOZGe1IPVI/ANaiD+RFw6ZPDl2 SHkA== ARC-Authentication-Results: i=1; mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=S5IaQTUs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 195si46097540pga.312.2019.04.15.20.02.24; Mon, 15 Apr 2019 20:02:40 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=fail header.i=@infradead.org header.s=bombadil.20170209 header.b=S5IaQTUs; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728828AbfDPDBU (ORCPT + 99 others); Mon, 15 Apr 2019 23:01:20 -0400 Received: from bombadil.infradead.org ([198.137.202.133]:44076 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726761AbfDPC4c (ORCPT ); Mon, 15 Apr 2019 22:56:32 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20170209; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=ZoZdA0oK6m8yyR2ic0jSeDxCigKd0Eo6aCUTBzz2luM=; b=S5IaQTUs5+i6dCqhpkmr0v3g6U 52hcyKjmg1kbBzrzsz4gxgiEw9duOMwKuUSBQRm3PhFF10EdrjFic8QwnS4wd1YFHU3a63UtZc0U2 w7Ksf0bW6hlw5wRE/PWQPIsd5nCFOFa3Rf5wqRnJNhWNrDN703xCSOrY9lEujgC/xVDTkIiOMbo/P POpNTdeIR6oiSoxrYJkkjbCeH3Cd+DEfYoR2Gg2yc6Gu0gZ7OHnS3HQvpetqEiU4DMmVUjadu+Ude pCQws5ODJdtKg8GR26xXWdQXdI364YFW7NVDo/lwbwAvtFlc6Mpu/9yzALMkz7Xuej11ePp99NVmR ZnI+3DmA==; Received: from 177.205.118.176.dynamic.adsl.gvt.net.br ([177.205.118.176] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtpsa (Exim 4.90_1 #2 (Red Hat Linux)) id 1hGEGp-0007aR-T7; Tue, 16 Apr 2019 02:56:30 +0000 Received: from mchehab by bombadil.infradead.org with local (Exim 4.92) (envelope-from ) id 1hGEGn-0001lX-4g; Mon, 15 Apr 2019 23:56:25 -0300 From: Mauro Carvalho Chehab To: Linux Doc Mailing List Cc: Mauro Carvalho Chehab , Mauro Carvalho Chehab , linux-kernel@vger.kernel.org, Jonathan Corbet , Alasdair Kergon , Mike Snitzer , dm-devel@redhat.com Subject: [PATCH 10/57] docs: device-mapper: convert it to ReST format Date: Mon, 15 Apr 2019 23:55:35 -0300 Message-Id: <9dd3c4eca01489bd67ea6de88dfedef8b0e81901.1555382110.git.mchehab+samsung@kernel.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Convert the Device Mapper documentation to ReST. Signed-off-by: Mauro Carvalho Chehab --- .../device-mapper/cache-policies.txt | 24 +- Documentation/device-mapper/cache.txt | 206 +++++++++------- Documentation/device-mapper/delay.txt | 29 ++- Documentation/device-mapper/dm-crypt.txt | 57 +++-- Documentation/device-mapper/dm-flakey.txt | 45 ++-- Documentation/device-mapper/dm-init.txt | 75 +++--- Documentation/device-mapper/dm-integrity.txt | 55 +++-- Documentation/device-mapper/dm-io.txt | 14 +- Documentation/device-mapper/dm-log.txt | 5 +- .../device-mapper/dm-queue-length.txt | 25 +- Documentation/device-mapper/dm-raid.txt | 225 +++++++++++------- .../device-mapper/dm-service-time.txt | 68 +++--- Documentation/device-mapper/dm-uevent.txt | 143 ++++++----- Documentation/device-mapper/dm-zoned.txt | 10 +- Documentation/device-mapper/era.txt | 36 +-- Documentation/device-mapper/kcopyd.txt | 10 +- Documentation/device-mapper/linear.txt | 100 ++++---- Documentation/device-mapper/log-writes.txt | 91 +++---- .../device-mapper/persistent-data.txt | 4 + Documentation/device-mapper/snapshot.txt | 116 ++++----- Documentation/device-mapper/statistics.txt | 62 ++--- Documentation/device-mapper/striped.txt | 68 +++--- Documentation/device-mapper/switch.txt | 47 ++-- .../device-mapper/thin-provisioning.txt | 68 ++++-- Documentation/device-mapper/unstriped.txt | 111 +++++---- Documentation/device-mapper/verity.txt | 20 +- Documentation/device-mapper/writecache.txt | 13 +- Documentation/device-mapper/zero.txt | 14 +- 28 files changed, 1012 insertions(+), 729 deletions(-) diff --git a/Documentation/device-mapper/cache-policies.txt b/Documentation/device-mapper/cache-policies.txt index 86786d87d9a8..b17fe352fc41 100644 --- a/Documentation/device-mapper/cache-policies.txt +++ b/Documentation/device-mapper/cache-policies.txt @@ -1,3 +1,4 @@ +============================= Guidance for writing policies ============================= @@ -30,7 +31,7 @@ multiqueue (mq) This policy is now an alias for smq (see below). -The following tunables are accepted, but have no effect: +The following tunables are accepted, but have no effect:: 'sequential_threshold <#nr_sequential_ios>' 'random_threshold <#nr_random_ios>' @@ -56,7 +57,9 @@ mq policy's hints to be dropped. Also, performance of the cache may degrade slightly until smq recalculates the origin device's hotspots that should be cached. -Memory usage: +Memory usage +^^^^^^^^^^^^ + The mq policy used a lot of memory; 88 bytes per cache block on a 64 bit machine. @@ -69,7 +72,9 @@ cache block). All this means smq uses ~25bytes per cache block. Still a lot of memory, but a substantial improvement nontheless. -Level balancing: +Level balancing +^^^^^^^^^^^^^^^ + mq placed entries in different levels of the multiqueue structures based on their hit count (~ln(hit count)). This meant the bottom levels generally had the most entries, and the top ones had very @@ -94,7 +99,9 @@ is used to decide which blocks to promote. If the hotspot queue is performing badly then it starts moving entries more quickly between levels. This lets it adapt to new IO patterns very quickly. -Performance: +Performance +^^^^^^^^^^^ + Testing smq shows substantially better performance than mq. cleaner @@ -105,16 +112,19 @@ The cleaner writes back all dirty blocks in a cache to decommission it. Examples ======== -The syntax for a table is: +The syntax for a table is:: + cache <#feature_args> []* <#policy_args> []* -The syntax to send a message using the dmsetup command is: +The syntax to send a message using the dmsetup command is:: + dmsetup message 0 sequential_threshold 1024 dmsetup message 0 random_threshold 8 -Using dmsetup: +Using dmsetup:: + dmsetup create blah --table "0 268435456 cache /dev/sdb /dev/sdc \ /dev/sdd 512 0 mq 4 sequential_threshold 1024 random_threshold 8" creates a 128GB large mapped device named 'blah' with the diff --git a/Documentation/device-mapper/cache.txt b/Documentation/device-mapper/cache.txt index 8ae1cf8e94da..f15e5254d05b 100644 --- a/Documentation/device-mapper/cache.txt +++ b/Documentation/device-mapper/cache.txt @@ -1,3 +1,7 @@ +===== +Cache +===== + Introduction ============ @@ -24,10 +28,13 @@ scenarios (eg. a vm image server). Glossary ======== - Migration - Movement of the primary copy of a logical block from one + Migration + Movement of the primary copy of a logical block from one device to the other. - Promotion - Migration from slow device to fast device. - Demotion - Migration from fast device to slow device. + Promotion + Migration from slow device to fast device. + Demotion + Migration from fast device to slow device. The origin device always contains a copy of the logical block, which may be out of date or kept in sync with the copy on the cache device @@ -169,45 +176,53 @@ Target interface Constructor ----------- - cache - <#feature args> []* - <#policy args> [policy args]* + :: - metadata dev : fast device holding the persistent metadata - cache dev : fast device holding cached data blocks - origin dev : slow device holding original data blocks - block size : cache unit size in sectors + cache + <#feature args> []* + <#policy args> [policy args]* - #feature args : number of feature arguments passed - feature args : writethrough or passthrough (The default is writeback.) + ================ ======================================================= + metadata dev fast device holding the persistent metadata + cache dev fast device holding cached data blocks + origin dev slow device holding original data blocks + block size cache unit size in sectors - policy : the replacement policy to use - #policy args : an even number of arguments corresponding to - key/value pairs passed to the policy - policy args : key/value pairs passed to the policy - E.g. 'sequential_threshold 1024' - See cache-policies.txt for details. + #feature args number of feature arguments passed + feature args writethrough or passthrough (The default is writeback.) + + policy the replacement policy to use + #policy args an even number of arguments corresponding to + key/value pairs passed to the policy + policy args key/value pairs passed to the policy + E.g. 'sequential_threshold 1024' + See cache-policies.txt for details. + ================ ======================================================= Optional feature arguments are: - writethrough : write through caching that prohibits cache block - content from being different from origin block content. - Without this argument, the default behaviour is to write - back cache block contents later for performance reasons, - so they may differ from the corresponding origin blocks. - passthrough : a degraded mode useful for various cache coherency - situations (e.g., rolling back snapshots of - underlying storage). Reads and writes always go to - the origin. If a write goes to a cached origin - block, then the cache block is invalidated. - To enable passthrough mode the cache must be clean. - metadata2 : use version 2 of the metadata. This stores the dirty bits - in a separate btree, which improves speed of shutting - down the cache. + ==================== ======================================================== + writethrough write through caching that prohibits cache block + content from being different from origin block content. + Without this argument, the default behaviour is to write + back cache block contents later for performance reasons, + so they may differ from the corresponding origin blocks. - no_discard_passdown : disable passing down discards from the cache - to the origin's data device. + passthrough a degraded mode useful for various cache coherency + situations (e.g., rolling back snapshots of + underlying storage). Reads and writes always go to + the origin. If a write goes to a cached origin + block, then the cache block is invalidated. + To enable passthrough mode the cache must be clean. + + metadata2 use version 2 of the metadata. This stores the dirty + bits in a separate btree, which improves speed of + shutting down the cache. + + no_discard_passdown disable passing down discards from the cache + to the origin's data device. + ==================== ======================================================== A policy called 'default' is always registered. This is an alias for the policy we currently think is giving best all round performance. @@ -218,54 +233,61 @@ the characteristics of a specific policy, always request it by name. Status ------ - <#used metadata blocks>/<#total metadata blocks> - <#used cache blocks>/<#total cache blocks> -<#read hits> <#read misses> <#write hits> <#write misses> -<#demotions> <#promotions> <#dirty> <#features> * -<#core args> * <#policy args> * - +:: -metadata block size : Fixed block size for each metadata block in - sectors -#used metadata blocks : Number of metadata blocks used -#total metadata blocks : Total number of metadata blocks -cache block size : Configurable block size for the cache device - in sectors -#used cache blocks : Number of blocks resident in the cache -#total cache blocks : Total number of cache blocks -#read hits : Number of times a READ bio has been mapped - to the cache -#read misses : Number of times a READ bio has been mapped - to the origin -#write hits : Number of times a WRITE bio has been mapped - to the cache -#write misses : Number of times a WRITE bio has been - mapped to the origin -#demotions : Number of times a block has been removed - from the cache -#promotions : Number of times a block has been moved to - the cache -#dirty : Number of blocks in the cache that differ - from the origin -#feature args : Number of feature args to follow -feature args : 'writethrough' (optional) -#core args : Number of core arguments (must be even) -core args : Key/value pairs for tuning the core - e.g. migration_threshold -policy name : Name of the policy -#policy args : Number of policy arguments to follow (must be even) -policy args : Key/value pairs e.g. sequential_threshold -cache metadata mode : ro if read-only, rw if read-write - In serious cases where even a read-only mode is deemed unsafe - no further I/O will be permitted and the status will just - contain the string 'Fail'. The userspace recovery tools - should then be used. -needs_check : 'needs_check' if set, '-' if not set - A metadata operation has failed, resulting in the needs_check - flag being set in the metadata's superblock. The metadata - device must be deactivated and checked/repaired before the - cache can be made fully operational again. '-' indicates - needs_check is not set. + <#used metadata blocks>/<#total metadata blocks> + <#used cache blocks>/<#total cache blocks> + <#read hits> <#read misses> <#write hits> <#write misses> + <#demotions> <#promotions> <#dirty> <#features> * + <#core args> * <#policy args> * + + + +========================= ===================================================== +metadata block size Fixed block size for each metadata block in + sectors +#used metadata blocks Number of metadata blocks used +#total metadata blocks Total number of metadata blocks +cache block size Configurable block size for the cache device + in sectors +#used cache blocks Number of blocks resident in the cache +#total cache blocks Total number of cache blocks +#read hits Number of times a READ bio has been mapped + to the cache +#read misses Number of times a READ bio has been mapped + to the origin +#write hits Number of times a WRITE bio has been mapped + to the cache +#write misses Number of times a WRITE bio has been + mapped to the origin +#demotions Number of times a block has been removed + from the cache +#promotions Number of times a block has been moved to + the cache +#dirty Number of blocks in the cache that differ + from the origin +#feature args Number of feature args to follow +feature args 'writethrough' (optional) +#core args Number of core arguments (must be even) +core args Key/value pairs for tuning the core + e.g. migration_threshold +policy name Name of the policy +#policy args Number of policy arguments to follow (must be even) +policy args Key/value pairs e.g. sequential_threshold +cache metadata mode ro if read-only, rw if read-write + + In serious cases where even a read-only mode is + deemed unsafe no further I/O will be permitted and + the status will just contain the string 'Fail'. + The userspace recovery tools should then be used. +needs_check 'needs_check' if set, '-' if not set + A metadata operation has failed, resulting in the + needs_check flag being set in the metadata's + superblock. The metadata device must be + deactivated and checked/repaired before the + cache can be made fully operational again. + '-' indicates needs_check is not set. +========================= ===================================================== Messages -------- @@ -274,11 +296,12 @@ Policies will have different tunables, specific to each one, so we need a generic way of getting and setting these. Device-mapper messages are used. (A sysfs interface would also be possible.) -The message format is: +The message format is:: -E.g. +E.g.:: + dmsetup message my_cache 0 sequential_threshold 1024 @@ -290,11 +313,12 @@ of values from 5 to 9. Each cblock must be expressed as a decimal value, in the future a variant message that takes cblock ranges expressed in hexadecimal may be needed to better support efficient invalidation of larger caches. The cache must be in passthrough mode -when invalidate_cblocks is used. +when invalidate_cblocks is used:: invalidate_cblocks [|-]* -E.g. +E.g.:: + dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789 Examples @@ -304,8 +328,10 @@ The test suite can be found here: https://github.com/jthornber/device-mapper-test-suite -dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ - /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0' -dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ - /dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \ - mq 4 sequential_threshold 1024 random_threshold 8' +:: + + dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ + /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0' + dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ + /dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \ + mq 4 sequential_threshold 1024 random_threshold 8' diff --git a/Documentation/device-mapper/delay.txt b/Documentation/device-mapper/delay.txt index 6426c45273cb..917ba8c33359 100644 --- a/Documentation/device-mapper/delay.txt +++ b/Documentation/device-mapper/delay.txt @@ -1,10 +1,12 @@ +======== dm-delay ======== Device-Mapper's "delay" target delays reads and/or writes and maps them to different devices. -Parameters: +Parameters:: + [ [ ]] @@ -14,15 +16,16 @@ Delays are specified in milliseconds. Example scripts =============== -[[ -#!/bin/sh -# Create device delaying rw operation for 500ms -echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed -]] - -[[ -#!/bin/sh -# Create device delaying only write operation for 500ms and -# splitting reads and writes to different devices $1 $2 -echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed -]] + +:: + + #!/bin/sh + # Create device delaying rw operation for 500ms + echo "0 `blockdev --getsz $1` delay $1 0 500" | dmsetup create delayed + +:: + + #!/bin/sh + # Create device delaying only write operation for 500ms and + # splitting reads and writes to different devices $1 $2 + echo "0 `blockdev --getsz $1` delay $1 0 0 $2 0 500" | dmsetup create delayed diff --git a/Documentation/device-mapper/dm-crypt.txt b/Documentation/device-mapper/dm-crypt.txt index 3b3e1de21c9c..8f4a3f889d43 100644 --- a/Documentation/device-mapper/dm-crypt.txt +++ b/Documentation/device-mapper/dm-crypt.txt @@ -1,5 +1,6 @@ +======== dm-crypt -========= +======== Device-Mapper's "crypt" target provides transparent encryption of block devices using the kernel crypto API. @@ -7,15 +8,20 @@ using the kernel crypto API. For a more detailed description of supported parameters see: https://gitlab.com/cryptsetup/cryptsetup/wikis/DMCrypt -Parameters: \ +Parameters:: + + \ [<#opt_params> ] Encryption cipher, encryption mode and Initial Vector (IV) generator. - The cipher specifications format is: + The cipher specifications format is:: + cipher[:keycount]-chainmode-ivmode[:ivopts] - Examples: + + Examples:: + aes-cbc-essiv:sha256 aes-xts-plain64 serpent-xts-plain64 @@ -25,12 +31,17 @@ Parameters: \ as for the first format type. This format is mainly used for specification of authenticated modes. - The crypto API cipher specifications format is: + The crypto API cipher specifications format is:: + capi:cipher_api_spec-ivmode[:ivopts] - Examples: + + Examples:: + capi:cbc(aes)-essiv:sha256 capi:xts(aes)-plain64 - Examples of authenticated modes: + + Examples of authenticated modes:: + capi:gcm(aes)-random capi:authenc(hmac(sha256),xts(aes))-random capi:rfc7539(chacha20,poly1305)-random @@ -142,21 +153,21 @@ LUKS (Linux Unified Key Setup) is now the preferred way to set up disk encryption with dm-crypt using the 'cryptsetup' utility, see https://gitlab.com/cryptsetup/cryptsetup -[[ -#!/bin/sh -# Create a crypt device using dmsetup -dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0" -]] +:: -[[ -#!/bin/sh -# Create a crypt device using dmsetup when encryption key is stored in keyring service -dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0" -]] + #!/bin/sh + # Create a crypt device using dmsetup + dmsetup create crypt1 --table "0 `blockdev --getsz $1` crypt aes-cbc-essiv:sha256 babebabebabebabebabebabebabebabe 0 $1 0" -[[ -#!/bin/sh -# Create a crypt device using cryptsetup and LUKS header with default cipher -cryptsetup luksFormat $1 -cryptsetup luksOpen $1 crypt1 -]] +:: + + #!/bin/sh + # Create a crypt device using dmsetup when encryption key is stored in keyring service + dmsetup create crypt2 --table "0 `blockdev --getsize $1` crypt aes-cbc-essiv:sha256 :32:logon:my_prefix:my_key 0 $1 0" + +:: + + #!/bin/sh + # Create a crypt device using cryptsetup and LUKS header with default cipher + cryptsetup luksFormat $1 + cryptsetup luksOpen $1 crypt1 diff --git a/Documentation/device-mapper/dm-flakey.txt b/Documentation/device-mapper/dm-flakey.txt index 9f0e247d0877..86138735879d 100644 --- a/Documentation/device-mapper/dm-flakey.txt +++ b/Documentation/device-mapper/dm-flakey.txt @@ -1,3 +1,4 @@ +========= dm-flakey ========= @@ -15,17 +16,26 @@ underlying devices. Table parameters ---------------- + +:: + \ [ []] Mandatory parameters: - : Full pathname to the underlying block-device, or a - "major:minor" device-number. - : Starting sector within the device. - : Number of seconds device is available. - : Number of seconds device returns errors. + + : + Full pathname to the underlying block-device, or a + "major:minor" device-number. + : + Starting sector within the device. + : + Number of seconds device is available. + : + Number of seconds device returns errors. Optional feature parameters: + If no feature parameters are present, during the periods of unreliability, all I/O returns errors. @@ -41,17 +51,24 @@ Optional feature parameters: During , replace of the data of each matching bio with . - : The offset of the byte to replace. - Counting starts at 1, to replace the first byte. - : Either 'r' to corrupt reads or 'w' to corrupt writes. - 'w' is incompatible with drop_writes. - : The value (from 0-255) to write. - : Perform the replacement only if bio->bi_opf has all the - selected flags set. + : + The offset of the byte to replace. + Counting starts at 1, to replace the first byte. + : + Either 'r' to corrupt reads or 'w' to corrupt writes. + 'w' is incompatible with drop_writes. + : + The value (from 0-255) to write. + : + Perform the replacement only if bio->bi_opf has all the + selected flags set. Examples: + +Replaces the 32nd byte of READ bios with the value 1:: + corrupt_bio_byte 32 r 1 0 - - replaces the 32nd byte of READ bios with the value 1 + +Replaces the 224th byte of REQ_META (=32) bios with the value 0:: corrupt_bio_byte 224 w 0 32 - - replaces the 224th byte of REQ_META (=32) bios with the value 0 diff --git a/Documentation/device-mapper/dm-init.txt b/Documentation/device-mapper/dm-init.txt index 8464ee7c01b8..a0db2f30030e 100644 --- a/Documentation/device-mapper/dm-init.txt +++ b/Documentation/device-mapper/dm-init.txt @@ -1,5 +1,6 @@ +================================ Early creation of mapped devices -==================================== +================================ It is possible to configure a device-mapper device to act as the root device for your system in two ways. @@ -12,15 +13,17 @@ The second is to create one or more device-mappers using the module parameter The format is specified as a string of data separated by commas and optionally semi-colons, where: + - a comma is used to separate fields like name, uuid, flags and table (specifies one device) - a semi-colon is used to separate devices. -So the format will look like this: +So the format will look like this:: dm-mod.create=,,,,[,
+][;,,,,
[,
+]+] -Where, +Where:: + ::= The device name. ::= xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx | "" ::= The device minor number | "" @@ -29,7 +32,7 @@ Where, ::= "verity" | "linear" | ... (see list below) The dm line should be equivalent to the one used by the dmsetup tool with the ---concise argument. +`--concise` argument. Target types ============ @@ -38,32 +41,34 @@ Not all target types are available as there are serious risks in allowing activation of certain DM targets without first using userspace tools to check the validity of associated metadata. - "cache": constrained, userspace should verify cache device - "crypt": allowed - "delay": allowed - "era": constrained, userspace should verify metadata device - "flakey": constrained, meant for test - "linear": allowed - "log-writes": constrained, userspace should verify metadata device - "mirror": constrained, userspace should verify main/mirror device - "raid": constrained, userspace should verify metadata device - "snapshot": constrained, userspace should verify src/dst device - "snapshot-origin": allowed - "snapshot-merge": constrained, userspace should verify src/dst device - "striped": allowed - "switch": constrained, userspace should verify dev path - "thin": constrained, requires dm target message from userspace - "thin-pool": constrained, requires dm target message from userspace - "verity": allowed - "writecache": constrained, userspace should verify cache device - "zero": constrained, not meant for rootfs +======================= ======================================================= +`cache` constrained, userspace should verify cache device +`crypt` allowed +`delay` allowed +`era` constrained, userspace should verify metadata device +`flakey` constrained, meant for test +`linear` allowed +`log-writes` constrained, userspace should verify metadata device +`mirror` constrained, userspace should verify main/mirror device +`raid` constrained, userspace should verify metadata device +`snapshot` constrained, userspace should verify src/dst device +`snapshot-origin` allowed +`snapshot-merge` constrained, userspace should verify src/dst device +`striped` allowed +`switch` constrained, userspace should verify dev path +`thin` constrained, requires dm target message from userspace +`thin-pool` constrained, requires dm target message from userspace +`verity` allowed +`writecache` constrained, userspace should verify cache device +`zero` constrained, not meant for rootfs +======================= ======================================================= If the target is not listed above, it is constrained by default (not tested). Examples ======== An example of booting to a linear array made up of user-mode linux block -devices: +devices:: dm-mod.create="lroot,,,rw, 0 4096 linear 98:16 0, 4096 4096 linear 98:32 0" root=/dev/dm-0 @@ -71,8 +76,8 @@ This will boot to a rw dm-linear target of 8192 sectors split across two block devices identified by their major:minor numbers. After boot, udev will rename this target to /dev/mapper/lroot (depending on the rules). No uuid was assigned. -An example of multiple device-mappers, with the dm-mod.create="..." contents is shown here -split on multiple lines for readability: +An example of multiple device-mappers, with the dm-mod.create="..." contents +is shown here split on multiple lines for readability:: vroot,,,ro, 0 1740800 verity 254:0 254:0 1740800 sha1 @@ -84,30 +89,36 @@ split on multiple lines for readability: Other examples (per target): -"crypt": +"crypt":: + dm-crypt,,8,ro, 0 1048576 crypt aes-xts-plain64 babebabebabebabebabebabebabebabebabebabebabebabebabebabebabebabe 0 /dev/sda 0 1 allow_discards -"delay": +"delay":: + dm-delay,,4,ro,0 409600 delay /dev/sda1 0 500 -"linear": +"linear":: + dm-linear,,,rw, 0 32768 linear /dev/sda1 0, 32768 1024000 linear /dev/sda2 0, 1056768 204800 linear /dev/sda3 0, 1261568 512000 linear /dev/sda4 0 -"snapshot-origin": +"snapshot-origin":: + dm-snap-orig,,4,ro,0 409600 snapshot-origin 8:2 -"striped": +"striped":: + dm-striped,,4,ro,0 1638400 striped 4 4096 /dev/sda1 0 /dev/sda2 0 /dev/sda3 0 /dev/sda4 0 -"verity": +"verity":: + dm-verity,,4,ro, 0 1638400 verity 1 8:1 8:2 4096 4096 204800 1 sha256 fb1a5a0f00deb908d8b53cb270858975e76cf64105d412ce764225d53b8f3cfd diff --git a/Documentation/device-mapper/dm-integrity.txt b/Documentation/device-mapper/dm-integrity.txt index 297251b0d2d5..4fa8d2939d46 100644 --- a/Documentation/device-mapper/dm-integrity.txt +++ b/Documentation/device-mapper/dm-integrity.txt @@ -1,3 +1,7 @@ +============ +dm-integrity +============ + The dm-integrity target emulates a block device that has additional per-sector tags that can be used for storing integrity information. @@ -28,15 +32,16 @@ zeroes. If the superblock is neither valid nor zeroed, the dm-integrity target can't be loaded. To use the target for the first time: + 1. overwrite the superblock with zeroes 2. load the dm-integrity target with one-sector size, the kernel driver - will format the device + will format the device 3. unload the dm-integrity target 4. read the "provided_data_sectors" value from the superblock 5. load the dm-integrity target with the the target size - "provided_data_sectors" + "provided_data_sectors" 6. if you want to use dm-integrity with dm-crypt, load the dm-crypt target - with the size "provided_data_sectors" + with the size "provided_data_sectors" Target arguments: @@ -44,22 +49,26 @@ Target arguments: 1. the underlying block device 2. the number of reserved sector at the beginning of the device - the - dm-integrity won't read of write these sectors + dm-integrity won't read of write these sectors 3. the size of the integrity tag (if "-" is used, the size is taken from - the internal-hash algorithm) + the internal-hash algorithm) 4. mode: - D - direct writes (without journal) - in this mode, journaling is + + D - direct writes (without journal) + in this mode, journaling is not used and data sectors and integrity tags are written separately. In case of crash, it is possible that the data and integrity tag doesn't match. - J - journaled writes - data and integrity tags are written to the + J - journaled writes + data and integrity tags are written to the journal and atomicity is guaranteed. In case of crash, either both data and tag or none of them are written. The journaled mode degrades write throughput twice because the data have to be written twice. - R - recovery mode - in this mode, journal is not replayed, + R - recovery mode + in this mode, journal is not replayed, checksums are not checked and writes to the device are not allowed. This mode is useful for data recovery if the device cannot be activated in any of the other standard @@ -154,9 +163,12 @@ and the reloaded target would be non-functional. The layout of the formatted block device: -* reserved sectors (they are not used by this target, they can be used for - storing LUKS metadata or for other purpose), the size of the reserved - area is specified in the target arguments + +* reserved sectors + (they are not used by this target, they can be used for + storing LUKS metadata or for other purpose), the size of the reserved + area is specified in the target arguments + * superblock (4kiB) * magic string - identifies that the device was formatted * version @@ -168,34 +180,47 @@ The layout of the formatted block device: metadata and padding). The user of this target should not send bios that access data beyond the "provided data sectors" limit. * flags - a flag is set if journal_mac is used + * journal The journal is divided into sections, each section contains: + * metadata area (4kiB), it contains journal entries - every journal entry contains: + + - every journal entry contains: + * logical sector (specifies where the data and tag should be written) * last 8 bytes of data * integrity tag (the size is specified in the superblock) - every metadata sector ends with + + - every metadata sector ends with + * mac (8-bytes), all the macs in 8 metadata sectors form a 64-byte value. It is used to store hmac of sector numbers in the journal section, to protect against a possibility that the attacker tampers with sector numbers in the journal. * commit id + * data area (the size is variable; it depends on how many journal entries fit into the metadata area) - every sector in the data area contains: + + - every sector in the data area contains: + * data (504 bytes of data, the last 8 bytes are stored in the journal entry) * commit id + To test if the whole journal section was written correctly, every 512-byte sector of the journal ends with 8-byte commit id. If the commit id matches on all sectors in a journal section, then it is assumed that the section was written correctly. If the commit id doesn't match, the section was written partially and it should not be replayed. -* one or more runs of interleaved tags and data. Each run contains: + +* one or more runs of interleaved tags and data. + Each run contains: + * tag area - it contains integrity tags. There is one tag for each sector in the data area * data area - it contains data sectors. The number of data sectors diff --git a/Documentation/device-mapper/dm-io.txt b/Documentation/device-mapper/dm-io.txt index 3b5d9a52cdcf..d2492917a1f5 100644 --- a/Documentation/device-mapper/dm-io.txt +++ b/Documentation/device-mapper/dm-io.txt @@ -1,3 +1,4 @@ +===== dm-io ===== @@ -7,7 +8,7 @@ version. The user must set up an io_region structure to describe the desired location of the I/O. Each io_region indicates a block-device along with the starting -sector and size of the region. +sector and size of the region:: struct io_region { struct block_device *bdev; @@ -19,7 +20,7 @@ Dm-io can read from one io_region or write to one or more io_regions. Writes to multiple regions are specified by an array of io_region structures. The first I/O service type takes a list of memory pages as the data buffer for -the I/O, along with an offset into the first page. +the I/O, along with an offset into the first page:: struct page_list { struct page_list *next; @@ -35,7 +36,7 @@ the I/O, along with an offset into the first page. The second I/O service type takes an array of bio vectors as the data buffer for the I/O. This service can be handy if the caller has a pre-assembled bio, -but wants to direct different portions of the bio to different devices. +but wants to direct different portions of the bio to different devices:: int dm_io_sync_bvec(unsigned int num_regions, struct io_region *where, int rw, struct bio_vec *bvec, @@ -47,7 +48,7 @@ but wants to direct different portions of the bio to different devices. The third I/O service type takes a pointer to a vmalloc'd memory buffer as the data buffer for the I/O. This service can be handy if the caller needs to do I/O to a large region but doesn't want to allocate a large number of individual -memory pages. +memory pages:: int dm_io_sync_vm(unsigned int num_regions, struct io_region *where, int rw, void *data, unsigned long *error_bits); @@ -55,11 +56,11 @@ memory pages. void *data, io_notify_fn fn, void *context); Callers of the asynchronous I/O services must include the name of a completion -callback routine and a pointer to some context data for the I/O. +callback routine and a pointer to some context data for the I/O:: typedef void (*io_notify_fn)(unsigned long error, void *context); -The "error" parameter in this callback, as well as the "*error" parameter in +The "error" parameter in this callback, as well as the `*error` parameter in all of the synchronous versions, is a bitset (instead of a simple error value). In the case of an write-I/O to multiple regions, this bitset allows dm-io to indicate success or failure on each individual region. @@ -72,4 +73,3 @@ always available in order to avoid unnecessary waiting while performing I/O. When the user is finished using the dm-io services, they should call dm_io_put() and specify the same number of pages that were given on the dm_io_get() call. - diff --git a/Documentation/device-mapper/dm-log.txt b/Documentation/device-mapper/dm-log.txt index c155ac569c44..ba4fce39bc27 100644 --- a/Documentation/device-mapper/dm-log.txt +++ b/Documentation/device-mapper/dm-log.txt @@ -1,3 +1,4 @@ +===================== Device-Mapper Logging ===================== The device-mapper logging code is used by some of the device-mapper @@ -16,11 +17,13 @@ dm_dirty_log_type in include/linux/dm-dirty-log.h). Various different logging implementations are available and provide different capabilities. The list includes: +============== ============================================================== Type Files -==== ===== +============== ============================================================== disk drivers/md/dm-log.c core drivers/md/dm-log.c userspace drivers/md/dm-log-userspace* include/linux/dm-log-userspace.h +============== ============================================================== The "disk" log type ------------------- diff --git a/Documentation/device-mapper/dm-queue-length.txt b/Documentation/device-mapper/dm-queue-length.txt index f4db2562175c..d8e381c1cb02 100644 --- a/Documentation/device-mapper/dm-queue-length.txt +++ b/Documentation/device-mapper/dm-queue-length.txt @@ -1,3 +1,4 @@ +=============== dm-queue-length =============== @@ -6,12 +7,18 @@ which selects a path with the least number of in-flight I/Os. The path selector name is 'queue-length'. Table parameters for each path: [] + +:: + : The number of I/Os to dispatch using the selected path before switching to the next path. If not given, internal default is used. To check the default value, see the activated table. Status for each path: + +:: + : 'A' if the path is active, 'F' if the path is failed. : The number of path failures. : The number of in-flight I/Os on the path. @@ -29,11 +36,13 @@ Examples ======== In case that 2 paths (sda and sdb) are used with repeat_count == 128. -# echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \ - dmsetup create test -# -# dmsetup table -test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128 -# -# dmsetup status -test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0 +:: + + # echo "0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128" \ + dmsetup create test + # + # dmsetup table + test: 0 10 multipath 0 0 1 1 queue-length 0 2 1 8:0 128 8:16 128 + # + # dmsetup status + test: 0 10 multipath 2 0 0 0 1 1 E 0 2 1 8:0 A 0 0 8:16 A 0 0 diff --git a/Documentation/device-mapper/dm-raid.txt b/Documentation/device-mapper/dm-raid.txt index 2355bef14653..2fe255b130fb 100644 --- a/Documentation/device-mapper/dm-raid.txt +++ b/Documentation/device-mapper/dm-raid.txt @@ -1,3 +1,4 @@ +======= dm-raid ======= @@ -8,49 +9,66 @@ interface. Mapping Table Interface ----------------------- -The target is named "raid" and it accepts the following parameters: +The target is named "raid" and it accepts the following parameters:: <#raid_params> \ <#raid_devs> [.. ] : + + ============= =============================================================== raid0 RAID0 striping (no resilience) raid1 RAID1 mirroring raid4 RAID4 with dedicated last parity disk raid5_n RAID5 with dedicated last parity disk supporting takeover Same as raid4 - -Transitory layout + + - Transitory layout raid5_la RAID5 left asymmetric + - rotating parity 0 with data continuation raid5_ra RAID5 right asymmetric + - rotating parity N with data continuation raid5_ls RAID5 left symmetric + - rotating parity 0 with data restart raid5_rs RAID5 right symmetric + - rotating parity N with data restart raid6_zr RAID6 zero restart + - rotating parity zero (left-to-right) with data restart raid6_nr RAID6 N restart + - rotating parity N (right-to-left) with data restart raid6_nc RAID6 N continue + - rotating parity N (right-to-left) with data continuation raid6_n_6 RAID6 with dedicate parity disks + - parity and Q-syndrome on the last 2 disks; layout for takeover from/to raid4/raid5_n raid6_la_6 Same as "raid_la" plus dedicated last Q-syndrome disk + - layout for takeover from raid5_la from/to raid6 raid6_ra_6 Same as "raid5_ra" dedicated last Q-syndrome disk + - layout for takeover from raid5_ra from/to raid6 raid6_ls_6 Same as "raid5_ls" dedicated last Q-syndrome disk + - layout for takeover from raid5_ls from/to raid6 raid6_rs_6 Same as "raid5_rs" dedicated last Q-syndrome disk + - layout for takeover from raid5_rs from/to raid6 raid10 Various RAID10 inspired algorithms chosen by additional params (see raid10_format and raid10_copies below) + - RAID10: Striped Mirrors (aka 'Striping on top of mirrors') - RAID1E: Integrated Adjacent Stripe Mirroring - RAID1E: Integrated Offset Stripe Mirroring - - and other similar RAID10 variants + - and other similar RAID10 variants + ============= =============================================================== Reference: Chapter 4 of http://www.snia.org/sites/default/files/SNIA_DDF_Technical_Position_v2.0.pdf @@ -58,33 +76,41 @@ The target is named "raid" and it accepts the following parameters: <#raid_params>: The number of parameters that follow. consists of + Mandatory parameters: - : Chunk size in sectors. This parameter is often known as + : + Chunk size in sectors. This parameter is often known as "stripe size". It is the only mandatory parameter and is placed first. followed by optional parameters (in any order): - [sync|nosync] Force or prevent RAID initialization. + [sync|nosync] + Force or prevent RAID initialization. - [rebuild ] Rebuild drive number 'idx' (first drive is 0). + [rebuild ] + Rebuild drive number 'idx' (first drive is 0). [daemon_sleep ] Interval between runs of the bitmap daemon that clear bits. A longer interval means less bitmap I/O but resyncing after a failure is likely to take longer. - [min_recovery_rate ] Throttle RAID initialization - [max_recovery_rate ] Throttle RAID initialization - [write_mostly ] Mark drive index 'idx' write-mostly. - [max_write_behind ] See '--write-behind=' (man mdadm) - [stripe_cache ] Stripe cache size (RAID 4/5/6 only) + [min_recovery_rate ] + Throttle RAID initialization + [max_recovery_rate ] + Throttle RAID initialization + [write_mostly ] + Mark drive index 'idx' write-mostly. + [max_write_behind ] + See '--write-behind=' (man mdadm) + [stripe_cache ] + Stripe cache size (RAID 4/5/6 only) [region_size ] The region_size multiplied by the number of regions is the logical size of the array. The bitmap records the device synchronisation state for each region. - [raid10_copies <# copies>] - [raid10_format ] + [raid10_copies <# copies>], [raid10_format ] These two options are used to alter the default layout of a RAID10 configuration. The number of copies is can be specified, but the default is 2. There are also three @@ -93,13 +119,17 @@ The target is named "raid" and it accepts the following parameters: respect to mirroring. If these options are left unspecified, or 'raid10_copies 2' and/or 'raid10_format near' are given, then the layouts for 2, 3 and 4 devices are: + + ======== ========== ============== 2 drives 3 drives 4 drives - -------- ---------- -------------- + ======== ========== ============== A1 A1 A1 A1 A2 A1 A1 A2 A2 A2 A2 A2 A3 A3 A3 A3 A4 A4 A3 A3 A4 A4 A5 A5 A5 A6 A6 A4 A4 A5 A6 A6 A7 A7 A8 A8 .. .. .. .. .. .. .. .. .. + ======== ========== ============== + The 2-device layout is equivalent 2-way RAID1. The 4-device layout is what a traditional RAID10 would look like. The 3-device layout is what might be called a 'RAID1E - Integrated @@ -107,8 +137,10 @@ The target is named "raid" and it accepts the following parameters: If 'raid10_copies 2' and 'raid10_format far', then the layouts for 2, 3 and 4 devices are: + + ======== ============ =================== 2 drives 3 drives 4 drives - -------- -------------- -------------------- + ======== ============ =================== A1 A2 A1 A2 A3 A1 A2 A3 A4 A3 A4 A4 A5 A6 A5 A6 A7 A8 A5 A6 A7 A8 A9 A9 A10 A11 A12 @@ -117,11 +149,14 @@ The target is named "raid" and it accepts the following parameters: A4 A3 A6 A4 A5 A6 A5 A8 A7 A6 A5 A9 A7 A8 A10 A9 A12 A11 .. .. .. .. .. .. .. .. .. + ======== ============ =================== If 'raid10_copies 2' and 'raid10_format offset', then the layouts for 2, 3 and 4 devices are: + + ======== ========== ================ 2 drives 3 drives 4 drives - -------- ------------ ----------------- + ======== ========== ================ A1 A2 A1 A2 A3 A1 A2 A3 A4 A2 A1 A3 A1 A2 A2 A1 A4 A3 A3 A4 A4 A5 A6 A5 A6 A7 A8 @@ -129,6 +164,8 @@ The target is named "raid" and it accepts the following parameters: A5 A6 A7 A8 A9 A9 A10 A11 A12 A6 A5 A9 A7 A8 A10 A9 A12 A11 .. .. .. .. .. .. .. .. .. + ======== ========== ================ + Here we see layouts closely akin to 'RAID1E - Integrated Offset Stripe Mirroring'. @@ -190,22 +227,25 @@ The target is named "raid" and it accepts the following parameters: Example Tables -------------- -# RAID4 - 4 data drives, 1 parity (no metadata devices) -# No metadata devices specified to hold superblock/bitmap info -# Chunk size of 1MiB -# (Lines separated for easy reading) -0 1960893648 raid \ - raid4 1 2048 \ - 5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81 +:: -# RAID4 - 4 data drives, 1 parity (with metadata devices) -# Chunk size of 1MiB, force RAID initialization, -# min recovery rate at 20 kiB/sec/disk + # RAID4 - 4 data drives, 1 parity (no metadata devices) + # No metadata devices specified to hold superblock/bitmap info + # Chunk size of 1MiB + # (Lines separated for easy reading) -0 1960893648 raid \ - raid4 4 2048 sync min_recovery_rate 20 \ - 5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82 + 0 1960893648 raid \ + raid4 1 2048 \ + 5 - 8:17 - 8:33 - 8:49 - 8:65 - 8:81 + + # RAID4 - 4 data drives, 1 parity (with metadata devices) + # Chunk size of 1MiB, force RAID initialization, + # min recovery rate at 20 kiB/sec/disk + + 0 1960893648 raid \ + raid4 4 2048 sync min_recovery_rate 20 \ + 5 8:17 8:18 8:33 8:34 8:49 8:50 8:65 8:66 8:81 8:82 Status Output @@ -219,41 +259,58 @@ Arguments that can be repeated are ordered by value. 'dmsetup status' yields information on the state and health of the array. The output is as follows (normally a single line, but expanded here for -clarity): -1: raid \ -2: <#devices> \ -3: +clarity):: + + 1: raid \ + 2: <#devices> \ + 3: Line 1 is the standard output produced by device-mapper. -Line 2 & 3 are produced by the raid target and are best explained by example: + +Line 2 & 3 are produced by the raid target and are best explained by example:: + 0 1960893648 raid raid4 5 AAAAA 2/490221568 init 0 + Here we can see the RAID type is raid4, there are 5 devices - all of which are 'A'live, and the array is 2/490221568 complete with its initial recovery. Here is a fuller description of the individual fields: + + =============== ========================================================= Same as the used to create the array. - One char for each device, indicating: 'A' = alive and - in-sync, 'a' = alive but not in-sync, 'D' = dead/failed. + One char for each device, indicating: + + - 'A' = alive and in-sync + - 'a' = alive but not in-sync + - 'D' = dead/failed. The ratio indicating how much of the array has undergone the process described by 'sync_action'. If the 'sync_action' is "check" or "repair", then the process of "resync" or "recover" can be considered complete. One of the following possible states: - idle - No synchronization action is being performed. - frozen - The current action has been halted. - resync - Array is undergoing its initial synchronization + + idle + - No synchronization action is being performed. + frozen + - The current action has been halted. + resync + - Array is undergoing its initial synchronization or is resynchronizing after an unclean shutdown (possibly aided by a bitmap). - recover - A device in the array is being rebuilt or + recover + - A device in the array is being rebuilt or replaced. - check - A user-initiated full check of the array is + check + - A user-initiated full check of the array is being performed. All blocks are read and checked for consistency. The number of discrepancies found are recorded in . No changes are made to the array by this action. - repair - The same as "check", but discrepancies are + repair + - The same as "check", but discrepancies are corrected. - reshape - The array is undergoing a reshape. + reshape + - The array is undergoing a reshape. The number of discrepancies found between mirror copies in RAID1/10 or wrong parity values found in RAID4/5/6. This value is valid only after a "check" of the array @@ -261,10 +318,11 @@ recovery. Here is a fuller description of the individual fields: The current data offset to the start of the user data on each component device of a raid set (see the respective raid parameter to support out-of-place reshaping). - 'A' - active write-through journal device. - 'a' - active write-back journal device. - 'D' - dead journal device. - '-' - no journal device. + - 'A' - active write-through journal device. + - 'a' - active write-back journal device. + - 'D' - dead journal device. + - '-' - no journal device. + =============== ========================================================= Message Interface @@ -272,12 +330,15 @@ Message Interface The dm-raid target will accept certain actions through the 'message' interface. ('man dmsetup' for more information on the message interface.) These actions include: - "idle" - Halt the current sync action. - "frozen" - Freeze the current sync action. - "resync" - Initiate/continue a resync. - "recover"- Initiate/continue a recover process. - "check" - Initiate a check (i.e. a "scrub") of the array. - "repair" - Initiate a repair of the array. + + ========= ================================================ + "idle" Halt the current sync action. + "frozen" Freeze the current sync action. + "resync" Initiate/continue a resync. + "recover" Initiate/continue a recover process. + "check" Initiate a check (i.e. a "scrub") of the array. + "repair" Initiate a repair of the array. + ========= ================================================ Discard Support @@ -307,48 +368,52 @@ increasingly whitelisted in the kernel and can thus be trusted. For trusted devices, the following dm-raid module parameter can be set to safely enable discard support for RAID 4/5/6: + 'devices_handle_discards_safely' Version History --------------- -1.0.0 Initial version. Support for RAID 4/5/6 -1.1.0 Added support for RAID 1 -1.2.0 Handle creation of arrays that contain failed devices. -1.3.0 Added support for RAID 10 -1.3.1 Allow device replacement/rebuild for RAID 10 -1.3.2 Fix/improve redundancy checking for RAID10 -1.4.0 Non-functional change. Removes arg from mapping function. -1.4.1 RAID10 fix redundancy validation checks (commit 55ebbb5). -1.4.2 Add RAID10 "far" and "offset" algorithm support. -1.5.0 Add message interface to allow manipulation of the sync_action. + +:: + + 1.0.0 Initial version. Support for RAID 4/5/6 + 1.1.0 Added support for RAID 1 + 1.2.0 Handle creation of arrays that contain failed devices. + 1.3.0 Added support for RAID 10 + 1.3.1 Allow device replacement/rebuild for RAID 10 + 1.3.2 Fix/improve redundancy checking for RAID10 + 1.4.0 Non-functional change. Removes arg from mapping function. + 1.4.1 RAID10 fix redundancy validation checks (commit 55ebbb5). + 1.4.2 Add RAID10 "far" and "offset" algorithm support. + 1.5.0 Add message interface to allow manipulation of the sync_action. New status (STATUSTYPE_INFO) fields: sync_action and mismatch_cnt. -1.5.1 Add ability to restore transiently failed devices on resume. -1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check". -1.6.0 Add discard support (and devices_handle_discard_safely module param). -1.7.0 Add support for MD RAID0 mappings. -1.8.0 Explicitly check for compatible flags in the superblock metadata + 1.5.1 Add ability to restore transiently failed devices on resume. + 1.5.2 'mismatch_cnt' is zero unless [last_]sync_action is "check". + 1.6.0 Add discard support (and devices_handle_discard_safely module param). + 1.7.0 Add support for MD RAID0 mappings. + 1.8.0 Explicitly check for compatible flags in the superblock metadata and reject to start the raid set if any are set by a newer target version, thus avoiding data corruption on a raid set with a reshape in progress. -1.9.0 Add support for RAID level takeover/reshape/region size + 1.9.0 Add support for RAID level takeover/reshape/region size and set size reduction. -1.9.1 Fix activation of existing RAID 4/10 mapped devices -1.9.2 Don't emit '- -' on the status table line in case the constructor + 1.9.1 Fix activation of existing RAID 4/10 mapped devices + 1.9.2 Don't emit '- -' on the status table line in case the constructor fails reading a superblock. Correctly emit 'maj:min1 maj:min2' and 'D' on the status line. If '- -' is passed into the constructor, emit '- -' on the table line and '-' as the status line health character. -1.10.0 Add support for raid4/5/6 journal device -1.10.1 Fix data corruption on reshape request -1.11.0 Fix table line argument order + 1.10.0 Add support for raid4/5/6 journal device + 1.10.1 Fix data corruption on reshape request + 1.11.0 Fix table line argument order (wrong raid10_copies/raid10_format sequence) -1.11.1 Add raid4/5/6 journal write-back support via journal_mode option -1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available -1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A') -1.13.1 Fix deadlock caused by early md_stop_writes(). Also fix size an + 1.11.1 Add raid4/5/6 journal write-back support via journal_mode option + 1.12.1 Fix for MD deadlock between mddev_suspend() and md_write_start() available + 1.13.0 Fix dev_health status at end of "recover" (was 'a', now 'A') + 1.13.1 Fix deadlock caused by early md_stop_writes(). Also fix size an state races. -1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen -1.14.0 Fix reshape race on small devices. Fix stripe adding reshape + 1.13.2 Fix raid redundancy validation and avoid keeping raid set frozen + 1.14.0 Fix reshape race on small devices. Fix stripe adding reshape deadlock/potential data corruption. Update superblock when specific devices are requested via rebuild. Fix RAID leg rebuild errors. diff --git a/Documentation/device-mapper/dm-service-time.txt b/Documentation/device-mapper/dm-service-time.txt index fb1d4a0cf122..facf277fc13c 100644 --- a/Documentation/device-mapper/dm-service-time.txt +++ b/Documentation/device-mapper/dm-service-time.txt @@ -1,3 +1,4 @@ +=============== dm-service-time =============== @@ -12,25 +13,34 @@ in a path-group, and it can be specified as a table argument. The path selector name is 'service-time'. -Table parameters for each path: [ []] - : The number of I/Os to dispatch using the selected +Table parameters for each path: + + [ []] + : + The number of I/Os to dispatch using the selected path before switching to the next path. If not given, internal default is used. To check the default value, see the activated table. - : The relative throughput value of the path + : + The relative throughput value of the path among all paths in the path-group. The valid range is 0-100. If not given, minimum value '1' is used. If '0' is given, the path isn't selected while other paths having a positive value are available. -Status for each path: \ - - : 'A' if the path is active, 'F' if the path is failed. - : The number of path failures. - : The size of in-flight I/Os on the path. - : The relative throughput value of the path - among all paths in the path-group. +Status for each path: + + + : + 'A' if the path is active, 'F' if the path is failed. + : + The number of path failures. + : + The size of in-flight I/Os on the path. + : + The relative throughput value of the path + among all paths in the path-group. Algorithm @@ -39,7 +49,7 @@ Algorithm dm-service-time adds the I/O size to 'in-flight-size' when the I/O is dispatched and subtracts when completed. Basically, dm-service-time selects a path having minimum service time -which is calculated by: +which is calculated by:: ('in-flight-size' + 'size-of-incoming-io') / 'relative_throughput' @@ -67,25 +77,25 @@ Examples ======== In case that 2 paths (sda and sdb) are used with repeat_count == 128 and sda has an average throughput 1GB/s and sdb has 4GB/s, -'relative_throughput' value may be '1' for sda and '4' for sdb. +'relative_throughput' value may be '1' for sda and '4' for sdb:: -# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \ - dmsetup create test -# -# dmsetup table -test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4 -# -# dmsetup status -test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4 + # echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4" \ + dmsetup create test + # + # dmsetup table + test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 1 8:16 128 4 + # + # dmsetup status + test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 1 8:16 A 0 0 4 -Or '2' for sda and '8' for sdb would be also true. +Or '2' for sda and '8' for sdb would be also true:: -# echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \ - dmsetup create test -# -# dmsetup table -test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8 -# -# dmsetup status -test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8 + # echo "0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8" \ + dmsetup create test + # + # dmsetup table + test: 0 10 multipath 0 0 1 1 service-time 0 2 2 8:0 128 2 8:16 128 8 + # + # dmsetup status + test: 0 10 multipath 2 0 0 0 1 1 E 0 2 2 8:0 A 0 0 2 8:16 A 0 0 8 diff --git a/Documentation/device-mapper/dm-uevent.txt b/Documentation/device-mapper/dm-uevent.txt index 07edbd85c714..4a8ee8d069c9 100644 --- a/Documentation/device-mapper/dm-uevent.txt +++ b/Documentation/device-mapper/dm-uevent.txt @@ -1,3 +1,7 @@ +==================== +device-mapper uevent +==================== + The device-mapper uevent code adds the capability to device-mapper to create and send kobject uevents (uevents). Previously device-mapper events were only available through the ioctl interface. The advantage of the uevents interface @@ -6,92 +10,101 @@ the event avoiding the need to query the state of the device-mapper device after the event is received. There are two functions currently for device-mapper events. The first function -listed creates the event and the second function sends the event(s). +listed creates the event and the second function sends the event(s):: -void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, - const char *path, unsigned nr_valid_paths) + void dm_path_uevent(enum dm_uevent_type event_type, struct dm_target *ti, + const char *path, unsigned nr_valid_paths) -void dm_send_uevents(struct list_head *events, struct kobject *kobj) + void dm_send_uevents(struct list_head *events, struct kobject *kobj) The variables added to the uevent environment are: Variable Name: DM_TARGET -Uevent Action(s): KOBJ_CHANGE -Type: string -Description: -Value: Name of device-mapper target that generated the event. +------------------------ +:Uevent Action(s): KOBJ_CHANGE +:Type: string +:Description: +:Value: Name of device-mapper target that generated the event. Variable Name: DM_ACTION -Uevent Action(s): KOBJ_CHANGE -Type: string -Description: -Value: Device-mapper specific action that caused the uevent action. - PATH_FAILED - A path has failed. +------------------------ +:Uevent Action(s): KOBJ_CHANGE +:Type: string +:Description: +:Value: Device-mapper specific action that caused the uevent action. + PATH_FAILED - A path has failed; PATH_REINSTATED - A path has been reinstated. Variable Name: DM_SEQNUM -Uevent Action(s): KOBJ_CHANGE -Type: unsigned integer -Description: A sequence number for this specific device-mapper device. -Value: Valid unsigned integer range. +------------------------ +:Uevent Action(s): KOBJ_CHANGE +:Type: unsigned integer +:Description: A sequence number for this specific device-mapper device. +:Value: Valid unsigned integer range. Variable Name: DM_PATH -Uevent Action(s): KOBJ_CHANGE -Type: string -Description: Major and minor number of the path device pertaining to this -event. -Value: Path name in the form of "Major:Minor" +---------------------- +:Uevent Action(s): KOBJ_CHANGE +:Type: string +:Description: Major and minor number of the path device pertaining to this + event. +:Value: Path name in the form of "Major:Minor" Variable Name: DM_NR_VALID_PATHS -Uevent Action(s): KOBJ_CHANGE -Type: unsigned integer -Description: -Value: Valid unsigned integer range. +-------------------------------- +:Uevent Action(s): KOBJ_CHANGE +:Type: unsigned integer +:Description: +:Value: Valid unsigned integer range. Variable Name: DM_NAME -Uevent Action(s): KOBJ_CHANGE -Type: string -Description: Name of the device-mapper device. -Value: Name +---------------------- +:Uevent Action(s): KOBJ_CHANGE +:Type: string +:Description: Name of the device-mapper device. +:Value: Name Variable Name: DM_UUID -Uevent Action(s): KOBJ_CHANGE -Type: string -Description: UUID of the device-mapper device. -Value: UUID. (Empty string if there isn't one.) +---------------------- +:Uevent Action(s): KOBJ_CHANGE +:Type: string +:Description: UUID of the device-mapper device. +:Value: UUID. (Empty string if there isn't one.) An example of the uevents generated as captured by udevmonitor is shown -below. +below -1.) Path failure. -UEVENT[1192521009.711215] change@/block/dm-3 -ACTION=change -DEVPATH=/block/dm-3 -SUBSYSTEM=block -DM_TARGET=multipath -DM_ACTION=PATH_FAILED -DM_SEQNUM=1 -DM_PATH=8:32 -DM_NR_VALID_PATHS=0 -DM_NAME=mpath2 -DM_UUID=mpath-35333333000002328 -MINOR=3 -MAJOR=253 -SEQNUM=1130 +1.) Path failure:: -2.) Path reinstate. -UEVENT[1192521132.989927] change@/block/dm-3 -ACTION=change -DEVPATH=/block/dm-3 -SUBSYSTEM=block -DM_TARGET=multipath -DM_ACTION=PATH_REINSTATED -DM_SEQNUM=2 -DM_PATH=8:32 -DM_NR_VALID_PATHS=1 -DM_NAME=mpath2 -DM_UUID=mpath-35333333000002328 -MINOR=3 -MAJOR=253 -SEQNUM=1131 + UEVENT[1192521009.711215] change@/block/dm-3 + ACTION=change + DEVPATH=/block/dm-3 + SUBSYSTEM=block + DM_TARGET=multipath + DM_ACTION=PATH_FAILED + DM_SEQNUM=1 + DM_PATH=8:32 + DM_NR_VALID_PATHS=0 + DM_NAME=mpath2 + DM_UUID=mpath-35333333000002328 + MINOR=3 + MAJOR=253 + SEQNUM=1130 + +2.) Path reinstate:: + + UEVENT[1192521132.989927] change@/block/dm-3 + ACTION=change + DEVPATH=/block/dm-3 + SUBSYSTEM=block + DM_TARGET=multipath + DM_ACTION=PATH_REINSTATED + DM_SEQNUM=2 + DM_PATH=8:32 + DM_NR_VALID_PATHS=1 + DM_NAME=mpath2 + DM_UUID=mpath-35333333000002328 + MINOR=3 + MAJOR=253 + SEQNUM=1131 diff --git a/Documentation/device-mapper/dm-zoned.txt b/Documentation/device-mapper/dm-zoned.txt index 736fcc78d193..07f56ebc1730 100644 --- a/Documentation/device-mapper/dm-zoned.txt +++ b/Documentation/device-mapper/dm-zoned.txt @@ -1,3 +1,4 @@ +======== dm-zoned ======== @@ -133,12 +134,13 @@ A zoned block device must first be formatted using the dmzadm tool. This will analyze the device zone configuration, determine where to place the metadata sets on the device and initialize the metadata sets. -Ex: +Ex:: -dmzadm --format /dev/sdxx + dmzadm --format /dev/sdxx For a formatted device, the target can be created normally with the dmsetup utility. The only parameter that dm-zoned requires is the -underlying zoned block device name. Ex: +underlying zoned block device name. Ex:: -echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | dmsetup create dmz-`basename ${dev}` + echo "0 `blockdev --getsize ${dev}` zoned ${dev}" | \ + dmsetup create dmz-`basename ${dev}` diff --git a/Documentation/device-mapper/era.txt b/Documentation/device-mapper/era.txt index 3c6d01be3560..90dd5c670b9f 100644 --- a/Documentation/device-mapper/era.txt +++ b/Documentation/device-mapper/era.txt @@ -1,3 +1,7 @@ +====== +dm-era +====== + Introduction ============ @@ -14,12 +18,14 @@ coherency after rolling back a vendor snapshot. Constructor =========== - era +era - metadata dev : fast device holding the persistent metadata - origin dev : device holding data blocks that may change - block size : block size of origin data device, granularity that is - tracked by the target + ================ ====================================================== + metadata dev fast device holding the persistent metadata + origin dev device holding data blocks that may change + block size block size of origin data device, granularity that is + tracked by the target + ================ ====================================================== Messages ======== @@ -49,14 +55,16 @@ Status <#used metadata blocks>/<#total metadata blocks> -metadata block size : Fixed block size for each metadata block in - sectors -#used metadata blocks : Number of metadata blocks used -#total metadata blocks : Total number of metadata blocks -current era : The current era -held metadata root : The location, in blocks, of the metadata root - that has been 'held' for userspace read - access. '-' indicates there is no held root +========================= ============================================== +metadata block size Fixed block size for each metadata block in + sectors +#used metadata blocks Number of metadata blocks used +#total metadata blocks Total number of metadata blocks +current era The current era +held metadata root The location, in blocks, of the metadata root + that has been 'held' for userspace read + access. '-' indicates there is no held root +========================= ============================================== Detailed use case ================= @@ -88,7 +96,7 @@ Memory usage The target uses a bitset to record writes in the current era. It also has a spare bitset ready for switching over to a new era. Other than -that it uses a few 4k blocks for updating metadata. +that it uses a few 4k blocks for updating metadata:: (4 * nr_blocks) bytes + buffers diff --git a/Documentation/device-mapper/kcopyd.txt b/Documentation/device-mapper/kcopyd.txt index 820382c4cecf..7651d395127f 100644 --- a/Documentation/device-mapper/kcopyd.txt +++ b/Documentation/device-mapper/kcopyd.txt @@ -1,3 +1,4 @@ +====== kcopyd ====== @@ -7,7 +8,7 @@ notification. It is used by dm-snapshot and dm-mirror. Users of kcopyd must first create a client and indicate how many memory pages to set aside for their copy jobs. This is done with a call to -kcopyd_client_create(). +kcopyd_client_create():: int kcopyd_client_create(unsigned int num_pages, struct kcopyd_client **result); @@ -16,7 +17,7 @@ To start a copy job, the user must set up io_region structures to describe the source and destinations of the copy. Each io_region indicates a block-device along with the starting sector and size of the region. The source of the copy is given as one io_region structure, and the destinations of the -copy are given as an array of io_region structures. +copy are given as an array of io_region structures:: struct io_region { struct block_device *bdev; @@ -26,7 +27,7 @@ copy are given as an array of io_region structures. To start the copy, the user calls kcopyd_copy(), passing in the client pointer, pointers to the source and destination io_regions, the name of a -completion callback routine, and a pointer to some context data for the copy. +completion callback routine, and a pointer to some context data for the copy:: int kcopyd_copy(struct kcopyd_client *kc, struct io_region *from, unsigned int num_dests, struct io_region *dests, @@ -41,7 +42,6 @@ write error occurred during the copy. When a user is done with all their copy jobs, they should call kcopyd_client_destroy() to delete the kcopyd client, which will release the -associated memory pages. +associated memory pages:: void kcopyd_client_destroy(struct kcopyd_client *kc); - diff --git a/Documentation/device-mapper/linear.txt b/Documentation/device-mapper/linear.txt index 7cb98d89d3f8..9d17fc6e64a9 100644 --- a/Documentation/device-mapper/linear.txt +++ b/Documentation/device-mapper/linear.txt @@ -1,3 +1,4 @@ +========= dm-linear ========= @@ -6,56 +7,57 @@ device onto a linear range of another device. This is the basic building block of logical volume managers. Parameters: - : Full pathname to the underlying block-device, or a - "major:minor" device-number. - : Starting sector within the device. + : + Full pathname to the underlying block-device, or a + "major:minor" device-number. + : + Starting sector within the device. Example scripts =============== -[[ -#!/bin/sh -# Create an identity mapping for a device -echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity -]] - - -[[ -#!/bin/sh -# Join 2 devices together -size1=`blockdev --getsz $1` -size2=`blockdev --getsz $2` -echo "0 $size1 linear $1 0 -$size1 $size2 linear $2 0" | dmsetup create joined -]] - - -[[ -#!/usr/bin/perl -w -# Split a device into 4M chunks and then join them together in reverse order. - -my $name = "reverse"; -my $extent_size = 4 * 1024 * 2; -my $dev = $ARGV[0]; -my $table = ""; -my $count = 0; - -if (!defined($dev)) { - die("Please specify a device.\n"); -} - -my $dev_size = `blockdev --getsz $dev`; -my $extents = int($dev_size / $extent_size) - - (($dev_size % $extent_size) ? 1 : 0); - -while ($extents > 0) { - my $this_start = $count * $extent_size; - $extents--; - $count++; - my $this_offset = $extents * $extent_size; - - $table .= "$this_start $extent_size linear $dev $this_offset\n"; -} - -`echo \"$table\" | dmsetup create $name`; -]] + +:: + + #!/bin/sh + # Create an identity mapping for a device + echo "0 `blockdev --getsz $1` linear $1 0" | dmsetup create identity + +:: + + #!/bin/sh + # Join 2 devices together + size1=`blockdev --getsz $1` + size2=`blockdev --getsz $2` + echo "0 $size1 linear $1 0 + $size1 $size2 linear $2 0" | dmsetup create joined + +:: + + #!/usr/bin/perl -w + # Split a device into 4M chunks and then join them together in reverse order. + + my $name = "reverse"; + my $extent_size = 4 * 1024 * 2; + my $dev = $ARGV[0]; + my $table = ""; + my $count = 0; + + if (!defined($dev)) { + die("Please specify a device.\n"); + } + + my $dev_size = `blockdev --getsz $dev`; + my $extents = int($dev_size / $extent_size) - + (($dev_size % $extent_size) ? 1 : 0); + + while ($extents > 0) { + my $this_start = $count * $extent_size; + $extents--; + $count++; + my $this_offset = $extents * $extent_size; + + $table .= "$this_start $extent_size linear $dev $this_offset\n"; + } + + `echo \"$table\" | dmsetup create $name`; diff --git a/Documentation/device-mapper/log-writes.txt b/Documentation/device-mapper/log-writes.txt index b638d124be6a..23141f2ffb7c 100644 --- a/Documentation/device-mapper/log-writes.txt +++ b/Documentation/device-mapper/log-writes.txt @@ -1,3 +1,4 @@ +============= dm-log-writes ============= @@ -25,11 +26,11 @@ completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to simulate the worst case scenario with regard to power failures. Consider the following example (W means write, C means complete): -W1,W2,W3,C3,C2,Wflush,C1,Cflush + W1,W2,W3,C3,C2,Wflush,C1,Cflush -The log would show the following +The log would show the following: -W3,W2,flush,W1.... + W3,W2,flush,W1.... Again this is to simulate what is actually on disk, this allows us to detect cases where a power failure at a particular point in time would create an @@ -42,11 +43,11 @@ Any REQ_OP_DISCARD requests are treated like WRITE requests. Otherwise we would have all the DISCARD requests, and then the WRITE requests and then the FLUSH request. Consider the following example: -WRITE block 1, DISCARD block 1, FLUSH + WRITE block 1, DISCARD block 1, FLUSH -If we logged DISCARD when it completed, the replay would look like this +If we logged DISCARD when it completed, the replay would look like this: -DISCARD 1, WRITE 1, FLUSH + DISCARD 1, WRITE 1, FLUSH which isn't quite what happened and wouldn't be caught during the log replay. @@ -57,15 +58,19 @@ i) Constructor log-writes - dev_path : Device that all of the IO will go to normally. - log_dev_path : Device where the log entries are written to. + ============= ============================================== + dev_path Device that all of the IO will go to normally. + log_dev_path Device where the log entries are written to. + ============= ============================================== ii) Status <#logged entries> - #logged entries : Number of logged entries - highest allocated sector : Highest allocated sector + =========================== ======================== + #logged entries Number of logged entries + highest allocated sector Highest allocated sector + =========================== ======================== iii) Messages @@ -75,15 +80,15 @@ iii) Messages For example say you want to fsck a file system after every write, but first you need to replay up to the mkfs to make sure we're fsck'ing something reasonable, you would do something like - this: + this:: mkfs.btrfs -f /dev/mapper/log dmsetup message log 0 mark mkfs - This would allow you to replay the log up to the mkfs mark and - then replay from that point on doing the fsck check in the - interval that you want. + This would allow you to replay the log up to the mkfs mark and + then replay from that point on doing the fsck check in the + interval that you want. Every log has a mark at the end labeled "dm-log-writes-end". @@ -97,42 +102,42 @@ Example usage ============= Say you want to test fsync on your file system. You would do something like -this: +this:: -TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" -dmsetup create log --table "$TABLE" -mkfs.btrfs -f /dev/mapper/log -dmsetup message log 0 mark mkfs + TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" + dmsetup create log --table "$TABLE" + mkfs.btrfs -f /dev/mapper/log + dmsetup message log 0 mark mkfs -mount /dev/mapper/log /mnt/btrfs-test - -dmsetup message log 0 mark fsync -md5sum /mnt/btrfs-test/foo -umount /mnt/btrfs-test + mount /dev/mapper/log /mnt/btrfs-test + + dmsetup message log 0 mark fsync + md5sum /mnt/btrfs-test/foo + umount /mnt/btrfs-test -dmsetup remove log -replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync -mount /dev/sdb /mnt/btrfs-test -md5sum /mnt/btrfs-test/foo - + dmsetup remove log + replay-log --log /dev/sdc --replay /dev/sdb --end-mark fsync + mount /dev/sdb /mnt/btrfs-test + md5sum /mnt/btrfs-test/foo + -Another option is to do a complicated file system operation and verify the file -system is consistent during the entire operation. You could do this with: + Another option is to do a complicated file system operation and verify the file + system is consistent during the entire operation. You could do this with: -TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" -dmsetup create log --table "$TABLE" -mkfs.btrfs -f /dev/mapper/log -dmsetup message log 0 mark mkfs + TABLE="0 $(blockdev --getsz /dev/sdb) log-writes /dev/sdb /dev/sdc" + dmsetup create log --table "$TABLE" + mkfs.btrfs -f /dev/mapper/log + dmsetup message log 0 mark mkfs -mount /dev/mapper/log /mnt/btrfs-test - -btrfs filesystem balance /mnt/btrfs-test -umount /mnt/btrfs-test -dmsetup remove log + mount /dev/mapper/log /mnt/btrfs-test + + btrfs filesystem balance /mnt/btrfs-test + umount /mnt/btrfs-test + dmsetup remove log -replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs -btrfsck /dev/sdb -replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ + replay-log --log /dev/sdc --replay /dev/sdb --end-mark mkfs + btrfsck /dev/sdb + replay-log --log /dev/sdc --replay /dev/sdb --start-mark mkfs \ --fsck "btrfsck /dev/sdb" --check fua And that will replay the log until it sees a FUA request, run the fsck command diff --git a/Documentation/device-mapper/persistent-data.txt b/Documentation/device-mapper/persistent-data.txt index a333bcb3a6c2..2065c3c5a091 100644 --- a/Documentation/device-mapper/persistent-data.txt +++ b/Documentation/device-mapper/persistent-data.txt @@ -1,3 +1,7 @@ +=============== +Persistent data +=============== + Introduction ============ diff --git a/Documentation/device-mapper/snapshot.txt b/Documentation/device-mapper/snapshot.txt index b8bbb516f989..4c53304e72f1 100644 --- a/Documentation/device-mapper/snapshot.txt +++ b/Documentation/device-mapper/snapshot.txt @@ -1,15 +1,16 @@ +============================== Device-mapper snapshot support ============================== Device-mapper allows you, without massive data copying: -*) To create snapshots of any block device i.e. mountable, saved states of -the block device which are also writable without interfering with the -original content; -*) To create device "forks", i.e. multiple different versions of the -same data stream. -*) To merge a snapshot of a block device back into the snapshot's origin -device. +- To create snapshots of any block device i.e. mountable, saved states of + the block device which are also writable without interfering with the + original content; +- To create device "forks", i.e. multiple different versions of the + same data stream. +- To merge a snapshot of a block device back into the snapshot's origin + device. In the first two cases, dm copies only the chunks of data that get changed and uses a separate copy-on-write (COW) block device for @@ -22,7 +23,7 @@ the origin device. There are three dm targets available: snapshot, snapshot-origin, and snapshot-merge. -*) snapshot-origin +- snapshot-origin which will normally have one or more snapshots based on it. Reads will be mapped directly to the backing device. For each write, the @@ -30,7 +31,7 @@ original data will be saved in the of each snapshot to keep its visible content unchanged, at least until the fills up. -*) snapshot +- snapshot A snapshot of the block device is created. Changed chunks of sectors will be stored on the . Writes will @@ -83,25 +84,25 @@ When you create the first LVM2 snapshot of a volume, four dm devices are used: source volume), whose table is replaced by a "snapshot-origin" mapping from device #1. -A fixed naming scheme is used, so with the following commands: +A fixed naming scheme is used, so with the following commands:: -lvcreate -L 1G -n base volumeGroup -lvcreate -L 100M --snapshot -n snap volumeGroup/base + lvcreate -L 1G -n base volumeGroup + lvcreate -L 100M --snapshot -n snap volumeGroup/base -we'll have this situation (with volumes in above order): +we'll have this situation (with volumes in above order):: -# dmsetup table|grep volumeGroup + # dmsetup table|grep volumeGroup -volumeGroup-base-real: 0 2097152 linear 8:19 384 -volumeGroup-snap-cow: 0 204800 linear 8:19 2097536 -volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16 -volumeGroup-base: 0 2097152 snapshot-origin 254:11 + volumeGroup-base-real: 0 2097152 linear 8:19 384 + volumeGroup-snap-cow: 0 204800 linear 8:19 2097536 + volumeGroup-snap: 0 2097152 snapshot 254:11 254:12 P 16 + volumeGroup-base: 0 2097152 snapshot-origin 254:11 -# ls -lL /dev/mapper/volumeGroup-* -brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real -brw------- 1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow -brw------- 1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap -brw------- 1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base + # ls -lL /dev/mapper/volumeGroup-* + brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real + brw------- 1 root root 254, 12 29 ago 18:15 /dev/mapper/volumeGroup-snap-cow + brw------- 1 root root 254, 13 29 ago 18:15 /dev/mapper/volumeGroup-snap + brw------- 1 root root 254, 10 29 ago 18:14 /dev/mapper/volumeGroup-base How snapshot-merge is used by LVM2 @@ -114,27 +115,28 @@ merging snapshot after it completes. The "snapshot" that hands over its COW device to the "snapshot-merge" is deactivated (unless using lvchange --refresh); but if it is left active it will simply return I/O errors. -A snapshot will merge into its origin with the following command: +A snapshot will merge into its origin with the following command:: -lvconvert --merge volumeGroup/snap + lvconvert --merge volumeGroup/snap -we'll now have this situation: +we'll now have this situation:: -# dmsetup table|grep volumeGroup + # dmsetup table|grep volumeGroup -volumeGroup-base-real: 0 2097152 linear 8:19 384 -volumeGroup-base-cow: 0 204800 linear 8:19 2097536 -volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16 + volumeGroup-base-real: 0 2097152 linear 8:19 384 + volumeGroup-base-cow: 0 204800 linear 8:19 2097536 + volumeGroup-base: 0 2097152 snapshot-merge 254:11 254:12 P 16 -# ls -lL /dev/mapper/volumeGroup-* -brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real -brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow -brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base + # ls -lL /dev/mapper/volumeGroup-* + brw------- 1 root root 254, 11 29 ago 18:15 /dev/mapper/volumeGroup-base-real + brw------- 1 root root 254, 12 29 ago 18:16 /dev/mapper/volumeGroup-base-cow + brw------- 1 root root 254, 10 29 ago 18:16 /dev/mapper/volumeGroup-base How to determine when a merging is complete =========================================== The snapshot-merge and snapshot status lines end with: + / Both and include both data and metadata. @@ -142,35 +144,37 @@ During merging, the number of sectors allocated gets smaller and smaller. Merging has finished when the number of sectors holding data is zero, in other words == . -Here is a practical example (using a hybrid of lvm and dmsetup commands): +Here is a practical example (using a hybrid of lvm and dmsetup commands):: -# lvs - LV VG Attr LSize Origin Snap% Move Log Copy% Convert - base volumeGroup owi-a- 4.00g - snap volumeGroup swi-a- 1.00g base 18.97 + # lvs + LV VG Attr LSize Origin Snap% Move Log Copy% Convert + base volumeGroup owi-a- 4.00g + snap volumeGroup swi-a- 1.00g base 18.97 -# dmsetup status volumeGroup-snap -0 8388608 snapshot 397896/2097152 1560 - ^^^^ metadata sectors + # dmsetup status volumeGroup-snap + 0 8388608 snapshot 397896/2097152 1560 + ^^^^ metadata sectors -# lvconvert --merge -b volumeGroup/snap - Merging of volume snap started. + # lvconvert --merge -b volumeGroup/snap + Merging of volume snap started. -# lvs volumeGroup/snap - LV VG Attr LSize Origin Snap% Move Log Copy% Convert - base volumeGroup Owi-a- 4.00g 17.23 + # lvs volumeGroup/snap + LV VG Attr LSize Origin Snap% Move Log Copy% Convert + base volumeGroup Owi-a- 4.00g 17.23 -# dmsetup status volumeGroup-base -0 8388608 snapshot-merge 281688/2097152 1104 + # dmsetup status volumeGroup-base + 0 8388608 snapshot-merge 281688/2097152 1104 -# dmsetup status volumeGroup-base -0 8388608 snapshot-merge 180480/2097152 712 + # dmsetup status volumeGroup-base + 0 8388608 snapshot-merge 180480/2097152 712 -# dmsetup status volumeGroup-base -0 8388608 snapshot-merge 16/2097152 16 + # dmsetup status volumeGroup-base + 0 8388608 snapshot-merge 16/2097152 16 Merging has finished. -# lvs - LV VG Attr LSize Origin Snap% Move Log Copy% Convert - base volumeGroup owi-a- 4.00g +:: + + # lvs + LV VG Attr LSize Origin Snap% Move Log Copy% Convert + base volumeGroup owi-a- 4.00g diff --git a/Documentation/device-mapper/statistics.txt b/Documentation/device-mapper/statistics.txt index 170ac02a1f50..3d80a9f850cc 100644 --- a/Documentation/device-mapper/statistics.txt +++ b/Documentation/device-mapper/statistics.txt @@ -1,3 +1,4 @@ +============= DM statistics ============= @@ -11,7 +12,7 @@ Individual statistics will be collected for each step-sized area within the range specified. The I/O statistics counters for each step-sized area of a region are -in the same format as /sys/block/*/stat or /proc/diskstats (see: +in the same format as `/sys/block/*/stat` or `/proc/diskstats` (see: Documentation/iostats.txt). But two extra counters (12 and 13) are provided: total time spent reading and writing. When the histogram argument is used, the 14th parameter is reported that represents the @@ -32,40 +33,45 @@ on each other's data. The creation of DM statistics will allocate memory via kmalloc or fallback to using vmalloc space. At most, 1/4 of the overall system memory may be allocated by DM statistics. The admin can see how much -memory is used by reading -/sys/module/dm_mod/parameters/stats_current_allocated_bytes +memory is used by reading: + + /sys/module/dm_mod/parameters/stats_current_allocated_bytes Messages ======== - @stats_create - [ ...] - [ []] - + @stats_create [ ...] [ []] Create a new region and return the region_id. - "-" - whole device - "+" - a range of 512-byte sectors - starting with . + "-" + whole device + "+" + a range of 512-byte sectors + starting with . - "" - the range is subdivided into areas each containing - sectors. - "/" - the range is subdivided into the specified - number of areas. + "" + the range is subdivided into areas each containing + sectors. + "/" + the range is subdivided into the specified + number of areas. The number of optional arguments - The following optional arguments are supported - precise_timestamps - use precise timer with nanosecond resolution + The following optional arguments are supported: + + precise_timestamps + use precise timer with nanosecond resolution instead of the "jiffies" variable. When this argument is used, the resulting times are in nanoseconds instead of milliseconds. Precise timestamps are a little bit slower to obtain than jiffies-based timestamps. - histogram:n1,n2,n3,n4,... - collect histogram of latencies. The + histogram:n1,n2,n3,n4,... + collect histogram of latencies. The numbers n1, n2, etc are times that represent the boundaries of the histogram. If precise_timestamps is not used, the times are in milliseconds, otherwise they are in @@ -96,21 +102,18 @@ Messages @stats_list message, but it doesn't use this value for anything. @stats_delete - Delete the region with the specified id. region_id returned from @stats_create @stats_clear - Clear all the counters except the in-flight i/o counters. region_id returned from @stats_create @stats_list [] - List all regions registered with @stats_create. @@ -127,7 +130,6 @@ Messages if they were specified when creating the region. @stats_print [ ] - Print counters for each step-sized area of a region. @@ -143,10 +145,11 @@ Messages Output format for each step-sized area of a region: - + counters + + + counters The first 11 counters have the same meaning as - /sys/block/*/stat or /proc/diskstats. + `/sys/block/*/stat or /proc/diskstats`. Please refer to Documentation/iostats.txt for details. @@ -163,11 +166,11 @@ Messages 11. the weighted number of milliseconds spent doing I/Os Additional counters: + 12. the total time spent reading in milliseconds 13. the total time spent writing in milliseconds @stats_print_clear [ ] - Atomically print and then clear all the counters except the in-flight i/o counters. Useful when the client consuming the statistics does not want to lose any statistics (those updated @@ -185,7 +188,6 @@ Messages If omitted, all lines are printed and then cleared. @stats_set_aux - Store auxiliary data aux_data for the specified region. @@ -201,23 +203,23 @@ Examples ======== Subdivide the DM device 'vol' into 100 pieces and start collecting -statistics on them: +statistics on them:: dmsetup message vol 0 @stats_create - /100 Set the auxiliary data string to "foo bar baz" (the escape for each -space must also be escaped, otherwise the shell will consume them): +space must also be escaped, otherwise the shell will consume them):: dmsetup message vol 0 @stats_set_aux 0 foo\\ bar\\ baz -List the statistics: +List the statistics:: dmsetup message vol 0 @stats_list -Print the statistics: +Print the statistics:: dmsetup message vol 0 @stats_print 0 -Delete the statistics: +Delete the statistics:: dmsetup message vol 0 @stats_delete 0 diff --git a/Documentation/device-mapper/striped.txt b/Documentation/device-mapper/striped.txt index 07ec492cceee..e9a8da192ae1 100644 --- a/Documentation/device-mapper/striped.txt +++ b/Documentation/device-mapper/striped.txt @@ -1,3 +1,4 @@ +========= dm-stripe ========= @@ -8,12 +9,16 @@ potentially provide improved I/O throughput by utilizing several physical devices in parallel. Parameters: [ ]+ - : Number of underlying devices. - : Size of each chunk of data. Must be at least as - large as the system's PAGE_SIZE. - : Full pathname to the underlying block-device, or a - "major:minor" device-number. - : Starting sector within the device. + : + Number of underlying devices. + : + Size of each chunk of data. Must be at least as + large as the system's PAGE_SIZE. + : + Full pathname to the underlying block-device, or a + "major:minor" device-number. + : + Starting sector within the device. One or more underlying devices can be specified. The striped device size must be a multiple of the chunk size multiplied by the number of underlying devices. @@ -22,36 +27,35 @@ be a multiple of the chunk size multiplied by the number of underlying devices. Example scripts =============== -[[ -#!/usr/bin/perl -w -# Create a striped device across any number of underlying devices. The device -# will be called "stripe_dev" and have a chunk-size of 128k. +:: -my $chunk_size = 128 * 2; -my $dev_name = "stripe_dev"; -my $num_devs = @ARGV; -my @devs = @ARGV; -my ($min_dev_size, $stripe_dev_size, $i); + #!/usr/bin/perl -w + # Create a striped device across any number of underlying devices. The device + # will be called "stripe_dev" and have a chunk-size of 128k. -if (!$num_devs) { - die("Specify at least one device\n"); -} + my $chunk_size = 128 * 2; + my $dev_name = "stripe_dev"; + my $num_devs = @ARGV; + my @devs = @ARGV; + my ($min_dev_size, $stripe_dev_size, $i); -$min_dev_size = `blockdev --getsz $devs[0]`; -for ($i = 1; $i < $num_devs; $i++) { - my $this_size = `blockdev --getsz $devs[$i]`; - $min_dev_size = ($min_dev_size < $this_size) ? - $min_dev_size : $this_size; -} + if (!$num_devs) { + die("Specify at least one device\n"); + } -$stripe_dev_size = $min_dev_size * $num_devs; -$stripe_dev_size -= $stripe_dev_size % ($chunk_size * $num_devs); + $min_dev_size = `blockdev --getsz $devs[0]`; + for ($i = 1; $i < $num_devs; $i++) { + my $this_size = `blockdev --getsz $devs[$i]`; + $min_dev_size = ($min_dev_size < $this_size) ? + $min_dev_size : $this_size; + } -$table = "0 $stripe_dev_size striped $num_devs $chunk_size"; -for ($i = 0; $i < $num_devs; $i++) { - $table .= " $devs[$i] 0"; -} + $stripe_dev_size = $min_dev_size * $num_devs; + $stripe_dev_size -= $stripe_dev_size % ($chunk_size * $num_devs); -`echo $table | dmsetup create $dev_name`; -]] + $table = "0 $stripe_dev_size striped $num_devs $chunk_size"; + for ($i = 0; $i < $num_devs; $i++) { + $table .= " $devs[$i] 0"; + } + `echo $table | dmsetup create $dev_name`; diff --git a/Documentation/device-mapper/switch.txt b/Documentation/device-mapper/switch.txt index 5bd4831db4a8..7dde06be1a4f 100644 --- a/Documentation/device-mapper/switch.txt +++ b/Documentation/device-mapper/switch.txt @@ -1,3 +1,4 @@ +========= dm-switch ========= @@ -67,27 +68,25 @@ b-tree can achieve. Construction Parameters ======================= - [...] - [ ]+ + [...] [ ]+ + + The number of paths across which to distribute the I/O. - - The number of paths across which to distribute the I/O. + + The number of 512-byte sectors in a region. Each region can be redirected + to any of the available paths. - - The number of 512-byte sectors in a region. Each region can be redirected - to any of the available paths. + + The number of optional arguments. Currently, no optional arguments + are supported and so this must be zero. - - The number of optional arguments. Currently, no optional arguments - are supported and so this must be zero. + + The block device that represents a specific path to the device. - - The block device that represents a specific path to the device. - - - The offset of the start of data on the specific (in units - of 512-byte sectors). This number is added to the sector number when - forwarding the request to the specific path. Typically it is zero. + + The offset of the start of data on the specific (in units + of 512-byte sectors). This number is added to the sector number when + forwarding the request to the specific path. Typically it is zero. Messages ======== @@ -122,17 +121,21 @@ Example Assume that you have volumes vg1/switch0 vg1/switch1 vg1/switch2 with the same size. -Create a switch device with 64kB region size: +Create a switch device with 64kB region size:: + dmsetup create switch --table "0 `blockdev --getsz /dev/vg1/switch0` switch 3 128 0 /dev/vg1/switch0 0 /dev/vg1/switch1 0 /dev/vg1/switch2 0" Set mappings for the first 7 entries to point to devices switch0, switch1, -switch2, switch0, switch1, switch2, switch1: +switch2, switch0, switch1, switch2, switch1:: + dmsetup message switch 0 set_region_mappings 0:0 :1 :2 :0 :1 :2 :1 -Set repetitive mapping. This command: +Set repetitive mapping. This command:: + dmsetup message switch 0 set_region_mappings 1000:1 :2 R2,10 -is equivalent to: + +is equivalent to:: + dmsetup message switch 0 set_region_mappings 1000:1 :2 :1 :2 :1 :2 :1 :2 \ :1 :2 :1 :2 :1 :2 :1 :2 :1 :2 - diff --git a/Documentation/device-mapper/thin-provisioning.txt b/Documentation/device-mapper/thin-provisioning.txt index 883e7ca5f745..bafebf79da4b 100644 --- a/Documentation/device-mapper/thin-provisioning.txt +++ b/Documentation/device-mapper/thin-provisioning.txt @@ -1,3 +1,7 @@ +================= +Thin provisioning +================= + Introduction ============ @@ -95,6 +99,8 @@ previously.) Using an existing pool device ----------------------------- +:: + dmsetup create pool \ --table "0 20971520 thin-pool $metadata_dev $data_dev \ $data_block_size $low_water_mark" @@ -154,7 +160,7 @@ Thin provisioning i) Creating a new thinly-provisioned volume. To create a new thinly- provisioned volume you must send a message to an - active pool device, /dev/mapper/pool in this example. + active pool device, /dev/mapper/pool in this example:: dmsetup message /dev/mapper/pool 0 "create_thin 0" @@ -164,7 +170,7 @@ i) Creating a new thinly-provisioned volume. ii) Using a thinly-provisioned volume. - Thinly-provisioned volumes are activated using the 'thin' target: + Thinly-provisioned volumes are activated using the 'thin' target:: dmsetup create thin --table "0 2097152 thin /dev/mapper/pool 0" @@ -181,6 +187,8 @@ i) Creating an internal snapshot. must suspend it before creating the snapshot to avoid corruption. This is NOT enforced at the moment, so please be careful! + :: + dmsetup suspend /dev/mapper/thin dmsetup message /dev/mapper/pool 0 "create_snap 1 0" dmsetup resume /dev/mapper/thin @@ -198,14 +206,14 @@ ii) Using an internal snapshot. activating or removing them both. (This differs from conventional device-mapper snapshots.) - Activate it exactly the same way as any other thinly-provisioned volume: + Activate it exactly the same way as any other thinly-provisioned volume:: dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 1" External snapshots ------------------ -You can use an external _read only_ device as an origin for a +You can use an external **read only** device as an origin for a thinly-provisioned volume. Any read to an unprovisioned area of the thin device will be passed through to the origin. Writes trigger the allocation of new blocks as usual. @@ -223,11 +231,13 @@ i) Creating a snapshot of an external device This is the same as creating a thin device. You don't mention the origin at this stage. + :: + dmsetup message /dev/mapper/pool 0 "create_thin 0" ii) Using a snapshot of an external device. - Append an extra parameter to the thin target specifying the origin: + Append an extra parameter to the thin target specifying the origin:: dmsetup create snap --table "0 2097152 thin /dev/mapper/pool 0 /dev/image" @@ -240,6 +250,8 @@ Deactivation All devices using a pool must be deactivated before the pool itself can be. +:: + dmsetup remove thin dmsetup remove snap dmsetup remove pool @@ -252,25 +264,32 @@ Reference i) Constructor - thin-pool \ - [ []*] + :: + + thin-pool \ + [ []*] Optional feature arguments: - skip_block_zeroing: Skip the zeroing of newly-provisioned blocks. + skip_block_zeroing: + Skip the zeroing of newly-provisioned blocks. - ignore_discard: Disable discard support. + ignore_discard: + Disable discard support. - no_discard_passdown: Don't pass discards down to the underlying - data device, but just remove the mapping. + no_discard_passdown: + Don't pass discards down to the underlying + data device, but just remove the mapping. - read_only: Don't allow any changes to be made to the pool + read_only: + Don't allow any changes to be made to the pool metadata. This mode is only available after the thin-pool has been created and first used in full read/write mode. It cannot be specified on initial thin-pool creation. - error_if_no_space: Error IOs, instead of queueing, if no space. + error_if_no_space: + Error IOs, instead of queueing, if no space. Data block size must be between 64KB (128 sectors) and 1GB (2097152 sectors) inclusive. @@ -278,10 +297,12 @@ i) Constructor ii) Status - / - / - ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space - needs_check|- metadata_low_watermark + :: + + / + / + ro|rw|out_of_data_space [no_]discard_passdown [error|queue]_if_no_space + needs_check|- metadata_low_watermark transaction id: A 64-bit number used by userspace to help synchronise with metadata @@ -336,13 +357,11 @@ ii) Status iii) Messages create_thin - Create a new thinly-provisioned device. is an arbitrary unique 24-bit identifier chosen by the caller. create_snap - Create a new snapshot of another thinly-provisioned device. is an arbitrary unique 24-bit identifier chosen by the caller. @@ -350,11 +369,9 @@ iii) Messages of which the new device will be a snapshot. delete - Deletes a thin device. Irreversible. set_transaction_id - Userland volume managers, such as LVM, need a way to synchronise their external metadata with the internal metadata of the pool target. The thin-pool target offers to store an @@ -364,14 +381,12 @@ iii) Messages compare-and-swap message. reserve_metadata_snap - Reserve a copy of the data mapping btree for use by userland. This allows userland to inspect the mappings as they were when this message was executed. Use the pool's status command to get the root block associated with the metadata snapshot. release_metadata_snap - Release a previously reserved copy of the data mapping btree. 'thin' target @@ -379,7 +394,9 @@ iii) Messages i) Constructor - thin [] + :: + + thin [] pool dev: the thin-pool device, e.g. /dev/mapper/my_pool or 253:0 @@ -401,8 +418,7 @@ provisioned as and when needed. ii) Status - - + If the pool has encountered device errors and failed, the status will just contain the string 'Fail'. The userspace recovery tools should then be used. diff --git a/Documentation/device-mapper/unstriped.txt b/Documentation/device-mapper/unstriped.txt index 0b2a306c54ee..0a8d3eb3f072 100644 --- a/Documentation/device-mapper/unstriped.txt +++ b/Documentation/device-mapper/unstriped.txt @@ -1,3 +1,7 @@ +================================ +Device-mapper "unstriped" target +================================ + Introduction ============ @@ -34,46 +38,46 @@ striped target to combine the 4 devices into one. It then will use the unstriped target ontop of the striped device to access the individual backing loop devices. We write data to the newly exposed unstriped devices and verify the data written matches the correct -underlying device on the striped array. - -#!/bin/bash - -MEMBER_SIZE=$((128 * 1024 * 1024)) -NUM=4 -SEQ_END=$((${NUM}-1)) -CHUNK=256 -BS=4096 - -RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512)) -DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}" -COUNT=$((${MEMBER_SIZE} / ${BS})) - -for i in $(seq 0 ${SEQ_END}); do - dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct - losetup /dev/loop${i} member-${i} - DM_PARMS+=" /dev/loop${i} 0" -done - -echo $DM_PARMS | dmsetup create raid0 -for i in $(seq 0 ${SEQ_END}); do - echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i} -done; - -for i in $(seq 0 ${SEQ_END}); do - dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct - diff /dev/mapper/set-${i} member-${i} -done; - -for i in $(seq 0 ${SEQ_END}); do - dmsetup remove set-${i} -done - -dmsetup remove raid0 - -for i in $(seq 0 ${SEQ_END}); do - losetup -d /dev/loop${i} - rm -f member-${i} -done +underlying device on the striped array:: + + #!/bin/bash + + MEMBER_SIZE=$((128 * 1024 * 1024)) + NUM=4 + SEQ_END=$((${NUM}-1)) + CHUNK=256 + BS=4096 + + RAID_SIZE=$((${MEMBER_SIZE}*${NUM}/512)) + DM_PARMS="0 ${RAID_SIZE} striped ${NUM} ${CHUNK}" + COUNT=$((${MEMBER_SIZE} / ${BS})) + + for i in $(seq 0 ${SEQ_END}); do + dd if=/dev/zero of=member-${i} bs=${MEMBER_SIZE} count=1 oflag=direct + losetup /dev/loop${i} member-${i} + DM_PARMS+=" /dev/loop${i} 0" + done + + echo $DM_PARMS | dmsetup create raid0 + for i in $(seq 0 ${SEQ_END}); do + echo "0 1 unstriped ${NUM} ${CHUNK} ${i} /dev/mapper/raid0 0" | dmsetup create set-${i} + done; + + for i in $(seq 0 ${SEQ_END}); do + dd if=/dev/urandom of=/dev/mapper/set-${i} bs=${BS} count=${COUNT} oflag=direct + diff /dev/mapper/set-${i} member-${i} + done; + + for i in $(seq 0 ${SEQ_END}); do + dmsetup remove set-${i} + done + + dmsetup remove raid0 + + for i in $(seq 0 ${SEQ_END}); do + losetup -d /dev/loop${i} + rm -f member-${i} + done Another example --------------- @@ -81,7 +85,7 @@ Another example Intel NVMe drives contain two cores on the physical device. Each core of the drive has segregated access to its LBA range. The current LBA model has a RAID 0 128k chunk on each core, resulting -in a 256k stripe across the two cores: +in a 256k stripe across the two cores:: Core 0: Core 1: __________ __________ @@ -108,17 +112,24 @@ Example dmsetup usage unstriped ontop of Intel NVMe device that has 2 cores ----------------------------------------------------- -dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0' -dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0' + +:: + + dmsetup create nvmset0 --table '0 512 unstriped 2 256 0 /dev/nvme0n1 0' + dmsetup create nvmset1 --table '0 512 unstriped 2 256 1 /dev/nvme0n1 0' There will now be two devices that expose Intel NVMe core 0 and 1 -respectively: -/dev/mapper/nvmset0 -/dev/mapper/nvmset1 +respectively:: + + /dev/mapper/nvmset0 + /dev/mapper/nvmset1 unstriped ontop of striped with 4 drives using 128K chunk size -------------------------------------------------------------- -dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0' -dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0' -dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0' -dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0' + +:: + + dmsetup create raid_disk0 --table '0 512 unstriped 4 256 0 /dev/mapper/striped 0' + dmsetup create raid_disk1 --table '0 512 unstriped 4 256 1 /dev/mapper/striped 0' + dmsetup create raid_disk2 --table '0 512 unstriped 4 256 2 /dev/mapper/striped 0' + dmsetup create raid_disk3 --table '0 512 unstriped 4 256 3 /dev/mapper/striped 0' diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt index b3d2e4a42255..a4d1c1476d72 100644 --- a/Documentation/device-mapper/verity.txt +++ b/Documentation/device-mapper/verity.txt @@ -1,5 +1,6 @@ +========= dm-verity -========== +========= Device-Mapper's "verity" target provides transparent integrity checking of block devices using a cryptographic digest provided by the kernel crypto API. @@ -7,6 +8,9 @@ This target is read-only. Construction Parameters ======================= + +:: + @@ -160,7 +164,9 @@ calculating the parent node. The tree looks something like: -alg = sha256, num_blocks = 32768, block_size = 4096 + alg = sha256, num_blocks = 32768, block_size = 4096 + +:: [ root ] / . . . \ @@ -189,6 +195,7 @@ block boundary) are the hash blocks which are stored a depth at a time The full specification of kernel parameters and on-disk metadata format is available at the cryptsetup project's wiki page + https://gitlab.com/cryptsetup/cryptsetup/wikis/DMVerity Status @@ -198,7 +205,8 @@ If any check failed, C (for Corruption) is returned. Example ======= -Set up a device: +Set up a device:: + # dmsetup create vroot --readonly --table \ "0 2097152 verity 1 /dev/sda1 /dev/sda2 4096 4096 262144 1 sha256 "\ "4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 "\ @@ -209,11 +217,13 @@ the hash tree or activate the kernel device. This is available from the cryptsetup upstream repository https://gitlab.com/cryptsetup/cryptsetup/ (as a libcryptsetup extension). -Create hash on the device: +Create hash on the device:: + # veritysetup format /dev/sda1 /dev/sda2 ... Root hash: 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 -Activate the device: +Activate the device:: + # veritysetup create vroot /dev/sda1 /dev/sda2 \ 4392712ba01368efdf14b05c76f9e4df0d53664630b5d48632ed17a137f39076 diff --git a/Documentation/device-mapper/writecache.txt b/Documentation/device-mapper/writecache.txt index 01532b3008ae..d3d7690f5e8d 100644 --- a/Documentation/device-mapper/writecache.txt +++ b/Documentation/device-mapper/writecache.txt @@ -1,3 +1,7 @@ +================= +Writecache target +================= + The writecache target caches writes on persistent memory or on SSD. It doesn't cache reads because reads are supposed to be cached in page cache in normal RAM. @@ -6,15 +10,18 @@ When the device is constructed, the first sector should be zeroed or the first sector should contain valid superblock from previous invocation. Constructor parameters: + 1. type of the cache device - "p" or "s" - p - persistent memory - s - SSD + + - p - persistent memory + - s - SSD 2. the underlying device that will be cached 3. the cache device 4. block size (4096 is recommended; the maximum block size is the page size) 5. the number of optional parameters (the parameters with an argument count as two) + start_sector n (default: 0) offset from the start of cache device in 512-byte sectors high_watermark n (default: 50) @@ -43,6 +50,7 @@ Constructor parameters: applicable only to persistent memory - don't use the FUA flag when writing back data and send the FLUSH request afterwards + - some underlying devices perform better with fua, some with nofua. The user should test it @@ -60,6 +68,7 @@ Messages: flush the cache device on next suspend. Use this message when you are going to remove the cache device. The proper sequence for removing the cache device is: + 1. send the "flush_on_suspend" message 2. load an inactive table with a linear target that maps to the underlying device diff --git a/Documentation/device-mapper/zero.txt b/Documentation/device-mapper/zero.txt index 20fb38e7fa7e..11fb5cf4597c 100644 --- a/Documentation/device-mapper/zero.txt +++ b/Documentation/device-mapper/zero.txt @@ -1,3 +1,4 @@ +======= dm-zero ======= @@ -18,20 +19,19 @@ filesystem limitations. To create a sparse device, start by creating a dm-zero device that's the desired size of the sparse device. For this example, we'll assume a 10TB -sparse device. +sparse device:: -TEN_TERABYTES=`expr 10 \* 1024 \* 1024 \* 1024 \* 2` # 10 TB in sectors -echo "0 $TEN_TERABYTES zero" | dmsetup create zero1 + TEN_TERABYTES=`expr 10 \* 1024 \* 1024 \* 1024 \* 2` # 10 TB in sectors + echo "0 $TEN_TERABYTES zero" | dmsetup create zero1 Then create a snapshot of the zero device, using any available block-device as the COW device. The size of the COW device will determine the amount of real space available to the sparse device. For this example, we'll assume /dev/sdb1 -is an available 10GB partition. +is an available 10GB partition:: -echo "0 $TEN_TERABYTES snapshot /dev/mapper/zero1 /dev/sdb1 p 128" | \ - dmsetup create sparse1 + echo "0 $TEN_TERABYTES snapshot /dev/mapper/zero1 /dev/sdb1 p 128" | \ + dmsetup create sparse1 This will create a 10TB sparse device called /dev/mapper/sparse1 that has 10GB of actual storage space available. If more than 10GB of data is written to this device, it will start returning I/O errors. - -- 2.20.1