Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756254Ab2B2VQr (ORCPT ); Wed, 29 Feb 2012 16:16:47 -0500 Received: from mx1.redhat.com ([209.132.183.28]:44639 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756199Ab2B2VQp (ORCPT ); Wed, 29 Feb 2012 16:16:45 -0500 Date: Wed, 29 Feb 2012 16:16:24 -0500 (EST) From: Mikulas Patocka X-X-Sender: mpatocka@hs20-bc2-1.build.redhat.com To: Mandeep Singh Baines cc: Alasdair G Kergon , dm-devel@redhat.com, linux-kernel@vger.kernel.org, Will Drewry , Elly Jones , Milan Broz , Olof Johansson , Steffen Klassert , Andrew Morton Subject: Re: [PATCH] dm: verity target In-Reply-To: <1330469872-23262-1-git-send-email-msb@chromium.org> Message-ID: References: <1330469872-23262-1-git-send-email-msb@chromium.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 59188 Lines: 1772 Hi This crashes if the device size is 64MiB (and sha256 hash is used). I tested it with the userspace utility and it doesn't work with device >= 128MiB, it fails to verify the output of the utility. I run this (/dev/vg1/verity_long_data has 128MiB size): ./verity mode=create alg=sha256 payload=/dev/vg1/verity_long_data hashtree=/dev/vg1/verity_long_hash salt=1234000000000000000000000000000000000000000000000000000000000000 dm:dm bht[DEBUG] Setting block_count 32768 dm:dm bht[DEBUG] Setting depth to 3. dm:dm bht[DEBUG] depth: 0 entries: 1 dm:dm bht[DEBUG] depth: 1 entries: 2 dm:dm bht[DEBUG] depth: 2 entries: 256 0 262144 verity payload=ROOT_DEV hashtree=HASH_DEV hashstart=262144 alg=sha256 root_hexdigest=6e46e106b288812a881a9da3f11180433f90ce264c4f1e8fa191fb40409846fb salt=1234000000000000000000000000000000000000000000000000000000000000 dmsetup -r create verity --table "0 `blockdev --getsize /dev/vg1/verity_long_data` verity 0 /dev/vg1/verity_long_data /dev/vg1/verity_long_hash 0 4096 sha256 6e46e106b288812a881a9da3f11180433f90ce264c4f1e8fa191fb40409846fb 1234000000000000000000000000000000000000000000000000000000000000" and get a lot of log messages "failed to verify hash (d=3,bi=0)" If the device is smaller than 128MiB, it works (except for 64MiB device where it crashes). Mikulas dmsetup -r create verity --table "0 131072 verity 0 /dev/vg1/verity_long_data /dev/vg1/verity_long_hash 0 4096 sha256 d821fec17e151a6e7b91c4a7a71487760185c25af49a74955a3e7a718c1f97dd 1234000000000000000000000000000000000000000000000000000000000000" [14356.819694] ------------[ cut here ]------------ [14356.819747] kernel BUG at drivers/md/dm-verity2.c:439! [14356.819796] invalid opcode: 0000 [#1] PREEMPT SMP [14356.819879] CPU 5 [14356.819897] Modules linked in: dm_verity2 md5 sha1_generic cryptomgr aead sha256_generic dm_zero dm_bufio crypto_hash crypto_algapi crypto dm_loop dm_mod parport_pc parport powernow_k8 mperf cpufreq_stats cpufreq_powersave cpufreq_userspace cpufreq_conservative cpufreq_ondemand freq_table snd_usb_audio snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd_page_alloc snd_hwdep snd_usbmidi_lib snd_rawmidi snd soundcore fuse raid0 md_mod lm85 hwmon_vid ide_cd_mod cdrom ohci_hcd ehci_hcd sata_svw libata serverworks ide_core usbcore floppy usb_common rtc_cmos e100 tg3 mii libphy k10temp skge hwmon button i2c_piix4 processor unix [last unloaded: dm_verity2] [14356.820536] [14356.820580] Pid: 10909, comm: dmsetup Not tainted 3.2.0 #19 empty empty/S3992-E [14356.820680] RIP: 0010:[] [] verity_ctr+0x808/0x860 [dm_verity2] [14356.820772] RSP: 0018:ffff880141b7bcc8 EFLAGS: 00010202 [14356.820821] RAX: 0000000000000408 RBX: ffff8802afedcc28 RCX: 0000000000000002[14356.820875] RDX: 0000000000000081 RSI: ffff88044685a540 RDI: ffff8802afc4b000[14356.820929] RBP: ffff8802afedcc00 R08: 0000000000000000 R09: ffff8802afc4a000[14356.820987] R10: 0000000000000001 R11: ffffff9000000018 R12: ffffffff8147a630[14356.821041] R13: ffff88044685a558 R14: 0000000000004000 R15: 0000000000000002[14356.821114] FS: 00007f1c9ea567a0(0000) GS:ffff880447c80000(0000) knlGS:0000000000000000 [14356.821199] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [14356.821257] CR2: 00007f1c9e199f80 CR3: 00000003fe98e000 CR4: 00000000000006e0[14356.821330] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000[14356.821394] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400[14356.821464] Process dmsetup (pid: 10909, threadinfo ffff880141b7a000, task ffff88023d8dec90) [14356.821557] Stack: [14356.821606] ffff880141b7bd34 0000000000000007 ffffc90012ac9040 ffff880429987340 [14356.821700] ffffc90012ac41a4 0000000000020000 ffffc90012ac419d ffffc90012ac4162 [14356.821787] ffffc90012ac417c ffffc90012ac41e5 0000000000020000 0000000000000000 [14356.821905] Call Trace: [14356.821965] [] ? dm_table_add_target+0x19b/0x450 [dm_mod] [14356.822034] [] ? table_clear+0x80/0x80 [dm_mod] [14356.822090] [] ? table_load+0xd2/0x330 [dm_mod] [14356.822150] [] ? table_clear+0x80/0x80 [dm_mod] [14356.822213] [] ? ctl_ioctl+0x159/0x2a0 [dm_mod] [14356.822286] [] ? ipc_addid+0x4d/0xd0 [14356.822338] [] ? dm_ctl_ioctl+0xe/0x20 [dm_mod] [14356.822411] [] ? do_vfs_ioctl+0x8e/0x4f0 [14356.822474] [] ? dput+0x20/0x230 [14356.822533] [] ? fput+0x162/0x220 [14356.822590] [] ? sys_ioctl+0x49/0x90 [14356.822652] [] ? system_call_fastpath+0x16/0x1b [14356.822702] Code: 38 ce 65 19 a0 e9 61 fb ff ff 48 c7 c7 d8 62 19 a0 31 c0 e8 d9 80 17 e1 48 c7 c7 28 63 19 a0 31 c0 e8 cb 80 17 e1 e9 40 fb ff ff <0f> 0b 48 c7 c7 b8 60 19 a0 31 c0 e8 b6 80 17 e1 e9 9b fa ff ff [14356.823081] RIP [] verity_ctr+0x808/0x860 [dm_verity2] [14356.823146] RSP [14356.823530] ---[ end trace 773c24b9dbd5cfff ]--- On Tue, 28 Feb 2012, Mandeep Singh Baines wrote: > The verity target provides transparent integrity checking of block devices > using a cryptographic digest. > > dm-verity is meant to be setup as part of a verified boot path. This > may be anything ranging from a boot using tboot or trustedgrub to just > booting from a known-good device (like a USB drive or CD). > > dm-verity is part of ChromeOS's verified boot path. It is used to verify > the integrity of the root filesystem on boot. The root filesystem is > mounted on a dm-verity partition which transparently verifies each block > with a bootloader verified hash passed into the kernel at boot. > > Changes in V4: > * Discussion over phone (Alasdair G Kergon) > * copy _ioctl fix from dm-linear > * verity_status format fixes to match dm conventions > * s/dm-bht/verity_tree > * put everything into dm-verity.c > * ctr changed to dm conventions > * use hex2bin > * use conventional dm names for function > * s/dm_// > * for example: verity_ctr versus dm_verity_ctr > * use per_cpu API > Changes in V3: > * Discussion over irc (Alasdair G Kergon) > * Implement ioctl hook > Changes in V2: > * https://lkml.org/lkml/2011/11/10/85 (Steffen Klassert) > * Use shash API instead of older hash API > > Signed-off-by: Will Drewry > Signed-off-by: Elly Jones > Signed-off-by: Mandeep Singh Baines > Cc: Alasdair G Kergon > Cc: Milan Broz > Cc: Olof Johansson > Cc: Steffen Klassert > Cc: Andrew Morton > Cc: Mikulas Patocka > Cc: dm-devel@redhat.com > --- > Documentation/device-mapper/verity.txt | 149 ++++ > drivers/md/Kconfig | 16 + > drivers/md/Makefile | 1 + > drivers/md/dm-verity.c | 1411 ++++++++++++++++++++++++++++++++ > 4 files changed, 1577 insertions(+), 0 deletions(-) > create mode 100644 Documentation/device-mapper/verity.txt > create mode 100644 drivers/md/dm-verity.c > > diff --git a/Documentation/device-mapper/verity.txt b/Documentation/device-mapper/verity.txt > new file mode 100644 > index 0000000..b631f12 > --- /dev/null > +++ b/Documentation/device-mapper/verity.txt > @@ -0,0 +1,149 @@ > +dm-verity > +========== > + > +Device-Mapper's "verity" target provides transparent integrity checking of > +block devices using a cryptographic digest provided by the kernel crypto API. > +This target is read-only. > + > +Parameters: > + > + > + > + This is the version number of the on-disk format. Currently, there is > + only version 0. > + > + > + This is the device that is going to be integrity checked. It may be > + a subset of the full device as specified to dmsetup (start sector and count) > + It may be specified as a path, like /dev/sdaX, or a device number, > + :. > + > + > + This is the device that that supplies the hash tree data. It may be > + specified similarly to the device path and may be the same device. If the > + same device is used, the hash offset should be outside of the dm-verity > + configured device size. > + > + > + This is the offset, in 512-byte sectors, from the start of hash_dev to > + the root block of the hash tree. > + > + > + The size of a hash block. Also, the size of a block to be hashed. > + > + > + The cryptographic hash algorithm used for this device. This should > + be the name of the algorithm, like "sha1". > + > + > + The hexadecimal encoding of the cryptographic hash of all of the > + neighboring nodes at the first level of the tree. This hash should be > + trusted as there is no other authenticity beyond this point. > + > + > + The hexadecimal encoding of the salt value. > + > +Theory of operation > +=================== > + > +dm-verity is meant to be setup as part of a verified boot path. This > +may be anything ranging from a boot using tboot or trustedgrub to just > +booting from a known-good device (like a USB drive or CD). > + > +When a dm-verity device is configured, it is expected that the caller > +has been authenticated in some way (cryptographic signatures, etc). > +After instantiation, all hashes will be verified on-demand during > +disk access. If they cannot be verified up to the root node of the > +tree, the root hash, then the I/O will fail. This should identify > +tampering with any data on the device and the hash data. > + > +Cryptographic hashes are used to assert the integrity of the device on a > +per-block basis. This allows for a lightweight hash computation on first read > +into the page cache. Block hashes are stored linearly aligned to the nearest > +block the size of a page. > + > +Hash Tree > +--------- > + > +Each node in the tree is a cryptographic hash. If it is a leaf node, the hash > +is of some block data on disk. If it is an intermediary node, then the hash is > +of a number of child nodes. > + > +Each entry in the tree is a collection of neighboring nodes that fit in one > +block. The number is determined based on block_size and the size of the > +selected cryptographic digest algorithm. The hashes are linearly ordered in > +this entry and any unaligned trailing space is ignored but included when > +calculating the parent node. > + > +The tree looks something like: > + > +alg = sha256, num_blocks = 32768, block_size = 4096 > + > + [ root ] > + / . . . \ > + [entry_0] [entry_1] > + / . . . \ . . . \ > + [entry_0_0] . . . [entry_0_127] . . . . [entry_1_127] > + / ... \ / . . . \ / \ > + blk_0 ... blk_127 blk_16256 blk_16383 blk_32640 . . . blk_32767 > + > +On-disk format > +============== > + > +Below is the recommended on-disk format. The verity kernel code does not > +read the on-disk header. It only reads the hash blocks which directly > +follow the header. It is expected that a user-space tool will verify the > +integrity of the verity_header and then call dm_setup with the correct > +parameters. Alternatively, the header can be omitted and the dm_setup > +parameters can be passed via the kernel command-line in a rooted chain > +of trust where the command-line is verified. > + > +The on-disk format is especially useful in cases where the hash blocks > +are on a separate partition. The magic number allows easy identification > +of the partition contents. Alternatively, the hash blocks can be stored > +in the same partition as the data to be verified. In such a configuration > +the filesystem on the partition would be sized a little smaller than > +the full-partition, leaving room for the hash blocks. > + > +struct verity_header { > + uint64_t magic = 0x7665726974790a00; > + uint32_t version; > + uint32_t block_size; > + char digest[128]; /* in hex-ascii, null-terminated or 128-bytes */ > + char salt[128]; /* in hex-ascii, null-terminated or 128-bytes */ > +} > + > +struct verity_header_block { > + struct verity_header; > + char unused[block_size - sizeof(struct verity_header) - sizeof(sig)]; > + char sig[128]; /* in hex-ascii, null-terminated or 128-bytes */ > +} > + > +Directly following the header are the hash blocks which are stored a depth > +at a time (starting from the root), sorted in order of increasing index. > + > +Usage > +===== > + > +The API provides mechanisms for reading and verifying a tree. When reading, all > +required data for the hash tree should be populated for a block before > +attempting a verify. This can be done by calling dm_bht_populate(). When all > +data is ready, a call to dm_bht_verify_block() with the expected hash value will > +perform both the direct block hash check and the hashes of the parent and > +neighboring nodes where needed to ensure validity up to the root hash. Note, > +dm_bht_set_root_hexdigest() should be called before any verification attempts > +occur. > + > +Example > +======= > + > +Setup a device; > +[[ > + dmsetup create vroot --table \ > + "0 204800 verity /dev/sda1 /dev/sda2 alg=sha1 "\ > + "root_hexdigest=9f74809a2ee7607b16fcc70d9399a4de9725a727" > +]] > + > +A command line tool is available to compute the hash tree and return the > +root hash value. > + http://git.chromium.org/cgi-bin/gitweb.cgi?p=dm-verity.git;a=tree > diff --git a/drivers/md/Kconfig b/drivers/md/Kconfig > index faa4741..b8bb690 100644 > --- a/drivers/md/Kconfig > +++ b/drivers/md/Kconfig > @@ -370,4 +370,20 @@ config DM_FLAKEY > ---help--- > A target that intermittently fails I/O for debugging purposes. > > +config DM_VERITY > + tristate "Verity target support" > + depends on BLK_DEV_DM > + select CRYPTO > + select CRYPTO_HASH > + ---help--- > + This device-mapper target allows you to create a device that > + transparently integrity checks the data on it. You'll need to > + activate the digests you're going to use in the cryptoapi > + configuration. > + > + To compile this code as a module, choose M here: the module will > + be called dm-verity. > + > + If unsure, say N. > + > endif # MD > diff --git a/drivers/md/Makefile b/drivers/md/Makefile > index 046860c..70a29af 100644 > --- a/drivers/md/Makefile > +++ b/drivers/md/Makefile > @@ -39,6 +39,7 @@ obj-$(CONFIG_DM_SNAPSHOT) += dm-snapshot.o > obj-$(CONFIG_DM_PERSISTENT_DATA) += persistent-data/ > obj-$(CONFIG_DM_MIRROR) += dm-mirror.o dm-log.o dm-region-hash.o > obj-$(CONFIG_DM_LOG_USERSPACE) += dm-log-userspace.o > +obj-$(CONFIG_DM_VERITY) += dm-verity.o > obj-$(CONFIG_DM_ZERO) += dm-zero.o > obj-$(CONFIG_DM_RAID) += dm-raid.o > obj-$(CONFIG_DM_THIN_PROVISIONING) += dm-thin-pool.o > diff --git a/drivers/md/dm-verity.c b/drivers/md/dm-verity.c > new file mode 100644 > index 0000000..87b7958 > --- /dev/null > +++ b/drivers/md/dm-verity.c > @@ -0,0 +1,1411 @@ > +/* > + * Originally based on dm-crypt.c, > + * Copyright (C) 2003 Christophe Saout > + * Copyright (C) 2004 Clemens Fruhwirth > + * Copyright (C) 2006-2008 Red Hat, Inc. All rights reserved. > + * Copyright (C) 2012 The Chromium OS Authors > + * All Rights Reserved. > + * > + * This file is released under the GPLv2. > + * > + * Implements a verifying transparent block device. > + * See Documentation/device-mapper/dm-verity.txt > + */ > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > + > +#define DM_MSG_PREFIX "verity" > + > + > +/* Helper for printing sector_t */ > +#define ULL(x) ((unsigned long long)(x)) > + > +#define MIN_IOS 32 > +#define MIN_BIOS (MIN_IOS * 2) > + > +/* To avoid allocating memory for digest tests, we just setup a > + * max to use for now. > + */ > +#define VERITY_MAX_DIGEST_SIZE 64 /* Supports up to 512-bit digests */ > +#define VERITY_SALT_SIZE 32 /* 256 bits of salt is a lot */ > + > +/* UNALLOCATED, PENDING, READY, and VERIFIED are valid states. All other > + * values are entry-related return codes. > + */ > +#define VERITY_TREE_ENTRY_VERIFIED 8 /* 'nodes' checked against parent */ > +#define VERITY_TREE_ENTRY_READY 4 /* 'nodes' is loaded and available */ > +#define VERITY_TREE_ENTRY_PENDING 2 /* 'nodes' is being loaded */ > +#define VERITY_TREE_ENTRY_UNALLOCATED 0 /* untouched */ > +#define VERITY_TREE_ENTRY_ERROR -1 /* entry is unsuitable for use */ > +#define VERITY_TREE_ENTRY_ERROR_IO -2 /* I/O error on load */ > + > +/* Additional possible return codes */ > +#define VERITY_TREE_ENTRY_ERROR_MISMATCH -3 /* Digest mismatch */ > + > + > +/* verity_tree_entry > + * Contains verity_tree->node_count tree nodes at a given tree depth. > + * state is used to transactionally assure that data is paged in > + * from disk. Unless verity_tree kept running crypto contexts for each > + * level, we need to load in the data for on-demand verification. > + */ > +struct verity_tree_entry { > + atomic_t state; /* see defines */ > + /* Keeping an extra pointer per entry wastes up to ~33k of > + * memory if a 1m blocks are used (or 66 on 64-bit arch) > + */ > + void *io_context; /* Reserve a pointer for use during io */ > + /* data should only be non-NULL if fully populated. */ > + void *nodes; /* The hash data used to verify the children. > + * Guaranteed to be page-aligned. > + */ > +}; > + > +/* verity_tree_level > + * Contains an array of entries which represent a page of hashes where > + * each hash is a node in the tree at the given tree depth/level. > + */ > +struct verity_tree_level { > + struct verity_tree_entry *entries; /* array of entries of tree nodes */ > + unsigned int count; /* number of entries at this level */ > + sector_t sector; /* starting sector for this level */ > +}; > + > +/* opaque context, start, databuf, sector_count */ > +typedef int(*verity_tree_callback)(void *, /* external context */ > + sector_t, /* start sector */ > + u8 *, /* destination page */ > + sector_t, /* num sectors */ > + struct verity_tree_entry *); > +/* verity_tree - Device mapper block hash tree > + * verity_tree provides a fixed interface for comparing data blocks > + * against a cryptographic hashes stored in a hash tree. It > + * optimizes the tree structure for storage on disk. > + * > + * The tree is built from the bottom up. A collection of data, > + * external to the tree, is hashed and these hashes are stored > + * as the blocks in the tree. For some number of these hashes, > + * a parent node is created by hashing them. These steps are > + * repeated. > + */ > +struct verity_tree { > + /* Configured values */ > + int depth; /* Depth of the tree including the root */ > + unsigned int block_count; /* Number of blocks hashed */ > + unsigned int block_size; /* Size of a hash block */ > + char hash_alg[CRYPTO_MAX_ALG_NAME]; > + u8 salt[VERITY_SALT_SIZE]; > + > + /* Computed values */ > + unsigned int node_count; /* Data size (in hashes) for each entry */ > + unsigned int node_count_shift; /* first bit set - 1 */ > + /* > + * There is one per CPU so that verified can be simultaneous. > + * Access through per_cpu_ptr() only > + */ > + struct shash_desc * __percpu *hash_desc; /* Container for hash alg */ > + unsigned int digest_size; > + sector_t sectors; /* Number of disk sectors used */ > + > + /* bool verified; Full tree is verified */ > + u8 digest[VERITY_MAX_DIGEST_SIZE]; > + struct verity_tree_level *levels; /* in reverse order */ > + /* Callback for reading from the hash device */ > + verity_tree_callback read_cb; > +}; > + > +/* per-requested-bio private data */ > +enum verity_io_flags { > + VERITY_IOFLAGS_CLONED = 0x1, /* original bio has been cloned */ > +}; > + > +struct verity_io { > + struct dm_target *target; > + struct bio *bio; > + struct delayed_work work; > + unsigned int flags; > + > + int error; > + atomic_t pending; > + > + u64 block; /* aligned block index */ > + u64 count; /* aligned count in blocks */ > +}; > + > +struct verity_config { > + struct dm_dev *dev; > + sector_t start; > + sector_t size; > + > + struct dm_dev *hash_dev; > + sector_t hash_start; > + > + struct verity_tree bht; > + > + /* Pool required for io contexts */ > + mempool_t *io_pool; > + /* Pool and bios required for making sure that backing device reads are > + * in PAGE_SIZE increments. > + */ > + struct bio_set *bs; > + > + char hash_alg[CRYPTO_MAX_ALG_NAME]; > +}; > + > + > +static struct kmem_cache *_verity_io_pool; > +static struct workqueue_struct *kveritydq, *kverityd_ioq; > + > + > +static void kverityd_verify(struct work_struct *work); > +static void kverityd_io(struct work_struct *work); > +static void kverityd_io_bht_populate(struct verity_io *io); > +static void kverityd_io_bht_populate_end(struct bio *, int error); > + > + > +/* > + * Utilities > + */ > + > +static void bin2hex(char *dst, const u8 *src, size_t count) > +{ > + while (count-- > 0) { > + sprintf(dst, "%02hhx", (int)*src); > + dst += 2; > + src++; > + } > +} > + > +/* > + * Verity Tree > + */ > + > +/* Functions for converting indices to nodes. */ > + > +static inline unsigned int verity_tree_get_level_shift(struct verity_tree *bht, > + int depth) > +{ > + return (bht->depth - depth) * bht->node_count_shift; > +} > + > +/* For the given depth, this is the entry index. At depth+1 it is the node > + * index for depth. > + */ > +static inline unsigned int verity_tree_index_at_level(struct verity_tree *bht, > + int depth, > + unsigned int leaf) > +{ > + return leaf >> verity_tree_get_level_shift(bht, depth); > +} > + > +static inline struct verity_tree_entry *verity_tree_get_entry( > + struct verity_tree *bht, > + int depth, > + unsigned int block) > +{ > + unsigned int index = verity_tree_index_at_level(bht, depth, block); > + struct verity_tree_level *level = &bht->levels[depth]; > + > + return &level->entries[index]; > +} > + > +static inline void *verity_tree_get_node(struct verity_tree *bht, > + struct verity_tree_entry *entry, > + int depth, unsigned int block) > +{ > + unsigned int index = verity_tree_index_at_level(bht, depth, block); > + unsigned int node_index = index % bht->node_count; > + > + return entry->nodes + (node_index * bht->digest_size); > +} > +/** > + * verity_tree_compute_hash: hashes a page of data > + */ > +static int verity_tree_compute_hash(struct verity_tree *vt, struct page *pg, > + unsigned int offset, u8 *digest) > +{ > + struct shash_desc *hash_desc; > + void *data; > + int err; > + > + hash_desc = *per_cpu_ptr(vt->hash_desc, smp_processor_id()); > + > + if (crypto_shash_init(hash_desc)) { > + DMCRIT("failed to reinitialize crypto hash (proc:%d)", > + smp_processor_id()); > + return -EINVAL; > + } > + data = kmap_atomic(pg); > + err = crypto_shash_update(hash_desc, data + offset, PAGE_SIZE); > + kunmap_atomic(data); > + if (err) { > + DMCRIT("crypto_hash_update failed"); > + return -EINVAL; > + } > + if (crypto_shash_update(hash_desc, vt->salt, sizeof(vt->salt))) { > + DMCRIT("crypto_hash_update failed"); > + return -EINVAL; > + } > + if (crypto_shash_final(hash_desc, digest)) { > + DMCRIT("crypto_hash_final failed"); > + return -EINVAL; > + } > + > + return 0; > +} > + > +static int verity_tree_initialize_entries(struct verity_tree *vt) > +{ > + /* last represents the index of the last digest store in the tree. > + * By walking the tree with that index, it is possible to compute the > + * total number of entries at each level. > + * > + * Since each entry will contain up to |node_count| nodes of the tree, > + * it is possible that the last index may not be at the end of a given > + * entry->nodes. In that case, it is assumed the value is padded. > + * > + * Note, we treat both the tree root (1 hash) and the tree leaves > + * independently from the vt data structures. Logically, the root is > + * depth=-1 and the block layer level is depth=vt->depth > + */ > + unsigned int last = vt->block_count; > + int depth; > + > + /* check that the largest level->count can't result in an int overflow > + * on allocation or sector calculation. > + */ > + if (((last >> vt->node_count_shift) + 1) > > + UINT_MAX / max((unsigned int)sizeof(struct verity_tree_entry), > + (unsigned int)to_sector(vt->block_size))) { > + DMCRIT("required entries %u is too large", last + 1); > + return -EINVAL; > + } > + > + /* Track the current sector location for each level so we don't have to > + * compute it during traversals. > + */ > + vt->sectors = 0; > + for (depth = 0; depth < vt->depth; ++depth) { > + struct verity_tree_level *level = &vt->levels[depth]; > + > + level->count = verity_tree_index_at_level(vt, depth, last) + 1; > + level->entries = (struct verity_tree_entry *) > + kcalloc(level->count, > + sizeof(struct verity_tree_entry), > + GFP_KERNEL); > + if (!level->entries) { > + DMERR("failed to allocate entries for depth %d", depth); > + return -ENOMEM; > + } > + level->sector = vt->sectors; > + vt->sectors += level->count * to_sector(vt->block_size); > + } > + > + return 0; > +} > + > +/** > + * verity_tree_create - prepares @vt for us > + * @vt: pointer to a verity_tree_create()d vt > + * @depth: tree depth without the root; including block hashes > + * @block_count: the number of block hashes / tree leaves > + * @alg_name: crypto hash algorithm name > + * > + * Returns 0 on success. > + * > + * Callers can offset into devices by storing the data in the io callbacks. > + */ > +static int verity_tree_create(struct verity_tree *vt, unsigned int block_count, > + unsigned int block_size, const char *alg_name) > +{ > + struct crypto_shash *tfm; > + int size, cpu, status = 0; > + > + vt->block_size = block_size; > + /* Verify that PAGE_SIZE >= block_size >= SECTOR_SIZE. */ > + if ((block_size > PAGE_SIZE) || > + (PAGE_SIZE % block_size) || > + (to_sector(block_size) == 0)) > + return -EINVAL; > + > + tfm = crypto_alloc_shash(alg_name, 0, 0); > + if (IS_ERR(tfm)) { > + DMERR("failed to allocate crypto hash '%s'", alg_name); > + return -ENOMEM; > + } > + size = sizeof(struct shash_desc) + crypto_shash_descsize(tfm); > + > + vt->hash_desc = alloc_percpu(struct shash_desc *); > + if (!vt->hash_desc) { > + DMERR("Failed to allocate per cpu hash_desc"); > + status = -ENOMEM; > + goto bad_per_cpu; > + } > + > + /* Pre-allocate per-cpu crypto contexts to avoid having to > + * kmalloc/kfree a context for every hash operation. > + */ > + for_each_possible_cpu(cpu) { > + struct shash_desc *hash_desc = kmalloc(size, GFP_KERNEL); > + > + *per_cpu_ptr(vt->hash_desc, cpu) = hash_desc; > + if (!hash_desc) { > + DMERR("failed to allocate crypto hash contexts"); > + status = -ENOMEM; > + goto bad_hash_alloc; > + } > + hash_desc->tfm = tfm; > + hash_desc->flags = 0x0; > + } > + vt->digest_size = crypto_shash_digestsize(tfm); > + /* We expect to be able to pack >=2 hashes into a block */ > + if (block_size / vt->digest_size < 2) { > + DMERR("too few hashes fit in a block"); > + status = -EINVAL; > + goto bad_arg; > + } > + > + if (vt->digest_size > VERITY_MAX_DIGEST_SIZE) { > + DMERR("VERITY_MAX_DIGEST_SIZE too small for digest"); > + status = -EINVAL; > + goto bad_arg; > + } > + > + /* Configure the tree */ > + vt->block_count = block_count; > + if (block_count == 0) { > + DMERR("block_count must be non-zero"); > + status = -EINVAL; > + goto bad_arg; > + } > + > + /* Each verity_tree_entry->nodes is one block. The node code tracks > + * how many nodes fit into one entry where a node is a single > + * hash (message digest). > + */ > + vt->node_count_shift = fls(block_size / vt->digest_size) - 1; > + /* Round down to the nearest power of two. This makes indexing > + * into the tree much less painful. > + */ > + vt->node_count = 1 << vt->node_count_shift; > + > + /* This is unlikely to happen, but with 64k pages, who knows. */ > + if (vt->node_count > UINT_MAX / vt->digest_size) { > + DMERR("node_count * hash_len exceeds UINT_MAX!"); > + status = -EINVAL; > + goto bad_arg; > + } > + > + vt->depth = DIV_ROUND_UP(fls(block_count - 1), vt->node_count_shift); > + > + /* Ensure that we can safely shift by this value. */ > + if (vt->depth * vt->node_count_shift >= sizeof(unsigned int) * 8) { > + DMERR("specified depth and node_count_shift is too large"); > + status = -EINVAL; > + goto bad_arg; > + } > + > + /* Allocate levels. Each level of the tree may have an arbitrary number > + * of verity_tree_entry structs. Each entry contains node_count nodes. > + * Each node in the tree is a cryptographic digest of either node_count > + * nodes on the subsequent level or of a specific block on disk. > + */ > + vt->levels = (struct verity_tree_level *) > + kcalloc(vt->depth, > + sizeof(struct verity_tree_level), GFP_KERNEL); > + if (!vt->levels) { > + DMERR("failed to allocate tree levels"); > + status = -ENOMEM; > + goto bad_level_alloc; > + } > + > + vt->read_cb = NULL; > + > + status = verity_tree_initialize_entries(vt); > + if (status) > + goto bad_entries_alloc; > + > + /* We compute depth such that there is only be 1 block at level 0. */ > + BUG_ON(vt->levels[0].count != 1); > + > + return 0; > + > +bad_entries_alloc: > + while (vt->depth-- > 0) > + kfree(vt->levels[vt->depth].entries); > + kfree(vt->levels); > +bad_level_alloc: > +bad_arg: > +bad_hash_alloc: > + for_each_possible_cpu(cpu) > + if (*per_cpu_ptr(vt->hash_desc, cpu)) > + kfree(*per_cpu_ptr(vt->hash_desc, cpu)); > + free_percpu(vt->hash_desc); > +bad_per_cpu: > + crypto_free_shash(tfm); > + return status; > +} > + > +/** > + * verity_tree_read_completed > + * @entry: pointer to the entry that's been loaded > + * @status: I/O status. Non-zero is failure. > + * MUST always be called after a read_cb completes. > + */ > +static void verity_tree_read_completed(struct verity_tree_entry *entry, > + int status) > +{ > + if (status) { > + DMCRIT("an I/O error occurred while reading entry"); > + atomic_set(&entry->state, VERITY_TREE_ENTRY_ERROR_IO); > + return; > + } > + BUG_ON(atomic_read(&entry->state) != VERITY_TREE_ENTRY_PENDING); > + atomic_set(&entry->state, VERITY_TREE_ENTRY_READY); > +} > + > +/** > + * verity_tree_verify_block - checks that all path nodes for @block are valid > + * @vt: pointer to a verity_tree_create()d vt > + * @block: specific block data is expected from > + * @pg: page holding the block data > + * @offset: offset into the page > + * > + * Returns 0 on success, VERITY_TREE_ENTRY_ERROR_MISMATCH on error. > + */ > +static int verity_tree_verify_block(struct verity_tree *vt, unsigned int block, > + struct page *pg, unsigned int offset) > +{ > + int state, depth = vt->depth; > + u8 digest[VERITY_MAX_DIGEST_SIZE]; > + struct verity_tree_entry *entry; > + void *node; > + > + do { > + /* Need to check that the hash of the current block is accurate > + * in its parent. > + */ > + entry = verity_tree_get_entry(vt, depth - 1, block); > + state = atomic_read(&entry->state); > + /* This call is only safe if all nodes along the path > + * are already populated (i.e. READY) via verity_tree_populate. > + */ > + BUG_ON(state < VERITY_TREE_ENTRY_READY); > + node = verity_tree_get_node(vt, entry, depth, block); > + > + if (verity_tree_compute_hash(vt, pg, offset, digest) || > + memcmp(digest, node, vt->digest_size)) > + goto mismatch; > + > + /* Keep the containing block of hashes to be verified in the > + * next pass. > + */ > + pg = virt_to_page(entry->nodes); > + offset = offset_in_page(entry->nodes); > + } while (--depth > 0 && state != VERITY_TREE_ENTRY_VERIFIED); > + > + if (depth == 0 && state != VERITY_TREE_ENTRY_VERIFIED) { > + if (verity_tree_compute_hash(vt, pg, offset, digest) || > + memcmp(digest, vt->digest, vt->digest_size)) > + goto mismatch; > + atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED); > + } > + > + /* Mark path to leaf as verified. */ > + for (depth++; depth < vt->depth; depth++) { > + entry = verity_tree_get_entry(vt, depth, block); > + /* At this point, entry can only be in VERIFIED or READY state. > + * So it is safe to use atomic_set instead of atomic_cmpxchg. > + */ > + atomic_set(&entry->state, VERITY_TREE_ENTRY_VERIFIED); > + } > + > + return 0; > + > +mismatch: > + DMERR_LIMIT("verify_path: failed to verify hash (d=%d,bi=%u)", > + depth, block); > + return VERITY_TREE_ENTRY_ERROR_MISMATCH; > +} > + > +/** > + * verity_tree_is_populated - check that nodes needed to verify a given > + * block are all ready > + * @vt: pointer to a verity_tree_create()d vt > + * @block: specific block data is expected from > + * > + * Callers may wish to call verity_tree_is_populated() when checking an io > + * for which entries were already pending. > + */ > +static bool verity_tree_is_populated(struct verity_tree *vt, unsigned int block) > +{ > + int depth; > + > + for (depth = vt->depth - 1; depth >= 0; depth--) { > + struct verity_tree_entry *entry; > + entry = verity_tree_get_entry(vt, depth, block); > + if (atomic_read(&entry->state) < VERITY_TREE_ENTRY_READY) > + return false; > + } > + > + return true; > +} > + > +/** > + * verity_tree_populate - reads entries from disk needed to verify a given block > + * @vt: pointer to a verity_tree_create()d vt > + * @ctx: context used for all read_cb calls on this request > + * @block: specific block data is expected from > + * > + * Returns negative value on error. Returns 0 on success. > + */ > +static int verity_tree_populate(struct verity_tree *vt, void *ctx, > + unsigned int block) > +{ > + int depth, state; > + > + BUG_ON(block >= vt->block_count); > + > + for (depth = vt->depth - 1; depth >= 0; --depth) { > + unsigned int index; > + struct verity_tree_level *level; > + struct verity_tree_entry *entry; > + > + index = verity_tree_index_at_level(vt, depth, block); > + level = &vt->levels[depth]; > + entry = verity_tree_get_entry(vt, depth, block); > + state = atomic_cmpxchg(&entry->state, > + VERITY_TREE_ENTRY_UNALLOCATED, > + VERITY_TREE_ENTRY_PENDING); > + if (state == VERITY_TREE_ENTRY_VERIFIED) > + break; > + if (state <= VERITY_TREE_ENTRY_ERROR) > + goto error_state; > + if (state != VERITY_TREE_ENTRY_UNALLOCATED) > + continue; > + > + /* Current entry is claimed for allocation and loading */ > + entry->nodes = kmalloc(vt->block_size, GFP_NOIO); > + if (!entry->nodes) > + goto nomem; > + > + vt->read_cb(ctx, > + level->sector + to_sector(index * vt->block_size), > + entry->nodes, to_sector(vt->block_size), entry); > + } > + > + return 0; > + > +error_state: > + DMCRIT("block %u at depth %d is in an error state", block, depth); > + return -EPERM; > + > +nomem: > + DMCRIT("failed to allocate memory for entry->nodes"); > + return -ENOMEM; > +} > + > +/** > + * verity_tree_destroy - cleans up all memory used by @vt > + * @vt: pointer to a verity_tree_create()d vt > + */ > +static void verity_tree_destroy(struct verity_tree *vt) > +{ > + int depth, cpu; > + > + for (depth = 0; depth < vt->depth; depth++) { > + struct verity_tree_entry *entry = vt->levels[depth].entries; > + struct verity_tree_entry *entry_end = entry + > + vt->levels[depth].count; > + for (; entry < entry_end; ++entry) > + kfree(entry->nodes); > + kfree(vt->levels[depth].entries); > + } > + kfree(vt->levels); > + crypto_free_shash((*per_cpu_ptr(vt->hash_desc, 0))->tfm); > + for_each_possible_cpu(cpu) > + kfree(*per_cpu_ptr(vt->hash_desc, cpu)); > +} > + > +/* > + * Verity Tree Accessors > + */ > + > +/** > + * verity_tree_set_digest - sets an unverified root digest hash from hex > + * @vt: pointer to a verity_tree_create()d vt > + * @digest: string containing the digest in hex > + * Returns non-zero on error. > + */ > +static int verity_tree_set_digest(struct verity_tree *vt, const char *digest) > +{ > + /* Make sure we have at least the bytes expected */ > + if (strnlen((char *)digest, vt->digest_size * 2) != > + vt->digest_size * 2) { > + DMERR("root digest length does not match hash algorithm"); > + return -1; > + } > + return hex2bin(vt->digest, digest, vt->digest_size); > +} > + > +/** > + * verity_tree_digest - returns root digest in hex > + * @vt: pointer to a verity_tree_create()d vt > + * @digest: buffer to put into, must be of length VERITY_SALT_SIZE * 2 + 1. > + */ > +int verity_tree_digest(struct verity_tree *vt, char *digest) > +{ > + bin2hex(digest, vt->digest, vt->digest_size); > + return 0; > +} > + > +/** > + * verity_tree_set_salt - sets the salt > + * @vt: pointer to a verity_tree_create()d vt > + * @salt: string containing the salt in hex > + * Returns non-zero on error. > + */ > +int verity_tree_set_salt(struct verity_tree *vt, const char *salt) > +{ > + size_t saltlen = min(strlen(salt) / 2, sizeof(vt->salt)); > + memset(vt->salt, 0, sizeof(vt->salt)); > + return hex2bin(vt->salt, salt, saltlen); > +} > + > + > +/** > + * verity_tree_salt - returns the salt in hex > + * @vt: pointer to a verity_tree_create()d vt > + * @salt: buffer to put salt into, of length VERITY_SALT_SIZE * 2 + 1. > + */ > +int verity_tree_salt(struct verity_tree *vt, char *salt) > +{ > + bin2hex(salt, vt->salt, sizeof(vt->salt)); > + return 0; > +} > + > +/* > + * Allocation and utility functions > + */ > + > +static void kverityd_src_io_read_end(struct bio *clone, int error); > + > +/* Shared destructor for all internal bios */ > +static void verity_bio_destructor(struct bio *bio) > +{ > + struct verity_io *io = bio->bi_private; > + struct verity_config *vc = io->target->private; > + bio_free(bio, vc->bs); > +} > + > +static struct bio *verity_alloc_bioset(struct verity_config *vc, gfp_t gfp_mask, > + int nr_iovecs) > +{ > + return bio_alloc_bioset(gfp_mask, nr_iovecs, vc->bs); > +} > + > +static struct verity_io *verity_io_alloc(struct dm_target *ti, > + struct bio *bio) > +{ > + struct verity_config *vc = ti->private; > + sector_t sector = bio->bi_sector - ti->begin; > + struct verity_io *io; > + > + io = mempool_alloc(vc->io_pool, GFP_NOIO); > + if (unlikely(!io)) > + return NULL; > + io->flags = 0; > + io->target = ti; > + io->bio = bio; > + io->error = 0; > + > + /* Adjust the sector by the virtual starting sector */ > + io->block = to_bytes(sector) / vc->bht.block_size; > + io->count = bio->bi_size / vc->bht.block_size; > + > + atomic_set(&io->pending, 0); > + > + return io; > +} > + > +static struct bio *verity_bio_clone(struct verity_io *io) > +{ > + struct verity_config *vc = io->target->private; > + struct bio *bio = io->bio; > + struct bio *clone = verity_alloc_bioset(vc, GFP_NOIO, bio->bi_max_vecs); > + > + if (!clone) > + return NULL; > + > + __bio_clone(clone, bio); > + clone->bi_private = io; > + clone->bi_end_io = kverityd_src_io_read_end; > + clone->bi_bdev = vc->dev->bdev; > + clone->bi_sector += vc->start - io->target->begin; > + clone->bi_destructor = verity_bio_destructor; > + > + return clone; > +} > + > +/* > + * Reverse flow of requests into the device. > + * > + * (Start at the bottom with verity_map and work your way upward). > + */ > + > +static void verity_inc_pending(struct verity_io *io); > + > +static void verity_return_bio_to_caller(struct verity_io *io) > +{ > + struct verity_config *vc = io->target->private; > + > + if (io->error) > + io->error = -EIO; > + > + bio_endio(io->bio, io->error); > + mempool_free(io, vc->io_pool); > +} > + > +/* Check for any missing bht hashes. */ > +static bool verity_is_bht_populated(struct verity_io *io) > +{ > + struct verity_config *vc = io->target->private; > + u64 block; > + > + for (block = io->block; block < io->block + io->count; ++block) > + if (!verity_tree_is_populated(&vc->bht, block)) > + return false; > + > + return true; > +} > + > +/* verity_dec_pending manages the lifetime of all verity_io structs. > + * Non-bug error handling is centralized through this interface and > + * all passage from workqueue to workqueue. > + */ > +static void verity_dec_pending(struct verity_io *io) > +{ > + if (!atomic_dec_and_test(&io->pending)) > + goto done; > + > + if (unlikely(io->error)) > + goto io_error; > + > + /* I/Os that were pending may now be ready */ > + if (verity_is_bht_populated(io)) { > + INIT_DELAYED_WORK(&io->work, kverityd_verify); > + queue_delayed_work(kveritydq, &io->work, 0); > + } else { > + INIT_DELAYED_WORK(&io->work, kverityd_io); > + queue_delayed_work(kverityd_ioq, &io->work, HZ/10); > + } > + > +done: > + return; > + > +io_error: > + verity_return_bio_to_caller(io); > +} > + > +/* Walks the data set and computes the hash of the data read from the > + * untrusted source device. The computed hash is then passed to verity-tree > + * for verification. > + */ > +static int verity_verify(struct verity_config *vc, > + struct verity_io *io) > +{ > + unsigned int block_size = vc->bht.block_size; > + struct bio *bio = io->bio; > + u64 block = io->block; > + unsigned int idx; > + int r; > + > + for (idx = bio->bi_idx; idx < bio->bi_vcnt; idx++) { > + struct bio_vec *bv = bio_iovec_idx(bio, idx); > + unsigned int offset = bv->bv_offset; > + unsigned int len = bv->bv_len; > + > + BUG_ON(offset % block_size); > + BUG_ON(len % block_size); > + > + while (len) { > + r = verity_tree_verify_block(&vc->bht, block, > + bv->bv_page, offset); > + if (r) > + goto bad_return; > + > + offset += block_size; > + len -= block_size; > + block++; > + cond_resched(); > + } > + } > + > + return 0; > + > +bad_return: > + /* verity_tree functions aren't expected to return errno friendly > + * values. They are converted here for uniformity. > + */ > + if (r > 0) { > + DMERR("Pending data for block %llu seen at verify", ULL(block)); > + r = -EBUSY; > + } else { > + DMERR_LIMIT("Block hash does not match!"); > + r = -EACCES; > + } > + return r; > +} > + > +/* Services the verify workqueue */ > +static void kverityd_verify(struct work_struct *work) > +{ > + struct delayed_work *dwork = container_of(work, struct delayed_work, > + work); > + struct verity_io *io = container_of(dwork, struct verity_io, > + work); > + struct verity_config *vc = io->target->private; > + > + io->error = verity_verify(vc, io); > + > + /* Free up the bio and tag with the return value */ > + verity_return_bio_to_caller(io); > +} > + > +/* Asynchronously called upon the completion of verity-tree I/O. The status > + * of the operation is passed back to verity-tree and the next steps are > + * decided by verity_dec_pending. > + */ > +static void kverityd_io_bht_populate_end(struct bio *bio, int error) > +{ > + struct verity_tree_entry *entry; > + struct verity_io *io; > + > + entry = (struct verity_tree_entry *) bio->bi_private; > + io = (struct verity_io *) entry->io_context; > + > + /* Tell the tree to atomically update now that we've populated > + * the given entry. > + */ > + verity_tree_read_completed(entry, error); > + > + /* Clean up for reuse when reading data to be checked */ > + bio->bi_vcnt = 0; > + bio->bi_io_vec->bv_offset = 0; > + bio->bi_io_vec->bv_len = 0; > + bio->bi_io_vec->bv_page = NULL; > + /* Restore the private data to I/O so the destructor can be shared. */ > + bio->bi_private = (void *) io; > + bio_put(bio); > + > + /* We bail but assume the tree has been marked bad. */ > + if (unlikely(error)) { > + DMERR("Failed to read for sector %llu (%u)", > + ULL(io->bio->bi_sector), io->bio->bi_size); > + io->error = error; > + /* Pass through the error to verity_dec_pending below */ > + } > + /* When pending = 0, it will transition to reading real data */ > + verity_dec_pending(io); > +} > + > +/* Called by verity-tree (via verity_tree_populate), this function provides > + * the message digests to verity-tree that are stored on disk. > + */ > +static int kverityd_bht_read_callback(void *ctx, sector_t start, u8 *dst, > + sector_t count, > + struct verity_tree_entry *entry) > +{ > + struct verity_io *io = ctx; /* I/O for this batch */ > + struct verity_config *vc; > + struct bio *bio; > + > + vc = io->target->private; > + > + /* The I/O context is nested inside the entry so that we don't need one > + * io context per page read. > + */ > + entry->io_context = ctx; > + > + /* We should only get page size requests at present. */ > + verity_inc_pending(io); > + bio = verity_alloc_bioset(vc, GFP_NOIO, 1); > + if (unlikely(!bio)) { > + DMCRIT("Out of memory at bio_alloc_bioset"); > + verity_tree_read_completed(entry, -ENOMEM); > + return -ENOMEM; > + } > + bio->bi_private = (void *) entry; > + bio->bi_idx = 0; > + bio->bi_size = vc->bht.block_size; > + bio->bi_sector = vc->hash_start + start; > + bio->bi_bdev = vc->hash_dev->bdev; > + bio->bi_end_io = kverityd_io_bht_populate_end; > + bio->bi_rw = REQ_META; > + /* Only need to free the bio since the page is managed by bht */ > + bio->bi_destructor = verity_bio_destructor; > + bio->bi_vcnt = 1; > + bio->bi_io_vec->bv_offset = offset_in_page(dst); > + bio->bi_io_vec->bv_len = to_bytes(count); > + /* dst is guaranteed to be a page_pool allocation */ > + bio->bi_io_vec->bv_page = virt_to_page(dst); > + /* Track that this I/O is in use. There should be no risk of the io > + * being removed prior since this is called synchronously. > + */ > + generic_make_request(bio); > + return 0; > +} > + > +/* Submits an io request for each missing block of block hashes. > + * The last one to return will then enqueue this on the io workqueue. > + */ > +static void kverityd_io_bht_populate(struct verity_io *io) > +{ > + struct verity_config *vc = io->target->private; > + u64 block; > + > + for (block = io->block; block < io->block + io->count; ++block) { > + int ret = verity_tree_populate(&vc->bht, io, block); > + > + if (ret < 0) { > + /* verity_dec_pending will handle the error case. */ > + io->error = ret; > + break; > + } > + } > +} > + > +/* Asynchronously called upon the completion of I/O issued > + * from kverityd_src_io_read. verity_dec_pending() acts as > + * the scheduler/flow manager. > + */ > +static void kverityd_src_io_read_end(struct bio *clone, int error) > +{ > + struct verity_io *io = clone->bi_private; > + > + if (unlikely(!bio_flagged(clone, BIO_UPTODATE) && !error)) > + error = -EIO; > + > + if (unlikely(error)) { > + DMERR("Error occurred: %d (%llu, %u)", > + error, ULL(clone->bi_sector), clone->bi_size); > + io->error = error; > + } > + > + /* Release the clone which just avoids the block layer from > + * leaving offsets, etc in unexpected states. > + */ > + bio_put(clone); > + > + verity_dec_pending(io); > +} > + > +/* If not yet underway, an I/O request will be issued to the vc->dev > + * device for the data needed. It is cloned to avoid unexpected changes > + * to the original bio struct. > + */ > +static void kverityd_src_io_read(struct verity_io *io) > +{ > + struct bio *clone; > + > + /* Check if the read is already issued. */ > + if (io->flags & VERITY_IOFLAGS_CLONED) > + return; > + > + io->flags |= VERITY_IOFLAGS_CLONED; > + > + /* Clone the bio. The block layer may modify the bvec array. */ > + clone = verity_bio_clone(io); > + if (unlikely(!clone)) { > + io->error = -ENOMEM; > + return; > + } > + > + verity_inc_pending(io); > + > + generic_make_request(clone); > +} > + > +/* kverityd_io services the I/O workqueue. For each pass through > + * the I/O workqueue, a call to populate both the origin drive > + * data and the hash tree data is made. > + */ > +static void kverityd_io(struct work_struct *work) > +{ > + struct delayed_work *dwork = container_of(work, struct delayed_work, > + work); > + struct verity_io *io = container_of(dwork, struct verity_io, > + work); > + > + /* Issue requests asynchronously. */ > + verity_inc_pending(io); > + kverityd_src_io_read(io); > + kverityd_io_bht_populate(io); > + verity_dec_pending(io); > +} > + > +/* Paired with verity_dec_pending, the pending value in the io dictate the > + * lifetime of a request and when it is ready to be processed on the > + * workqueues. > + */ > +static void verity_inc_pending(struct verity_io *io) > +{ > + atomic_inc(&io->pending); > +} > + > +/* Block-level requests start here. */ > +static int verity_map(struct dm_target *ti, struct bio *bio, > + union map_info *map_context) > +{ > + struct verity_io *io; > + struct verity_config *vc; > + struct request_queue *r_queue; > + > + if (unlikely(!ti)) { > + DMERR("dm_target was NULL"); > + return -EIO; > + } > + > + vc = ti->private; > + r_queue = bdev_get_queue(vc->dev->bdev); > + > + if (bio_data_dir(bio) == WRITE) { > + /* If we silently drop writes, then the VFS layer will cache > + * the write and persist it in memory. While it doesn't change > + * the underlying storage, it still may be contrary to the > + * behavior expected by a verified, read-only device. > + */ > + DMWARN_LIMIT("write request received. rejecting with -EIO."); > + return -EIO; > + } else { > + /* Queue up the request to be verified */ > + io = verity_io_alloc(ti, bio); > + if (!io) { > + DMERR_LIMIT("Failed to allocate and init IO data"); > + return DM_MAPIO_REQUEUE; > + } > + INIT_DELAYED_WORK(&io->work, kverityd_io); > + queue_delayed_work(kverityd_ioq, &io->work, 0); > + } > + > + return DM_MAPIO_SUBMITTED; > +} > + > +/* > + * Non-block interfaces and device-mapper specific code > + */ > + > +/* > + * Verity target parameters: > + * > + * > + * > + * version: version of the hash tree on-disk format > + * dev: device to verify > + * hash_dev: device hashtree is stored on > + * hash_start: start address of hashes > + * block_size: size of a hash block > + * alg: hash algorithm > + * digest: toplevel hash of the tree > + * salt: salt > + */ > +static int verity_ctr(struct dm_target *ti, unsigned int argc, char **argv) > +{ > + struct verity_config *vc = NULL; > + const char *dev, *hash_dev, *alg, *digest, *salt; > + unsigned long hash_start, block_size, version; > + sector_t blocks; > + int ret; > + > + if (argc != 8) { > + ti->error = "Invalid argument count"; > + return -EINVAL; > + } > + > + if (strict_strtoul(argv[0], 10, &version) || > + (version != 0)) { > + ti->error = "Invalid version"; > + return -EINVAL; > + } > + dev = argv[1]; > + hash_dev = argv[2]; > + if (strict_strtoul(argv[3], 10, &hash_start)) { > + ti->error = "Invalid hash_start"; > + return -EINVAL; > + } > + if (strict_strtoul(argv[4], 10, &block_size) || > + (block_size > UINT_MAX)) { > + ti->error = "Invalid block_size"; > + return -EINVAL; > + } > + alg = argv[5]; > + digest = argv[6]; > + salt = argv[7]; > + > + /* The device mapper device should be setup read-only */ > + if ((dm_table_get_mode(ti->table) & ~FMODE_READ) != 0) { > + ti->error = "Must be created readonly."; > + return -EINVAL; > + } > + > + vc = kzalloc(sizeof(*vc), GFP_KERNEL); > + if (!vc) { > + return -EINVAL; > + } > + > + /* Calculate the blocks from the given device size */ > + vc->size = ti->len; > + blocks = to_bytes(vc->size) / block_size; > + if (verity_tree_create(&vc->bht, blocks, block_size, alg)) { > + DMERR("failed to create required bht"); > + goto bad_bht; > + } > + if (verity_tree_set_digest(&vc->bht, digest)) { > + DMERR("digest error"); > + goto bad_digest; > + } > + verity_tree_set_salt(&vc->bht, salt); > + vc->bht.read_cb = kverityd_bht_read_callback; > + > + vc->start = 0; > + /* We only ever grab the device in read-only mode. */ > + ret = dm_get_device(ti, dev, dm_table_get_mode(ti->table), &vc->dev); > + if (ret) { > + DMERR("Failed to acquire device '%s': %d", dev, ret); > + ti->error = "Device lookup failed"; > + goto bad_verity_dev; > + } > + > + if ((to_bytes(vc->start) % block_size) || > + (to_bytes(vc->size) % block_size)) { > + ti->error = "Device must be block_size divisble/aligned"; > + goto bad_hash_start; > + } > + > + vc->hash_start = (sector_t)hash_start; > + > + /* > + * Note, dev == hash_dev is okay as long as the size of > + * ti->len passed to device mapper does not include > + * the hashes. > + */ > + if (dm_get_device(ti, hash_dev, > + dm_table_get_mode(ti->table), &vc->hash_dev)) { > + ti->error = "Hash device lookup failed"; > + goto bad_hash_dev; > + } > + > + if (snprintf(vc->hash_alg, CRYPTO_MAX_ALG_NAME, "%s", alg) >= > + CRYPTO_MAX_ALG_NAME) { > + ti->error = "Hash algorithm name is too long"; > + goto bad_hash; > + } > + > + vc->io_pool = mempool_create_slab_pool(MIN_IOS, _verity_io_pool); > + if (!vc->io_pool) { > + ti->error = "Cannot allocate verity io mempool"; > + goto bad_slab_pool; > + } > + > + vc->bs = bioset_create(MIN_BIOS, 0); > + if (!vc->bs) { > + ti->error = "Cannot allocate verity bioset"; > + goto bad_bs; > + } > + > + ti->private = vc; > + > + return 0; > + > +bad_bs: > + mempool_destroy(vc->io_pool); > +bad_slab_pool: > +bad_hash: > + dm_put_device(ti, vc->hash_dev); > +bad_hash_dev: > +bad_hash_start: > + dm_put_device(ti, vc->dev); > +bad_bht: > +bad_digest: > +bad_verity_dev: > + kfree(vc); /* hash is not secret so no need to zero */ > + return -EINVAL; > +} > + > +static void verity_dtr(struct dm_target *ti) > +{ > + struct verity_config *vc = (struct verity_config *) ti->private; > + > + bioset_free(vc->bs); > + mempool_destroy(vc->io_pool); > + verity_tree_destroy(&vc->bht); > + dm_put_device(ti, vc->hash_dev); > + dm_put_device(ti, vc->dev); > + kfree(vc); > +} > + > +static int verity_ioctl(struct dm_target *ti, unsigned int cmd, > + unsigned long arg) > +{ > + struct verity_config *vc = (struct verity_config *) ti->private; > + struct dm_dev *dev = vc->dev; > + int r = 0; > + > + /* > + * Only pass ioctls through if the device sizes match exactly. > + */ > + if (vc->start || > + ti->len != i_size_read(dev->bdev->bd_inode) >> SECTOR_SHIFT) > + r = scsi_verify_blk_ioctl(NULL, cmd); > + > + return r ? : __blkdev_driver_ioctl(dev->bdev, dev->mode, cmd, arg); > +} > + > +static int verity_status(struct dm_target *ti, status_type_t type, > + char *result, unsigned int maxlen) > +{ > + struct verity_config *vc = (struct verity_config *) ti->private; > + unsigned int sz = 0; > + char digest[VERITY_MAX_DIGEST_SIZE * 2 + 1] = { 0 }; > + char salt[VERITY_SALT_SIZE * 2 + 1] = { 0 }; > + > + verity_tree_digest(&vc->bht, digest); > + verity_tree_salt(&vc->bht, salt); > + > + switch (type) { > + case STATUSTYPE_INFO: > + result[0] = '\0'; > + break; > + case STATUSTYPE_TABLE: > + DMEMIT("%s %s %llu %llu %s %s %s", > + vc->dev->name, > + vc->hash_dev->name, > + ULL(vc->hash_start), > + ULL(vc->bht.block_size), > + vc->hash_alg, > + digest, > + salt); > + break; > + } > + return 0; > +} > + > +static int verity_merge(struct dm_target *ti, struct bvec_merge_data *bvm, > + struct bio_vec *biovec, int max_size) > +{ > + struct verity_config *vc = ti->private; > + struct request_queue *q = bdev_get_queue(vc->dev->bdev); > + > + if (!q->merge_bvec_fn) > + return max_size; > + > + bvm->bi_bdev = vc->dev->bdev; > + bvm->bi_sector = vc->start + bvm->bi_sector - ti->begin; > + > + /* Optionally, this could just return 0 to stick to single pages. */ > + return min(max_size, q->merge_bvec_fn(q, bvm, biovec)); > +} > + > +static int verity_iterate_devices(struct dm_target *ti, > + iterate_devices_callout_fn fn, void *data) > +{ > + struct verity_config *vc = ti->private; > + > + return fn(ti, vc->dev, vc->start, ti->len, data); > +} > + > +static void verity_io_hints(struct dm_target *ti, > + struct queue_limits *limits) > +{ > + struct verity_config *vc = ti->private; > + unsigned int block_size = vc->bht.block_size; > + > + limits->logical_block_size = block_size; > + limits->physical_block_size = block_size; > + blk_limits_io_min(limits, block_size); > +} > + > +static struct target_type verity_target = { > + .name = "verity", > + .version = {0, 1, 0}, > + .module = THIS_MODULE, > + .ctr = verity_ctr, > + .dtr = verity_dtr, > + .ioctl = verity_ioctl, > + .map = verity_map, > + .merge = verity_merge, > + .status = verity_status, > + .iterate_devices = verity_iterate_devices, > + .io_hints = verity_io_hints, > +}; > + > +#define VERITY_WQ_FLAGS (WQ_CPU_INTENSIVE|WQ_HIGHPRI) > + > +static int __init verity_init(void) > +{ > + int r = -ENOMEM; > + > + _verity_io_pool = KMEM_CACHE(verity_io, 0); > + if (!_verity_io_pool) { > + DMERR("failed to allocate pool verity_io"); > + goto bad_io_pool; > + } > + > + kverityd_ioq = alloc_workqueue("kverityd_io", VERITY_WQ_FLAGS, 1); > + if (!kverityd_ioq) { > + DMERR("failed to create workqueue kverityd_ioq"); > + goto bad_io_queue; > + } > + > + kveritydq = alloc_workqueue("kverityd", VERITY_WQ_FLAGS, 1); > + if (!kveritydq) { > + DMERR("failed to create workqueue kveritydq"); > + goto bad_verify_queue; > + } > + > + r = dm_register_target(&verity_target); > + if (r < 0) { > + DMERR("register failed %d", r); > + goto register_failed; > + } > + > + DMINFO("version %u.%u.%u loaded", verity_target.version[0], > + verity_target.version[1], verity_target.version[2]); > + > + return r; > + > +register_failed: > + destroy_workqueue(kveritydq); > +bad_verify_queue: > + destroy_workqueue(kverityd_ioq); > +bad_io_queue: > + kmem_cache_destroy(_verity_io_pool); > +bad_io_pool: > + return r; > +} > + > +static void __exit verity_exit(void) > +{ > + destroy_workqueue(kveritydq); > + destroy_workqueue(kverityd_ioq); > + > + dm_unregister_target(&verity_target); > + kmem_cache_destroy(_verity_io_pool); > +} > + > +module_init(verity_init); > +module_exit(verity_exit); > + > +MODULE_AUTHOR("The Chromium OS Authors "); > +MODULE_DESCRIPTION(DM_NAME " target for transparent disk integrity checking"); > +MODULE_LICENSE("GPL"); > -- > 1.7.7.3 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/