2024-02-09 16:12:23

by Sergei Shtepa

[permalink] [raw]
Subject: [PATCH v7 0/8] filtering and snapshots of a block devices

Hi all.

I am happy to offer an improved version of the block device filtering
mechanism (blkfilter) and module for creating snapshots of block devices
(blksnap).

The filtering block device mechanism is implemented in the block layer.
This allows to attach and detach block device filters. Filters extend the
functionality of the block layer. See more in
Documentation/block/blkfilter.rst.

The main purpose of snapshots of block devices is to provide backups of
them. See more in Documentation/block/blksnap.rst. The tool, library and
tests for working with blksnap can be found on github.
Link: https://github.com/veeam/blksnap/tree/stable-v2.0
There is also documentation from which you can learn how to manage the
module using the library and the console tool.

Based on LK v6.8-rc3 with Christoph's patchset "clean up blk_mq_submit_bio".
Link: https://lore.kernel.org/linux-block/[email protected]/T/#t

I express my appreciation and gratitude to Christoph. Thanks to his
attention to the project, it was possible to raise the quality of the code.
I probably wouldn't have made version 7 if it wasn't for his help.
I am sure that the blksnap module will improve the quality of backup tools
for Linux.

v7 changes:
- The location of the filtering of I/O units has been changed. This made it
possible to remove the additional call bio_queue_enter().
- Remove configs BLKSNAP_DIFF_BLKDEV and BLKSNAP_CHUNK_DIFF_BIO_SYNC.
- To process the ioctl, the switch statement is used instead of a table
with functions.
- Instead of a file descriptor, the module gets a path on the file system.
This allows the kernel module to correctly open a file or block device
with exclusive access rights.
- Fixed a bio leaking bugs.

v6 changes:
- The difference storage has been changed.
In the previous version, the file was created only to reserve sector
ranges on a block device. The data was stored directly to the block
device in these sector ranges. Now saving and reading data is done using
'VFS' using vfs_iter_write() and vfs_iter_read() functions. This allows
not to depend on the filesystem and use, for example, tmpfs. Using an
unnamed temporary file allows hiding it from other processes and
automatically release it when the snapshot is closed.
However, now the module does not allow adding a block device to the
snapshot on which the difference storage is located. There is no way to
ensure the immutability of file metadata when writing data to a file.
This means that the metadata of the filesystem may change, which may
cause damage to the snapshot.
- _IOW and _IOR were mixed up - fixed.
- Protection against the use of the snapshots for block devices with
hardware inline encryption and data integrity was implemented.
Compatibility with them was not planned and has not been tested at the
moment.

v5 changes:
- Rebase for "kernel/git/axboe/linux-block.git" branch "for-6.5/block".
Link: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git/log/?h=for-6.5/block

v4 changes:
- Structures for describing the state of chunks are allocated dynamically.
This reduces memory consumption, since the struct chunk is allocated only
for those blocks for which the snapshot image state differs from the
original block device.
- The algorithm for calculating the chunk size depending on the size of the
block device has been changed. For large block devices, it is now
possible to allocate a larger number of chunks, and their size is smaller.
- For block devices, a 'filter' file has been added to /sys/block/<device>.
It displays the name of the filter that is attached to the block device.
- Fixed a problem with the lack of protection against re-adding a block
device to a snapshot.
- Fixed a bug in the algorithm of allocating the next bio for a chunk.
This problem was occurred on large disks, for which a chunk consists of
at least two bio.
- The ownership mechanism of the diff_area structure has been changed.
This fixed the error of prematurely releasing the diff_area structure
when destroying the snapshot.
- Documentation corrected.
- The Sparse analyzer is passed.
- Use __u64 type instead pointers in UAPI.

v3 changes:
- New block device I/O controls BLKFILTER_ATTACH and BLKFILTER_DETACH allow
to attach and detach filters.
- New block device I/O control BLKFILTER_CTL allow sending command to
attached block device filter.
- The copy-on-write algorithm for processing I/O units has been optimized
and has become asynchronous.
- The snapshot image reading algorithm has been optimized and has become
asynchronous.
- Optimized the finite state machine for processing chunks.
- Fixed a tracking block size calculation bug.

v2 changes:
- Added documentation for Block Device Filtering Mechanism.
- Added documentation for Block Devices Snapshots Module (blksnap).
- The MAINTAINERS file has been updated.
- Optimized queue code for snapshot images.
- Fixed comments, log messages and code for better readability.

v1 changes:
- Forgotten "static" declarations have been added.
- The text of the comments has been corrected.
- It is possible to connect only one filter, since there are no others in
upstream.
- Do not have additional locks for attach/detach filter.
- blksnap.h moved to include/uapi/.
- #pragma once and commented code removed.
- uuid_t removed from user API.
- Removed default values for module parameters from the configuration file.
- The debugging code for tracking memory leaks has been removed.
- Simplified Makefile.
- Optimized work with large memory buffers, CBT tables are now in virtual
memory.
- The allocation code of minor numbers has been optimized.
- The implementation of the snapshot image block device has been
simplified, now it is a bio-based block device.
- Removed initialization of global variables with null values.
- only one bio is used to copy one chunk.
- Checked on ppc64le.

Sergei Shtepa (8):
documentation: filtering and snapshots of a block devices
block: filtering of a block devices
block: header file of the blksnap module interface
block: module management interface functions
block: handling and tracking I/O units
block: difference storage implementation
block: snapshot and snapshot image block device
block: Kconfig, Makefile and MAINTAINERS files

Documentation/block/blkfilter.rst | 66 ++
Documentation/block/blksnap.rst | 351 ++++++++++
Documentation/block/index.rst | 2 +
.../userspace-api/ioctl/ioctl-number.rst | 1 +
MAINTAINERS | 17 +
block/Makefile | 3 +-
block/bdev.c | 2 +
block/blk-core.c | 26 +-
block/blk-filter.c | 257 +++++++
block/blk-mq.c | 7 +-
block/blk-mq.h | 2 +-
block/blk.h | 11 +
block/genhd.c | 10 +
block/ioctl.c | 7 +
block/partitions/core.c | 9 +
drivers/block/Kconfig | 2 +
drivers/block/Makefile | 2 +
drivers/block/blksnap/Kconfig | 12 +
drivers/block/blksnap/Makefile | 15 +
drivers/block/blksnap/cbt_map.c | 225 +++++++
drivers/block/blksnap/cbt_map.h | 90 +++
drivers/block/blksnap/chunk.c | 631 ++++++++++++++++++
drivers/block/blksnap/chunk.h | 134 ++++
drivers/block/blksnap/diff_area.c | 577 ++++++++++++++++
drivers/block/blksnap/diff_area.h | 175 +++++
drivers/block/blksnap/diff_buffer.c | 114 ++++
drivers/block/blksnap/diff_buffer.h | 37 +
drivers/block/blksnap/diff_storage.c | 290 ++++++++
drivers/block/blksnap/diff_storage.h | 103 +++
drivers/block/blksnap/event_queue.c | 81 +++
drivers/block/blksnap/event_queue.h | 64 ++
drivers/block/blksnap/main.c | 481 +++++++++++++
drivers/block/blksnap/params.h | 16 +
drivers/block/blksnap/snapimage.c | 135 ++++
drivers/block/blksnap/snapimage.h | 10 +
drivers/block/blksnap/snapshot.c | 462 +++++++++++++
drivers/block/blksnap/snapshot.h | 65 ++
drivers/block/blksnap/tracker.c | 369 ++++++++++
drivers/block/blksnap/tracker.h | 78 +++
include/linux/blk-filter.h | 72 ++
include/linux/blk_types.h | 1 +
include/linux/sched.h | 1 +
include/uapi/linux/blk-filter.h | 35 +
include/uapi/linux/blksnap.h | 384 +++++++++++
include/uapi/linux/fs.h | 3 +
45 files changed, 5430 insertions(+), 5 deletions(-)
create mode 100644 Documentation/block/blkfilter.rst
create mode 100644 Documentation/block/blksnap.rst
create mode 100644 block/blk-filter.c
create mode 100644 drivers/block/blksnap/Kconfig
create mode 100644 drivers/block/blksnap/Makefile
create mode 100644 drivers/block/blksnap/cbt_map.c
create mode 100644 drivers/block/blksnap/cbt_map.h
create mode 100644 drivers/block/blksnap/chunk.c
create mode 100644 drivers/block/blksnap/chunk.h
create mode 100644 drivers/block/blksnap/diff_area.c
create mode 100644 drivers/block/blksnap/diff_area.h
create mode 100644 drivers/block/blksnap/diff_buffer.c
create mode 100644 drivers/block/blksnap/diff_buffer.h
create mode 100644 drivers/block/blksnap/diff_storage.c
create mode 100644 drivers/block/blksnap/diff_storage.h
create mode 100644 drivers/block/blksnap/event_queue.c
create mode 100644 drivers/block/blksnap/event_queue.h
create mode 100644 drivers/block/blksnap/main.c
create mode 100644 drivers/block/blksnap/params.h
create mode 100644 drivers/block/blksnap/snapimage.c
create mode 100644 drivers/block/blksnap/snapimage.h
create mode 100644 drivers/block/blksnap/snapshot.c
create mode 100644 drivers/block/blksnap/snapshot.h
create mode 100644 drivers/block/blksnap/tracker.c
create mode 100644 drivers/block/blksnap/tracker.h
create mode 100644 include/linux/blk-filter.h
create mode 100644 include/uapi/linux/blk-filter.h
create mode 100644 include/uapi/linux/blksnap.h

--
2.34.1



2024-02-09 16:15:14

by Sergei Shtepa

[permalink] [raw]
Subject: [PATCH v7 8/8] block: Kconfig, Makefile and MAINTAINERS files

Allows to build a blksnap module and add it to the kernel tree.

Signed-off-by: Sergei Shtepa <[email protected]>
---
MAINTAINERS | 17 +++++++++++++++++
drivers/block/Kconfig | 2 ++
drivers/block/Makefile | 2 ++
drivers/block/blksnap/Kconfig | 12 ++++++++++++
drivers/block/blksnap/Makefile | 15 +++++++++++++++
5 files changed, 48 insertions(+)
create mode 100644 drivers/block/blksnap/Kconfig
create mode 100644 drivers/block/blksnap/Makefile

diff --git a/MAINTAINERS b/MAINTAINERS
index 960512bec428..fc95d3e1fd66 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3608,6 +3608,23 @@ M: Jan-Simon Moeller <[email protected]>
S: Maintained
F: drivers/leds/leds-blinkm.c

+BLOCK DEVICE FILTERING MECHANISM
+M: Sergei Shtepa <[email protected]>
+L: [email protected]
+S: Supported
+F: Documentation/block/blkfilter.rst
+F: block/blk-filter.c
+F: include/linux/blk-filter.h
+F: include/uapi/linux/blk-filter.h
+
+BLOCK DEVICE SNAPSHOTS MODULE
+M: Sergei Shtepa <[email protected]>
+L: [email protected]
+S: Supported
+F: Documentation/block/blksnap.rst
+F: drivers/block/blksnap/*
+F: include/uapi/linux/blksnap.h
+
BLOCK LAYER
M: Jens Axboe <[email protected]>
L: [email protected]
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 5b9d4aaebb81..74d2d55526a3 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -404,4 +404,6 @@ config BLKDEV_UBLK_LEGACY_OPCODES

source "drivers/block/rnbd/Kconfig"

+source "drivers/block/blksnap/Kconfig"
+
endif # BLK_DEV
diff --git a/drivers/block/Makefile b/drivers/block/Makefile
index 101612cba303..9a2a9a56a247 100644
--- a/drivers/block/Makefile
+++ b/drivers/block/Makefile
@@ -40,3 +40,5 @@ obj-$(CONFIG_BLK_DEV_NULL_BLK) += null_blk/
obj-$(CONFIG_BLK_DEV_UBLK) += ublk_drv.o

swim_mod-y := swim.o swim_asm.o
+
+obj-$(CONFIG_BLKSNAP) += blksnap/
diff --git a/drivers/block/blksnap/Kconfig b/drivers/block/blksnap/Kconfig
new file mode 100644
index 000000000000..ce3e33d52c71
--- /dev/null
+++ b/drivers/block/blksnap/Kconfig
@@ -0,0 +1,12 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Snapshots of block devices configuration
+#
+
+config BLKSNAP
+ tristate "Snapshots of block devices (Experimental)"
+ help
+ Allow to create snapshots and track block changes for block devices.
+ It can be used to create backups of block devices. Snapshots are
+ temporary and are released when backup is completed. Change block
+ tracking allows to create incremental or differential backups.
diff --git a/drivers/block/blksnap/Makefile b/drivers/block/blksnap/Makefile
new file mode 100644
index 000000000000..8d528b95579a
--- /dev/null
+++ b/drivers/block/blksnap/Makefile
@@ -0,0 +1,15 @@
+# SPDX-License-Identifier: GPL-2.0
+
+blksnap-y := \
+ cbt_map.o \
+ chunk.o \
+ diff_area.o \
+ diff_buffer.o \
+ diff_storage.o \
+ event_queue.o \
+ main.o \
+ snapimage.o \
+ snapshot.o \
+ tracker.o
+
+obj-$(CONFIG_BLKSNAP) += blksnap.o
--
2.34.1


2024-02-09 16:15:13

by Sergei Shtepa

[permalink] [raw]
Subject: [PATCH v7 1/8] documentation: filtering and snapshots of a block devices

The blkfilter.rst document contains:
* Describes the purpose of the mechanism
* A little historical background on the capabilities of handling I/O
units of the Linux kernel
* Brief description of the design
* Reference to interface description

The blksnap.rst document contains:
* Describes the purpose of the block device snapshots
* Description of features
* Description of algorithms
* Recommendations about using the module from the user-space side
* Reference to module interface description

Signed-off-by: Sergei Shtepa <[email protected]>
---
Documentation/block/blkfilter.rst | 66 ++++++
Documentation/block/blksnap.rst | 351 ++++++++++++++++++++++++++++++
Documentation/block/index.rst | 2 +
3 files changed, 419 insertions(+)
create mode 100644 Documentation/block/blkfilter.rst
create mode 100644 Documentation/block/blksnap.rst

diff --git a/Documentation/block/blkfilter.rst b/Documentation/block/blkfilter.rst
new file mode 100644
index 000000000000..4e148e78f3d4
--- /dev/null
+++ b/Documentation/block/blkfilter.rst
@@ -0,0 +1,66 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================================
+Block Device Filtering Mechanism
+================================
+
+The block device filtering mechanism provides the ability to attach block
+device filters. Block device filters allow performing additional processing
+for I/O units.
+
+Introduction
+============
+
+The idea of handling I/O units on block devices is not new. Back in the
+2.6 kernel, there was an undocumented possibility of handling I/O units
+by substituting the make_request_fn() function, which belonged to the
+request_queue structure. But none of the in-tree kernel modules used this
+feature, and it was eliminated in the 5.10 kernel.
+
+The block device filtering mechanism returns the ability to handle I/O units.
+It is possible to safely attach a filter to a block device "on the fly" without
+changing the structure of the block device's stack.
+
+It supports attaching one filter to one block device, because there is only
+one filter implementation in the kernel yet.
+See Documentation/block/blksnap.rst.
+
+Design
+======
+
+The block device filtering mechanism provides registration and unregistration
+for filter operations. The struct blkfilter_operations contains a pointer to
+the callback functions for the filter. After registering the filter operations,
+the filter can be managed using block device ioctls BLKFILTER_ATTACH,
+BLKFILTER_DETACH and BLKFILTER_CTL.
+
+When the filter is attached, the callback function is called for each I/O unit
+for a block device, providing I/O unit filtering. Depending on the result of
+filtering the I/O unit, it can either be passed for subsequent processing by
+the block layer, or skipped.
+
+The filter can be implemented as a loadable module. In this case, the filter
+module cannot be unloaded while the filter is attached to at least one of the
+block devices.
+
+Interface description
+=====================
+
+The ioctl BLKFILTER_ATTACH allows user-space programs to attach a block device
+filter to a block device. The ioctl BLKFILTER_DETACH allows user-space programs
+to detach it. Both ioctls use &struct blkfilter_name. The ioctl BLKFILTER_CTL
+allows user-space programs to send a filter-specific command. It use &struct
+blkfilter_ctl.
+
+.. kernel-doc:: include/uapi/linux/blk-filter.h
+
+To register in the system, the filter uses the &struct blkfilter_operations,
+which contains callback functions, unique filter name and module owner. When
+attaching a filter to a block device, the filter creates a &struct blkfilter.
+The pointer to the &struct blkfilter allows the filter to determine for which
+block device the callback functions are being called.
+
+.. kernel-doc:: include/linux/blk-filter.h
+
+.. kernel-doc:: block/blk-filter.c
+ :export:
diff --git a/Documentation/block/blksnap.rst b/Documentation/block/blksnap.rst
new file mode 100644
index 000000000000..679f753841d9
--- /dev/null
+++ b/Documentation/block/blksnap.rst
@@ -0,0 +1,351 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+========================================
+Block Devices Snapshots Module (blksnap)
+========================================
+
+Introduction
+============
+
+At first glance, there is no novelty in the idea of creating snapshots for
+block devices. The Linux kernel already has mechanisms for creating snapshots.
+Device Mapper includes dm-snap, which allows to create snapshots of block
+devices. BTRFS supports snapshots at the filesystem level. However, both of
+these options have specificities that do not allow to use them as a universal
+tool for creating backups.
+
+The main properties that a backup tool should have are:
+
+- Simplicity and universality of use
+- Reliability
+- Minimal consumption of system resources during backup
+- Minimal time required for recovery or replication of the entire system
+
+Taking above properties into account, blksnap module features:
+
+- Change tracker
+- Snapshots at the block device level
+- Dynamic allocation of space for storing differences
+- Snapshot overflow resistance
+- Coherent snapshot of multiple block devices
+
+Features
+========
+
+Change tracker
+--------------
+
+The change tracker allows to determine which blocks were changed during the
+time between the last snapshot created and any of the previous snapshots.
+With a map of changes, it is enough to copy only the changed blocks, and no
+need to reread the entire block device completely. The change tracker allows
+to implement the logic of both incremental and differential backups.
+Incremental backup is critical for large file repositories whose size can be
+hundreds of terabytes and whose full backup time can take more than a day.
+On such servers, the use of backup tools without a change tracker becomes
+practically impossible.
+
+Snapshot at the block device level
+----------------------------------
+
+A snapshot at the block device level allows to simplify the backup algorithm
+and reduce consumption of system resources. It also allows to perform linear
+reading of disk space directly, which allows to achieve maximum reading speed
+with minimal use of processor time. At the same time, the universality of
+creating snapshots for any block device is achieved, regardless of the file
+system located on it. The exceptions are BTRFS, ZFS and cluster file systems.
+
+Dynamic allocation of storage space for differences
+---------------------------------------------------
+
+To store differences, the module does not require a pre-reserved space on
+filesystem. The space for storing differences can be allocated in file in any
+filesystem. In addition, the size of the difference storage can be increased
+after the snapshot is created, but only for a filesystem that supports
+fallocate. A shared difference storage for all images of snapshot block devices
+allows to optimize the use of storage space. However, there is one limitation.
+A snapshot cannot be taken from a block device on which the difference storage
+is located.
+
+Snapshot overflow resistance
+----------------------------
+
+To create images of snapshots of block devices, the module stores blocks
+of the original block device that have been changed since the snapshot
+was taken. To do this, the module handles write requests and reads blocks
+that need to be overwritten. This algorithm guarantees safety of the data
+of the original block device in the event of an overflow of the snapshot,
+and even in the case of unpredictable critical errors. If a problem occurs
+during backup, the difference storage is released, the snapshot is closed,
+no backup is created, but the server continues to work.
+
+Coherent snapshot of multiple block devices
+-------------------------------------------
+
+A snapshot is created simultaneously for all block devices for which a backup
+is being created, ensuring their coherent state.
+
+
+Algorithms
+==========
+
+Overview
+--------
+
+The blksnap module is a block-level filter. It handles all write I/O units.
+The filter is attached to the block device when the snapshot is created
+for the first time. The change tracker marks all overwritten blocks.
+Information about the history of changes on the block device is available
+while holding the snapshot. The module reads the blocks that need to be
+overwritten and stores them in the difference storage. When reading from
+a snapshot image, reading is performed either from the original device or
+from the difference storage.
+
+Change tracking
+---------------
+
+A change tracker map is created for each block device. One byte of this map
+corresponds to one block. The block size is set by the
+``tracking_block_minimum_shift`` and ``tracking_block_maximum_count``
+module parameters. The ``tracking_block_minimum_shift`` parameter limits
+the minimum block size for tracking, while ``tracking_block_maximum_count``
+defines the maximum allowed number of blocks. The size of the change tracker
+block is determined depending on the size of the block device when adding
+a tracking device, that is, when the snapshot is taken for the first time.
+The block size must be a power of two. The ``tracking_block_maximum_shift``
+module parameter allows to limit the maximum block size for tracking. If the
+block size reaches the allowable limit, the number of blocks will exceed the
+``tracking_block_maximum_count`` parameter.
+
+The byte of the change map stores a number from 0 to 255. This is the
+snapshot number, since the creation of which there have been changes in
+the block. Each time a snapshot is created, the number of the current
+snapshot is increased by one. This number is written to the cell of the
+change map when writing to the block. Thus, knowing the number of one of
+the previous snapshots and the number of the last snapshot, one can determine
+from the change map which blocks have been changed. When the number of the
+current change reaches the maximum allowed value for the map of 255, at the
+time when the next snapshot is created, the map of changes is reset to zero,
+and the number of the current snapshot is assigned the value 1. The change
+tracker is reset, and a new UUID is generated - a unique identifier of the
+snapshot generation. The snapshot generation identifier allows to identify
+that a change tracking reset has been performed.
+
+The change map has two copies. One copy is active, it tracks the current
+changes on the block device. The second copy is available for reading
+while the snapshot is being held, and contains the history up to the moment
+the snapshot is taken. Copies are synchronized at the moment of snapshot
+creation. After the snapshot is released, a second copy of the map is not
+needed, but it is not released, so as not to allocate memory for it again
+the next time the snapshot is created.
+
+Copy on write
+-------------
+
+Data is copied in blocks, or rather in chunks. The term "chunk" is used to
+avoid confusion with change tracker blocks and I/O blocks. In addition,
+the "chunk" in the blksnap module means about the same as the "chunk" in
+the dm-snap module.
+
+The size of the chunk is determined by the ``chunk_minimum_shift`` and
+``chunk_maximum_count`` module parameters. The ``chunk_minimum_shift``
+parameter limits the minimum size of the chunk, while ``chunk_maximum_count``
+defines the maximum allowed number of chunks. The size of the chunk is
+determined depending on the size of the block device at the time of taking the
+snapshot. The size of the chunk must be a power of two. The module parameter
+``chunk_maximum_shift`` allows to limit the maximum chunk size. If the chunk
+size reaches the allowable limit, the number of chunks will exceed the
+``chunk_maximum_count`` parameter.
+
+One chunk is described by the ``struct chunk`` structure. A map of structures
+is created for each block device. The structure contains all the necessary
+information to copy the chunks data from the original block device to the
+difference storage. This information allows to describe the snapshot image.
+A semaphore is located in the structure, which allows synchronization of threads
+accessing the chunk.
+
+The block level in Linux has a feature. If a read I/O unit was sent, and a
+write I/O unit was sent after it, then a write can be performed first, and only
+then a read. Therefore, the copy-on-write algorithm is executed synchronously.
+If the write request is handled, the execution of this I/O unit will be delayed
+until the overwritten chunks are read from the original device for later
+storing to the difference store. But if, when handling a write I/O unit, it
+turns out that the written range of sectors has already been prepared for
+storing to the difference storage, then the I/O unit is simply passed.
+
+This algorithm makes it possible to efficiently perform backup even systems
+with a Round-Robin databases. Such databases can be overwritten several times
+during the system backup. Of course, the value of a backup of the RRD monitoring
+system data can be questioned. However, it is often a task to make a backup
+of the entire enterprise infrastructure in order to restore or replicate it
+entirely in case of problems.
+
+There is also a flaw in the algorithm. When overwriting at least one sector,
+an entire chunk is copied. Thus, a situation of rapid filling of the difference
+storage when writing data to a block device in small portions in random order
+is possible. This situation is possible in case of strong fragmentation of
+data on the filesystem. But it must be borne in mind that with such data
+fragmentation, performance of systems usually degrades greatly. So, this
+problem does not occur on real servers, although it can easily be created
+by artificial tests.
+
+Difference storage
+------------------
+
+The difference storage can be a block device or it can be a file on a
+filesystem. Using a block device allows to achieve slightly higher performance,
+but in this case, the block device is used by the kernel module exclusively.
+Usually the disk space is marked up so that there is no available free space
+for backup purposes. Using a file allows to place the difference storage on a
+filesystem.
+
+The difference storage can be expanded already while the snapshot is being held,
+but only if the filesystem supports fallocate(). If the free space in the
+difference storage remains less than half of the value of the module parameter
+``diff_storage_minimum``, then the kernel module can expand the difference
+storage file within the specified limits. This limit is set when creating a
+snapshot.
+
+If free space in the difference storage runs out, an event to user land is
+generated about the overflow of the snapshot. Such a snapshot is considered
+corrupted, and read I/O units to snapshot images will be terminated with an
+error code. The difference storage stores outdated data required for snapshot
+images, so when the snapshot is overflowed, the backup process is interrupted,
+but the system maintains its operability without data loss.
+
+The difference storage has a limitation. The device cannot be added to the
+snapshot where the difference storage is located. In this case, the difference
+storage can be located in virtual memory, which consists of RAM and a swap
+partition (or file). To do this, it is enough to use a file in /dev/shm, or a
+new tmpfs filesystem can be created for this purpose. Obviously, this variant
+can be useful if the system has a lot of RAM or a large swap. The good news is
+that the modern Linux kernel allows to increase the size of the swap file "on
+the fly" without changing the system configuration.
+
+A regular file or a block device file for the difference storage must be opened
+with the O_EXCL flag. If an unnamed file with the O_TMPFILE flag is created,
+then such a file will be automatically released when the snapshot is destroyed.
+In addition, the use of an unnamed temporary file ensures that no one can open
+this file and read its contents.
+
+Performing I/O for a snapshot image
+-----------------------------------
+
+To read snapshot data, when taking a snapshot, block devices of snapshot images
+are created. The snapshot image block devices support the write operation.
+This allows to perform additional data preparation on the filesystem before
+creating a backup.
+
+To process the I/O unit, clones of the I/O unit are created, which redirect
+the I/O unit either to the original block device or to the difference storage.
+When processing of cloned I/O units is completed, the original I/O unit is
+marked as completed too.
+
+An I/O unit can be partially processed without accessing to block devices if
+the I/O unit refers to a chunk that is in the queue for storing to the
+difference storage. In this case, the data is read or written in a buffer in
+memory.
+
+If, when processing the write I/O unit, it turns out that the data of the
+referred chunk has not yet been stored to the difference storage or has not
+even been read from the original device, then an I/O unit to read data from the
+original device is initiated beforehand. After the reading from original device
+is performed, their data from the I/O unit is partially overwritten directly in
+the buffer of the chunk in memory, and the chunk is scheduled to be saved to the
+difference storage.
+
+How to use
+==========
+
+Depending on the needs and the selected license, you can choose different
+options for managing the module:
+
+- Using ioctl directly
+- Using a static C++ library
+- Using the blksnap console tool
+
+Using a BLKFILTER_CTL for block device
+--------------------------------------
+
+BLKFILTER_CTL allows to send a filter-specific command to the filter on block
+device and get the result of its execution. The module provides the
+``include/uapi/blksnap.h`` header file with a description of the commands and
+their data structures.
+
+1. ``BLKFILTER_CTL_BLKSNAP_CBTINFO`` allows to get information from the
+ change tracker.
+2. ``BLKFILTER_CTL_BLKSNAP_CBTMAP`` reads the change tracker table. If a write
+ operation was performed for the snapshot, then the change tracker takes this
+ into account. Therefore, it is necessary to receive tracker data after write
+ operations have been completed.
+3. ``BLKFILTER_CTL_BLKSNAP_CBTDIRTY`` mark blocks as changed in the change
+ tracker table. This is necessary if post-processing is performed after the
+ backup is created, which changes the backup blocks.
+4. ``BLKFILTER_CTL_BLKSNAP_SNAPSHOTADD`` adds a block device to the snapshot.
+5. ``BLKFILTER_CTL_BLKSNAP_SNAPSHOTINFO`` allows to get the name of the snapshot
+ image block device and the presence of an error.
+
+Using ioctl
+-----------
+
+Using a BLKFILTER_CTL ioctl does not allow to fully implement the management of
+the blksnap module. A control file ``blksnap-control`` is created to manage
+snapshots. The control commands are also described in the file
+``include/uapi/blksnap.h``.
+
+1. ``BLKSNAP_IOCTL_VERSION`` get the version number.
+2. ``BLKSNAP_IOCTL_SNAPSHOT_CREATE`` initiates a snapshot and prepares a
+ difference storage.
+3. ``BLKSNAP_IOCTL_SNAPSHOT_TAKE`` creates block devices of block device
+ snapshot images.
+4. ``BLKSNAP_IOCTL_SNAPSHOT_COLLECT`` collect all created snapshots.
+5. ``BLKSNAP_IOCTL_SNAPSHOT_WAIT_EVENT`` allows to track the status of
+ snapshots and receive events about the requirement to expand the difference
+ storage or about snapshot overflow.
+6. ``BLKSNAP_IOCTL_SNAPSHOT_DESTROY`` releases the snapshot.
+
+Static C++ library
+------------------
+
+The [#userspace_libs]_ library was created primarily to simplify creation of
+tests in C++, and it is also a good example of using the module interface.
+When creating applications, direct use of control calls is preferable.
+However, the library can be used in an application with a GPL-2+ license,
+or a library with an LGPL-2+ license can be created, with which even a
+proprietary application can be dynamically linked.
+
+blksnap console tool
+--------------------
+
+The blksnap [#userspace_tools]_ console tool allows to control the module from
+the command line. The tool contains detailed built-in help. To get list of
+commands with usage description, see ``blksnap --help`` command. The ``blksnap
+<command name> --help`` command allows to get detailed information about the
+parameters of each command call. This option may be convenient when creating
+proprietary software, as it allows not to compile with the open source code.
+At the same time, the blksnap tool can be used for creating backup scripts.
+For example, rsync can be called to synchronize files on the filesystem of
+the mounted snapshot image and files in the archive on a filesystem that
+supports compression.
+
+Tests
+-----
+
+A set of tests was created for regression testing [#userspace_tests]_.
+Tests with simple algorithms that use the ``blksnap`` console tool to
+control the module are written in Bash. More complex testing algorithms
+are implemented in C++.
+
+References
+==========
+
+.. [#userspace_libs] https://github.com/veeam/blksnap/tree/stable-v2.0/lib
+
+.. [#userspace_tools] https://github.com/veeam/blksnap/tree/stable-v2.0/tools
+
+.. [#userspace_tests] https://github.com/veeam/blksnap/tree/stable-v2.0/tests
+
+Module interface description
+============================
+
+.. kernel-doc:: include/uapi/linux/blksnap.h
diff --git a/Documentation/block/index.rst b/Documentation/block/index.rst
index 9fea696f9daa..696ff150c6b7 100644
--- a/Documentation/block/index.rst
+++ b/Documentation/block/index.rst
@@ -10,6 +10,8 @@ Block
bfq-iosched
biovecs
blk-mq
+ blkfilter
+ blksnap
cmdline-partition
data-integrity
deadline-iosched
--
2.34.1


2024-02-09 16:19:00

by Sergei Shtepa

[permalink] [raw]
Subject: [PATCH v7 7/8] block: snapshot and snapshot image block device

The struck snapshot combines block devices, for which a snapshot is
created, block devices of their snapshot images, as well as a difference
storage.

There may be several snapshots at the same time, but they should not
contain common block devices. This can be used for cases when backup is
scheduled once an hour for some block devices, and once a day for
others, and once a week for others. In this case, it is possible that
three snapshots are used at the same time.

Snapshot images of block devices provides the read and write operations.
They redirect I/O units to the original block device or to differential
storage devices.

Events are used to fast notify the user-space of a change in the
snapshot state. For example, if an error occurred while snapshot holding
when reading data from the original block device or from the difference
storage, the thread polling this queue will read a message about it.

Signed-off-by: Sergei Shtepa <[email protected]>
---
drivers/block/blksnap/event_queue.c | 81 +++++
drivers/block/blksnap/event_queue.h | 64 ++++
drivers/block/blksnap/snapimage.c | 135 ++++++++
drivers/block/blksnap/snapimage.h | 10 +
drivers/block/blksnap/snapshot.c | 462 ++++++++++++++++++++++++++++
drivers/block/blksnap/snapshot.h | 65 ++++
6 files changed, 817 insertions(+)
create mode 100644 drivers/block/blksnap/event_queue.c
create mode 100644 drivers/block/blksnap/event_queue.h
create mode 100644 drivers/block/blksnap/snapimage.c
create mode 100644 drivers/block/blksnap/snapimage.h
create mode 100644 drivers/block/blksnap/snapshot.c
create mode 100644 drivers/block/blksnap/snapshot.h

diff --git a/drivers/block/blksnap/event_queue.c b/drivers/block/blksnap/event_queue.c
new file mode 100644
index 000000000000..afa4e8511eeb
--- /dev/null
+++ b/drivers/block/blksnap/event_queue.c
@@ -0,0 +1,81 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+#define pr_fmt(fmt) KBUILD_MODNAME "-event_queue: " fmt
+
+#include <linux/slab.h>
+#include <linux/sched.h>
+#include "event_queue.h"
+
+void event_queue_init(struct event_queue *event_queue)
+{
+ INIT_LIST_HEAD(&event_queue->list);
+ spin_lock_init(&event_queue->lock);
+ init_waitqueue_head(&event_queue->wq_head);
+}
+
+void event_queue_done(struct event_queue *event_queue)
+{
+ struct event *event;
+
+ spin_lock(&event_queue->lock);
+ while (!list_empty(&event_queue->list)) {
+ event = list_first_entry(&event_queue->list, struct event,
+ link);
+ list_del(&event->link);
+ event_free(event);
+ }
+ spin_unlock(&event_queue->lock);
+}
+
+int event_gen(struct event_queue *event_queue, int code,
+ const void *data, int data_size)
+{
+ struct event *event;
+
+ event = kzalloc(sizeof(struct event) + data_size + 1, GFP_KERNEL);
+ if (!event)
+ return -ENOMEM;
+
+ event->time = ktime_get();
+ event->code = code;
+ event->data_size = data_size;
+ memcpy(event->data, data, data_size);
+
+ pr_debug("Generate event: time=%lld code=%d data_size=%d\n",
+ event->time, event->code, event->data_size);
+
+ spin_lock(&event_queue->lock);
+ list_add_tail(&event->link, &event_queue->list);
+ spin_unlock(&event_queue->lock);
+
+ wake_up(&event_queue->wq_head);
+ return 0;
+}
+
+struct event *event_wait(struct event_queue *event_queue,
+ unsigned long timeout_ms)
+{
+ int ret;
+
+ ret = wait_event_interruptible_timeout(event_queue->wq_head,
+ !list_empty(&event_queue->list), timeout_ms);
+ if (ret >= 0) {
+ struct event *event = ERR_PTR(-ENOENT);
+
+ spin_lock(&event_queue->lock);
+ if (!list_empty(&event_queue->list)) {
+ event = list_first_entry(&event_queue->list,
+ struct event, link);
+ list_del(&event->link);
+ }
+ spin_unlock(&event_queue->lock);
+ return event;
+ }
+ if (ret == -ERESTARTSYS) {
+ pr_debug("event waiting interrupted\n");
+ return ERR_PTR(-EINTR);
+ }
+
+ pr_err("Failed to wait event. errno=%d\n", abs(ret));
+ return ERR_PTR(ret);
+}
diff --git a/drivers/block/blksnap/event_queue.h b/drivers/block/blksnap/event_queue.h
new file mode 100644
index 000000000000..4980789ee83a
--- /dev/null
+++ b/drivers/block/blksnap/event_queue.h
@@ -0,0 +1,64 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+#ifndef __BLKSNAP_EVENT_QUEUE_H
+#define __BLKSNAP_EVENT_QUEUE_H
+
+#include <linux/types.h>
+#include <linux/ktime.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/wait.h>
+
+/**
+ * struct event - An event to be passed to the user space.
+ * @link:
+ * The list header allows to combine events from the queue.
+ * @time:
+ * A timestamp indicates when an event occurred.
+ * @code:
+ * Event code.
+ * @data_size:
+ * The number of bytes in the event data array.
+ * @data:
+ * An array of event data.
+ *
+ * Events can be different, so they contain different data. The size of the
+ * data array is not defined exactly, but it has limitations. The size of
+ * the event structure is limited by the PAGE_SIZE (4096 bytes).
+ */
+struct event {
+ struct list_head link;
+ ktime_t time;
+ int code;
+ int data_size;
+ char data[];
+};
+
+/**
+ * struct event_queue - A queue of &struct event.
+ * @list:
+ * Linked list for storing events.
+ * @lock:
+ * Spinlock allows to guarantee safety of the linked list.
+ * @wq_head:
+ * A wait queue allows to put a user thread in a waiting state until
+ * an event appears in the linked list.
+ */
+struct event_queue {
+ struct list_head list;
+ spinlock_t lock;
+ struct wait_queue_head wq_head;
+};
+
+void event_queue_init(struct event_queue *event_queue);
+void event_queue_done(struct event_queue *event_queue);
+
+int event_gen(struct event_queue *event_queue, int code,
+ const void *data, int data_size);
+struct event *event_wait(struct event_queue *event_queue,
+ unsigned long timeout_ms);
+static inline void event_free(struct event *event)
+{
+ kfree(event);
+};
+#endif /* __BLKSNAP_EVENT_QUEUE_H */
diff --git a/drivers/block/blksnap/snapimage.c b/drivers/block/blksnap/snapimage.c
new file mode 100644
index 000000000000..2e87f3380cbc
--- /dev/null
+++ b/drivers/block/blksnap/snapimage.c
@@ -0,0 +1,135 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+/*
+ * Present the snapshot image as a block device.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME "-image: " fmt
+#include <linux/slab.h>
+#include <linux/cdrom.h>
+#include <linux/blk-mq.h>
+#include <linux/build_bug.h>
+#include <uapi/linux/blksnap.h>
+#include "snapimage.h"
+#include "tracker.h"
+#include "chunk.h"
+#include "cbt_map.h"
+
+/*
+ * The snapshot supports write operations. This allows for example to delete
+ * some files from the file system before backing up the volume. The data can
+ * be stored only in the difference storage. Therefore, before partially
+ * overwriting this data, it should be read from the original block device.
+ */
+static void snapimage_submit_bio(struct bio *bio)
+{
+ struct tracker *tracker = bio->bi_bdev->bd_disk->private_data;
+ struct diff_area *diff_area = tracker->diff_area;
+ unsigned int flags;
+ struct blkfilter *prev_filter;
+ bool is_success = true;
+
+ /*
+ * We can use the diff_area here without fear that it will be released.
+ * The diff_area is not blocked from releasing now, because
+ * snapimage_free() is calling before diff_area_put() in
+ * tracker_release_snapshot().
+ */
+ if (diff_area_is_corrupted(diff_area)) {
+ bio_io_error(bio);
+ return;
+ }
+
+ flags = memalloc_noio_save();
+ /*
+ * The change tracking table should indicate that the image block device
+ * is different from the original device. At the next snapshot, such
+ * blocks must be inevitably reread.
+ */
+ if (op_is_write(bio_op(bio)))
+ cbt_map_set_both(tracker->cbt_map, bio->bi_iter.bi_sector,
+ bio_sectors(bio));
+
+ prev_filter = current->blk_filter;
+ current->blk_filter = &tracker->filter;
+ while (bio->bi_iter.bi_size && is_success)
+ is_success = diff_area_submit_chunk(diff_area, bio);
+ current->blk_filter = prev_filter;
+
+ if (is_success)
+ bio_endio(bio);
+ else
+ bio_io_error(bio);
+
+ memalloc_noio_restore(flags);
+}
+
+static const struct block_device_operations bd_ops = {
+ .owner = THIS_MODULE,
+ .submit_bio = snapimage_submit_bio,
+};
+
+void snapimage_free(struct tracker *tracker)
+{
+ struct gendisk *disk = tracker->snap_disk;
+
+ if (!disk)
+ return;
+
+ pr_debug("Snapshot image disk %s delete\n", disk->disk_name);
+ del_gendisk(disk);
+ put_disk(disk);
+
+ tracker->snap_disk = NULL;
+}
+
+int snapimage_create(struct tracker *tracker)
+{
+ int ret = 0;
+ dev_t dev_id = tracker->dev_id;
+ struct gendisk *disk;
+
+ pr_info("Create snapshot image device for original device [%u:%u]\n",
+ MAJOR(dev_id), MINOR(dev_id));
+
+ disk = blk_alloc_disk(NUMA_NO_NODE);
+ if (!disk) {
+ pr_err("Failed to allocate disk\n");
+ return -ENOMEM;
+ }
+
+ disk->flags = GENHD_FL_NO_PART;
+ disk->fops = &bd_ops;
+ disk->private_data = tracker;
+ set_capacity(disk, tracker->cbt_map->device_capacity);
+ ret = snprintf(disk->disk_name, DISK_NAME_LEN, "%s_%d:%d",
+ BLKSNAP_IMAGE_NAME, MAJOR(dev_id), MINOR(dev_id));
+ if (ret < 0) {
+ pr_err("Unable to set disk name for snapshot image device: invalid device id [%d:%d]\n",
+ MAJOR(dev_id), MINOR(dev_id));
+ ret = -EINVAL;
+ goto fail_cleanup_disk;
+ }
+ pr_debug("Snapshot image disk name [%s]\n", disk->disk_name);
+
+ blk_queue_physical_block_size(disk->queue,
+ tracker->diff_area->physical_blksz);
+ blk_queue_logical_block_size(disk->queue,
+ tracker->diff_area->logical_blksz);
+
+ ret = add_disk(disk);
+ if (ret) {
+ pr_err("Failed to add disk [%s] for snapshot image device\n",
+ disk->disk_name);
+ goto fail_cleanup_disk;
+ }
+ tracker->snap_disk = disk;
+
+ pr_debug("Image block device [%d:%d] has been created\n",
+ disk->major, disk->first_minor);
+
+ return 0;
+
+fail_cleanup_disk:
+ put_disk(disk);
+ return ret;
+}
diff --git a/drivers/block/blksnap/snapimage.h b/drivers/block/blksnap/snapimage.h
new file mode 100644
index 000000000000..cb2df7019eb8
--- /dev/null
+++ b/drivers/block/blksnap/snapimage.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+#ifndef __BLKSNAP_SNAPIMAGE_H
+#define __BLKSNAP_SNAPIMAGE_H
+
+struct tracker;
+
+void snapimage_free(struct tracker *tracker);
+int snapimage_create(struct tracker *tracker);
+#endif /* __BLKSNAP_SNAPIMAGE_H */
diff --git a/drivers/block/blksnap/snapshot.c b/drivers/block/blksnap/snapshot.c
new file mode 100644
index 000000000000..db5ff325fa58
--- /dev/null
+++ b/drivers/block/blksnap/snapshot.c
@@ -0,0 +1,462 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+#define pr_fmt(fmt) KBUILD_MODNAME "-snapshot: " fmt
+
+#include <linux/slab.h>
+#include <linux/sched/mm.h>
+#include <linux/build_bug.h>
+#include <uapi/linux/blksnap.h>
+#include "snapshot.h"
+#include "tracker.h"
+#include "diff_storage.h"
+#include "diff_area.h"
+#include "snapimage.h"
+#include "cbt_map.h"
+
+static LIST_HEAD(snapshots);
+static DECLARE_RWSEM(snapshots_lock);
+
+static void snapshot_free(struct kref *kref)
+{
+ struct snapshot *snapshot = container_of(kref, struct snapshot, kref);
+
+ pr_info("Release snapshot %pUb\n", &snapshot->id);
+ while (!list_empty(&snapshot->trackers)) {
+ struct tracker *tracker;
+
+ tracker = list_first_entry(&snapshot->trackers, struct tracker,
+ link);
+ list_del_init(&tracker->link);
+ tracker_release_snapshot(tracker);
+ tracker_put(tracker);
+ }
+
+ diff_storage_put(snapshot->diff_storage);
+ snapshot->diff_storage = NULL;
+ kfree(snapshot);
+}
+
+static inline void snapshot_get(struct snapshot *snapshot)
+{
+ kref_get(&snapshot->kref);
+};
+static inline void snapshot_put(struct snapshot *snapshot)
+{
+ if (likely(snapshot))
+ kref_put(&snapshot->kref, snapshot_free);
+};
+
+static struct snapshot *snapshot_new(void)
+{
+ int ret;
+ struct snapshot *snapshot = NULL;
+
+ snapshot = kzalloc(sizeof(struct snapshot), GFP_KERNEL);
+ if (!snapshot)
+ return ERR_PTR(-ENOMEM);
+
+ snapshot->diff_storage = diff_storage_new();
+ if (!snapshot->diff_storage) {
+ ret = -ENOMEM;
+ goto fail_free_snapshot;
+ }
+
+ INIT_LIST_HEAD(&snapshot->link);
+ kref_init(&snapshot->kref);
+ uuid_gen(&snapshot->id);
+ init_rwsem(&snapshot->rw_lock);
+ snapshot->is_taken = false;
+ INIT_LIST_HEAD(&snapshot->trackers);
+
+ return snapshot;
+
+fail_free_snapshot:
+ kfree(snapshot);
+
+ return ERR_PTR(ret);
+}
+
+void __exit snapshot_done(void)
+{
+ struct snapshot *snapshot;
+
+ pr_debug("Cleanup snapshots\n");
+ do {
+ down_write(&snapshots_lock);
+ snapshot = list_first_entry_or_null(&snapshots, struct snapshot,
+ link);
+ if (snapshot)
+ list_del(&snapshot->link);
+ up_write(&snapshots_lock);
+
+ snapshot_put(snapshot);
+ } while (snapshot);
+}
+
+int snapshot_create(const char *filename, sector_t limit_sect,
+ struct blksnap_uuid *id)
+{
+ int ret;
+ struct snapshot *snapshot = NULL;
+
+ snapshot = snapshot_new();
+ if (IS_ERR(snapshot)) {
+ pr_err("Unable to create snapshot: failed to allocate snapshot structure\n");
+ return PTR_ERR(snapshot);
+ }
+
+ if (!filename) {
+ pr_err("Unable to create snapshot: difference storage file is not set\n");
+ snapshot_put(snapshot);
+ return ret;
+ }
+ ret = diff_storage_set_diff_storage(snapshot->diff_storage,
+ filename, limit_sect);
+ if (ret) {
+ pr_err("Unable to create snapshot: invalid difference storage file\n");
+ snapshot_put(snapshot);
+ return ret;
+ }
+
+ export_uuid(id->b, &snapshot->id);
+
+ down_write(&snapshots_lock);
+ list_add_tail(&snapshot->link, &snapshots);
+ up_write(&snapshots_lock);
+
+ pr_info("Snapshot %pUb was created\n", id->b);
+ return 0;
+}
+
+static struct snapshot *snapshot_get_by_id(const uuid_t *id)
+{
+ struct snapshot *snapshot = NULL;
+ struct snapshot *s;
+
+ down_read(&snapshots_lock);
+ if (list_empty(&snapshots))
+ goto out;
+
+ list_for_each_entry(s, &snapshots, link) {
+ if (uuid_equal(&s->id, id)) {
+ snapshot = s;
+ snapshot_get(snapshot);
+ break;
+ }
+ }
+out:
+ up_read(&snapshots_lock);
+ return snapshot;
+}
+
+int snapshot_add_device(const uuid_t *id, struct tracker *tracker)
+{
+ int ret = 0;
+ struct snapshot *snapshot = NULL;
+
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+ if (tracker->orig_bdev->bd_disk->queue->integrity.profile) {
+ pr_err("Blksnap is not compatible with data integrity\n");
+ ret = -EPERM;
+ goto out_up;
+ } else
+ pr_debug("Data integrity not found\n");
+#endif
+
+#ifdef CONFIG_BLK_INLINE_ENCRYPTION
+ if (tracker->orig_bdev->bd_disk->queue->crypto_profile) {
+ pr_err("Blksnap is not compatible with hardware inline encryption\n");
+ ret = -EPERM;
+ goto out_up;
+ } else
+ pr_debug("Inline encryption not found\n");
+#endif
+ snapshot = snapshot_get_by_id(id);
+ if (!snapshot)
+ return -ESRCH;
+
+ down_write(&snapshot->rw_lock);
+ if (tracker->dev_id == snapshot->diff_storage->dev_id) {
+ pr_err("The block device %d:%d is already being used as difference storage\n",
+ MAJOR(tracker->dev_id), MINOR(tracker->dev_id));
+ goto out_up;
+ }
+ if (!list_empty(&snapshot->trackers)) {
+ struct tracker *tr;
+
+ list_for_each_entry(tr, &snapshot->trackers, link) {
+ if ((tr == tracker) ||
+ (tr->dev_id == tracker->dev_id)) {
+ ret = -EALREADY;
+ goto out_up;
+ }
+ }
+ }
+ if (list_empty(&tracker->link)) {
+ tracker_get(tracker);
+ list_add_tail(&tracker->link, &snapshot->trackers);
+ } else
+ ret = -EBUSY;
+out_up:
+ up_write(&snapshot->rw_lock);
+
+ snapshot_put(snapshot);
+
+ return ret;
+}
+
+int snapshot_destroy(const uuid_t *id)
+{
+ struct snapshot *snapshot = NULL;
+
+ pr_info("Destroy snapshot %pUb\n", id);
+ down_write(&snapshots_lock);
+ if (!list_empty(&snapshots)) {
+ struct snapshot *s = NULL;
+
+ list_for_each_entry(s, &snapshots, link) {
+ if (uuid_equal(&s->id, id)) {
+ snapshot = s;
+ list_del(&snapshot->link);
+ break;
+ }
+ }
+ }
+ up_write(&snapshots_lock);
+
+ if (!snapshot) {
+ pr_err("Unable to destroy snapshot: cannot find snapshot by id %pUb\n",
+ id);
+ return -ENODEV;
+ }
+ snapshot_put(snapshot);
+
+ return 0;
+}
+
+static int snapshot_take_trackers(struct snapshot *snapshot)
+{
+ int ret = 0;
+ struct tracker *tracker;
+
+ down_write(&snapshot->rw_lock);
+
+ if (list_empty(&snapshot->trackers)) {
+ ret = -ENODEV;
+ goto fail;
+ }
+
+ list_for_each_entry(tracker, &snapshot->trackers, link) {
+ struct diff_area *diff_area =
+ diff_area_new(tracker, snapshot->diff_storage);
+
+ if (IS_ERR(diff_area)) {
+ ret = PTR_ERR(diff_area);
+ break;
+ }
+ tracker->diff_area = diff_area;
+ }
+ if (ret)
+ goto fail;
+
+ /*
+ * Try to flush and freeze file system on each original block device.
+ */
+ list_for_each_entry(tracker, &snapshot->trackers, link) {
+ if (bdev_freeze(tracker->diff_area->orig_bdev))
+ pr_warn("Failed to freeze device [%u:%u]\n",
+ MAJOR(tracker->dev_id), MINOR(tracker->dev_id));
+ else {
+ pr_debug("Device [%u:%u] was frozen\n",
+ MAJOR(tracker->dev_id), MINOR(tracker->dev_id));
+ }
+ }
+
+ /*
+ * Take snapshot - switch CBT tables and enable COW logic for each
+ * tracker.
+ */
+ list_for_each_entry(tracker, &snapshot->trackers, link) {
+ ret = tracker_take_snapshot(tracker);
+ if (ret) {
+ pr_err("Unable to take snapshot: failed to capture snapshot %pUb\n",
+ &snapshot->id);
+ break;
+ }
+ }
+
+ if (!ret)
+ snapshot->is_taken = true;
+
+ /*
+ * Thaw file systems on original block devices.
+ */
+ list_for_each_entry(tracker, &snapshot->trackers, link) {
+ if (bdev_thaw(tracker->diff_area->orig_bdev))
+ pr_warn("Failed to thaw device [%u:%u]\n",
+ MAJOR(tracker->dev_id), MINOR(tracker->dev_id));
+ else
+ pr_debug("Device [%u:%u] was unfrozen\n",
+ MAJOR(tracker->dev_id), MINOR(tracker->dev_id));
+ }
+fail:
+ if (ret) {
+ list_for_each_entry(tracker, &snapshot->trackers, link) {
+ if (tracker->diff_area) {
+ diff_area_put(tracker->diff_area);
+ tracker->diff_area = NULL;
+ }
+ }
+ }
+ up_write(&snapshot->rw_lock);
+ return ret;
+}
+
+/*
+ * Sometimes a snapshot is in the state of corrupt immediately after it is
+ * taken.
+ */
+static int snapshot_check_trackers(struct snapshot *snapshot)
+{
+ int ret = 0;
+ struct tracker *tracker;
+
+ down_read(&snapshot->rw_lock);
+
+ list_for_each_entry(tracker, &snapshot->trackers, link) {
+ if (unlikely(diff_area_is_corrupted(tracker->diff_area))) {
+ pr_err("Unable to create snapshot for device [%u:%u]: diff area is corrupted\n",
+ MAJOR(tracker->dev_id), MINOR(tracker->dev_id));
+ ret = -EFAULT;
+ break;
+ }
+ }
+
+ up_read(&snapshot->rw_lock);
+
+ return ret;
+}
+
+/*
+ * Create all image block devices.
+ */
+static int snapshot_take_images(struct snapshot *snapshot)
+{
+ int ret = 0;
+ struct tracker *tracker;
+
+ down_write(&snapshot->rw_lock);
+
+ list_for_each_entry(tracker, &snapshot->trackers, link) {
+ ret = snapimage_create(tracker);
+
+ if (ret) {
+ pr_err("Failed to create snapshot image for device [%u:%u] with error=%d\n",
+ MAJOR(tracker->dev_id), MINOR(tracker->dev_id),
+ ret);
+ break;
+ }
+ }
+
+ up_write(&snapshot->rw_lock);
+ return ret;
+}
+
+static int snapshot_release_trackers(struct snapshot *snapshot)
+{
+ int ret = 0;
+ struct tracker *tracker;
+
+ down_write(&snapshot->rw_lock);
+
+ list_for_each_entry(tracker, &snapshot->trackers, link)
+ tracker_release_snapshot(tracker);
+
+ up_write(&snapshot->rw_lock);
+ return ret;
+}
+
+int snapshot_take(const uuid_t *id)
+{
+ int ret = 0;
+ struct snapshot *snapshot;
+
+ snapshot = snapshot_get_by_id(id);
+ if (!snapshot)
+ return -ESRCH;
+
+ if (!snapshot->is_taken) {
+ ret = snapshot_take_trackers(snapshot);
+ if (!ret) {
+ ret = snapshot_check_trackers(snapshot);
+ if (!ret)
+ ret = snapshot_take_images(snapshot);
+ }
+
+ if (ret)
+ snapshot_release_trackers(snapshot);
+ } else
+ ret = -EALREADY;
+
+ snapshot_put(snapshot);
+
+ if (ret)
+ pr_err("Unable to take snapshot %pUb\n", &snapshot->id);
+ else
+ pr_info("Snapshot %pUb was taken successfully\n",
+ &snapshot->id);
+ return ret;
+}
+
+int snapshot_collect(unsigned int *pcount,
+ struct blksnap_uuid __user *id_array)
+{
+ int ret = 0;
+ int inx = 0;
+ struct snapshot *s;
+
+ pr_debug("Collect snapshots\n");
+
+ down_read(&snapshots_lock);
+ if (list_empty(&snapshots))
+ goto out;
+
+ if (!id_array) {
+ list_for_each_entry(s, &snapshots, link)
+ inx++;
+ goto out;
+ }
+
+ list_for_each_entry(s, &snapshots, link) {
+ if (inx >= *pcount) {
+ ret = -ENODATA;
+ goto out;
+ }
+
+ if (copy_to_user(id_array[inx].b, &s->id.b, sizeof(uuid_t))) {
+ pr_err("Unable to collect snapshots: failed to copy data to user buffer\n");
+ goto out;
+ }
+
+ inx++;
+ }
+out:
+ up_read(&snapshots_lock);
+ *pcount = inx;
+ return ret;
+}
+
+struct event *snapshot_wait_event(const uuid_t *id, unsigned long timeout_ms)
+{
+ struct snapshot *snapshot;
+ struct event *event;
+
+ snapshot = snapshot_get_by_id(id);
+ if (!snapshot)
+ return ERR_PTR(-ESRCH);
+
+ event = event_wait(&snapshot->diff_storage->event_queue, timeout_ms);
+
+ snapshot_put(snapshot);
+ return event;
+}
diff --git a/drivers/block/blksnap/snapshot.h b/drivers/block/blksnap/snapshot.h
new file mode 100644
index 000000000000..2cacdd4a080a
--- /dev/null
+++ b/drivers/block/blksnap/snapshot.h
@@ -0,0 +1,65 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+#ifndef __BLKSNAP_SNAPSHOT_H
+#define __BLKSNAP_SNAPSHOT_H
+
+#include <linux/types.h>
+#include <linux/list.h>
+#include <linux/mm.h>
+#include <linux/kref.h>
+#include <linux/uuid.h>
+#include <linux/spinlock.h>
+#include <linux/rwsem.h>
+#include <linux/fs.h>
+#include "event_queue.h"
+
+struct tracker;
+struct diff_storage;
+/**
+ * struct snapshot - Snapshot structure.
+ * @link:
+ * The list header allows to store snapshots in a linked list.
+ * @kref:
+ * Protects the structure from being released during the processing of
+ * an ioctl.
+ * @id:
+ * UUID of snapshot.
+ * @rw_lock:
+ * Protects the structure from being modified by different threads.
+ * @is_taken:
+ * Flag that the snapshot was taken.
+ * @diff_storage:
+ * A pointer to the difference storage of this snapshot.
+ * @trackers:
+ * List of block device trackers.
+ *
+ * A snapshot corresponds to a single backup session and provides snapshot
+ * images for multiple block devices. Several backup sessions can be performed
+ * at the same time, which means that several snapshots can exist at the same
+ * time. However, the original block device can only belong to one snapshot.
+ * Creating multiple snapshots from the same block device is not allowed.
+ */
+struct snapshot {
+ struct list_head link;
+ struct kref kref;
+ uuid_t id;
+
+ struct rw_semaphore rw_lock;
+
+ bool is_taken;
+ struct diff_storage *diff_storage;
+ struct list_head trackers;
+};
+
+void __exit snapshot_done(void);
+
+int snapshot_create(const char *filename, sector_t limit_sect,
+ struct blksnap_uuid *id);
+int snapshot_destroy(const uuid_t *id);
+int snapshot_add_device(const uuid_t *id, struct tracker *tracker);
+int snapshot_take(const uuid_t *id);
+int snapshot_collect(unsigned int *pcount,
+ struct blksnap_uuid __user *id_array);
+struct event *snapshot_wait_event(const uuid_t *id, unsigned long timeout_ms);
+
+#endif /* __BLKSNAP_SNAPSHOT_H */
--
2.34.1


2024-02-09 16:19:24

by Sergei Shtepa

[permalink] [raw]
Subject: [PATCH v7 5/8] block: handling and tracking I/O units

The struct tracker contains callback functions for handling a I/O units
of a block device. When a write request is handled, the change block
tracking (CBT) map functions are called and initiates the process of
copying data from the original block device to the change store.
Registering and unregistering the tracker is provided by the functions
blkfilter_register() and blkfilter_unregister().
The struct cbt_map allows to store the history of block device changes.

Signed-off-by: Sergei Shtepa <[email protected]>
---
drivers/block/blksnap/cbt_map.c | 225 +++++++++++++++++++
drivers/block/blksnap/cbt_map.h | 90 ++++++++
drivers/block/blksnap/tracker.c | 369 ++++++++++++++++++++++++++++++++
drivers/block/blksnap/tracker.h | 78 +++++++
4 files changed, 762 insertions(+)
create mode 100644 drivers/block/blksnap/cbt_map.c
create mode 100644 drivers/block/blksnap/cbt_map.h
create mode 100644 drivers/block/blksnap/tracker.c
create mode 100644 drivers/block/blksnap/tracker.h

diff --git a/drivers/block/blksnap/cbt_map.c b/drivers/block/blksnap/cbt_map.c
new file mode 100644
index 000000000000..7217398176dc
--- /dev/null
+++ b/drivers/block/blksnap/cbt_map.c
@@ -0,0 +1,225 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+#define pr_fmt(fmt) KBUILD_MODNAME "-cbt_map: " fmt
+
+#include <linux/slab.h>
+#include <linux/vmalloc.h>
+#include <uapi/linux/blksnap.h>
+#include "cbt_map.h"
+#include "params.h"
+
+static inline unsigned long long count_by_shift(sector_t capacity,
+ unsigned long long shift)
+{
+ sector_t blk_size = 1ull << (shift - SECTOR_SHIFT);
+
+ return round_up(capacity, blk_size) / blk_size;
+}
+
+static void cbt_map_calculate_block_size(struct cbt_map *cbt_map)
+{
+ unsigned long long count;
+ unsigned long long shift = get_tracking_block_minimum_shift();
+
+ pr_debug("Device capacity %llu sectors\n", cbt_map->device_capacity);
+ /*
+ * The size of the tracking block is calculated based on the size of
+ * the disk so that the CBT table does not exceed a reasonable size.
+ */
+ count = count_by_shift(cbt_map->device_capacity, shift);
+ pr_debug("Blocks count %llu\n", count);
+ while (count > get_tracking_block_maximum_count()) {
+ if (shift >= get_tracking_block_maximum_shift()) {
+ pr_info("The maximum allowable CBT block size has been reached.\n");
+ break;
+ }
+ shift = shift + 1ull;
+ count = count_by_shift(cbt_map->device_capacity, shift);
+ pr_debug("Blocks count %llu\n", count);
+ }
+
+ cbt_map->blk_size_shift = shift;
+ cbt_map->blk_count = count;
+ pr_debug("The optimal CBT block size was calculated as %llu bytes\n",
+ (1ull << cbt_map->blk_size_shift));
+}
+
+static int cbt_map_allocate(struct cbt_map *cbt_map)
+{
+ int ret = 0;
+ unsigned int flags;
+ unsigned char *read_map = NULL;
+ unsigned char *write_map = NULL;
+ size_t size = cbt_map->blk_count;
+
+ if (cbt_map->read_map || cbt_map->write_map)
+ return -EINVAL;
+
+ pr_debug("Allocate CBT map of %zu blocks\n", size);
+ flags = memalloc_noio_save();
+
+ read_map = vzalloc(size);
+ if (!read_map) {
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ write_map = vzalloc(size);
+ if (!write_map) {
+ vfree(read_map);
+ ret = -ENOMEM;
+ goto out;
+ }
+
+ cbt_map->read_map = read_map;
+ cbt_map->write_map = write_map;
+
+ cbt_map->snap_number_previous = 0;
+ cbt_map->snap_number_active = 1;
+ generate_random_uuid(cbt_map->generation_id.b);
+ cbt_map->is_corrupted = false;
+out:
+ memalloc_noio_restore(flags);
+ return ret;
+}
+
+int cbt_map_reset(struct cbt_map *cbt_map, sector_t device_capacity)
+{
+ cbt_map->is_corrupted = false;
+ vfree(cbt_map->read_map);
+ cbt_map->read_map = NULL;
+ vfree(cbt_map->write_map);
+ cbt_map->write_map = NULL;
+
+ cbt_map->device_capacity = device_capacity;
+ cbt_map_calculate_block_size(cbt_map);
+
+ return cbt_map_allocate(cbt_map);
+}
+
+void cbt_map_destroy(struct cbt_map *cbt_map)
+{
+ pr_debug("CBT map destroy\n");
+
+ vfree(cbt_map->read_map);
+ vfree(cbt_map->write_map);
+ kfree(cbt_map);
+}
+
+struct cbt_map *cbt_map_create(struct block_device *bdev)
+{
+ struct cbt_map *cbt_map = NULL;
+ int ret;
+
+ pr_debug("CBT map create\n");
+
+ cbt_map = kzalloc(sizeof(struct cbt_map), GFP_KERNEL);
+ if (cbt_map == NULL)
+ return NULL;
+
+ cbt_map->device_capacity = bdev_nr_sectors(bdev);
+ cbt_map_calculate_block_size(cbt_map);
+
+ ret = cbt_map_allocate(cbt_map);
+ if (ret) {
+ pr_err("Failed to create tracker. errno=%d\n", abs(ret));
+ cbt_map_destroy(cbt_map);
+ return NULL;
+ }
+
+ spin_lock_init(&cbt_map->locker);
+ cbt_map->is_corrupted = false;
+
+ return cbt_map;
+}
+
+void cbt_map_switch(struct cbt_map *cbt_map)
+{
+ pr_debug("CBT map switch\n");
+ spin_lock(&cbt_map->locker);
+
+ cbt_map->snap_number_previous = cbt_map->snap_number_active;
+ ++cbt_map->snap_number_active;
+ if (cbt_map->snap_number_active == 256) {
+ cbt_map->snap_number_active = 1;
+
+ memset(cbt_map->write_map, 0, cbt_map->blk_count);
+
+ generate_random_uuid(cbt_map->generation_id.b);
+
+ pr_debug("CBT reset\n");
+ } else
+ memcpy(cbt_map->read_map, cbt_map->write_map,
+ cbt_map->blk_count);
+ spin_unlock(&cbt_map->locker);
+}
+
+static inline int _cbt_map_set(struct cbt_map *cbt_map, sector_t sector_start,
+ sector_t sector_cnt, u8 snap_number,
+ unsigned char *map)
+{
+ int res = 0;
+ u8 num;
+ size_t inx;
+ size_t cbt_block_first = (size_t)(
+ sector_start >> (cbt_map->blk_size_shift - SECTOR_SHIFT));
+ size_t cbt_block_last = (size_t)(
+ (sector_start + sector_cnt - 1) >>
+ (cbt_map->blk_size_shift - SECTOR_SHIFT));
+
+ for (inx = cbt_block_first; inx <= cbt_block_last; ++inx) {
+ if (unlikely(inx >= cbt_map->blk_count)) {
+ pr_err("Block index is too large\n");
+ pr_err("Block #%zu was demanded, map size %zu blocks\n",
+ inx, cbt_map->blk_count);
+ res = -EINVAL;
+ break;
+ }
+
+ num = map[inx];
+ if (num < snap_number)
+ map[inx] = snap_number;
+ }
+ return res;
+}
+
+int cbt_map_set(struct cbt_map *cbt_map, sector_t sector_start,
+ sector_t sector_cnt)
+{
+ int res;
+
+ spin_lock(&cbt_map->locker);
+ if (unlikely(cbt_map->is_corrupted)) {
+ spin_unlock(&cbt_map->locker);
+ return -EINVAL;
+ }
+ res = _cbt_map_set(cbt_map, sector_start, sector_cnt,
+ (u8)cbt_map->snap_number_active, cbt_map->write_map);
+ if (unlikely(res))
+ cbt_map->is_corrupted = true;
+
+ spin_unlock(&cbt_map->locker);
+
+ return res;
+}
+
+int cbt_map_set_both(struct cbt_map *cbt_map, sector_t sector_start,
+ sector_t sector_cnt)
+{
+ int res;
+
+ spin_lock(&cbt_map->locker);
+ if (unlikely(cbt_map->is_corrupted)) {
+ spin_unlock(&cbt_map->locker);
+ return -EINVAL;
+ }
+ res = _cbt_map_set(cbt_map, sector_start, sector_cnt,
+ (u8)cbt_map->snap_number_active, cbt_map->write_map);
+ if (!res)
+ res = _cbt_map_set(cbt_map, sector_start, sector_cnt,
+ (u8)cbt_map->snap_number_previous,
+ cbt_map->read_map);
+ spin_unlock(&cbt_map->locker);
+
+ return res;
+}
diff --git a/drivers/block/blksnap/cbt_map.h b/drivers/block/blksnap/cbt_map.h
new file mode 100644
index 000000000000..95dc17e6bcec
--- /dev/null
+++ b/drivers/block/blksnap/cbt_map.h
@@ -0,0 +1,90 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+#ifndef __BLKSNAP_CBT_MAP_H
+#define __BLKSNAP_CBT_MAP_H
+
+#include <linux/kernel.h>
+#include <linux/kref.h>
+#include <linux/uuid.h>
+#include <linux/spinlock.h>
+#include <linux/blkdev.h>
+
+struct blksnap_sectors;
+
+/**
+ * struct cbt_map - The table of changes for a block device.
+ *
+ * @locker:
+ * Locking for atomic modification of structure members.
+ * @blk_size_shift:
+ * The power of 2 used to specify the change tracking block size.
+ * @blk_count:
+ * The number of change tracking blocks.
+ * @device_capacity:
+ * The actual capacity of the device.
+ * @read_map:
+ * A table of changes available for reading. This is the table that can
+ * be read after taking a snapshot.
+ * @write_map:
+ * The current table for tracking changes.
+ * @snap_number_active:
+ * The current sequential number of changes. This is the number that is
+ * written to the current table when the block data changes.
+ * @snap_number_previous:
+ * The previous sequential number of changes. This number is used to
+ * identify the blocks that were changed between the penultimate snapshot
+ * and the last snapshot.
+ * @generation_id:
+ * UUID of the generation of changes.
+ * @is_corrupted:
+ * A flag that the change tracking data is no longer reliable.
+ *
+ * The change block tracking map is a byte table. Each byte stores the
+ * sequential number of changes for one block. To determine which blocks have
+ * changed since the previous snapshot with the change number 4, it is enough
+ * to find all bytes with the number more than 4.
+ *
+ * Since one byte is allocated to track changes in one block, the change table
+ * is created again at the 255th snapshot. At the same time, a new unique
+ * generation identifier is generated. Tracking changes is possible only for
+ * tables of the same generation.
+ *
+ * There are two tables on the change block tracking map. One is available for
+ * reading, and the other is available for writing. At the moment of taking
+ * a snapshot, the tables are synchronized. The user's process, when calling
+ * the corresponding ioctl, can read the readable table. At the same time, the
+ * change tracking mechanism continues to work with the writable table.
+ *
+ * To provide the ability to mount a snapshot image as writeable, it is
+ * possible to make changes to both of these tables simultaneously.
+ *
+ */
+struct cbt_map {
+ spinlock_t locker;
+
+ size_t blk_size_shift;
+ size_t blk_count;
+ sector_t device_capacity;
+
+ unsigned char *read_map;
+ unsigned char *write_map;
+
+ unsigned long snap_number_active;
+ unsigned long snap_number_previous;
+ uuid_t generation_id;
+
+ bool is_corrupted;
+};
+
+struct cbt_map *cbt_map_create(struct block_device *bdev);
+int cbt_map_reset(struct cbt_map *cbt_map, sector_t device_capacity);
+
+void cbt_map_destroy(struct cbt_map *cbt_map);
+
+void cbt_map_switch(struct cbt_map *cbt_map);
+int cbt_map_set(struct cbt_map *cbt_map, sector_t sector_start,
+ sector_t sector_cnt);
+int cbt_map_set_both(struct cbt_map *cbt_map, sector_t sector_start,
+ sector_t sector_cnt);
+
+#endif /* __BLKSNAP_CBT_MAP_H */
diff --git a/drivers/block/blksnap/tracker.c b/drivers/block/blksnap/tracker.c
new file mode 100644
index 000000000000..cba31b24a22d
--- /dev/null
+++ b/drivers/block/blksnap/tracker.c
@@ -0,0 +1,369 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+#define pr_fmt(fmt) KBUILD_MODNAME "-tracker: " fmt
+
+#include <linux/slab.h>
+#include <linux/blk-mq.h>
+#include <linux/sched/mm.h>
+#include <linux/build_bug.h>
+#include <linux/blk-crypto.h>
+#include <uapi/linux/blksnap.h>
+#include "tracker.h"
+#include "cbt_map.h"
+#include "diff_area.h"
+#include "snapimage.h"
+#include "snapshot.h"
+
+void tracker_free(struct kref *kref)
+{
+ struct tracker *tracker = container_of(kref, struct tracker, kref);
+
+ might_sleep();
+
+ pr_debug("Free tracker for device [%u:%u]\n", MAJOR(tracker->dev_id),
+ MINOR(tracker->dev_id));
+
+ if (tracker->diff_area)
+ diff_area_put(tracker->diff_area);
+ if (tracker->cbt_map)
+ cbt_map_destroy(tracker->cbt_map);
+
+ kfree(tracker);
+}
+
+static bool tracker_submit_bio(struct bio *bio)
+{
+ struct blkfilter *flt = bio->bi_bdev->bd_filter;
+ struct tracker *tracker = container_of(flt, struct tracker, filter);
+ sector_t count = bio_sectors(bio);
+ struct bvec_iter copy_iter;
+
+ if (WARN_ON_ONCE(current->blk_filter != flt))
+ return false;
+
+ if (!op_is_write(bio_op(bio)) || !count)
+ return false;
+
+ copy_iter = bio->bi_iter;
+ if (bio_flagged(bio, BIO_REMAPPED))
+ copy_iter.bi_sector -= bio->bi_bdev->bd_start_sect;
+
+ if (cbt_map_set(tracker->cbt_map, copy_iter.bi_sector, count))
+ return false;
+
+ if (!atomic_read(&tracker->snapshot_is_taken))
+ return false;
+ /*
+ * The diff_area is not blocked from releasing now, because
+ * changing the value of the snapshot_is_taken is performed when
+ * the block device queue is frozen in tracker_release_snapshot().
+ */
+ if (diff_area_is_corrupted(tracker->diff_area))
+ return false;
+
+#ifdef CONFIG_BLK_INLINE_ENCRYPTION
+ if (bio_has_crypt_ctx(bio)) {
+ pr_err("Inline encryption is not supported\n");
+ diff_area_set_corrupted(tracker->diff_area, -EPERM);
+ return false;
+ }
+#endif
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+ if (bio->bi_integrity) {
+ pr_err("Data integrity is not supported\n");
+ diff_area_set_corrupted(tracker->diff_area, -EPERM);
+ return false;
+ }
+#endif
+ return diff_area_cow(tracker->diff_area, bio, &copy_iter);
+}
+
+static struct blkfilter *tracker_attach(struct block_device *bdev)
+{
+ struct tracker *tracker = NULL;
+ struct cbt_map *cbt_map;
+
+ pr_debug("Creating tracker for device [%u:%u]\n",
+ MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
+
+ cbt_map = cbt_map_create(bdev);
+ if (!cbt_map) {
+ pr_err("Failed to create CBT map for device [%u:%u]\n",
+ MAJOR(bdev->bd_dev), MINOR(bdev->bd_dev));
+ return ERR_PTR(-ENOMEM);
+ }
+
+ tracker = kzalloc(sizeof(struct tracker), GFP_KERNEL);
+ if (tracker == NULL) {
+ cbt_map_destroy(cbt_map);
+ return ERR_PTR(-ENOMEM);
+ }
+
+ tracker->orig_bdev = bdev;
+ mutex_init(&tracker->ctl_lock);
+ INIT_LIST_HEAD(&tracker->link);
+ kref_init(&tracker->kref);
+ tracker->dev_id = bdev->bd_dev;
+ atomic_set(&tracker->snapshot_is_taken, false);
+ tracker->cbt_map = cbt_map;
+ tracker->diff_area = NULL;
+
+ pr_debug("New tracker for device [%u:%u] was created\n",
+ MAJOR(tracker->dev_id), MINOR(tracker->dev_id));
+
+ return &tracker->filter;
+}
+
+static void tracker_detach(struct blkfilter *flt)
+{
+ struct tracker *tracker = container_of(flt, struct tracker, filter);
+
+ pr_debug("Detach tracker from device [%u:%u]\n",
+ MAJOR(tracker->dev_id), MINOR(tracker->dev_id));
+
+ tracker_put(tracker);
+}
+
+static int ctl_cbtinfo(struct tracker *tracker, __u8 __user *buf, __u32 *plen)
+{
+ struct cbt_map *cbt_map = tracker->cbt_map;
+ struct blksnap_cbtinfo arg;
+
+ if (!cbt_map)
+ return -ESRCH;
+
+ if (*plen < sizeof(arg))
+ return -EINVAL;
+
+ arg.device_capacity = (__u64)(cbt_map->device_capacity << SECTOR_SHIFT);
+ arg.block_size = (__u32)(1 << cbt_map->blk_size_shift);
+ arg.block_count = (__u32)cbt_map->blk_count;
+ export_uuid(arg.generation_id.b, &cbt_map->generation_id);
+ arg.changes_number = (__u8)cbt_map->snap_number_previous;
+
+ if (copy_to_user(buf, &arg, sizeof(arg)))
+ return -ENODATA;
+
+ *plen = sizeof(arg);
+ return 0;
+}
+
+static int ctl_cbtmap(struct tracker *tracker, __u8 __user *buf, __u32 *plen)
+{
+ struct cbt_map *cbt_map = tracker->cbt_map;
+ struct blksnap_cbtmap arg;
+
+ if (!cbt_map)
+ return -ESRCH;
+
+ if (unlikely(cbt_map->is_corrupted)) {
+ pr_err("CBT table was corrupted\n");
+ return -EFAULT;
+ }
+
+ if (*plen < sizeof(arg))
+ return -EINVAL;
+
+ if (copy_from_user(&arg, buf, sizeof(arg)))
+ return -ENODATA;
+
+ if (arg.length > (cbt_map->blk_count - arg.offset))
+ return -ENODATA;
+
+ if (copy_to_user(u64_to_user_ptr(arg.buffer),
+ cbt_map->read_map + arg.offset, arg.length))
+
+ return -EINVAL;
+
+ *plen = 0;
+ return 0;
+}
+
+static int ctl_cbtdirty(struct tracker *tracker, __u8 __user *buf, __u32 *plen)
+{
+ struct cbt_map *cbt_map = tracker->cbt_map;
+ struct blksnap_cbtdirty arg;
+ unsigned int inx;
+
+ if (!cbt_map)
+ return -ESRCH;
+
+ if (*plen < sizeof(arg))
+ return -EINVAL;
+
+ if (copy_from_user(&arg, buf, sizeof(arg)))
+ return -ENODATA;
+
+ for (inx = 0; inx < arg.count; inx++) {
+ struct blksnap_sectors range;
+ int ret;
+
+ if (copy_from_user(&range, u64_to_user_ptr(arg.dirty_sectors),
+ sizeof(range)))
+ return -ENODATA;
+
+ ret = cbt_map_set_both(cbt_map, range.offset, range.count);
+ if (ret)
+ return ret;
+ }
+ *plen = 0;
+ return 0;
+}
+
+static int ctl_snapshotadd(struct tracker *tracker,
+ __u8 __user *buf, __u32 *plen)
+{
+ struct blksnap_snapshotadd arg;
+
+ if (*plen < sizeof(arg))
+ return -EINVAL;
+
+ if (copy_from_user(&arg, buf, sizeof(arg)))
+ return -ENODATA;
+
+ *plen = 0;
+ return snapshot_add_device((uuid_t *)&arg.id, tracker);
+}
+static int ctl_snapshotinfo(struct tracker *tracker,
+ __u8 __user *buf, __u32 *plen)
+{
+ struct blksnap_snapshotinfo arg = {0};
+
+ if (*plen < sizeof(arg))
+ return -EINVAL;
+
+ if (copy_from_user(&arg, buf, sizeof(arg)))
+ return -ENODATA;
+
+ if (tracker->diff_area && diff_area_is_corrupted(tracker->diff_area))
+ arg.error_code = tracker->diff_area->error_code;
+ else
+ arg.error_code = 0;
+
+ if (tracker->snap_disk)
+ strscpy(arg.image, tracker->snap_disk->disk_name,
+ IMAGE_DISK_NAME_LEN);
+
+ if (copy_to_user(buf, &arg, sizeof(arg)))
+ return -ENODATA;
+
+ *plen = sizeof(arg);
+ return 0;
+}
+
+static int tracker_ctl(struct blkfilter *flt, const unsigned int cmd,
+ __u8 __user *buf, __u32 *plen)
+{
+ int ret = 0;
+ struct tracker *tracker = container_of(flt, struct tracker, filter);
+
+ mutex_lock(&tracker->ctl_lock);
+ switch (cmd) {
+ case BLKFILTER_CTL_BLKSNAP_CBTINFO:
+ ret = ctl_cbtinfo(tracker, buf, plen);
+ break;
+ case BLKFILTER_CTL_BLKSNAP_CBTMAP:
+ ret = ctl_cbtmap(tracker, buf, plen);
+ break;
+ case BLKFILTER_CTL_BLKSNAP_CBTDIRTY:
+ ret = ctl_cbtdirty(tracker, buf, plen);
+ break;
+ case BLKFILTER_CTL_BLKSNAP_SNAPSHOTADD:
+ ret = ctl_snapshotadd(tracker, buf, plen);
+ break;
+ case BLKFILTER_CTL_BLKSNAP_SNAPSHOTINFO:
+ ret = ctl_snapshotinfo(tracker, buf, plen);
+ break;
+ default:
+ ret = -ENOTTY;
+ };
+ mutex_unlock(&tracker->ctl_lock);
+
+ return ret;
+}
+
+static struct blkfilter_operations tracker_ops = {
+ .owner = THIS_MODULE,
+ .name = "blksnap",
+ .attach = tracker_attach,
+ .detach = tracker_detach,
+ .ctl = tracker_ctl,
+ .submit_bio = tracker_submit_bio,
+};
+
+int tracker_take_snapshot(struct tracker *tracker)
+{
+ int ret = 0;
+ bool cbt_reset_needed = false;
+ struct block_device *orig_bdev = tracker->orig_bdev;
+ sector_t capacity;
+ unsigned int current_flag;
+
+ blk_mq_freeze_queue(orig_bdev->bd_queue);
+ current_flag = memalloc_noio_save();
+
+ if (tracker->cbt_map->is_corrupted) {
+ cbt_reset_needed = true;
+ pr_warn("Corrupted CBT table detected. CBT fault\n");
+ }
+
+ capacity = bdev_nr_sectors(orig_bdev);
+ if (tracker->cbt_map->device_capacity != capacity) {
+ cbt_reset_needed = true;
+ pr_warn("Device resize detected. CBT fault\n");
+ }
+
+ if (cbt_reset_needed) {
+ ret = cbt_map_reset(tracker->cbt_map, capacity);
+ if (ret) {
+ pr_err("Failed to create tracker. errno=%d\n",
+ abs(ret));
+ return ret;
+ }
+ }
+
+ cbt_map_switch(tracker->cbt_map);
+ atomic_set(&tracker->snapshot_is_taken, true);
+
+ memalloc_noio_restore(current_flag);
+ blk_mq_unfreeze_queue(orig_bdev->bd_queue);
+
+ return 0;
+}
+
+void tracker_release_snapshot(struct tracker *tracker)
+{
+ struct diff_area *diff_area = tracker->diff_area;
+
+ if (unlikely(!diff_area))
+ return;
+
+ snapimage_free(tracker);
+
+ pr_debug("Tracker for device [%u:%u] release snapshot\n",
+ MAJOR(tracker->dev_id), MINOR(tracker->dev_id));
+
+ blk_mq_freeze_queue(tracker->orig_bdev->bd_queue);
+ atomic_set(&tracker->snapshot_is_taken, false);
+ tracker->diff_area = NULL;
+ blk_mq_unfreeze_queue(tracker->orig_bdev->bd_queue);
+
+ flush_work(&diff_area->image_io_work);
+ flush_work(&diff_area->store_queue_work);
+
+ diff_area_put(diff_area);
+}
+
+int __init tracker_init(void)
+{
+ pr_debug("Register filter '%s'", tracker_ops.name);
+
+ return blkfilter_register(&tracker_ops);
+}
+
+void tracker_done(void)
+{
+ pr_debug("Unregister filter '%s'", tracker_ops.name);
+
+ blkfilter_unregister(&tracker_ops);
+}
diff --git a/drivers/block/blksnap/tracker.h b/drivers/block/blksnap/tracker.h
new file mode 100644
index 000000000000..05ecc3c3c819
--- /dev/null
+++ b/drivers/block/blksnap/tracker.h
@@ -0,0 +1,78 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+#ifndef __BLKSNAP_TRACKER_H
+#define __BLKSNAP_TRACKER_H
+
+#include <linux/blk-filter.h>
+#include <linux/kref.h>
+#include <linux/spinlock.h>
+#include <linux/list.h>
+#include <linux/rwsem.h>
+#include <linux/blkdev.h>
+#include <linux/fs.h>
+
+struct cbt_map;
+struct diff_area;
+
+/**
+ * struct tracker - Tracker for a block device.
+ *
+ * @filter:
+ * The block device filter structure.
+ * @orig_bdev:
+ * The original block device this trackker is attached to.
+ * @ctl_lock:
+ * The mutex blocks simultaneous management of the tracker from different
+ * treads.
+ * @link:
+ * List header. Allows to combine trackers into a list in a snapshot.
+ * @kref:
+ * The reference counter allows to control the lifetime of the tracker.
+ * @dev_id:
+ * Original block device ID.
+ * @snapshot_is_taken:
+ * Indicates that a snapshot was taken for the device whose I/O unit are
+ * handled by this tracker.
+ * @cbt_map:
+ * Pointer to a change block tracker map.
+ * @diff_area:
+ * Pointer to a difference area.
+ * @snap_disk:
+ * Snapshot image disk.
+ *
+ * The goal of the tracker is to handle I/O unit. The tracker detectes the range
+ * of sectors that will change and transmits them to the CBT map and to the
+ * difference area.
+ */
+struct tracker {
+ struct blkfilter filter;
+ struct block_device *orig_bdev;
+ struct mutex ctl_lock;
+ struct list_head link;
+ struct kref kref;
+ dev_t dev_id;
+
+ atomic_t snapshot_is_taken;
+
+ struct cbt_map *cbt_map;
+ struct diff_area *diff_area;
+ struct gendisk *snap_disk;
+};
+
+int __init tracker_init(void);
+void tracker_done(void);
+
+void tracker_free(struct kref *kref);
+static inline void tracker_put(struct tracker *tracker)
+{
+ if (likely(tracker))
+ kref_put(&tracker->kref, tracker_free);
+};
+static inline void tracker_get(struct tracker *tracker)
+{
+ kref_get(&tracker->kref);
+};
+int tracker_take_snapshot(struct tracker *tracker);
+void tracker_release_snapshot(struct tracker *tracker);
+
+#endif /* __BLKSNAP_TRACKER_H */
--
2.34.1


2024-02-09 16:19:36

by Sergei Shtepa

[permalink] [raw]
Subject: [PATCH v7 4/8] block: module management interface functions

Contains callback functions for loading and unloading the module and
implementation of module management interface functions. The module
parameters and other mandatory declarations for the kernel module are
also defined.

Signed-off-by: Sergei Shtepa <[email protected]>
---
drivers/block/blksnap/main.c | 481 +++++++++++++++++++++++++++++++++
drivers/block/blksnap/params.h | 16 ++
2 files changed, 497 insertions(+)
create mode 100644 drivers/block/blksnap/main.c
create mode 100644 drivers/block/blksnap/params.h

diff --git a/drivers/block/blksnap/main.c b/drivers/block/blksnap/main.c
new file mode 100644
index 000000000000..5d4504d00c71
--- /dev/null
+++ b/drivers/block/blksnap/main.c
@@ -0,0 +1,481 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/miscdevice.h>
+#include <linux/build_bug.h>
+#include <uapi/linux/blksnap.h>
+#include "snapimage.h"
+#include "snapshot.h"
+#include "tracker.h"
+#include "chunk.h"
+#include "params.h"
+
+/*
+ * The power of 2 for minimum tracking block size.
+ *
+ * If we make the tracking block size small, we will get detailed information
+ * about the changes, but the size of the change tracker table will be too
+ * large, which will lead to inefficient memory usage.
+ */
+static unsigned int tracking_block_minimum_shift = 16;
+
+/*
+ * The maximum number of tracking blocks.
+ *
+ * A table is created in RAM to store information about the status of all
+ * tracking blocks. So, if the size of the tracking block is small, then the
+ * size of the table turns out to be large and memory is consumed inefficiently.
+ * As the size of the block device grows, the size of the tracking block size
+ * should also grow. For this purpose, the limit of the maximum number of block
+ * size is set.
+ */
+static unsigned int tracking_block_maximum_count = 2097152;
+
+/*
+ * The power of 2 for maximum tracking block size.
+ *
+ * On very large capacity disks, the block size may be too large. To prevent
+ * this, the maximum block size is limited. If the limit on the maximum block
+ * size has been reached, then the number of blocks may exceed the
+ * &tracking_block_maximum_count.
+ */
+static unsigned int tracking_block_maximum_shift = 26;
+
+/*
+ * The power of 2 for minimum chunk size.
+ *
+ * The size of the chunk depends on how much data will be copied to the
+ * difference storage when at least one sector of the block device is changed.
+ * If the size is small, then small I/O units will be generated, which will
+ * reduce performance. Too large a chunk size will lead to inefficient use of
+ * the difference storage.
+ */
+static unsigned int chunk_minimum_shift = 18;
+
+/*
+ * The power of 2 for maximum number of chunks.
+ *
+ * A table is created in RAM to store information about the state of the chunks.
+ * So, if the size of the chunk is small, then the size of the table turns out
+ * to be large and memory is consumed inefficiently. As the size of the block
+ * device grows, the size of the chunk should also grow. For this purpose, the
+ * maximum number of chunks is set.
+ *
+ * The table expands dynamically when new chunks are allocated. Therefore,
+ * memory consumption also depends on the intensity of writing to the block
+ * device under the snapshot.
+ */
+static unsigned int chunk_maximum_count_shift = 40;
+
+/*
+ * The power of 2 for maximum chunk size.
+ *
+ * On very large capacity disks, the chunk size may be too large. To prevent
+ * this, the maximum block size is limited. If the limit on the maximum chunk
+ * size has been reached, then the number of chunks may exceed the
+ * &chunk_maximum_count.
+ */
+static unsigned int chunk_maximum_shift = 26;
+
+/*
+ * The maximum number of chunks in queue.
+ *
+ * The chunk is not immediately stored to the difference storage. The chunks
+ * are put in a store queue. The store queue allows to postpone the operation
+ * of storing a chunks data to the difference storage and perform it later in
+ * the worker thread.
+ */
+static unsigned int chunk_maximum_in_queue = 16;
+
+/*
+ * The size of the pool of preallocated difference buffers.
+ *
+ * A buffer can be allocated for each chunk. After use, this buffer is not
+ * released immediately, but is sent to the pool of free buffers. However, if
+ * there are too many free buffers in the pool, then these free buffers will
+ * be released immediately.
+ */
+static unsigned int free_diff_buffer_pool_size = 128;
+
+/*
+ * The minimum allowable size of the difference storage in sectors.
+ *
+ * The difference storage is a part of the disk space allocated for storing
+ * snapshot data. If the free space in difference storage is less than half of
+ * this value, then the process of increasing the size of the difference storage
+ * file will begin. The size of the difference storage file is increased in
+ * portions, the size of which is determined by this value.
+ */
+static unsigned int diff_storage_minimum = 2097152;
+
+#define VERSION_STR "2.0.0.0"
+static const struct blksnap_version version = {
+ .major = 2,
+ .minor = 0,
+ .revision = 0,
+ .build = 0,
+};
+
+unsigned int get_tracking_block_minimum_shift(void)
+{
+ return tracking_block_minimum_shift;
+}
+
+unsigned int get_tracking_block_maximum_shift(void)
+{
+ return tracking_block_maximum_shift;
+}
+
+unsigned int get_tracking_block_maximum_count(void)
+{
+ return tracking_block_maximum_count;
+}
+
+unsigned int get_chunk_minimum_shift(void)
+{
+ return chunk_minimum_shift;
+}
+
+unsigned int get_chunk_maximum_shift(void)
+{
+ return chunk_maximum_shift;
+}
+
+unsigned long get_chunk_maximum_count(void)
+{
+ /*
+ * The XArray is used to store chunks. And 'unsigned long' is used as
+ * chunk number parameter. So, The number of chunks cannot exceed the
+ * limits of ULONG_MAX.
+ */
+ if ((chunk_maximum_count_shift >> 3) < sizeof(unsigned long))
+ return (1ul << chunk_maximum_count_shift);
+ return ULONG_MAX;
+}
+
+unsigned int get_chunk_maximum_in_queue(void)
+{
+ return chunk_maximum_in_queue;
+}
+
+unsigned int get_free_diff_buffer_pool_size(void)
+{
+ return free_diff_buffer_pool_size;
+}
+
+sector_t get_diff_storage_minimum(void)
+{
+ return (sector_t)diff_storage_minimum;
+}
+
+static int ioctl_version(struct blksnap_version __user *user_version)
+{
+ if (copy_to_user(user_version, &version, sizeof(version))) {
+ pr_err("Unable to get version: invalid user buffer\n");
+ return -ENODATA;
+ }
+
+ return 0;
+}
+
+static_assert(sizeof(uuid_t) == sizeof(struct blksnap_uuid),
+ "Invalid size of struct blksnap_uuid.");
+
+static int ioctl_snapshot_create(struct blksnap_snapshot_create __user *uarg)
+{
+ struct blksnap_snapshot_create karg;
+ char *fname;
+ int ret;
+
+ if (copy_from_user(&karg, uarg, sizeof(karg))) {
+ pr_err("Unable to create snapshot: invalid user buffer\n");
+ return -ENODATA;
+ }
+ fname = strndup_user((const char __user *)karg.diff_storage_filename,
+ PATH_MAX);
+ if (IS_ERR(fname))
+ return PTR_ERR(fname);
+
+ ret = snapshot_create(fname, karg.diff_storage_limit_sect, &karg.id);
+ kfree(fname);
+ if (ret)
+ return ret;
+
+ if (copy_to_user(uarg, &karg, sizeof(karg))) {
+ pr_err("Unable to create snapshot: invalid user buffer\n");
+ return -ENODATA;
+ }
+
+ return 0;
+}
+
+static int ioctl_snapshot_destroy(struct blksnap_uuid __user *user_id)
+{
+ uuid_t kernel_id;
+
+ if (copy_from_user(kernel_id.b, user_id->b, sizeof(uuid_t))) {
+ pr_err("Unable to destroy snapshot: invalid user buffer\n");
+ return -ENODATA;
+ }
+
+ return snapshot_destroy(&kernel_id);
+}
+
+static int ioctl_snapshot_take(struct blksnap_uuid __user *user_id)
+{
+ uuid_t kernel_id;
+
+ if (copy_from_user(kernel_id.b, user_id->b, sizeof(uuid_t))) {
+ pr_err("Unable to take snapshot: invalid user buffer\n");
+ return -ENODATA;
+ }
+
+ return snapshot_take(&kernel_id);
+}
+
+static int ioctl_snapshot_collect(struct blksnap_snapshot_collect __user *uarg)
+{
+ int ret;
+ struct blksnap_snapshot_collect karg;
+
+ if (copy_from_user(&karg, uarg, sizeof(karg))) {
+ pr_err("Unable to collect available snapshots: invalid user buffer\n");
+ return -ENODATA;
+ }
+
+ ret = snapshot_collect(&karg.count, u64_to_user_ptr(karg.ids));
+
+ if (copy_to_user(uarg, &karg, sizeof(karg))) {
+ pr_err("Unable to collect available snapshots: invalid user buffer\n");
+ return -ENODATA;
+ }
+
+ return ret;
+}
+
+static_assert(sizeof(struct blksnap_snapshot_event) == 4096,
+ "The size struct blksnap_snapshot_event should be equal to the size of the page.");
+
+static int ioctl_snapshot_wait_event(struct blksnap_snapshot_event __user *uarg)
+{
+ int ret = 0;
+ struct blksnap_snapshot_event *karg;
+ struct event *ev;
+
+ karg = kzalloc(sizeof(struct blksnap_snapshot_event), GFP_KERNEL);
+ if (!karg)
+ return -ENOMEM;
+
+ /* Copy only snapshot ID and timeout*/
+ if (copy_from_user(karg, uarg, sizeof(uuid_t) + sizeof(__u32))) {
+ pr_err("Unable to get snapshot event. Invalid user buffer\n");
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ev = snapshot_wait_event((uuid_t *)karg->id.b, karg->timeout_ms);
+ if (IS_ERR(ev)) {
+ ret = PTR_ERR(ev);
+ goto out;
+ }
+
+ pr_debug("Received event=%lld code=%d data_size=%d\n", ev->time,
+ ev->code, ev->data_size);
+ karg->code = ev->code;
+ karg->time_label = ev->time;
+
+ if (ev->data_size > sizeof(karg->data)) {
+ pr_err("Event size %d is too big\n", ev->data_size);
+ ret = -ENOSPC;
+ /* If we can't copy all the data, we copy only part of it. */
+ }
+ memcpy(karg->data, ev->data, ev->data_size);
+ event_free(ev);
+
+ if (copy_to_user(uarg, karg, sizeof(struct blksnap_snapshot_event))) {
+ pr_err("Unable to get snapshot event. Invalid user buffer\n");
+ ret = -EINVAL;
+ }
+out:
+ kfree(karg);
+
+ return ret;
+}
+
+static long blksnap_ctrl_unlocked_ioctl(struct file *filp, unsigned int cmd,
+ unsigned long arg)
+{
+ void *argp = (void __user *)arg;
+
+ switch (cmd) {
+ case IOCTL_BLKSNAP_VERSION:
+ return ioctl_version(argp);
+ case IOCTL_BLKSNAP_SNAPSHOT_CREATE:
+ return ioctl_snapshot_create(argp);
+ case IOCTL_BLKSNAP_SNAPSHOT_DESTROY:
+ return ioctl_snapshot_destroy(argp);
+ case IOCTL_BLKSNAP_SNAPSHOT_TAKE:
+ return ioctl_snapshot_take(argp);
+ case IOCTL_BLKSNAP_SNAPSHOT_COLLECT:
+ return ioctl_snapshot_collect(argp);
+ case IOCTL_BLKSNAP_SNAPSHOT_WAIT_EVENT:
+ return ioctl_snapshot_wait_event(argp);
+ default:
+ return -ENOTTY;
+ }
+
+}
+
+static const struct file_operations blksnap_ctrl_fops = {
+ .owner = THIS_MODULE,
+ .unlocked_ioctl = blksnap_ctrl_unlocked_ioctl,
+};
+
+static struct miscdevice blksnap_ctrl_misc = {
+ .minor = MISC_DYNAMIC_MINOR,
+ .name = BLKSNAP_CTL,
+ .fops = &blksnap_ctrl_fops,
+};
+
+static inline sector_t chunk_minimum_sectors(void)
+{
+ return (1ull << (chunk_minimum_shift - SECTOR_SHIFT));
+};
+
+static int __init parameters_init(void)
+{
+ pr_debug("tracking_block_minimum_shift: %d\n",
+ tracking_block_minimum_shift);
+ pr_debug("tracking_block_maximum_shift: %d\n",
+ tracking_block_maximum_shift);
+ pr_debug("tracking_block_maximum_count: %d\n",
+ tracking_block_maximum_count);
+
+ pr_debug("chunk_minimum_shift: %d\n", chunk_minimum_shift);
+ pr_debug("chunk_maximum_shift: %d\n", chunk_maximum_shift);
+ pr_debug("chunk_maximum_count_shift: %u\n", chunk_maximum_count_shift);
+
+ pr_debug("chunk_maximum_in_queue: %d\n", chunk_maximum_in_queue);
+ pr_debug("free_diff_buffer_pool_size: %d\n",
+ free_diff_buffer_pool_size);
+ pr_debug("diff_storage_minimum: %d\n", diff_storage_minimum);
+
+ if (tracking_block_maximum_shift < tracking_block_minimum_shift) {
+ tracking_block_maximum_shift = tracking_block_minimum_shift;
+ pr_warn("fixed tracking_block_maximum_shift: %d\n",
+ tracking_block_maximum_shift);
+ }
+
+ if (chunk_minimum_shift < PAGE_SHIFT) {
+ chunk_minimum_shift = PAGE_SHIFT;
+ pr_warn("fixed chunk_minimum_shift: %d\n",
+ chunk_minimum_shift);
+ }
+ if (chunk_maximum_shift < chunk_minimum_shift) {
+ chunk_maximum_shift = chunk_minimum_shift;
+ pr_warn("fixed chunk_maximum_shift: %d\n",
+ chunk_maximum_shift);
+ }
+ if (diff_storage_minimum < (chunk_minimum_sectors() * 2)) {
+ diff_storage_minimum = chunk_minimum_sectors() * 2;
+ pr_warn("fixed diff_storage_minimum: %d\n",
+ diff_storage_minimum);
+ }
+ if (diff_storage_minimum & (chunk_minimum_sectors() - 1)) {
+ diff_storage_minimum &= ~(chunk_minimum_sectors() - 1);
+ pr_warn("fixed diff_storage_minimum: %d\n",
+ diff_storage_minimum);
+ }
+
+ return 0;
+}
+
+static int __init blksnap_init(void)
+{
+ int ret;
+
+ pr_debug("Loading\n");
+ pr_debug("Version: %s\n", VERSION_STR);
+
+ ret = parameters_init();
+ if (ret)
+ return ret;
+
+ ret = chunk_init();
+ if (ret)
+ goto fail_chunk_init;
+
+ ret = tracker_init();
+ if (ret)
+ goto fail_tracker_init;
+
+ ret = misc_register(&blksnap_ctrl_misc);
+ if (ret)
+ goto fail_misc_register;
+
+ return 0;
+
+fail_misc_register:
+ tracker_done();
+fail_tracker_init:
+ chunk_done();
+fail_chunk_init:
+
+ return ret;
+}
+
+static void __exit blksnap_exit(void)
+{
+ pr_debug("Unloading module\n");
+
+ misc_deregister(&blksnap_ctrl_misc);
+
+ chunk_done();
+ snapshot_done();
+ tracker_done();
+
+ pr_debug("Module was unloaded\n");
+}
+
+module_init(blksnap_init);
+module_exit(blksnap_exit);
+
+module_param_named(tracking_block_minimum_shift, tracking_block_minimum_shift,
+ uint, 0644);
+MODULE_PARM_DESC(tracking_block_minimum_shift,
+ "The power of 2 for minimum tracking block size");
+module_param_named(tracking_block_maximum_count, tracking_block_maximum_count,
+ uint, 0644);
+MODULE_PARM_DESC(tracking_block_maximum_count,
+ "The maximum number of tracking blocks");
+module_param_named(tracking_block_maximum_shift, tracking_block_maximum_shift,
+ uint, 0644);
+MODULE_PARM_DESC(tracking_block_maximum_shift,
+ "The power of 2 for maximum trackings block size");
+module_param_named(chunk_minimum_shift, chunk_minimum_shift, uint, 0644);
+MODULE_PARM_DESC(chunk_minimum_shift,
+ "The power of 2 for minimum chunk size");
+module_param_named(chunk_maximum_count_shift, chunk_maximum_count_shift,
+ uint, 0644);
+MODULE_PARM_DESC(chunk_maximum_count_shift,
+ "The power of 2 for maximum number of chunks");
+module_param_named(chunk_maximum_shift, chunk_maximum_shift, uint, 0644);
+MODULE_PARM_DESC(chunk_maximum_shift,
+ "The power of 2 for maximum snapshots chunk size");
+module_param_named(chunk_maximum_in_queue, chunk_maximum_in_queue, uint, 0644);
+MODULE_PARM_DESC(chunk_maximum_in_queue,
+ "The maximum number of chunks in store queue");
+module_param_named(free_diff_buffer_pool_size, free_diff_buffer_pool_size,
+ uint, 0644);
+MODULE_PARM_DESC(free_diff_buffer_pool_size,
+ "The size of the pool of preallocated difference buffers");
+module_param_named(diff_storage_minimum, diff_storage_minimum, uint, 0644);
+MODULE_PARM_DESC(diff_storage_minimum,
+ "The minimum allowable size of the difference storage in sectors");
+
+MODULE_DESCRIPTION("Block Device Snapshots Module");
+MODULE_VERSION(VERSION_STR);
+MODULE_AUTHOR("Veeam Software Group GmbH");
+MODULE_LICENSE("GPL");
diff --git a/drivers/block/blksnap/params.h b/drivers/block/blksnap/params.h
new file mode 100644
index 000000000000..3ec4cce4de39
--- /dev/null
+++ b/drivers/block/blksnap/params.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/* Copyright (C) 2023 Veeam Software Group GmbH */
+#ifndef __BLKSNAP_PARAMS_H
+#define __BLKSNAP_PARAMS_H
+
+unsigned int get_tracking_block_minimum_shift(void);
+unsigned int get_tracking_block_maximum_shift(void);
+unsigned int get_tracking_block_maximum_count(void);
+unsigned int get_chunk_minimum_shift(void);
+unsigned int get_chunk_maximum_shift(void);
+unsigned long get_chunk_maximum_count(void);
+unsigned int get_chunk_maximum_in_queue(void);
+unsigned int get_free_diff_buffer_pool_size(void);
+sector_t get_diff_storage_minimum(void);
+
+#endif /* __BLKSNAP_PARAMS_H */
--
2.34.1