2013-04-08 10:09:58

by Gabriel de Perthuis

[permalink] [raw]
Subject: [Pull request] bcache data offset

Hello,
These patches update the bcache superblock format so that backing device
data can be at an arbitrary offset from the start of the backing device;
this helps convert partitions or logical volumes to bcache in-place, and
<https://github.com/g2p/blocks> has been updated to use the new format.

The kernel half is on top of bcache-for-upstream, the bcache-tools half
is on top of the development version of bcache-tools.
They can be pulled from
- https://github.com/g2p/linux/tree/bcache-for-upstream
- https://github.com/g2p/bcache-tools/tree/enable-data-offset


2013-04-08 10:11:18

by Gabriel de Perthuis

[permalink] [raw]
Subject: [PATCH] bcache: Take data offset from the bdev superblock.

Add a new superblock version, and consolidate related defines.

Signed-off-by: Gabriel de Perthuis <[email protected]>
---
drivers/md/bcache/bcache.h | 23 ++++++++++++++++++-----
drivers/md/bcache/request.c | 2 +-
drivers/md/bcache/super.c | 21 ++++++++++++++++-----
3 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
index f057235..8a110e6 100644
--- a/drivers/md/bcache/bcache.h
+++ b/drivers/md/bcache/bcache.h
@@ -223,11 +223,17 @@ struct bkey {
#define BKEY_PADDED(key) \
union { struct bkey key; uint64_t key ## _pad[BKEY_PAD]; }

-/* Version 1: Backing device
+/* Version 0: Cache device
+ * Version 1: Backing device
* Version 2: Seed pointer into btree node checksum
- * Version 3: New UUID format
+ * Version 3: Cache device with new UUID format
+ * Version 4: Backing device with data offset
*/
-#define BCACHE_SB_VERSION 3
+#define BCACHE_SB_VERSION_CDEV 0
+#define BCACHE_SB_VERSION_BDEV 1
+#define BCACHE_SB_VERSION_CDEV_WITH_UUID 3
+#define BCACHE_SB_VERSION_BDEV_WITH_OFFSET 4
+#define BCACHE_SB_MAX_VERSION 4

#define SB_SECTOR 8
#define SB_SIZE 4096
@@ -236,13 +242,12 @@ struct bkey {
/* SB_JOURNAL_BUCKETS must be divisible by BITS_PER_LONG */
#define MAX_CACHES_PER_SET 8

-#define BDEV_DATA_START 16 /* sectors */
+#define BDEV_DATA_START_DEFAULT 16 /* sectors */

struct cache_sb {
uint64_t csum;
uint64_t offset; /* sector where this sb was written */
uint64_t version;
-#define CACHE_BACKING_DEV 1

uint8_t magic[16];

@@ -485,6 +490,7 @@ struct cached_dev {
* where it's at.
*/
sector_t last_read;
+ sector_t data_start_sector;

/* Number of writeback bios in flight */
atomic_t in_flight;
@@ -861,6 +867,13 @@ static inline bool key_merging_disabled(struct cache_set *c)
#endif
}

+
+static inline bool SB_IS_BDEV(const struct cache_sb *sb) {
+ return sb->version == BCACHE_SB_VERSION_BDEV
+ || sb->version == BCACHE_SB_VERSION_BDEV_WITH_OFFSET;
+}
+
+
struct bbio {
unsigned submit_time_us;
union {
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
index 83731dc..9f74aff 100644
--- a/drivers/md/bcache/request.c
+++ b/drivers/md/bcache/request.c
@@ -1220,7 +1220,7 @@ static void cached_dev_make_request(struct request_queue *q, struct bio *bio)
part_stat_unlock();

bio->bi_bdev = dc->bdev;
- bio->bi_sector += BDEV_DATA_START;
+ bio->bi_sector += dc->data_start_sector;

if (cached_dev_get(dc)) {
s = search_alloc(bio, d);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 5fa3cd2..a409bb5 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -148,7 +148,7 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
goto err;

err = "Unsupported superblock version";
- if (sb->version > BCACHE_SB_VERSION)
+ if (sb->version > BCACHE_SB_MAX_VERSION)
goto err;

err = "Bad block/bucket size";
@@ -168,7 +168,7 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
if (get_capacity(bdev->bd_disk) < sb->bucket_size * sb->nbuckets)
goto err;

- if (sb->version == CACHE_BACKING_DEV)
+ if (SB_IS_BDEV(sb))
goto out;

err = "Bad UUID";
@@ -286,7 +286,7 @@ void bcache_write_super(struct cache_set *c)
for_each_cache(ca, c, i) {
struct bio *bio = &ca->sb_bio;

- ca->sb.version = BCACHE_SB_VERSION;
+ ca->sb.version = BCACHE_SB_VERSION_CDEV_WITH_UUID;
ca->sb.seq = c->sb.seq;
ca->sb.last_mount = c->sb.last_mount;

@@ -1047,9 +1047,20 @@ static const char *register_bdev(struct cache_sb *sb, struct page *sb_page,
dc->bdev = bdev;
dc->bdev->bd_holder = dc;

+ err = "bad start sector";
+ if (sb->version == BCACHE_SB_VERSION_BDEV) {
+ dc->data_start_sector = BDEV_DATA_START_DEFAULT;
+ } else {
+ if (sb->keys < 1)
+ goto err;
+ dc->data_start_sector = sb->d[0];
+ if (dc->data_start_sector < BDEV_DATA_START_DEFAULT)
+ goto err;
+ }
+
g = dc->disk.disk;

- set_capacity(g, dc->bdev->bd_part->nr_sects - 16);
+ set_capacity(g, dc->bdev->bd_part->nr_sects - dc->data_start_sector);

bch_cached_dev_request_init(dc);

@@ -1802,7 +1813,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
if (err)
goto err_close;

- if (sb->version == CACHE_BACKING_DEV) {
+ if (SB_IS_BDEV(sb)) {
struct cached_dev *dc = kzalloc(sizeof(*dc), GFP_KERNEL);

err = register_bdev(sb, sb_page, bdev, dc);
--
1.8.2.rc3.7.g35aca0e

2013-04-08 20:50:05

by Kent Overstreet

[permalink] [raw]
Subject: Re: [PATCH] bcache: Take data offset from the bdev superblock.

On Mon, Apr 08, 2013 at 12:11:06PM +0200, Gabriel wrote:
> Add a new superblock version, and consolidate related defines.

So, I think BDEV_WITH_OFFSET looks ok, but what's the use case for it? I
was going to add it way back but we decided not to implement the hack we
thought we needed it for - if you or someone is going to use it I'll go
ahead and apply it.

As for BCACHE_SB_VERSION_CDEV_WITH_UUID, can you explain why you added
that? I suspect it's needed but I can't remember why I didn't add it
when I added the new UUID format (or perhaps I just forgot)

>
> Signed-off-by: Gabriel de Perthuis <[email protected]>
> ---
> drivers/md/bcache/bcache.h | 23 ++++++++++++++++++-----
> drivers/md/bcache/request.c | 2 +-
> drivers/md/bcache/super.c | 21 ++++++++++++++++-----
> 3 files changed, 35 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/md/bcache/bcache.h b/drivers/md/bcache/bcache.h
> index f057235..8a110e6 100644
> --- a/drivers/md/bcache/bcache.h
> +++ b/drivers/md/bcache/bcache.h
> @@ -223,11 +223,17 @@ struct bkey {
> #define BKEY_PADDED(key) \
> union { struct bkey key; uint64_t key ## _pad[BKEY_PAD]; }
>
> -/* Version 1: Backing device
> +/* Version 0: Cache device
> + * Version 1: Backing device
> * Version 2: Seed pointer into btree node checksum
> - * Version 3: New UUID format
> + * Version 3: Cache device with new UUID format
> + * Version 4: Backing device with data offset
> */
> -#define BCACHE_SB_VERSION 3
> +#define BCACHE_SB_VERSION_CDEV 0
> +#define BCACHE_SB_VERSION_BDEV 1
> +#define BCACHE_SB_VERSION_CDEV_WITH_UUID 3
> +#define BCACHE_SB_VERSION_BDEV_WITH_OFFSET 4
> +#define BCACHE_SB_MAX_VERSION 4
>
> #define SB_SECTOR 8
> #define SB_SIZE 4096
> @@ -236,13 +242,12 @@ struct bkey {
> /* SB_JOURNAL_BUCKETS must be divisible by BITS_PER_LONG */
> #define MAX_CACHES_PER_SET 8
>
> -#define BDEV_DATA_START 16 /* sectors */
> +#define BDEV_DATA_START_DEFAULT 16 /* sectors */
>
> struct cache_sb {
> uint64_t csum;
> uint64_t offset; /* sector where this sb was written */
> uint64_t version;
> -#define CACHE_BACKING_DEV 1
>
> uint8_t magic[16];
>
> @@ -485,6 +490,7 @@ struct cached_dev {
> * where it's at.
> */
> sector_t last_read;
> + sector_t data_start_sector;
>
> /* Number of writeback bios in flight */
> atomic_t in_flight;
> @@ -861,6 +867,13 @@ static inline bool key_merging_disabled(struct cache_set *c)
> #endif
> }
>
> +
> +static inline bool SB_IS_BDEV(const struct cache_sb *sb) {
> + return sb->version == BCACHE_SB_VERSION_BDEV
> + || sb->version == BCACHE_SB_VERSION_BDEV_WITH_OFFSET;
> +}
> +
> +
> struct bbio {
> unsigned submit_time_us;
> union {
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index 83731dc..9f74aff 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -1220,7 +1220,7 @@ static void cached_dev_make_request(struct request_queue *q, struct bio *bio)
> part_stat_unlock();
>
> bio->bi_bdev = dc->bdev;
> - bio->bi_sector += BDEV_DATA_START;
> + bio->bi_sector += dc->data_start_sector;
>
> if (cached_dev_get(dc)) {
> s = search_alloc(bio, d);
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 5fa3cd2..a409bb5 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -148,7 +148,7 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
> goto err;
>
> err = "Unsupported superblock version";
> - if (sb->version > BCACHE_SB_VERSION)
> + if (sb->version > BCACHE_SB_MAX_VERSION)
> goto err;
>
> err = "Bad block/bucket size";
> @@ -168,7 +168,7 @@ static const char *read_super(struct cache_sb *sb, struct block_device *bdev,
> if (get_capacity(bdev->bd_disk) < sb->bucket_size * sb->nbuckets)
> goto err;
>
> - if (sb->version == CACHE_BACKING_DEV)
> + if (SB_IS_BDEV(sb))
> goto out;
>
> err = "Bad UUID";
> @@ -286,7 +286,7 @@ void bcache_write_super(struct cache_set *c)
> for_each_cache(ca, c, i) {
> struct bio *bio = &ca->sb_bio;
>
> - ca->sb.version = BCACHE_SB_VERSION;
> + ca->sb.version = BCACHE_SB_VERSION_CDEV_WITH_UUID;
> ca->sb.seq = c->sb.seq;
> ca->sb.last_mount = c->sb.last_mount;
>
> @@ -1047,9 +1047,20 @@ static const char *register_bdev(struct cache_sb *sb, struct page *sb_page,
> dc->bdev = bdev;
> dc->bdev->bd_holder = dc;
>
> + err = "bad start sector";
> + if (sb->version == BCACHE_SB_VERSION_BDEV) {
> + dc->data_start_sector = BDEV_DATA_START_DEFAULT;
> + } else {
> + if (sb->keys < 1)
> + goto err;
> + dc->data_start_sector = sb->d[0];
> + if (dc->data_start_sector < BDEV_DATA_START_DEFAULT)
> + goto err;
> + }
> +
> g = dc->disk.disk;
>
> - set_capacity(g, dc->bdev->bd_part->nr_sects - 16);
> + set_capacity(g, dc->bdev->bd_part->nr_sects - dc->data_start_sector);
>
> bch_cached_dev_request_init(dc);
>
> @@ -1802,7 +1813,7 @@ static ssize_t register_bcache(struct kobject *k, struct kobj_attribute *attr,
> if (err)
> goto err_close;
>
> - if (sb->version == CACHE_BACKING_DEV) {
> + if (SB_IS_BDEV(sb)) {
> struct cached_dev *dc = kzalloc(sizeof(*dc), GFP_KERNEL);
>
> err = register_bdev(sb, sb_page, bdev, dc);
> --
> 1.8.2.rc3.7.g35aca0e
>

2013-04-08 21:24:12

by Gabriel de Perthuis

[permalink] [raw]
Subject: Re: [PATCH] bcache: Take data offset from the bdev superblock.

Le lun. 08 avril 2013 22:49:56 CEST, Kent Overstreet a écrit :
> On Mon, Apr 08, 2013 at 12:11:06PM +0200, Gabriel wrote:
>> Add a new superblock version, and consolidate related defines.
>
> So, I think BDEV_WITH_OFFSET looks ok, but what's the use case for it? I
> was going to add it way back but we decided not to implement the hack we
> thought we needed it for - if you or someone is going to use it I'll go
> ahead and apply it.

It's for converting existing devices to bcache.

https://github.com/g2p/blocks converts a partition to bcache by putting
a bcache superblock immediately before and shifting the partition start
to the left by exactly 1MB. The 1MB alignment is to play nice with
other partitioning tools and drives with 4k sectors.

blocks also converts logical volumes to bcache, and for that it has to
insert exactly 4MB (an LVM physical extent) before the filesystem data.

I'm already using the new format, it allowed me to get rid of some
complicated stuff that sandwiched a partition table on top of an LV so
that the original filesystem data was at the start of its container device.

> As for BCACHE_SB_VERSION_CDEV_WITH_UUID, can you explain why you added
> that? I suspect it's needed but I can't remember why I didn't add it
> when I added the new UUID format (or perhaps I just forgot)

I took the name from a comment in the kernel-side bcache.h.
BCACHE_SB_VERSION_CDEV is the version make-bcache writes, and
BCACHE_SB_VERSION_CDEV_WITH_UUID is what the kernel updates it too; I
just changed the version names so that user-side and kernel-side were
more consistent, internally and with each other.
The kernel doesn't discriminate these two versions when opening, so it
should be possible to define only the latter and deprecate the other.

2013-04-08 21:44:49

by Gabriel de Perthuis

[permalink] [raw]
Subject: Re: [PATCH] bcache: Take data offset from the bdev superblock.

>> As for BCACHE_SB_VERSION_CDEV_WITH_UUID, can you explain why you added
>> that? I suspect it's needed but I can't remember why I didn't add it
>> when I added the new UUID format (or perhaps I just forgot)
>
> I took the name from a comment in the kernel-side bcache.h.
> BCACHE_SB_VERSION_CDEV is the version make-bcache writes, and
> BCACHE_SB_VERSION_CDEV_WITH_UUID is what the kernel updates it too; I
> just changed the version names so that user-side and kernel-side were
> more consistent, internally and with each other.
> The kernel doesn't discriminate these two versions when opening, so it
> should be possible to define only the latter and deprecate the other.

To be more clear, I've replaced the kernel's BCACHE_SB_VERSION by
BCACHE_SB_VERSION_CDEV_WITH_UUID or BCACHE_SB_MAX_VERSION depending on
the intent.