2015-07-13 08:39:25

by Nick Wang

[permalink] [raw]
Subject: [Patch v2 00/10] Zero out devices instead of initial full sync

Patch set is based on drbd-8.4 859e34a9, have
already compiled/tested against SLES12.

If this feature can be merged into upstream, please
ignore v1, since v1 may caused pingACK timeout when
zeroing out large device.

Compare to v1, changes are as follow:
1. Using drbd_device_post_work to zero out device as background
task, so that it won't block pingACK when zeroing out
large device.

2. Fix bug of won't update peer node status if it finished
zeroing out earier.

3. Change some functions from file drbd_receive.c to
drbd_worker.c and reorder the patch set.


Full sync for drbd initial usually take a long time, especically
when network become the bottleneck of the syncing. Simply skip
the full sync with "--clear-bitmap" may not the perfect solution
for all the cases, like using the bare device(no filesystem) to
work,etc database,vm... This patche set can be used to zero out
devices locally instead of a full sync to make the consistent
block device. This approach can be useful when lack of network
bandwidth to sync.

The patches add one new option "--zap-devices" to "new-current-uuid"
to zero out devices. It will start zeroing out devices of both
side.

Nick Wang (10):
drbd: Fix the wrong logic of move history.
drbd: Add options zap_devices to new-current-uuid
drbd: Add a function to zero out drbd backing device.
drbd: New packet P_ZERO_OUT.
drbd: Functions to notify peer node to start
zeroing out and zero out finished.
drbd: Wapper for zeroing out device by worker.
drbd: Add flag for drbd device work.
drbd: Function to work with packet P_ZERO_OUT.
drbd: Receive zero out command from peer node.
drbd: Handle new-current-uuid --zap-devices.

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]

drbd/drbd_int.h | 13 ++++++++
drbd/drbd_main.c | 51 ++++++++++++++++++++++++++--
drbd/drbd_nl.c | 21 +++++++++++-
drbd/drbd_protocol.h | 1 +
drbd/drbd_receiver.c | 69 ++++++++++++++++++++++++++++++++++++++
drbd/drbd_worker.c | 90 ++++++++++++++++++++++++++++++++++++++++++++++++++
drbd/linux/drbd_genl.h | 1 +
7 files changed, 243 insertions(+), 3 deletions(-)

--
1.8.4.5


2015-07-13 08:39:26

by Nick Wang

[permalink] [raw]
Subject: [Patch v2 01/10] drbd: Fix the wrong logic of moving history

Logic of moving history is wrong. May overlap
history when more than two history.
If won't extent more that two history. The loop
can be delete.

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]
---
drbd/drbd_main.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drbd/drbd_main.c b/drbd/drbd_main.c
index 9871894..a3dc39e 100644
--- a/drbd/drbd_main.c
+++ b/drbd/drbd_main.c
@@ -3466,8 +3466,8 @@ void drbd_uuid_move_history(struct drbd_device *device) __must_hold(local)
{
int i;

- for (i = UI_HISTORY_START; i < UI_HISTORY_END; i++)
- device->ldev->md.uuid[i+1] = device->ldev->md.uuid[i];
+ for (i = UI_HISTORY_END; i > UI_HISTORY_START; i--)
+ device->ldev->md.uuid[i] = device->ldev->md.uuid[i-1];
}

void __drbd_uuid_set(struct drbd_device *device, int idx, u64 val) __must_hold(local)
--
1.8.4.5

2015-07-13 08:41:56

by Nick Wang

[permalink] [raw]
Subject: [Patch v2 02/10] drbd: Add option zap_devices to new-current-uuid

Using zeroing out device instead of initial
full sync of device. Can be useful in high
latency network environment.

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]
---
drbd/drbd_nl.c | 9 +++++++++
drbd/linux/drbd_genl.h | 1 +
2 files changed, 10 insertions(+)

diff --git a/drbd/drbd_nl.c b/drbd/drbd_nl.c
index 691b615..1d17663 100644
--- a/drbd/drbd_nl.c
+++ b/drbd/drbd_nl.c
@@ -4017,6 +4017,7 @@ int drbd_adm_new_c_uuid(struct sk_buff *skb, struct genl_info *info)
struct drbd_device *device;
enum drbd_ret_code retcode;
int skip_initial_sync = 0;
+ int zero_out_devices = 0;
int err;
struct new_c_uuid_parms args;

@@ -4051,6 +4052,14 @@ int drbd_adm_new_c_uuid(struct sk_buff *skb, struct genl_info *info)
device->ldev->md.uuid[UI_CURRENT] == UUID_JUST_CREATED && args.clear_bm) {
drbd_info(device, "Preparing to skip initial sync\n");
skip_initial_sync = 1;
+ /* this is "zero out" devices to make it all zero.
+ * ignore "zero out" if both "clear_bm" and "zap_devices" set. */
+ } else if (device->state.conn == C_CONNECTED &&
+ first_peer_device(device)->connection->agreed_pro_version >= 90 &&
+ device->ldev->md.uuid[UI_CURRENT] == UUID_JUST_CREATED &&
+ args.zap_devices) {
+ drbd_info(device, "Preparing to zero out devices, will take a long time\n");
+ zero_out_devices = 1;
} else if (device->state.conn != C_STANDALONE) {
retcode = ERR_CONNECTED;
goto out_dec;
diff --git a/drbd/linux/drbd_genl.h b/drbd/linux/drbd_genl.h
index 5db53f5..eef8d8c 100644
--- a/drbd/linux/drbd_genl.h
+++ b/drbd/linux/drbd_genl.h
@@ -240,6 +240,7 @@ GENL_struct(DRBD_NLA_START_OV_PARMS, 9, start_ov_parms,

GENL_struct(DRBD_NLA_NEW_C_UUID_PARMS, 10, new_c_uuid_parms,
__flg_field(1, DRBD_GENLA_F_MANDATORY, clear_bm)
+ __flg_field(2, DRBD_GENLA_F_MANDATORY, zap_devices)
)

GENL_struct(DRBD_NLA_TIMEOUT_PARMS, 11, timeout_parms,
--
1.8.4.5

2015-07-13 08:39:40

by Nick Wang

[permalink] [raw]
Subject: [Patch v2 03/10] drbd: A function to zero out drbd backing device

The function can be used to zero out the whole
backing device.

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]
---
drbd/drbd_int.h | 1 +
drbd/drbd_worker.c | 21 +++++++++++++++++++++
2 files changed, 22 insertions(+)

diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
index a234228..9ecf971 100644
--- a/drbd/drbd_int.h
+++ b/drbd/drbd_int.h
@@ -1662,6 +1662,7 @@ extern void drbd_send_acks_wf(struct work_struct *ws);
extern bool drbd_rs_c_min_rate_throttle(struct drbd_device *device);
extern bool drbd_rs_should_slow_down(struct drbd_device *device, sector_t sector,
bool throttle_if_app_is_waiting);
+extern int zero_out_local_device(struct drbd_device *device);
extern int drbd_submit_peer_request(struct drbd_device *,
struct drbd_peer_request *, const unsigned,
const int);
diff --git a/drbd/drbd_worker.c b/drbd/drbd_worker.c
index 2a15aeb..50564f5 100644
--- a/drbd/drbd_worker.c
+++ b/drbd/drbd_worker.c
@@ -1653,6 +1653,27 @@ void drbd_rs_controller_reset(struct drbd_device *device)
rcu_read_unlock();
}

+/**
+ * zero_out_local_device()
+ * @device: DRBD device.
+ *
+ * Description:
+ * Zero out drbd backing device when creating new uuid.
+ *
+**/
+int zero_out_local_device(struct drbd_device *device)
+{
+ struct block_device *bdev;
+
+ bdev = device->ldev->backing_bdev;
+ if (device->ldev->known_size != drbd_get_capacity(bdev))
+ device->ldev->known_size = drbd_get_capacity(bdev);
+
+ /* zero out the backing device */
+ return blkdev_issue_zeroout(bdev, 0,
+ device->ldev->known_size, GFP_NOIO, false);
+}
+
void start_resync_timer_fn(unsigned long data)
{
struct drbd_device *device = (struct drbd_device *) data;
--
1.8.4.5

2015-07-13 08:41:07

by Nick Wang

[permalink] [raw]
Subject: [Patch v2 04/10] drbd: New packet P_ZERO_OUT

Using packet P_ZERO_OUT to get peer node's result
of zeroing out.

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]
---
drbd/drbd_int.h | 5 +++++
drbd/drbd_main.c | 28 ++++++++++++++++++++++++++++
drbd/drbd_protocol.h | 1 +
3 files changed, 34 insertions(+)

diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
index 9ecf971..014b65e 100644
--- a/drbd/drbd_int.h
+++ b/drbd/drbd_int.h
@@ -622,6 +622,9 @@ enum {
RS_START, /* tell worker to start resync/OV */
RS_PROGRESS, /* tell worker that resync made significant progress */
RS_DONE, /* tell worker that resync is done */
+ /* used for zero out device */
+ ZERO_DONE, /* succeed on zero out a device */
+ ZERO_FAIL, /* fail to zero out a device */
};

struct drbd_bitmap; /* opaque for drbd_device */
@@ -1205,6 +1208,8 @@ extern int __drbd_send_protocol(struct drbd_connection *connection, enum drbd_pa
extern int drbd_send_protocol(struct drbd_connection *connection);
extern int drbd_send_uuids(struct drbd_peer_device *);
extern int drbd_send_uuids_skip_initial_sync(struct drbd_peer_device *);
+extern int drbd_send_zero_out_ok(struct drbd_peer_device *);
+extern int drbd_send_zero_out_fail(struct drbd_peer_device *);
extern void drbd_gen_and_send_sync_uuid(struct drbd_peer_device *);
extern int drbd_send_sizes(struct drbd_peer_device *peer_device, int trigger_reply, enum dds_flags flags);
extern int drbd_send_state(struct drbd_peer_device *, union drbd_state);
diff --git a/drbd/drbd_main.c b/drbd/drbd_main.c
index a3dc39e..740015e 100644
--- a/drbd/drbd_main.c
+++ b/drbd/drbd_main.c
@@ -908,6 +908,34 @@ int drbd_send_uuids_skip_initial_sync(struct drbd_peer_device *peer_device)
return _drbd_send_uuids(peer_device, 8);
}

+/**
+ * _drbd_send_zero_out_state() - Sends the drbd state to the peer
+ * @peer_device: DRBD peer device.
+ * @state: Device zero out status.
+ */
+static int _drbd_send_zero_out_state(struct drbd_peer_device *peer_device, unsigned int status)
+{
+ struct drbd_socket *sock;
+ struct p_state *p;
+
+ sock = &peer_device->connection->data;
+ p = drbd_prepare_command(peer_device, sock);
+ if (!p)
+ return -EIO;
+ p->state = cpu_to_be32(status);
+ return drbd_send_command(peer_device, sock, P_ZERO_OUT, sizeof(*p), NULL, 0);
+}
+
+int drbd_send_zero_out_ok(struct drbd_peer_device *peer_device)
+{
+ return _drbd_send_zero_out_state(peer_device, 0);
+}
+
+int drbd_send_zero_out_fail(struct drbd_peer_device *peer_device)
+{
+ return _drbd_send_zero_out_state(peer_device, 1);
+}
+
void drbd_print_uuids(struct drbd_device *device, const char *text)
{
if (get_ldev_if_state(device, D_NEGOTIATING)) {
diff --git a/drbd/drbd_protocol.h b/drbd/drbd_protocol.h
index 405b181..3a82442 100644
--- a/drbd/drbd_protocol.h
+++ b/drbd/drbd_protocol.h
@@ -59,6 +59,7 @@ enum drbd_packet {
/* REQ_DISCARD. We used "discard" in different contexts before,
* which is why I chose TRIM here, to disambiguate. */
P_TRIM = 0x31,
+ P_ZERO_OUT = 0x32,

P_MAY_IGNORE = 0x100, /* Flag to test if (cmd > P_MAY_IGNORE) ... */
P_MAX_OPT_CMD = 0x101,
--
1.8.4.5

2015-07-13 08:41:06

by Nick Wang

[permalink] [raw]
Subject: [Patch v2 05/10] drbd: Functions to notify peer node to zero out

Notify peer node to start zeroing out device.
Update state of peer node when both nodes are
finished zeroing.

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]
---
drbd/drbd_int.h | 2 ++
drbd/drbd_main.c | 19 +++++++++++++++++++
2 files changed, 21 insertions(+)

diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
index 014b65e..f43f957 100644
--- a/drbd/drbd_int.h
+++ b/drbd/drbd_int.h
@@ -1208,6 +1208,8 @@ extern int __drbd_send_protocol(struct drbd_connection *connection, enum drbd_pa
extern int drbd_send_protocol(struct drbd_connection *connection);
extern int drbd_send_uuids(struct drbd_peer_device *);
extern int drbd_send_uuids_skip_initial_sync(struct drbd_peer_device *);
+extern int drbd_send_zero_out_start(struct drbd_peer_device *);
+extern int drbd_send_zero_out_finish(struct drbd_peer_device *);
extern int drbd_send_zero_out_ok(struct drbd_peer_device *);
extern int drbd_send_zero_out_fail(struct drbd_peer_device *);
extern void drbd_gen_and_send_sync_uuid(struct drbd_peer_device *);
diff --git a/drbd/drbd_main.c b/drbd/drbd_main.c
index 740015e..2b821cd 100644
--- a/drbd/drbd_main.c
+++ b/drbd/drbd_main.c
@@ -908,6 +908,25 @@ int drbd_send_uuids_skip_initial_sync(struct drbd_peer_device *peer_device)
return _drbd_send_uuids(peer_device, 8);
}

+
+/**
+ * drbd_send_zero_out_start() - Notify peer node to start zero out
+ * @peer_device: DRBD peer device.
+ */
+int drbd_send_zero_out_start(struct drbd_peer_device *peer_device)
+{
+ return _drbd_send_uuids(peer_device, 16);
+}
+
+/**
+ * drbd_send_zero_out_finish() - Notify both node finished zeroing out
+ * @peer_device: DRBD peer device.
+ */
+int drbd_send_zero_out_finish(struct drbd_peer_device *peer_device)
+{
+ return _drbd_send_uuids(peer_device, 32);
+}
+
/**
* _drbd_send_zero_out_state() - Sends the drbd state to the peer
* @peer_device: DRBD peer device.
--
1.8.4.5

2015-07-13 08:39:56

by Nick Wang

[permalink] [raw]
Subject: [Patch v2 06/10] drbd: Wapper for zeroing out device by worker

Wapper functions for drbd_device_post_work to start
zeroing out device. Change state when both node
finish zeroing.

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]
---
drbd/drbd_int.h | 2 ++
drbd/drbd_worker.c | 63 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 65 insertions(+)

diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
index f43f957..dd680a9 100644
--- a/drbd/drbd_int.h
+++ b/drbd/drbd_int.h
@@ -1670,6 +1670,8 @@ extern bool drbd_rs_c_min_rate_throttle(struct drbd_device *device);
extern bool drbd_rs_should_slow_down(struct drbd_device *device, sector_t sector,
bool throttle_if_app_is_waiting);
extern int zero_out_local_device(struct drbd_device *device);
+extern void require_zero_out_local_device(struct drbd_device *device);
+extern void receive_zero_out_local_device(struct drbd_device *device);
extern int drbd_submit_peer_request(struct drbd_device *,
struct drbd_peer_request *, const unsigned,
const int);
diff --git a/drbd/drbd_worker.c b/drbd/drbd_worker.c
index 50564f5..293aa27 100644
--- a/drbd/drbd_worker.c
+++ b/drbd/drbd_worker.c
@@ -1674,6 +1674,69 @@ int zero_out_local_device(struct drbd_device *device)
device->ldev->known_size, GFP_NOIO, false);
}

+/**
+ * require_zero_out_local_device()
+ * @device: DRBD device.
+ *
+ * Description:
+ * Start to zero out local device. Update
+ * status if peer node (secondary) finished
+ * zeroing.
+ *
+**/
+void require_zero_out_local_device(struct drbd_device *device)
+{
+ int zero_out_err = 0;
+
+ zero_out_err = zero_out_local_device(device);
+
+ if (zero_out_err) {
+ drbd_err(device, "Failed to zero out local device\n");
+ set_bit(ZERO_FAIL, &device->flags);
+ drbd_chk_io_error(device, 1, DRBD_WRITE_ERROR);
+ } else {
+ drbd_info(device, "Finished zero out local device.\n");
+
+ if (test_and_clear_bit(ZERO_DONE, &device->flags)) {
+ spin_lock_irq(&device->resource->req_lock);
+ _drbd_set_state(_NS2(device, disk, D_UP_TO_DATE,
+ pdsk, D_UP_TO_DATE), CS_VERBOSE, NULL);
+ spin_unlock_irq(&device->resource->req_lock);
+ drbd_send_zero_out_finish(first_peer_device(device));
+ } else if (test_and_clear_bit(ZERO_FAIL, &device->flags)) {
+ drbd_info(device, "Peer device has already failed on zero out\n");
+ } else {
+ /* waiting for peer device finished */
+ set_bit(ZERO_DONE, &device->flags);
+ }
+ }
+}
+
+/**
+ * receive_zero_out_local_device()
+ * @device: DRBD device.
+ *
+ * Description:
+ * Start to zero out local device.
+ * Notify peer node the zeroing result.
+ *
+**/
+void receive_zero_out_local_device(struct drbd_device *device)
+{
+ int zero_out_err = 0;
+ struct drbd_peer_device *const peer_device = first_peer_device(device);
+
+ zero_out_err = zero_out_local_device(device);
+ if (zero_out_err) {
+ drbd_err(device, "Failed to zero out local device\n");
+ drbd_send_zero_out_fail(peer_device);
+ drbd_chk_io_error(device, 1, DRBD_WRITE_ERROR);
+ } else {
+ drbd_info(device, "Finished zero out local device.\n");
+ drbd_send_zero_out_ok(peer_device);
+ }
+}
+
void start_resync_timer_fn(unsigned long data)
{
struct drbd_device *device = (struct drbd_device *) data;
--
1.8.4.5

2015-07-13 08:39:55

by Nick Wang

[permalink] [raw]
Subject: [Patch v2 07/10] drbd: Flags for background drbd device work

Background drbd device work for zeroing out device.

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]
---
drbd/drbd_int.h | 3 +++
drbd/drbd_worker.c | 6 ++++++
2 files changed, 9 insertions(+)

diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
index dd680a9..287ffd7 100644
--- a/drbd/drbd_int.h
+++ b/drbd/drbd_int.h
@@ -622,6 +622,9 @@ enum {
RS_START, /* tell worker to start resync/OV */
RS_PROGRESS, /* tell worker that resync made significant progress */
RS_DONE, /* tell worker that resync is done */
+ P_ZERO_START, /* tell worker to zero out device */
+ S_ZERO_START, /* tell worker to zero out device as requested*/
+
/* used for zero out device */
ZERO_DONE, /* succeed on zero out a device */
ZERO_FAIL, /* fail to zero out a device */
diff --git a/drbd/drbd_worker.c b/drbd/drbd_worker.c
index 293aa27..23e82c1 100644
--- a/drbd/drbd_worker.c
+++ b/drbd/drbd_worker.c
@@ -2070,6 +2070,10 @@ static void do_device_work(struct drbd_device *device, const unsigned long todo)
drbd_ldev_destroy(device);
if (test_bit(RS_START, &todo))
do_start_resync(device);
+ if (test_bit(P_ZERO_START, &todo))
+ require_zero_out_local_device(device);
+ if (test_bit(S_ZERO_START, &todo))
+ receive_zero_out_local_device(device);
}

#define DRBD_DEVICE_WORK_MASK \
@@ -2079,6 +2083,8 @@ static void do_device_work(struct drbd_device *device, const unsigned long todo)
|(1UL << RS_START) \
|(1UL << RS_PROGRESS) \
|(1UL << RS_DONE) \
+ |(1UL << P_ZERO_START) \
+ |(1UL << S_ZERO_START) \
)

static unsigned long get_work_bits(unsigned long *flags)
--
1.8.4.5

2015-07-13 08:41:05

by Nick Wang

[permalink] [raw]
Subject: [Patch v2 08/10] drbd: Function to work with packet P_ZERO_OUT

Using packet P_ZERO_OUT to update zero out
status of peer node.

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]
---
drbd/drbd_receiver.c | 36 ++++++++++++++++++++++++++++++++++++
1 file changed, 36 insertions(+)

diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
index 5e6b149..6eae84f 100644
--- a/drbd/drbd_receiver.c
+++ b/drbd/drbd_receiver.c
@@ -4383,6 +4383,41 @@ static union drbd_state convert_state(union drbd_state ps)
return ms;
}

+static int receive_zero_out_state(struct drbd_connection *connection, struct packet_info *pi)
+{
+ struct drbd_peer_device *peer_device;
+ struct drbd_device *device;
+ struct p_state *p = pi->data;
+ unsigned int isfail;
+
+ peer_device = conn_peer_device(connection, pi->vnr);
+ if (!peer_device)
+ return -EIO;
+ device = peer_device->device;
+
+ isfail = be32_to_cpu(p->state);
+
+ if (isfail) {
+ drbd_info(device, "Failed to zero out peer device\n");
+ set_bit(ZERO_FAIL, &device->flags);
+ } else {
+ drbd_info(device, "Finished zero out peer device\n");
+ if (test_and_clear_bit(ZERO_DONE, &device->flags)) {
+ drbd_info(device, "Both side finished zeroing.\n");
+ spin_lock_irq(&device->resource->req_lock);
+ _drbd_set_state(_NS2(device, disk, D_UP_TO_DATE,
+ pdsk, D_UP_TO_DATE), CS_VERBOSE, NULL);
+ spin_unlock_irq(&device->resource->req_lock);
+ drbd_send_zero_out_finish(peer_device);
+ } else {
+ /* waiting for local device finished */
+ set_bit(ZERO_DONE, &device->flags);
+ }
+ }
+
+ return 0;
+}
+
static int receive_req_state(struct drbd_connection *connection, struct packet_info *pi)
{
struct drbd_peer_device *peer_device;
@@ -5008,6 +5043,7 @@ static struct data_cmd drbd_cmd_handler[] = {
[P_CONN_ST_CHG_REQ] = { 0, sizeof(struct p_req_state), receive_req_conn_state },
[P_PROTOCOL_UPDATE] = { 1, sizeof(struct p_protocol), receive_protocol },
[P_TRIM] = { 0, sizeof(struct p_trim), receive_Data },
+ [P_ZERO_OUT] = { 0, sizeof(struct p_state), receive_zero_out_state },
};

static void drbdd(struct drbd_connection *connection)
--
1.8.4.5

2015-07-13 08:39:57

by Nick Wang

[permalink] [raw]
Subject: [Patch v2 09/10] drbd: Handle zero out command from peer node

Recevie P_UUID flag 16 for starting zero out device,
P_UUID flag 32 after both side finished zeroing,
Change state to uptodate.

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]
---
drbd/drbd_receiver.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)

diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
index 6eae84f..4d6d99a 100644
--- a/drbd/drbd_receiver.c
+++ b/drbd/drbd_receiver.c
@@ -4317,6 +4317,15 @@ static int receive_uuids(struct drbd_connection *connection, struct packet_info
peer_device->connection->agreed_pro_version >= 90 &&
device->ldev->md.uuid[UI_CURRENT] == UUID_JUST_CREATED &&
(p_uuid[UI_FLAGS] & 8);
+ int zero_out_devices =
+ device->state.conn == C_CONNECTED &&
+ peer_device->connection->agreed_pro_version >= 90 &&
+ device->ldev->md.uuid[UI_CURRENT] == UUID_JUST_CREATED &&
+ (p_uuid[UI_FLAGS] & 16);
+ int zero_out_finish =
+ device->state.conn == C_CONNECTED &&
+ peer_device->connection->agreed_pro_version >= 90 &&
+ (p_uuid[UI_FLAGS] & 32);
if (skip_initial_sync) {
drbd_info(device, "Accepted new current UUID, preparing to skip initial sync\n");
drbd_bitmap_io(device, &drbd_bmio_clear_n_write,
@@ -4324,11 +4333,35 @@ static int receive_uuids(struct drbd_connection *connection, struct packet_info
BM_LOCKED_TEST_ALLOWED);
_drbd_uuid_set(device, UI_CURRENT, p_uuid[UI_CURRENT]);
_drbd_uuid_set(device, UI_BITMAP, 0);
+ spin_lock_irq(&device->resource->req_lock);
_drbd_set_state(_NS2(device, disk, D_UP_TO_DATE, pdsk, D_UP_TO_DATE),
CS_VERBOSE, NULL);
+ spin_unlock_irq(&device->resource->req_lock);
drbd_md_sync(device);
updated_uuids = 1;
}
+
+ if (zero_out_devices) {
+ drbd_info(device, "Accepted to zero out devices, will take a long time\n");
+ drbd_bitmap_io(device, &drbd_bmio_clear_n_write,
+ "clear_n_write from receive_uuids",
+ BM_LOCKED_TEST_ALLOWED);
+ _drbd_uuid_set(device, UI_CURRENT, p_uuid[UI_CURRENT]);
+ _drbd_uuid_set(device, UI_BITMAP, 0);
+ drbd_print_uuids(device, "cleared bitmap UUID for zeroing device");
+
+ drbd_device_post_work(device, S_ZERO_START);
+ updated_uuids = 1;
+ }
+
+ if (zero_out_finish) {
+ drbd_info(device, "Both side finished zero out devices.\n");
+ spin_lock_irq(&device->resource->req_lock);
+ _drbd_set_state(_NS2(device, disk, D_UP_TO_DATE, pdsk, D_UP_TO_DATE),
+ CS_VERBOSE, NULL);
+ spin_unlock_irq(&device->resource->req_lock);
+ }
+
put_ldev(device);
} else if (device->state.disk < D_INCONSISTENT &&
device->state.role == R_PRIMARY) {
--
1.8.4.5

2015-07-13 08:39:58

by Nick Wang

[permalink] [raw]
Subject: [Patch v2 10/10] drbd: Handle new-current-uuid --zap-devices

Zap devices for zeroing out device on both side
instead of initial full sync.

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]
---
drbd/drbd_nl.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drbd/drbd_nl.c b/drbd/drbd_nl.c
index 1d17663..131f112 100644
--- a/drbd/drbd_nl.c
+++ b/drbd/drbd_nl.c
@@ -4068,7 +4068,7 @@ int drbd_adm_new_c_uuid(struct sk_buff *skb, struct genl_info *info)
drbd_uuid_set(device, UI_BITMAP, 0); /* Rotate UI_BITMAP to History 1, etc... */
drbd_uuid_new_current(device); /* New current, previous to UI_BITMAP */

- if (args.clear_bm) {
+ if (args.clear_bm || args.zap_devices) {
err = drbd_bitmap_io(device, &drbd_bmio_clear_n_write,
"clear_n_write from new_c_uuid", BM_LOCKED_MASK);
if (err) {
@@ -4084,6 +4084,16 @@ int drbd_adm_new_c_uuid(struct sk_buff *skb, struct genl_info *info)
CS_VERBOSE, NULL);
spin_unlock_irq(&device->resource->req_lock);
}
+ if (zero_out_devices) {
+ drbd_send_zero_out_start(first_peer_device(device));
+ _drbd_uuid_set(device, UI_BITMAP, 0);
+ drbd_print_uuids(device, "cleared bitmap UUID for zeroing device");
+
+ /* CLear bit flag of zero out */
+ clear_bit(ZERO_DONE, &device->flags);
+ clear_bit(ZERO_FAIL, &device->flags);
+ drbd_device_post_work(device, P_ZERO_START);
+ }
}

drbd_md_sync(device);
--
1.8.4.5

2015-07-31 12:49:11

by Philipp Reisner

[permalink] [raw]
Subject: Re: [Drbd-dev] [Patch v2 00/10] Zero out devices instead of initial full sync

Hi Nick,

finally I have time to really review it:

* it uses blkdev_issue_zeroout() with the discard parameter
set to false. I.e. it will completely allocate a thinly
provided backing device. Please make this more generic.
Maybe two one option --zeroout-device and --discard-device
instread of --zap-device

* The patch-set is not very complex, I am fine with having
this in a single patch

* It introduces a new packet into the protocol without
bumping the protocol version or introducing a protocol
reature flag. Do that, and make sure to send the new
packets only when you know that the peer is recent enough
to process them.
See here for an example:
http://git.drbd.org/gitweb.cgi?p=drbd-8.4.git;a=commitdiff;h=476039f699948155d71d6f86323a3b16e6d05f0c;hp=4c7521e19c6c2c046be6547490334294b6f190e4

Best regards,
phil

> Patch set is based on drbd-8.4 859e34a9, have
> already compiled/tested against SLES12.
>
> If this feature can be merged into upstream, please
> ignore v1, since v1 may caused pingACK timeout when
> zeroing out large device.
>
> Compare to v1, changes are as follow:
> 1. Using drbd_device_post_work to zero out device as background
> task, so that it won't block pingACK when zeroing out
> large device.
>
> 2. Fix bug of won't update peer node status if it finished
> zeroing out earier.
>
> 3. Change some functions from file drbd_receive.c to
> drbd_worker.c and reorder the patch set.
>
>
> Full sync for drbd initial usually take a long time, especically
> when network become the bottleneck of the syncing. Simply skip
> the full sync with "--clear-bitmap" may not the perfect solution
> for all the cases, like using the bare device(no filesystem) to
> work,etc database,vm... This patche set can be used to zero out
> devices locally instead of a full sync to make the consistent
> block device. This approach can be useful when lack of network
> bandwidth to sync.
>
> The patches add one new option "--zap-devices" to "new-current-uuid"
> to zero out devices. It will start zeroing out devices of both
> side.
>
> Nick Wang (10):
> drbd: Fix the wrong logic of move history.
> drbd: Add options zap_devices to new-current-uuid
> drbd: Add a function to zero out drbd backing device.
> drbd: New packet P_ZERO_OUT.
> drbd: Functions to notify peer node to start
> zeroing out and zero out finished.
> drbd: Wapper for zeroing out device by worker.
> drbd: Add flag for drbd device work.
> drbd: Function to work with packet P_ZERO_OUT.
> drbd: Receive zero out command from peer node.
> drbd: Handle new-current-uuid --zap-devices.
>
> Signed-off-by: Nick Wang <[email protected]>
> CC: Philipp Reisner <[email protected]>
> CC: Lars Ellenberg <[email protected]>
> CC: [email protected]
> CC: [email protected]
>
> drbd/drbd_int.h | 13 ++++++++
> drbd/drbd_main.c | 51 ++++++++++++++++++++++++++--
> drbd/drbd_nl.c | 21 +++++++++++-
> drbd/drbd_protocol.h | 1 +
> drbd/drbd_receiver.c | 69 ++++++++++++++++++++++++++++++++++++++
> drbd/drbd_worker.c | 90
> ++++++++++++++++++++++++++++++++++++++++++++++++++ drbd/linux/drbd_genl.h |
> 1 +
> 7 files changed, 243 insertions(+), 3 deletions(-)

2015-07-31 12:51:09

by Philipp Reisner

[permalink] [raw]
Subject: Re: [Drbd-dev] [Patch v2 02/10] drbd: Add option zap_devices to new-current-uuid

Hi Nick,

When introducing new fields alsways mark them as optional. Otherwise
a new kernel/module code would refuse netlink messages from older
drbd-utils:

> + __flg_field(2, DRBD_GENLA_F_MANDATORY, zap_devices)

__flg_field(2, /* OPTIONAL */ 0, zap_devices)

best regards,
Phil

2015-08-06 10:04:49

by Nick Wang

[permalink] [raw]
Subject: [PATCH v3 0/1] Zeroout/discard devices instead of initial full sync

Patch set is based on drbd-8.4 3ae8af0b, may confilct with
branch rs-discard-granularity of new feature flag, have
already compiled/tested against SLES12.

Changes compare to v2:
1. two options for new-current-uuid --zeroout-devices and --discard-devices.
2. Create a new feature flag FF_DISCARD.
3. Mark optional for zeroout-devices and discard-devices.
4. Merge patch set into one patch.

Full sync for drbd initial usually take a long time, especically
when network become the bottleneck of the syncing. Simply skip
the full sync with "--clear-bitmap" may not the perfect solution
for all the cases, like using the bare device(no filesystem) to
work,etc database,vm... This patche set can be used to zero out
devices locally instead of a full sync to make the consistent
block device. This approach can be useful when lack of network
bandwidth to sync.

The patches add one new option "--zap-devices" to "new-current-uuid"
to zero out devices. It will start zeroing out devices of both
side.

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]

Nick Wang (1):
drbd: Support zeroout device instead of initial full sync

drbd/drbd_int.h | 15 +++++++
drbd/drbd_main.c | 60 +++++++++++++++++++++++++++-
drbd/drbd_nl.c | 41 +++++++++++++++++--
drbd/drbd_protocol.h | 2 +
drbd/drbd_receiver.c | 86 +++++++++++++++++++++++++++++++++++++++-
drbd/drbd_worker.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++++
drbd/linux/drbd_genl.h | 2 +
7 files changed, 305 insertions(+), 6 deletions(-)

--
2.1.4

2015-08-06 10:05:01

by Nick Wang

[permalink] [raw]
Subject: [PATCH] drbd: Support zeroout device instead of initial full sync

Patch set for zeroing out device on both side
instead of initial full sync. Useful for high
latency network environment.

Implement --zeroout-devices and --discard-devices
for new-current-uuid

Signed-off-by: Nick Wang <[email protected]>
CC: Philipp Reisner <[email protected]>
CC: Lars Ellenberg <[email protected]>
CC: [email protected]
CC: [email protected]
---
drbd/drbd_int.h | 15 +++++++
drbd/drbd_main.c | 60 +++++++++++++++++++++++++++-
drbd/drbd_nl.c | 41 +++++++++++++++++--
drbd/drbd_protocol.h | 2 +
drbd/drbd_receiver.c | 86 +++++++++++++++++++++++++++++++++++++++-
drbd/drbd_worker.c | 105 +++++++++++++++++++++++++++++++++++++++++++++++++
drbd/linux/drbd_genl.h | 2 +
7 files changed, 305 insertions(+), 6 deletions(-)

diff --git a/drbd/drbd_int.h b/drbd/drbd_int.h
index d1e2bc0..555b24c 100644
--- a/drbd/drbd_int.h
+++ b/drbd/drbd_int.h
@@ -621,6 +621,13 @@ enum {
RS_START, /* tell worker to start resync/OV */
RS_PROGRESS, /* tell worker that resync made significant progress */
RS_DONE, /* tell worker that resync is done */
+ P_ZERO_START, /* tell worker to zero out device */
+ S_ZERO_START, /* tell worker to zero out device as requested*/
+
+ /* used for zero out/discard device */
+ DISCARD_DISK, /* flag to discard device */
+ ZERO_DONE, /* succeed on zero out a device */
+ ZERO_FAIL, /* fail to zero out a device */
};

struct drbd_bitmap; /* opaque for drbd_device */
@@ -1204,6 +1211,11 @@ extern int __drbd_send_protocol(struct drbd_connection *connection, enum drbd_pa
extern int drbd_send_protocol(struct drbd_connection *connection);
extern int drbd_send_uuids(struct drbd_peer_device *);
extern int drbd_send_uuids_skip_initial_sync(struct drbd_peer_device *);
+extern int drbd_send_zeroout_start(struct drbd_peer_device *);
+extern int drbd_send_discard_start(struct drbd_peer_device *);
+extern int drbd_send_zeroout_finish(struct drbd_peer_device *);
+extern int drbd_send_zeroout_ok(struct drbd_peer_device *);
+extern int drbd_send_zeroout_fail(struct drbd_peer_device *);
extern void drbd_gen_and_send_sync_uuid(struct drbd_peer_device *);
extern int drbd_send_sizes(struct drbd_peer_device *peer_device, int trigger_reply, enum dds_flags flags);
extern int drbd_send_state(struct drbd_peer_device *, union drbd_state);
@@ -1661,6 +1673,9 @@ extern void drbd_send_acks_wf(struct work_struct *ws);
extern bool drbd_rs_c_min_rate_throttle(struct drbd_device *device);
extern bool drbd_rs_should_slow_down(struct drbd_device *device, sector_t sector,
bool throttle_if_app_is_waiting);
+extern int zeroout_local_device(struct drbd_device *device, bool discard);
+extern void require_zeroout_local_device(struct drbd_device *device);
+extern void receive_zeroout_local_device(struct drbd_device *device);
extern int drbd_submit_peer_request(struct drbd_device *,
struct drbd_peer_request *, const unsigned,
const int);
diff --git a/drbd/drbd_main.c b/drbd/drbd_main.c
index 31bf43f..badc719 100644
--- a/drbd/drbd_main.c
+++ b/drbd/drbd_main.c
@@ -908,6 +908,62 @@ int drbd_send_uuids_skip_initial_sync(struct drbd_peer_device *peer_device)
return _drbd_send_uuids(peer_device, 8);
}

+
+/**
+ * drbd_send_zeroout_start() - Notify peer node to zeroout device
+ * @peer_device: DRBD peer device.
+ */
+int drbd_send_zeroout_start(struct drbd_peer_device *peer_device)
+{
+ return _drbd_send_uuids(peer_device, 16);
+}
+
+/**
+ * drbd_send_discard_start() - Notify peer node to discard device
+ * @peer_device: DRBD peer device.
+ */
+int drbd_send_discard_start(struct drbd_peer_device *peer_device)
+{
+ return _drbd_send_uuids(peer_device, 32);
+}
+
+/**
+ * drbd_send_zeroout_finish() - Notify both node finished zeroing out
+ * @peer_device: DRBD peer device.
+ */
+int drbd_send_zeroout_finish(struct drbd_peer_device *peer_device)
+{
+ return _drbd_send_uuids(peer_device, 64);
+}
+
+/**
+ * _drbd_send_zeroout_state() - Sends the drbd state to the peer
+ * @peer_device: DRBD peer device.
+ * @state: Device zero out status.
+ */
+static int _drbd_send_zeroout_state(struct drbd_peer_device *peer_device, unsigned int status)
+{
+ struct drbd_socket *sock;
+ struct p_state *p;
+
+ sock = &peer_device->connection->data;
+ p = drbd_prepare_command(peer_device, sock);
+ if (!p)
+ return -EIO;
+ p->state = cpu_to_be32(status);
+ return drbd_send_command(peer_device, sock, P_ZERO_OUT, sizeof(*p), NULL, 0);
+}
+
+int drbd_send_zeroout_ok(struct drbd_peer_device *peer_device)
+{
+ return _drbd_send_zeroout_state(peer_device, 0);
+}
+
+int drbd_send_zeroout_fail(struct drbd_peer_device *peer_device)
+{
+ return _drbd_send_zeroout_state(peer_device, 1);
+}
+
void drbd_print_uuids(struct drbd_device *device, const char *text)
{
if (get_ldev_if_state(device, D_NEGOTIATING)) {
@@ -3466,8 +3522,8 @@ void drbd_uuid_move_history(struct drbd_device *device) __must_hold(local)
{
int i;

- for (i = UI_HISTORY_START; i < UI_HISTORY_END; i++)
- device->ldev->md.uuid[i+1] = device->ldev->md.uuid[i];
+ for (i = UI_HISTORY_END; i > UI_HISTORY_START; i--)
+ device->ldev->md.uuid[i] = device->ldev->md.uuid[i-1];
}

void __drbd_uuid_set(struct drbd_device *device, int idx, u64 val) __must_hold(local)
diff --git a/drbd/drbd_nl.c b/drbd/drbd_nl.c
index bb7e1b0..dc2bfec 100644
--- a/drbd/drbd_nl.c
+++ b/drbd/drbd_nl.c
@@ -4015,8 +4015,11 @@ int drbd_adm_new_c_uuid(struct sk_buff *skb, struct genl_info *info)
{
struct drbd_config_context adm_ctx;
struct drbd_device *device;
+ struct drbd_peer_device *peer_device = NULL;
enum drbd_ret_code retcode;
int skip_initial_sync = 0;
+ int zeroout_devices = 0;
+ int discard_devices = 0;
int err;
struct new_c_uuid_parms args;

@@ -4045,12 +4048,28 @@ int drbd_adm_new_c_uuid(struct sk_buff *skb, struct genl_info *info)
goto out;
}

+ peer_device = first_peer_device(device);
+
/* this is "skip initial sync", assume to be clean */
if (device->state.conn == C_CONNECTED &&
- first_peer_device(device)->connection->agreed_pro_version >= 90 &&
+ peer_device->connection->agreed_pro_version >= 90 &&
device->ldev->md.uuid[UI_CURRENT] == UUID_JUST_CREATED && args.clear_bm) {
drbd_info(device, "Preparing to skip initial sync\n");
skip_initial_sync = 1;
+ /* this is "zero out/discard" devices to make it all zero.
+ * ignore "zero out" if both "clear_bm" and "zeroout_devices/discard_devices" set. */
+ } else if (device->state.conn == C_CONNECTED &&
+ peer_device->connection->agreed_features & FF_DISCARD &&
+ device->ldev->md.uuid[UI_CURRENT] == UUID_JUST_CREATED &&
+ args.zeroout_devices) {
+ drbd_info(device, "Preparing to zero out devices, will take a long time\n");
+ zeroout_devices = 1;
+ } else if (device->state.conn == C_CONNECTED &&
+ peer_device->connection->agreed_features & FF_DISCARD &&
+ device->ldev->md.uuid[UI_CURRENT] == UUID_JUST_CREATED &&
+ args.discard_devices) {
+ drbd_info(device, "Preparing to discard devices, will take a long time\n");
+ discard_devices = 1;
} else if (device->state.conn != C_STANDALONE) {
retcode = ERR_CONNECTED;
goto out_dec;
@@ -4059,7 +4078,7 @@ int drbd_adm_new_c_uuid(struct sk_buff *skb, struct genl_info *info)
drbd_uuid_set(device, UI_BITMAP, 0); /* Rotate UI_BITMAP to History 1, etc... */
drbd_uuid_new_current(device); /* New current, previous to UI_BITMAP */

- if (args.clear_bm) {
+ if (args.clear_bm || args.zeroout_devices || args.discard_devices) {
err = drbd_bitmap_io(device, &drbd_bmio_clear_n_write,
"clear_n_write from new_c_uuid", BM_LOCKED_MASK);
if (err) {
@@ -4067,7 +4086,7 @@ int drbd_adm_new_c_uuid(struct sk_buff *skb, struct genl_info *info)
retcode = ERR_IO_MD_DISK;
}
if (skip_initial_sync) {
- drbd_send_uuids_skip_initial_sync(first_peer_device(device));
+ drbd_send_uuids_skip_initial_sync(peer_device);
_drbd_uuid_set(device, UI_BITMAP, 0);
drbd_print_uuids(device, "cleared bitmap UUID");
spin_lock_irq(&device->resource->req_lock);
@@ -4075,6 +4094,22 @@ int drbd_adm_new_c_uuid(struct sk_buff *skb, struct genl_info *info)
CS_VERBOSE, NULL);
spin_unlock_irq(&device->resource->req_lock);
}
+ if (zeroout_devices || discard_devices) {
+ if (discard_devices) {
+ drbd_send_discard_start(peer_device);
+ set_bit(DISCARD_DISK, &device->flags);
+ } else {
+ drbd_send_zeroout_start(peer_device);
+ clear_bit(DISCARD_DISK, &device->flags);
+ }
+ _drbd_uuid_set(device, UI_BITMAP, 0);
+ drbd_print_uuids(device, "cleared bitmap UUID for zeroing device");
+
+ /* CLear bit flag of zero out */
+ clear_bit(ZERO_DONE, &device->flags);
+ clear_bit(ZERO_FAIL, &device->flags);
+ drbd_device_post_work(device, P_ZERO_START);
+ }
}

drbd_md_sync(device);
diff --git a/drbd/drbd_protocol.h b/drbd/drbd_protocol.h
index 405b181..d9db0ce 100644
--- a/drbd/drbd_protocol.h
+++ b/drbd/drbd_protocol.h
@@ -59,6 +59,7 @@ enum drbd_packet {
/* REQ_DISCARD. We used "discard" in different contexts before,
* which is why I chose TRIM here, to disambiguate. */
P_TRIM = 0x31,
+ P_ZERO_OUT = 0x32,

P_MAY_IGNORE = 0x100, /* Flag to test if (cmd > P_MAY_IGNORE) ... */
P_MAX_OPT_CMD = 0x101,
@@ -161,6 +162,7 @@ struct p_block_req {
*/

#define FF_TRIM 1
+#define FF_DISCARD 4

struct p_connection_features {
u32 protocol_min;
diff --git a/drbd/drbd_receiver.c b/drbd/drbd_receiver.c
index 06e5667..ec324ef 100644
--- a/drbd/drbd_receiver.c
+++ b/drbd/drbd_receiver.c
@@ -47,7 +47,7 @@
#include "drbd_vli.h"
#include <linux/scatterlist.h>

-#define PRO_FEATURES (FF_TRIM)
+#define PRO_FEATURES (FF_TRIM | FF_DISCARD)

struct flush_work {
struct drbd_work w;
@@ -4317,6 +4317,20 @@ static int receive_uuids(struct drbd_connection *connection, struct packet_info
peer_device->connection->agreed_pro_version >= 90 &&
device->ldev->md.uuid[UI_CURRENT] == UUID_JUST_CREATED &&
(p_uuid[UI_FLAGS] & 8);
+ int zeroout_devices =
+ device->state.conn == C_CONNECTED &&
+ peer_device->connection->agreed_pro_version >= 90 &&
+ device->ldev->md.uuid[UI_CURRENT] == UUID_JUST_CREATED &&
+ (p_uuid[UI_FLAGS] & 16);
+ int discard_devices =
+ device->state.conn == C_CONNECTED &&
+ peer_device->connection->agreed_pro_version >= 90 &&
+ device->ldev->md.uuid[UI_CURRENT] == UUID_JUST_CREATED &&
+ (p_uuid[UI_FLAGS] & 32);
+ int zeroout_finish =
+ device->state.conn == C_CONNECTED &&
+ peer_device->connection->agreed_pro_version >= 90 &&
+ (p_uuid[UI_FLAGS] & 64);
if (skip_initial_sync) {
drbd_info(device, "Accepted new current UUID, preparing to skip initial sync\n");
drbd_bitmap_io(device, &drbd_bmio_clear_n_write,
@@ -4324,11 +4338,42 @@ static int receive_uuids(struct drbd_connection *connection, struct packet_info
BM_LOCKED_TEST_ALLOWED);
_drbd_uuid_set(device, UI_CURRENT, p_uuid[UI_CURRENT]);
_drbd_uuid_set(device, UI_BITMAP, 0);
+ spin_lock_irq(&device->resource->req_lock);
_drbd_set_state(_NS2(device, disk, D_UP_TO_DATE, pdsk, D_UP_TO_DATE),
CS_VERBOSE, NULL);
+ spin_unlock_irq(&device->resource->req_lock);
drbd_md_sync(device);
updated_uuids = 1;
}
+
+ if (zeroout_devices || discard_devices) {
+ if (discard_devices) {
+ drbd_info(device, "Accepted to discard devices, will take a long time\n");
+ set_bit(DISCARD_DISK, &device->flags);
+ } else {
+ drbd_info(device, "Accepted to zeroout devices, will take a long time\n");
+ clear_bit(DISCARD_DISK, &device->flags);
+ }
+
+ drbd_bitmap_io(device, &drbd_bmio_clear_n_write,
+ "clear_n_write from receive_uuids",
+ BM_LOCKED_TEST_ALLOWED);
+ _drbd_uuid_set(device, UI_CURRENT, p_uuid[UI_CURRENT]);
+ _drbd_uuid_set(device, UI_BITMAP, 0);
+ drbd_print_uuids(device, "cleared bitmap UUID for zeroing device");
+
+ drbd_device_post_work(device, S_ZERO_START);
+ updated_uuids = 1;
+ }
+
+ if (zeroout_finish) {
+ drbd_info(device, "Both side finished zero out devices.\n");
+ spin_lock_irq(&device->resource->req_lock);
+ _drbd_set_state(_NS2(device, disk, D_UP_TO_DATE, pdsk, D_UP_TO_DATE),
+ CS_VERBOSE, NULL);
+ spin_unlock_irq(&device->resource->req_lock);
+ }
+
put_ldev(device);
} else if (device->state.disk < D_INCONSISTENT &&
device->state.role == R_PRIMARY) {
@@ -4383,6 +4428,41 @@ static union drbd_state convert_state(union drbd_state ps)
return ms;
}

+static int receive_zeroout_state(struct drbd_connection *connection, struct packet_info *pi)
+{
+ struct drbd_peer_device *peer_device;
+ struct drbd_device *device;
+ struct p_state *p = pi->data;
+ unsigned int isfail;
+
+ peer_device = conn_peer_device(connection, pi->vnr);
+ if (!peer_device)
+ return -EIO;
+ device = peer_device->device;
+
+ isfail = be32_to_cpu(p->state);
+
+ if (isfail) {
+ drbd_info(device, "Failed to zero out peer device\n");
+ set_bit(ZERO_FAIL, &device->flags);
+ } else {
+ drbd_info(device, "Finished zero out peer device\n");
+ if (test_and_clear_bit(ZERO_DONE, &device->flags)) {
+ drbd_info(device, "Both side finished zeroing.\n");
+ spin_lock_irq(&device->resource->req_lock);
+ _drbd_set_state(_NS2(device, disk, D_UP_TO_DATE,
+ pdsk, D_UP_TO_DATE), CS_VERBOSE, NULL);
+ spin_unlock_irq(&device->resource->req_lock);
+ drbd_send_zeroout_finish(peer_device);
+ } else {
+ /* waiting for local device finished */
+ set_bit(ZERO_DONE, &device->flags);
+ }
+ }
+
+ return 0;
+}
+
static int receive_req_state(struct drbd_connection *connection, struct packet_info *pi)
{
struct drbd_peer_device *peer_device;
@@ -5008,6 +5088,7 @@ static struct data_cmd drbd_cmd_handler[] = {
[P_CONN_ST_CHG_REQ] = { 0, sizeof(struct p_req_state), receive_req_conn_state },
[P_PROTOCOL_UPDATE] = { 1, sizeof(struct p_protocol), receive_protocol },
[P_TRIM] = { 0, sizeof(struct p_trim), receive_Data },
+ [P_ZERO_OUT] = { 0, sizeof(struct p_state), receive_zeroout_state },
};

static void drbdd(struct drbd_connection *connection)
@@ -5287,6 +5368,9 @@ static int drbd_do_features(struct drbd_connection *connection)
drbd_info(connection, "Agreed to%ssupport TRIM on protocol level\n",
connection->agreed_features & FF_TRIM ? " " : " not ");

+ drbd_info(connection, "Agreed to%ssupport DISCARD DEVICE on protocol level\n",
+ connection->agreed_features & FF_DISCARD ? " " : " not ");
+
return 1;

incompat:
diff --git a/drbd/drbd_worker.c b/drbd/drbd_worker.c
index 2a15aeb..a8e231f 100644
--- a/drbd/drbd_worker.c
+++ b/drbd/drbd_worker.c
@@ -1653,6 +1653,105 @@ void drbd_rs_controller_reset(struct drbd_device *device)
rcu_read_unlock();
}

+/**
+ * zeroout_local_device()
+ * @device: DRBD device.
+ * @discard: whether to discard the block range.
+ *
+ * Description:
+ * Zero out drbd backing device when creating new uuid.
+ *
+**/
+int zeroout_local_device(struct drbd_device *device, bool discard)
+{
+ struct block_device *bdev;
+
+ bdev = device->ldev->backing_bdev;
+ if (device->ldev->known_size != drbd_get_capacity(bdev))
+ device->ldev->known_size = drbd_get_capacity(bdev);
+
+ if (discard){
+ /* zero out the backing device by discarding blocks */
+ return blkdev_issue_zeroout(bdev, 0,
+ device->ldev->known_size, GFP_NOIO, true);
+ } else {
+ /* zero out the backing device with WRITE call*/
+ return blkdev_issue_zeroout(bdev, 0,
+ device->ldev->known_size, GFP_NOIO, false);
+ }
+}
+
+/**
+ * require_zeroout_local_device()
+ * @device: DRBD device.
+ *
+ * Description:
+ * Start to zero out local device. Update
+ * status if peer node (secondary) finished
+ * zeroing.
+ *
+**/
+void require_zeroout_local_device(struct drbd_device *device)
+{
+ int zeroout_err = 0;
+
+ if (test_and_clear_bit(DISCARD_DISK, &device->flags)) {
+ zeroout_err = zeroout_local_device(device, true);
+ } else {
+ zeroout_err = zeroout_local_device(device, false);
+ }
+
+ if (zeroout_err) {
+ drbd_err(device, "Failed to zero out local device\n");
+ set_bit(ZERO_FAIL, &device->flags);
+ drbd_chk_io_error(device, 1, DRBD_WRITE_ERROR);
+ } else {
+ drbd_info(device, "Finished zero out local device.\n");
+
+ if (test_and_clear_bit(ZERO_DONE, &device->flags)) {
+ spin_lock_irq(&device->resource->req_lock);
+ _drbd_set_state(_NS2(device, disk, D_UP_TO_DATE,
+ pdsk, D_UP_TO_DATE), CS_VERBOSE, NULL);
+ spin_unlock_irq(&device->resource->req_lock);
+ drbd_send_zeroout_finish(first_peer_device(device));
+ } else if (test_and_clear_bit(ZERO_FAIL, &device->flags)) {
+ drbd_info(device, "Peer device has already failed on zero out\n");
+ } else {
+ /* waiting for peer device finished */
+ set_bit(ZERO_DONE, &device->flags);
+ }
+ }
+}
+
+/**
+ * receive_zeroout_local_device()
+ * @device: DRBD device.
+ *
+ * Description:
+ * Start to zero out local device.
+ * Notify peer node the zeroing result.
+ *
+**/
+void receive_zeroout_local_device(struct drbd_device *device)
+{
+ int zeroout_err = 0;
+ struct drbd_peer_device *const peer_device = first_peer_device(device);
+
+ if (test_and_clear_bit(DISCARD_DISK, &device->flags)) {
+ zeroout_err = zeroout_local_device(device, true);
+ } else {
+ zeroout_err = zeroout_local_device(device, false);
+ }
+ if (zeroout_err) {
+ drbd_err(device, "Failed to zero out local device\n");
+ drbd_send_zeroout_fail(peer_device);
+ drbd_chk_io_error(device, 1, DRBD_WRITE_ERROR);
+ } else {
+ drbd_info(device, "Finished zero out local device.\n");
+ drbd_send_zeroout_ok(peer_device);
+ }
+}
+
void start_resync_timer_fn(unsigned long data)
{
struct drbd_device *device = (struct drbd_device *) data;
@@ -1986,6 +2085,10 @@ static void do_device_work(struct drbd_device *device, const unsigned long todo)
drbd_ldev_destroy(device);
if (test_bit(RS_START, &todo))
do_start_resync(device);
+ if (test_bit(P_ZERO_START, &todo))
+ require_zeroout_local_device(device);
+ if (test_bit(S_ZERO_START, &todo))
+ receive_zeroout_local_device(device);
}

#define DRBD_DEVICE_WORK_MASK \
@@ -1995,6 +2098,8 @@ static void do_device_work(struct drbd_device *device, const unsigned long todo)
|(1UL << RS_START) \
|(1UL << RS_PROGRESS) \
|(1UL << RS_DONE) \
+ |(1UL << P_ZERO_START) \
+ |(1UL << S_ZERO_START) \
)

static unsigned long get_work_bits(unsigned long *flags)
diff --git a/drbd/linux/drbd_genl.h b/drbd/linux/drbd_genl.h
index 5db53f5..5a31a5c 100644
--- a/drbd/linux/drbd_genl.h
+++ b/drbd/linux/drbd_genl.h
@@ -240,6 +240,8 @@ GENL_struct(DRBD_NLA_START_OV_PARMS, 9, start_ov_parms,

GENL_struct(DRBD_NLA_NEW_C_UUID_PARMS, 10, new_c_uuid_parms,
__flg_field(1, DRBD_GENLA_F_MANDATORY, clear_bm)
+ __flg_field(2, 0, zeroout_devices)
+ __flg_field(3, 0, discard_devices)
)

GENL_struct(DRBD_NLA_TIMEOUT_PARMS, 11, timeout_parms,
--
2.1.4

2015-08-18 15:10:34

by Lars Ellenberg

[permalink] [raw]
Subject: Re: [PATCH] drbd: Support zeroout device instead of initial full sync

On Thu, Aug 06, 2015 at 06:04:22PM +0800, Nick Wang wrote:
> Patch set for zeroing out device on both side
> instead of initial full sync. Useful for high
> latency network environment.
>
> Implement --zeroout-devices and --discard-devices
> for new-current-uuid

I still think this does not belong into the kernel at all.
I may not yet have properly explained why.

This is a lot of stuff to add to the DRBD module,
introducing write protocol incompatibility/protocol version bump.

For no good reason.

you want to create a new drbd,
and want to make sure the backing devices are discarded:

# blkdiscard /dev/backing
and then proceed with
# drbdadm create-md ...
as normal, and skip the initial sync as documented.

You want to grow an existing drbd,
you have to grow the backend first, anyways,
you can then (if necessary)
# blkdiscard --offset $o --length $l /dev/backing,
# drbdadm resize ... --assume-clean ...

No need to touch either the DRBD module, or the DRBD utils at all.
All there already.

Lars Ellenberg

2015-08-21 03:25:35

by Nick Wang

[permalink] [raw]
Subject: 答复: Re: [PATCH] drbd: Support zeroout device instead of initial full s ync

Hi Lars,

> I still think this does not belong into the kernel at all.
> I may not yet have properly explained why.
>
Thanks for your information.
I tried blkdiscard on my test machine, but unfortunately the
underlying block device i used doesn't support discard...
(/sys/block/<dev>/queue/discard_max_bytes == 0)
However, i think blkscard could work well for SSDs and
thinly-provisioned support storage as a workaround.

> This is a lot of stuff to add to the DRBD module,
> introducing write protocol incompatibility/protocol version bump.
>
> For no good reason.
>
Yes, it is not a mandatory feature, more like a enhancement
"convenience patch" for some edge cases. User can choose
zeroout/discard devices via drbd tools, in case the storage
does not support discard, it will using zeroout instead of only
throw an "not support" error to output.

Implement this function to drbd also help to sync the status
between nodes, convenience for auto deploy or monitor by admin.
Also have the possibility to implement a resume broken
zeroout/discard in future.

And i can understand your concern and any decision, many thanks
for your and Philipp's help and review:)

Best regards,
Nick