2015-11-25 11:09:15

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 00/38] DRBD update

Hi Jens,

please pull these patches into your for-4.5/drivers branch.

This huge patch set updates the in-tree DRBD to what we have out of tree.
All of this has been extensively tested and in production use by LINBIT's
customers.

Andreas' patches backport some DRBD-9 interface functionality, easing
smooth migration of the user base to DRBD-9 later on. These patches
add contains touch the most lines in the series.

Lars and others did the maintenance and bug-fixing work.

PS1: I have sent all of this to LKML for review in July and August,
sorry for the late pull request.

PS2: Optionally you can pull it here:
The following changes since commit 1ec218373b8ebda821aec00bb156a9c94fad9cd4:

Linux 4.4-rc2 (2015-11-22 16:45:59 -0800)

are available in the git repository at:

http://git.drbd.org/linux-drbd.git/ for-4.5

for you to fetch changes up to a97f4c8180c7ddb09eeadf1994aa12f14db52fa5:

drbd: fix error path during resize (2015-11-25 10:45:04 +0100)

Best,
Phil

Andreas Gruenbacher (7):
drbd: De-inline drbd_should_do_remote() and
drbd_should_send_out_of_sync()
drbd: Get rid of some first_peer_device() calls
drbd: Move enum write_ordering_e to drbd.h
drbd: drbd_adm_attach(): Add missing drbd_resync_after_changed()
drbd: Fix locking across all resources
drbd: Backport the "events2" command
drbd: Backport the "status" command

Lars Ellenberg (22):
drbd: Fix spurious disk-timeout
drbd: drop remnants of connector -- we don't use it anymore in drbd
8.4
drbd: drbdsetup detach of an unresponsive local disk should not block
IO "forever"
drbd: also bump UUIDs if a diskless primary connects
drbd: add comment why we want to first call local-io-error, then send
state
drbd: drbd_panic_after_delayed_completion_of_aborted_request()
drbd: improve network timeout detection
drbd: fix NULL deref in remember_new_state
drbd: fix refcount error during detach of an already failed disk
drbd: prevent NULL pointer deref when resuming diskless primary
drbd: debugfs: expose ed_data_gen_id
drbd: use resource name in workqueue
drbd: avoid redefinition of BITS_PER_PAGE
drbd: use bitmap_weight() helper, don't open code
drbd: fix spurious alert level printk
drbd: fix queue limit setup for discard
drbd: make drbd known to lsblk: use bd_link_disk_holder
drbd: don't block forever in disconnect during resync if
fencing=r-a-stonith
drbd: fix "endless" transfer log walk in protocol A
drbd: separate out __al_write_transaction helper function
drbd: avoid potential deadlock during handshake
drbd: fix error path during resize

Markus Elfring (1):
drbd: Deletion of an unnecessary check before the function call
"lc_destroy"

Oleg Drokin (1):
drbd: fix memory leak in drbd_adm_resize

Philipp Reisner (5):
drbd: Remove pointless check
drbd: Replace 0 with the more meaningful GFP_NOWAIT
drbd: Rename asender to ack_receiver
drbd: Create a dedicated workqueue for sending acks on the control
connection
drbd: make suspend_io() / resume_io() must be thread and recursion
safe

Roland Kammerer (2):
MAINTAINERS: Updated information for DRBD DRIVER
lru_cache: Converted lc_seq_printf_status to return void

MAINTAINERS | 11 +-
drivers/block/drbd/drbd_actlog.c | 323 ++++----
drivers/block/drbd/drbd_bitmap.c | 22 +-
drivers/block/drbd/drbd_debugfs.c | 10 +
drivers/block/drbd/drbd_int.h | 111 ++-
drivers/block/drbd/drbd_main.c | 74 +-
drivers/block/drbd/drbd_nl.c | 1361 +++++++++++++++++++++++++++++---
drivers/block/drbd/drbd_proc.c | 6 +-
drivers/block/drbd/drbd_protocol.h | 2 +-
drivers/block/drbd/drbd_receiver.c | 254 +++---
drivers/block/drbd/drbd_req.c | 147 +++-
drivers/block/drbd/drbd_req.h | 17 +-
drivers/block/drbd/drbd_state.c | 428 +++++++++-
drivers/block/drbd/drbd_state.h | 6 +-
drivers/block/drbd/drbd_state_change.h | 63 ++
drivers/block/drbd/drbd_worker.c | 105 +--
include/linux/drbd.h | 26 +-
include/linux/drbd_genl.h | 149 ++++
include/linux/idr.h | 14 +
include/linux/lru_cache.h | 2 +-
lib/lru_cache.c | 4 +-
21 files changed, 2543 insertions(+), 592 deletions(-)
create mode 100644 drivers/block/drbd/drbd_state_change.h

--
1.9.1


2015-11-25 11:06:55

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 01/38] MAINTAINERS: Updated information for DRBD DRIVER

From: Roland Kammerer <[email protected]>

- Changed obsoleted 'P' to 'M' entries.
- Removed the user related mailing list.
- Changed git repos to current versions

Signed-off-by: Roland Kammerer <[email protected]>
---
MAINTAINERS | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 050d0e7..2ea2604 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3592,13 +3592,12 @@ F: drivers/scsi/dpt*
F: drivers/scsi/dpt/

DRBD DRIVER
-P: Philipp Reisner
-P: Lars Ellenberg
-M: [email protected]
-L: [email protected]
+M: Philipp Reisner <[email protected]>
+M: Lars Ellenberg <[email protected]>
+L: [email protected]
W: http://www.drbd.org
-T: git git://git.drbd.org/linux-2.6-drbd.git drbd
-T: git git://git.drbd.org/drbd-8.3.git
+T: git git://git.linbit.com/linux-drbd.git
+T: git git://git.linbit.com/drbd-8.4.git
S: Supported
F: drivers/block/drbd/
F: lib/lru_cache.c
--
1.9.1

2015-11-25 11:01:41

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 02/38] drbd: Remove pointless check

In drbd-8.4 there is always a single connection per resource,
and there is always exactly one peer_device for a device.
peer_device can not be NULL here.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index e80cbef..a1a01cc 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1478,7 +1478,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
device = adm_ctx.device;
mutex_lock(&adm_ctx.resource->adm_mutex);
peer_device = first_peer_device(device);
- connection = peer_device ? peer_device->connection : NULL;
+ connection = peer_device->connection;
conn_reconfig_start(connection);

/* if you want to reconfigure, please tear down first */
--
1.9.1

2015-11-25 11:12:19

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 03/38] drbd: De-inline drbd_should_do_remote() and drbd_should_send_out_of_sync()

From: Andreas Gruenbacher <[email protected]>

There is no need to have these two as inline functions. In addition,
drbd_should_send_out_of_sync() is only used in a single place, anyway.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_req.c | 18 ++++++++++++++++++
drivers/block/drbd/drbd_req.h | 17 +----------------
2 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 3ae2c00..55fca68 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1095,6 +1095,24 @@ static bool do_remote_read(struct drbd_request *req)
return false;
}

+bool drbd_should_do_remote(union drbd_dev_state s)
+{
+ return s.pdsk == D_UP_TO_DATE ||
+ (s.pdsk >= D_INCONSISTENT &&
+ s.conn >= C_WF_BITMAP_T &&
+ s.conn < C_AHEAD);
+ /* Before proto 96 that was >= CONNECTED instead of >= C_WF_BITMAP_T.
+ That is equivalent since before 96 IO was frozen in the C_WF_BITMAP*
+ states. */
+}
+
+static bool drbd_should_send_out_of_sync(union drbd_dev_state s)
+{
+ return s.conn == C_AHEAD || s.conn == C_WF_BITMAP_S;
+ /* pdsk = D_INCONSISTENT as a consequence. Protocol 96 check not necessary
+ since we enter state C_AHEAD only if proto >= 96 */
+}
+
/* returns number of connections (== 1, for drbd 8.4)
* expected to actually write this data,
* which does NOT include those that we are L_AHEAD for. */
diff --git a/drivers/block/drbd/drbd_req.h b/drivers/block/drbd/drbd_req.h
index 9f6a040..bb2ef78 100644
--- a/drivers/block/drbd/drbd_req.h
+++ b/drivers/block/drbd/drbd_req.h
@@ -331,21 +331,6 @@ static inline int req_mod(struct drbd_request *req,
return rv;
}

-static inline bool drbd_should_do_remote(union drbd_dev_state s)
-{
- return s.pdsk == D_UP_TO_DATE ||
- (s.pdsk >= D_INCONSISTENT &&
- s.conn >= C_WF_BITMAP_T &&
- s.conn < C_AHEAD);
- /* Before proto 96 that was >= CONNECTED instead of >= C_WF_BITMAP_T.
- That is equivalent since before 96 IO was frozen in the C_WF_BITMAP*
- states. */
-}
-static inline bool drbd_should_send_out_of_sync(union drbd_dev_state s)
-{
- return s.conn == C_AHEAD || s.conn == C_WF_BITMAP_S;
- /* pdsk = D_INCONSISTENT as a consequence. Protocol 96 check not necessary
- since we enter state C_AHEAD only if proto >= 96 */
-}
+extern bool drbd_should_do_remote(union drbd_dev_state);

#endif
--
1.9.1

2015-11-25 11:02:32

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 04/38] drbd: Get rid of some first_peer_device() calls

From: Andreas Gruenbacher <[email protected]>

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_receiver.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index b4b5680..5bb71e5 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1380,7 +1380,7 @@ int drbd_submit_peer_request(struct drbd_device *device,
if (peer_req->flags & EE_IS_TRIM_USE_ZEROOUT) {
/* wait for all pending IO completions, before we start
* zeroing things out. */
- conn_wait_active_ee_empty(first_peer_device(device)->connection);
+ conn_wait_active_ee_empty(peer_req->peer_device->connection);
/* add it to the active list now,
* so we can find it to present it in debugfs */
peer_req->submit_jif = jiffies;
@@ -1966,7 +1966,7 @@ static int e_end_block(struct drbd_work *w, int cancel)
} else
D_ASSERT(device, drbd_interval_empty(&peer_req->i));

- drbd_may_finish_epoch(first_peer_device(device)->connection, peer_req->epoch, EV_PUT + (cancel ? EV_CLEANUP : 0));
+ drbd_may_finish_epoch(peer_device->connection, peer_req->epoch, EV_PUT + (cancel ? EV_CLEANUP : 0));

return err;
}
@@ -2098,7 +2098,7 @@ static int wait_for_and_update_peer_seq(struct drbd_peer_device *peer_device, co
}

rcu_read_lock();
- tp = rcu_dereference(first_peer_device(device)->connection->net_conf)->two_primaries;
+ tp = rcu_dereference(peer_device->connection->net_conf)->two_primaries;
rcu_read_unlock();

if (!tp)
@@ -2364,7 +2364,7 @@ static int receive_Data(struct drbd_connection *connection, struct packet_info *
if (dp_flags & DP_SEND_RECEIVE_ACK) {
/* I really don't like it that the receiver thread
* sends on the msock, but anyways */
- drbd_send_ack(first_peer_device(device), P_RECV_ACK, peer_req);
+ drbd_send_ack(peer_device, P_RECV_ACK, peer_req);
}

if (tp) {
--
1.9.1

2015-11-25 11:13:43

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 05/38] drbd: Move enum write_ordering_e to drbd.h

From: Andreas Gruenbacher <[email protected]>

Also change the enum values to all-capital letters.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_int.h | 6 ------
drivers/block/drbd/drbd_main.c | 2 +-
drivers/block/drbd/drbd_nl.c | 4 ++--
drivers/block/drbd/drbd_proc.c | 6 +++---
drivers/block/drbd/drbd_receiver.c | 28 ++++++++++++++--------------
include/linux/drbd.h | 7 +++++++
6 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index e66d453..47d4b02 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -632,12 +632,6 @@ struct bm_io_work {
void (*done)(struct drbd_device *device, int rv);
};

-enum write_ordering_e {
- WO_none,
- WO_drain_io,
- WO_bdev_flush,
-};
-
struct fifo_buffer {
unsigned int head_index;
unsigned int size;
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 74d97f4..3ee4a44 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2590,7 +2590,7 @@ struct drbd_resource *drbd_create_resource(const char *name)
kref_init(&resource->kref);
idr_init(&resource->devices);
INIT_LIST_HEAD(&resource->connections);
- resource->write_ordering = WO_bdev_flush;
+ resource->write_ordering = WO_BDEV_FLUSH;
list_add_tail_rcu(&resource->resources, &drbd_resources);
mutex_init(&resource->conf_update);
mutex_init(&resource->adm_mutex);
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index a1a01cc..dfc1799 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1418,7 +1418,7 @@ int drbd_adm_disk_opts(struct sk_buff *skb, struct genl_info *info)
set_bit(MD_NO_FUA, &device->flags);

if (write_ordering_changed(old_disk_conf, new_disk_conf))
- drbd_bump_write_ordering(device->resource, NULL, WO_bdev_flush);
+ drbd_bump_write_ordering(device->resource, NULL, WO_BDEV_FLUSH);

drbd_md_sync(device);

@@ -1727,7 +1727,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
new_disk_conf = NULL;
new_plan = NULL;

- drbd_bump_write_ordering(device->resource, device->ldev, WO_bdev_flush);
+ drbd_bump_write_ordering(device->resource, device->ldev, WO_BDEV_FLUSH);

if (drbd_md_test_flag(device->ldev, MDF_CRASHED_PRIMARY))
set_bit(CRASHED_PRIMARY, &device->flags);
diff --git a/drivers/block/drbd/drbd_proc.c b/drivers/block/drbd/drbd_proc.c
index 3b10fa6..6537b25 100644
--- a/drivers/block/drbd/drbd_proc.c
+++ b/drivers/block/drbd/drbd_proc.c
@@ -245,9 +245,9 @@ static int drbd_seq_show(struct seq_file *seq, void *v)
char wp;

static char write_ordering_chars[] = {
- [WO_none] = 'n',
- [WO_drain_io] = 'd',
- [WO_bdev_flush] = 'f',
+ [WO_NONE] = 'n',
+ [WO_DRAIN_IO] = 'd',
+ [WO_BDEV_FLUSH] = 'f',
};

seq_printf(seq, "version: " REL_VERSION " (api:%d/proto:%d-%d)\n%s\n",
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 5bb71e5..bf38b95 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1178,7 +1178,7 @@ static void drbd_flush(struct drbd_connection *connection)
struct drbd_peer_device *peer_device;
int vnr;

- if (connection->resource->write_ordering >= WO_bdev_flush) {
+ if (connection->resource->write_ordering >= WO_BDEV_FLUSH) {
rcu_read_lock();
idr_for_each_entry(&connection->peer_devices, peer_device, vnr) {
struct drbd_device *device = peer_device->device;
@@ -1203,7 +1203,7 @@ static void drbd_flush(struct drbd_connection *connection)
/* would rather check on EOPNOTSUPP, but that is not reliable.
* don't try again for ANY return value != 0
* if (rv == -EOPNOTSUPP) */
- drbd_bump_write_ordering(connection->resource, NULL, WO_drain_io);
+ drbd_bump_write_ordering(connection->resource, NULL, WO_DRAIN_IO);
}
put_ldev(device);
kref_put(&device->kref, drbd_destroy_device);
@@ -1299,10 +1299,10 @@ max_allowed_wo(struct drbd_backing_dev *bdev, enum write_ordering_e wo)

dc = rcu_dereference(bdev->disk_conf);

- if (wo == WO_bdev_flush && !dc->disk_flushes)
- wo = WO_drain_io;
- if (wo == WO_drain_io && !dc->disk_drain)
- wo = WO_none;
+ if (wo == WO_BDEV_FLUSH && !dc->disk_flushes)
+ wo = WO_DRAIN_IO;
+ if (wo == WO_DRAIN_IO && !dc->disk_drain)
+ wo = WO_NONE;

return wo;
}
@@ -1319,13 +1319,13 @@ void drbd_bump_write_ordering(struct drbd_resource *resource, struct drbd_backin
enum write_ordering_e pwo;
int vnr;
static char *write_ordering_str[] = {
- [WO_none] = "none",
- [WO_drain_io] = "drain",
- [WO_bdev_flush] = "flush",
+ [WO_NONE] = "none",
+ [WO_DRAIN_IO] = "drain",
+ [WO_BDEV_FLUSH] = "flush",
};

pwo = resource->write_ordering;
- if (wo != WO_bdev_flush)
+ if (wo != WO_BDEV_FLUSH)
wo = min(pwo, wo);
rcu_read_lock();
idr_for_each_entry(&resource->devices, device, vnr) {
@@ -1343,7 +1343,7 @@ void drbd_bump_write_ordering(struct drbd_resource *resource, struct drbd_backin
rcu_read_unlock();

resource->write_ordering = wo;
- if (pwo != resource->write_ordering || wo == WO_bdev_flush)
+ if (pwo != resource->write_ordering || wo == WO_BDEV_FLUSH)
drbd_info(resource, "Method to ensure write ordering: %s\n", write_ordering_str[resource->write_ordering]);
}

@@ -1533,7 +1533,7 @@ static int receive_Barrier(struct drbd_connection *connection, struct packet_inf
* Therefore we must send the barrier_ack after the barrier request was
* completed. */
switch (connection->resource->write_ordering) {
- case WO_none:
+ case WO_NONE:
if (rv == FE_RECYCLED)
return 0;

@@ -1546,8 +1546,8 @@ static int receive_Barrier(struct drbd_connection *connection, struct packet_inf
drbd_warn(connection, "Allocation of an epoch failed, slowing down\n");
/* Fall through */

- case WO_bdev_flush:
- case WO_drain_io:
+ case WO_BDEV_FLUSH:
+ case WO_DRAIN_IO:
conn_wait_active_ee_empty(connection);
drbd_flush(connection);

diff --git a/include/linux/drbd.h b/include/linux/drbd.h
index 8723f2a..15a1472 100644
--- a/include/linux/drbd.h
+++ b/include/linux/drbd.h
@@ -357,6 +357,13 @@ enum drbd_timeout_flag {

#define UUID_JUST_CREATED ((__u64)4)

+enum write_ordering_e {
+ WO_NONE,
+ WO_DRAIN_IO,
+ WO_BDEV_FLUSH,
+ WO_BIO_BARRIER
+};
+
/* magic numbers used in meta data and network packets */
#define DRBD_MAGIC 0x83740267
#define DRBD_MAGIC_BIG 0x835a
--
1.9.1

2015-11-25 11:02:15

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 06/38] drbd: drbd_adm_attach(): Add missing drbd_resync_after_changed()

From: Andreas Gruenbacher <[email protected]>

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index dfc1799..94e380f 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1541,9 +1541,8 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)

write_lock_irq(&global_state_lock);
retcode = drbd_resync_after_valid(device, new_disk_conf->resync_after);
- write_unlock_irq(&global_state_lock);
if (retcode != NO_ERROR)
- goto fail;
+ goto fail_unlock;

rcu_read_lock();
nc = rcu_dereference(connection->net_conf);
@@ -1551,7 +1550,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
if (new_disk_conf->fencing == FP_STONITH && nc->wire_protocol == DRBD_PROT_A) {
rcu_read_unlock();
retcode = ERR_STONITH_AND_PROT_A;
- goto fail;
+ goto fail_unlock;
}
}
rcu_read_unlock();
@@ -1562,7 +1561,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
drbd_err(device, "open(\"%s\") failed with %ld\n", new_disk_conf->backing_dev,
PTR_ERR(bdev));
retcode = ERR_OPEN_DISK;
- goto fail;
+ goto fail_unlock;
}
nbc->backing_bdev = bdev;

@@ -1582,7 +1581,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
drbd_err(device, "open(\"%s\") failed with %ld\n", new_disk_conf->meta_dev,
PTR_ERR(bdev));
retcode = ERR_OPEN_MD_DISK;
- goto fail;
+ goto fail_unlock;
}
nbc->md_bdev = bdev;

@@ -1590,7 +1589,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
(new_disk_conf->meta_dev_idx == DRBD_MD_INDEX_INTERNAL ||
new_disk_conf->meta_dev_idx == DRBD_MD_INDEX_FLEX_INT)) {
retcode = ERR_MD_IDX_INVALID;
- goto fail;
+ goto fail_unlock;
}

resync_lru = lc_create("resync", drbd_bm_ext_cache,
@@ -1598,14 +1597,14 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
offsetof(struct bm_extent, lce));
if (!resync_lru) {
retcode = ERR_NOMEM;
- goto fail;
+ goto fail_unlock;
}

/* Read our meta data super block early.
* This also sets other on-disk offsets. */
retcode = drbd_md_read(device, nbc);
if (retcode != NO_ERROR)
- goto fail;
+ goto fail_unlock;

if (new_disk_conf->al_extents < DRBD_AL_EXTENTS_MIN)
new_disk_conf->al_extents = DRBD_AL_EXTENTS_MIN;
@@ -1617,7 +1616,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
(unsigned long long) drbd_get_max_capacity(nbc),
(unsigned long long) new_disk_conf->disk_size);
retcode = ERR_DISK_TOO_SMALL;
- goto fail;
+ goto fail_unlock;
}

if (new_disk_conf->meta_dev_idx < 0) {
@@ -1634,7 +1633,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
drbd_warn(device, "refusing attach: md-device too small, "
"at least %llu sectors needed for this meta-disk type\n",
(unsigned long long) min_md_device_sectors);
- goto fail;
+ goto fail_unlock;
}

/* Make sure the new disk is big enough
@@ -1642,7 +1641,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
if (drbd_get_max_capacity(nbc) <
drbd_get_capacity(device->this_bdev)) {
retcode = ERR_DISK_TOO_SMALL;
- goto fail;
+ goto fail_unlock;
}

nbc->known_size = drbd_get_capacity(nbc->backing_bdev);
@@ -1672,7 +1671,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
retcode = rv; /* FIXME: Type mismatch. */
drbd_resume_io(device);
if (rv < SS_SUCCESS)
- goto fail;
+ goto fail_unlock;

if (!get_ldev_if_state(device, D_ATTACHING))
goto force_diskless;
@@ -1727,6 +1726,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
new_disk_conf = NULL;
new_plan = NULL;

+ drbd_resync_after_changed(device);
drbd_bump_write_ordering(device->resource, device->ldev, WO_BDEV_FLUSH);

if (drbd_md_test_flag(device->ldev, MDF_CRASHED_PRIMARY))
@@ -1850,6 +1850,8 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
if (rv < SS_SUCCESS)
goto force_diskless_dec;

+ write_unlock(&global_state_lock);
+
mod_timer(&device->request_timer, jiffies + HZ);

if (device->state.role == R_PRIMARY)
@@ -1872,6 +1874,8 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
force_diskless:
drbd_force_state(device, NS(disk, D_DISKLESS));
drbd_md_sync(device);
+ fail_unlock:
+ write_unlock_irq(&global_state_lock);
fail:
conn_reconfig_done(connection);
if (nbc) {
--
1.9.1

2015-11-25 11:09:40

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 07/38] drbd: Fix locking across all resources

From: Andreas Gruenbacher <[email protected]>

Instead of using a rwlock for synchronizing state changes across
resources, take the request locks of all resources for global state
changes. Use resources_mutex to serialize global state changes.

This means that taking the request lock of a resource is now enough to
prevent changes of that resource. (Previously, a read lock on the
global state lock was needed as well.)

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_int.h | 18 ++-------
drivers/block/drbd/drbd_main.c | 24 +++++++++++-
drivers/block/drbd/drbd_nl.c | 45 +++++++++++----------
drivers/block/drbd/drbd_state.c | 14 +++----
drivers/block/drbd/drbd_state.h | 6 +--
drivers/block/drbd/drbd_worker.c | 85 ++++++++++++++++++----------------------
6 files changed, 99 insertions(+), 93 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 47d4b02..2c9ee22 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -292,6 +292,9 @@ struct drbd_device_work {

extern int drbd_wait_misc(struct drbd_device *, struct drbd_interval *);

+extern void lock_all_resources(void);
+extern void unlock_all_resources(void);
+
struct drbd_request {
struct drbd_work w;
struct drbd_device *device;
@@ -1418,7 +1421,7 @@ extern struct bio_set *drbd_md_io_bio_set;
/* to allocate from that set */
extern struct bio *bio_alloc_drbd(gfp_t gfp_mask);

-extern rwlock_t global_state_lock;
+extern struct mutex resources_mutex;

extern int conn_lowest_minor(struct drbd_connection *connection);
extern enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsigned int minor);
@@ -1688,19 +1691,6 @@ static inline int drbd_peer_req_has_active_page(struct drbd_peer_request *peer_r
return 0;
}

-static inline enum drbd_state_rv
-_drbd_set_state(struct drbd_device *device, union drbd_state ns,
- enum chg_state_flags flags, struct completion *done)
-{
- enum drbd_state_rv rv;
-
- read_lock(&global_state_lock);
- rv = __drbd_set_state(device, ns, flags, done);
- read_unlock(&global_state_lock);
-
- return rv;
-}
-
static inline union drbd_state drbd_read_state(struct drbd_device *device)
{
struct drbd_resource *resource = device->resource;
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 3ee4a44..f66294d 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -117,6 +117,7 @@ module_param_string(usermode_helper, usermode_helper, sizeof(usermode_helper), 0
*/
struct idr drbd_devices;
struct list_head drbd_resources;
+struct mutex resources_mutex;

struct kmem_cache *drbd_request_cache;
struct kmem_cache *drbd_ee_cache; /* peer requests */
@@ -2923,7 +2924,7 @@ static int __init drbd_init(void)
drbd_proc = NULL; /* play safe for drbd_cleanup */
idr_init(&drbd_devices);

- rwlock_init(&global_state_lock);
+ mutex_init(&resources_mutex);
INIT_LIST_HEAD(&drbd_resources);

err = drbd_genl_register();
@@ -3746,6 +3747,27 @@ int drbd_wait_misc(struct drbd_device *device, struct drbd_interval *i)
return 0;
}

+void lock_all_resources(void)
+{
+ struct drbd_resource *resource;
+ int __maybe_unused i = 0;
+
+ mutex_lock(&resources_mutex);
+ local_irq_disable();
+ for_each_resource(resource, &drbd_resources)
+ spin_lock_nested(&resource->req_lock, i++);
+}
+
+void unlock_all_resources(void)
+{
+ struct drbd_resource *resource;
+
+ for_each_resource(resource, &drbd_resources)
+ spin_unlock(&resource->req_lock);
+ local_irq_enable();
+ mutex_unlock(&resources_mutex);
+}
+
#ifdef CONFIG_DRBD_FAULT_INJECTION
/* Fault insertion support including random number generator shamelessly
* stolen from kernel/rcutorture.c */
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 94e380f..d37c509 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1389,13 +1389,13 @@ int drbd_adm_disk_opts(struct sk_buff *skb, struct genl_info *info)
goto fail_unlock;
}

- write_lock_irq(&global_state_lock);
+ lock_all_resources();
retcode = drbd_resync_after_valid(device, new_disk_conf->resync_after);
if (retcode == NO_ERROR) {
rcu_assign_pointer(device->ldev->disk_conf, new_disk_conf);
drbd_resync_after_changed(device);
}
- write_unlock_irq(&global_state_lock);
+ unlock_all_resources();

if (retcode != NO_ERROR)
goto fail_unlock;
@@ -1539,18 +1539,13 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
goto fail;
}

- write_lock_irq(&global_state_lock);
- retcode = drbd_resync_after_valid(device, new_disk_conf->resync_after);
- if (retcode != NO_ERROR)
- goto fail_unlock;
-
rcu_read_lock();
nc = rcu_dereference(connection->net_conf);
if (nc) {
if (new_disk_conf->fencing == FP_STONITH && nc->wire_protocol == DRBD_PROT_A) {
rcu_read_unlock();
retcode = ERR_STONITH_AND_PROT_A;
- goto fail_unlock;
+ goto fail;
}
}
rcu_read_unlock();
@@ -1561,7 +1556,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
drbd_err(device, "open(\"%s\") failed with %ld\n", new_disk_conf->backing_dev,
PTR_ERR(bdev));
retcode = ERR_OPEN_DISK;
- goto fail_unlock;
+ goto fail;
}
nbc->backing_bdev = bdev;

@@ -1581,7 +1576,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
drbd_err(device, "open(\"%s\") failed with %ld\n", new_disk_conf->meta_dev,
PTR_ERR(bdev));
retcode = ERR_OPEN_MD_DISK;
- goto fail_unlock;
+ goto fail;
}
nbc->md_bdev = bdev;

@@ -1589,7 +1584,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
(new_disk_conf->meta_dev_idx == DRBD_MD_INDEX_INTERNAL ||
new_disk_conf->meta_dev_idx == DRBD_MD_INDEX_FLEX_INT)) {
retcode = ERR_MD_IDX_INVALID;
- goto fail_unlock;
+ goto fail;
}

resync_lru = lc_create("resync", drbd_bm_ext_cache,
@@ -1597,14 +1592,14 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
offsetof(struct bm_extent, lce));
if (!resync_lru) {
retcode = ERR_NOMEM;
- goto fail_unlock;
+ goto fail;
}

/* Read our meta data super block early.
* This also sets other on-disk offsets. */
retcode = drbd_md_read(device, nbc);
if (retcode != NO_ERROR)
- goto fail_unlock;
+ goto fail;

if (new_disk_conf->al_extents < DRBD_AL_EXTENTS_MIN)
new_disk_conf->al_extents = DRBD_AL_EXTENTS_MIN;
@@ -1616,7 +1611,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
(unsigned long long) drbd_get_max_capacity(nbc),
(unsigned long long) new_disk_conf->disk_size);
retcode = ERR_DISK_TOO_SMALL;
- goto fail_unlock;
+ goto fail;
}

if (new_disk_conf->meta_dev_idx < 0) {
@@ -1633,7 +1628,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
drbd_warn(device, "refusing attach: md-device too small, "
"at least %llu sectors needed for this meta-disk type\n",
(unsigned long long) min_md_device_sectors);
- goto fail_unlock;
+ goto fail;
}

/* Make sure the new disk is big enough
@@ -1641,7 +1636,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
if (drbd_get_max_capacity(nbc) <
drbd_get_capacity(device->this_bdev)) {
retcode = ERR_DISK_TOO_SMALL;
- goto fail_unlock;
+ goto fail;
}

nbc->known_size = drbd_get_capacity(nbc->backing_bdev);
@@ -1671,7 +1666,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
retcode = rv; /* FIXME: Type mismatch. */
drbd_resume_io(device);
if (rv < SS_SUCCESS)
- goto fail_unlock;
+ goto fail;

if (!get_ldev_if_state(device, D_ATTACHING))
goto force_diskless;
@@ -1706,6 +1701,13 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
goto force_diskless_dec;
}

+ lock_all_resources();
+ retcode = drbd_resync_after_valid(device, new_disk_conf->resync_after);
+ if (retcode != NO_ERROR) {
+ unlock_all_resources();
+ goto force_diskless_dec;
+ }
+
/* Reset the "barriers don't work" bits here, then force meta data to
* be written, to ensure we determine if barriers are supported. */
if (new_disk_conf->md_flushes)
@@ -1728,6 +1730,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)

drbd_resync_after_changed(device);
drbd_bump_write_ordering(device->resource, device->ldev, WO_BDEV_FLUSH);
+ unlock_all_resources();

if (drbd_md_test_flag(device->ldev, MDF_CRASHED_PRIMARY))
set_bit(CRASHED_PRIMARY, &device->flags);
@@ -1850,8 +1853,6 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
if (rv < SS_SUCCESS)
goto force_diskless_dec;

- write_unlock(&global_state_lock);
-
mod_timer(&device->request_timer, jiffies + HZ);

if (device->state.role == R_PRIMARY)
@@ -1874,8 +1875,6 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
force_diskless:
drbd_force_state(device, NS(disk, D_DISKLESS));
drbd_md_sync(device);
- fail_unlock:
- write_unlock_irq(&global_state_lock);
fail:
conn_reconfig_done(connection);
if (nbc) {
@@ -3453,8 +3452,10 @@ int drbd_adm_new_resource(struct sk_buff *skb, struct genl_info *info)
}

/* not yet safe for genl_family.parallel_ops */
+ mutex_lock(&resources_mutex);
if (!conn_create(adm_ctx.resource_name, &res_opts))
retcode = ERR_NOMEM;
+ mutex_unlock(&resources_mutex);
out:
drbd_adm_finish(&adm_ctx, info, retcode);
return 0;
@@ -3545,7 +3546,9 @@ static int adm_del_resource(struct drbd_resource *resource)
if (!idr_is_empty(&resource->devices))
return ERR_RES_IN_USE;

+ mutex_lock(&resources_mutex);
list_del_rcu(&resource->resources);
+ mutex_unlock(&resources_mutex);
/* Make sure all threads have actually stopped: state handling only
* does drbd_thread_stop_nowait(). */
list_for_each_entry(connection, &resource->connections, connections)
diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 2d7dd26..535ae47 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -937,7 +937,7 @@ void drbd_resume_al(struct drbd_device *device)
drbd_info(device, "Resumed AL updates\n");
}

-/* helper for __drbd_set_state */
+/* helper for _drbd_set_state */
static void set_ov_position(struct drbd_device *device, enum drbd_conns cs)
{
if (first_peer_device(device)->connection->agreed_pro_version < 90)
@@ -965,17 +965,17 @@ static void set_ov_position(struct drbd_device *device, enum drbd_conns cs)
}

/**
- * __drbd_set_state() - Set a new DRBD state
+ * _drbd_set_state() - Set a new DRBD state
* @device: DRBD device.
* @ns: new state.
* @flags: Flags
* @done: Optional completion, that will get completed after the after_state_ch() finished
*
- * Caller needs to hold req_lock, and global_state_lock. Do not call directly.
+ * Caller needs to hold req_lock. Do not call directly.
*/
enum drbd_state_rv
-__drbd_set_state(struct drbd_device *device, union drbd_state ns,
- enum chg_state_flags flags, struct completion *done)
+_drbd_set_state(struct drbd_device *device, union drbd_state ns,
+ enum chg_state_flags flags, struct completion *done)
{
struct drbd_peer_device *peer_device = first_peer_device(device);
struct drbd_connection *connection = peer_device ? peer_device->connection : NULL;
@@ -1444,7 +1444,7 @@ static void after_state_ch(struct drbd_device *device, union drbd_state os,
if (os.disk != D_FAILED && ns.disk == D_FAILED) {
enum drbd_io_error_p eh = EP_PASS_ON;
int was_io_error = 0;
- /* corresponding get_ldev was in __drbd_set_state, to serialize
+ /* corresponding get_ldev was in _drbd_set_state, to serialize
* our cleanup here with the transition to D_DISKLESS.
* But is is still not save to dreference ldev here, since
* we might come from an failed Attach before ldev was set. */
@@ -1759,7 +1759,7 @@ conn_set_state(struct drbd_connection *connection, union drbd_state mask, union
if (flags & CS_IGN_OUTD_FAIL && ns.disk == D_OUTDATED && os.disk < D_OUTDATED)
ns.disk = os.disk;

- rv = __drbd_set_state(device, ns, flags, NULL);
+ rv = _drbd_set_state(device, ns, flags, NULL);
if (rv < SS_SUCCESS)
BUG();

diff --git a/drivers/block/drbd/drbd_state.h b/drivers/block/drbd/drbd_state.h
index 7f53c40..bd98953 100644
--- a/drivers/block/drbd/drbd_state.h
+++ b/drivers/block/drbd/drbd_state.h
@@ -122,9 +122,9 @@ extern enum drbd_state_rv
_drbd_request_state_holding_state_mutex(struct drbd_device *, union drbd_state,
union drbd_state, enum chg_state_flags);

-extern enum drbd_state_rv __drbd_set_state(struct drbd_device *, union drbd_state,
- enum chg_state_flags,
- struct completion *done);
+extern enum drbd_state_rv _drbd_set_state(struct drbd_device *, union drbd_state,
+ enum chg_state_flags,
+ struct completion *done);
extern void print_st_err(struct drbd_device *, union drbd_state,
union drbd_state, int);

diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 5578c14..3b3d980 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -55,13 +55,6 @@ static int make_resync_request(struct drbd_device *, int);
*
*/

-
-/* About the global_state_lock
- Each state transition on an device holds a read lock. In case we have
- to evaluate the resync after dependencies, we grab a write lock, because
- we need stable states on all devices for that. */
-rwlock_t global_state_lock;
-
/* used for synchronous meta data and bitmap IO
* submitted by drbd_md_sync_page_io()
*/
@@ -1456,70 +1449,73 @@ static int _drbd_may_sync_now(struct drbd_device *device)
}

/**
- * _drbd_pause_after() - Pause resync on all devices that may not resync now
+ * drbd_pause_after() - Pause resync on all devices that may not resync now
* @device: DRBD device.
*
* Called from process context only (admin command and after_state_ch).
*/
-static int _drbd_pause_after(struct drbd_device *device)
+static bool drbd_pause_after(struct drbd_device *device)
{
+ bool changed = false;
struct drbd_device *odev;
- int i, rv = 0;
+ int i;

rcu_read_lock();
idr_for_each_entry(&drbd_devices, odev, i) {
if (odev->state.conn == C_STANDALONE && odev->state.disk == D_DISKLESS)
continue;
- if (!_drbd_may_sync_now(odev))
- rv |= (__drbd_set_state(_NS(odev, aftr_isp, 1), CS_HARD, NULL)
- != SS_NOTHING_TO_DO);
+ if (!_drbd_may_sync_now(odev) &&
+ _drbd_set_state(_NS(odev, aftr_isp, 1),
+ CS_HARD, NULL) != SS_NOTHING_TO_DO)
+ changed = true;
}
rcu_read_unlock();

- return rv;
+ return changed;
}

/**
- * _drbd_resume_next() - Resume resync on all devices that may resync now
+ * drbd_resume_next() - Resume resync on all devices that may resync now
* @device: DRBD device.
*
* Called from process context only (admin command and worker).
*/
-static int _drbd_resume_next(struct drbd_device *device)
+static bool drbd_resume_next(struct drbd_device *device)
{
+ bool changed = false;
struct drbd_device *odev;
- int i, rv = 0;
+ int i;

rcu_read_lock();
idr_for_each_entry(&drbd_devices, odev, i) {
if (odev->state.conn == C_STANDALONE && odev->state.disk == D_DISKLESS)
continue;
if (odev->state.aftr_isp) {
- if (_drbd_may_sync_now(odev))
- rv |= (__drbd_set_state(_NS(odev, aftr_isp, 0),
- CS_HARD, NULL)
- != SS_NOTHING_TO_DO) ;
+ if (_drbd_may_sync_now(odev) &&
+ _drbd_set_state(_NS(odev, aftr_isp, 0),
+ CS_HARD, NULL) != SS_NOTHING_TO_DO)
+ changed = true;
}
}
rcu_read_unlock();
- return rv;
+ return changed;
}

void resume_next_sg(struct drbd_device *device)
{
- write_lock_irq(&global_state_lock);
- _drbd_resume_next(device);
- write_unlock_irq(&global_state_lock);
+ lock_all_resources();
+ drbd_resume_next(device);
+ unlock_all_resources();
}

void suspend_other_sg(struct drbd_device *device)
{
- write_lock_irq(&global_state_lock);
- _drbd_pause_after(device);
- write_unlock_irq(&global_state_lock);
+ lock_all_resources();
+ drbd_pause_after(device);
+ unlock_all_resources();
}

-/* caller must hold global_state_lock */
+/* caller must lock_all_resources() */
enum drbd_ret_code drbd_resync_after_valid(struct drbd_device *device, int o_minor)
{
struct drbd_device *odev;
@@ -1557,15 +1553,15 @@ enum drbd_ret_code drbd_resync_after_valid(struct drbd_device *device, int o_min
}
}

-/* caller must hold global_state_lock */
+/* caller must lock_all_resources() */
void drbd_resync_after_changed(struct drbd_device *device)
{
- int changes;
+ int changed;

do {
- changes = _drbd_pause_after(device);
- changes |= _drbd_resume_next(device);
- } while (changes);
+ changed = drbd_pause_after(device);
+ changed |= drbd_resume_next(device);
+ } while (changed);
}

void drbd_rs_controller_reset(struct drbd_device *device)
@@ -1685,19 +1681,14 @@ void drbd_start_resync(struct drbd_device *device, enum drbd_conns side)
} else {
mutex_lock(device->state_mutex);
}
- clear_bit(B_RS_H_DONE, &device->flags);

- /* req_lock: serialize with drbd_send_and_submit() and others
- * global_state_lock: for stable sync-after dependencies */
- spin_lock_irq(&device->resource->req_lock);
- write_lock(&global_state_lock);
+ lock_all_resources();
+ clear_bit(B_RS_H_DONE, &device->flags);
/* Did some connection breakage or IO error race with us? */
if (device->state.conn < C_CONNECTED
|| !get_ldev_if_state(device, D_NEGOTIATING)) {
- write_unlock(&global_state_lock);
- spin_unlock_irq(&device->resource->req_lock);
- mutex_unlock(device->state_mutex);
- return;
+ unlock_all_resources();
+ goto out;
}

ns = drbd_read_state(device);
@@ -1711,7 +1702,7 @@ void drbd_start_resync(struct drbd_device *device, enum drbd_conns side)
else /* side == C_SYNC_SOURCE */
ns.pdsk = D_INCONSISTENT;

- r = __drbd_set_state(device, ns, CS_VERBOSE, NULL);
+ r = _drbd_set_state(device, ns, CS_VERBOSE, NULL);
ns = drbd_read_state(device);

if (ns.conn < C_CONNECTED)
@@ -1732,7 +1723,7 @@ void drbd_start_resync(struct drbd_device *device, enum drbd_conns side)
device->rs_mark_left[i] = tw;
device->rs_mark_time[i] = now;
}
- _drbd_pause_after(device);
+ drbd_pause_after(device);
/* Forget potentially stale cached per resync extent bit-counts.
* Open coded drbd_rs_cancel_all(device), we already have IRQs
* disabled, and know the disk state is ok. */
@@ -1742,8 +1733,7 @@ void drbd_start_resync(struct drbd_device *device, enum drbd_conns side)
device->resync_wenr = LC_FREE;
spin_unlock(&device->al_lock);
}
- write_unlock(&global_state_lock);
- spin_unlock_irq(&device->resource->req_lock);
+ unlock_all_resources();

if (r == SS_SUCCESS) {
wake_up(&device->al_wait); /* for lc_reset() above */
@@ -1807,6 +1797,7 @@ void drbd_start_resync(struct drbd_device *device, enum drbd_conns side)
drbd_md_sync(device);
}
put_ldev(device);
+out:
mutex_unlock(device->state_mutex);
}

--
1.9.1

2015-11-25 11:02:26

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 08/38] drbd: Backport the "events2" command

From: Andreas Gruenbacher <[email protected]>

The events2 command originates from drbd-9 development. It features
more information but requires a incompatible change in output
format.
Therefore the previous events command continues to exist, the new
improved events2 command becomes available now.

This prepares the user-base for a later switch to the complete
drbd9 code base.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_int.h | 45 +++
drivers/block/drbd/drbd_nl.c | 625 ++++++++++++++++++++++++++++++++-
drivers/block/drbd/drbd_receiver.c | 6 -
drivers/block/drbd/drbd_state.c | 424 +++++++++++++++++++++-
drivers/block/drbd/drbd_state_change.h | 63 ++++
include/linux/drbd.h | 16 +
include/linux/drbd_genl.h | 114 ++++++
7 files changed, 1281 insertions(+), 12 deletions(-)
create mode 100644 drivers/block/drbd/drbd_state_change.h

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 2c9ee22..965aae0 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -667,6 +667,8 @@ enum {
DEVICE_WORK_PENDING, /* tell worker that some device has pending work */
};

+enum which_state { NOW, OLD = NOW, NEW };
+
struct drbd_resource {
char *name;
#ifdef CONFIG_DEBUG_FS
@@ -785,6 +787,17 @@ struct drbd_connection {
} send;
};

+static inline bool has_net_conf(struct drbd_connection *connection)
+{
+ bool has_net_conf;
+
+ rcu_read_lock();
+ has_net_conf = rcu_dereference(connection->net_conf);
+ rcu_read_unlock();
+
+ return has_net_conf;
+}
+
void __update_timing_details(
struct drbd_thread_timing_details *tdp,
unsigned int *cb_nr,
@@ -1017,6 +1030,12 @@ static inline struct drbd_peer_device *first_peer_device(struct drbd_device *dev
return list_first_entry_or_null(&device->peer_devices, struct drbd_peer_device, peer_devices);
}

+static inline struct drbd_peer_device *
+conn_peer_device(struct drbd_connection *connection, int volume_number)
+{
+ return idr_find(&connection->peer_devices, volume_number);
+}
+
#define for_each_resource(resource, _resources) \
list_for_each_entry(resource, _resources, resources)

@@ -1451,6 +1470,9 @@ extern int is_valid_ar_handle(struct drbd_request *, sector_t);


/* drbd_nl.c */
+
+extern struct mutex notification_mutex;
+
extern void drbd_suspend_io(struct drbd_device *device);
extern void drbd_resume_io(struct drbd_device *device);
extern char *ppsize(char *buf, unsigned long long size);
@@ -1665,6 +1687,29 @@ struct sib_info {
};
void drbd_bcast_event(struct drbd_device *device, const struct sib_info *sib);

+extern void notify_resource_state(struct sk_buff *,
+ unsigned int,
+ struct drbd_resource *,
+ struct resource_info *,
+ enum drbd_notification_type);
+extern void notify_device_state(struct sk_buff *,
+ unsigned int,
+ struct drbd_device *,
+ struct device_info *,
+ enum drbd_notification_type);
+extern void notify_connection_state(struct sk_buff *,
+ unsigned int,
+ struct drbd_connection *,
+ struct connection_info *,
+ enum drbd_notification_type);
+extern void notify_peer_device_state(struct sk_buff *,
+ unsigned int,
+ struct drbd_peer_device *,
+ struct peer_device_info *,
+ enum drbd_notification_type);
+extern void notify_helper(enum drbd_notification_type, struct drbd_device *,
+ struct drbd_connection *, const char *, int);
+
/*
* inline helper functions
*************************/
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index d37c509..aa805cd 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -36,6 +36,7 @@
#include "drbd_int.h"
#include "drbd_protocol.h"
#include "drbd_req.h"
+#include "drbd_state_change.h"
#include <asm/unaligned.h>
#include <linux/drbd_limits.h>
#include <linux/kthread.h>
@@ -75,11 +76,17 @@ int drbd_adm_get_status(struct sk_buff *skb, struct genl_info *info);
int drbd_adm_get_timeout_type(struct sk_buff *skb, struct genl_info *info);
/* .dumpit */
int drbd_adm_get_status_all(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_get_initial_state(struct sk_buff *skb, struct netlink_callback *cb);

#include <linux/drbd_genl_api.h>
#include "drbd_nla.h"
#include <linux/genl_magic_func.h>

+static atomic_t drbd_genl_seq = ATOMIC_INIT(2); /* two. */
+static atomic_t notify_genl_seq = ATOMIC_INIT(2); /* two. */
+
+DEFINE_MUTEX(notification_mutex);
+
/* used blkdev_get_by_path, to claim our meta data device(s) */
static char *drbd_m_holder = "Hands off! this is DRBD's meta data device.";

@@ -349,6 +356,7 @@ int drbd_khelper(struct drbd_device *device, char *cmd)
sib.sib_reason = SIB_HELPER_PRE;
sib.helper_name = cmd;
drbd_bcast_event(device, &sib);
+ notify_helper(NOTIFY_CALL, device, connection, cmd, 0);
ret = call_usermodehelper(usermode_helper, argv, envp, UMH_WAIT_PROC);
if (ret)
drbd_warn(device, "helper command: %s %s %s exit code %u (0x%x)\n",
@@ -361,6 +369,7 @@ int drbd_khelper(struct drbd_device *device, char *cmd)
sib.sib_reason = SIB_HELPER_POST;
sib.helper_exit_code = ret;
drbd_bcast_event(device, &sib);
+ notify_helper(NOTIFY_RESPONSE, device, connection, cmd, ret);

if (current == connection->worker.task)
clear_bit(CALLBACK_PENDING, &connection->flags);
@@ -388,6 +397,7 @@ static int conn_khelper(struct drbd_connection *connection, char *cmd)

drbd_info(connection, "helper command: %s %s %s\n", usermode_helper, cmd, resource_name);
/* TODO: conn_bcast_event() ?? */
+ notify_helper(NOTIFY_CALL, NULL, connection, cmd, 0);

ret = call_usermodehelper(usermode_helper, argv, envp, UMH_WAIT_PROC);
if (ret)
@@ -399,6 +409,7 @@ static int conn_khelper(struct drbd_connection *connection, char *cmd)
usermode_helper, cmd, resource_name,
(ret >> 8) & 0xff, ret);
/* TODO: conn_bcast_event() ?? */
+ notify_helper(NOTIFY_RESPONSE, NULL, connection, cmd, ret);

if (ret < 0) /* Ignore any ERRNOs we got. */
ret = 0;
@@ -2248,8 +2259,31 @@ int drbd_adm_net_opts(struct sk_buff *skb, struct genl_info *info)
return 0;
}

+static void connection_to_info(struct connection_info *info,
+ struct drbd_connection *connection)
+{
+ info->conn_connection_state = connection->cstate;
+ info->conn_role = conn_highest_peer(connection);
+}
+
+static void peer_device_to_info(struct peer_device_info *info,
+ struct drbd_peer_device *peer_device)
+{
+ struct drbd_device *device = peer_device->device;
+
+ info->peer_repl_state =
+ max_t(enum drbd_conns, C_WF_REPORT_PARAMS, device->state.conn);
+ info->peer_disk_state = device->state.pdsk;
+ info->peer_resync_susp_user = device->state.user_isp;
+ info->peer_resync_susp_peer = device->state.peer_isp;
+ info->peer_resync_susp_dependency = device->state.aftr_isp;
+}
+
int drbd_adm_connect(struct sk_buff *skb, struct genl_info *info)
{
+ struct connection_info connection_info;
+ enum drbd_notification_type flags;
+ unsigned int peer_devices = 0;
struct drbd_config_context adm_ctx;
struct drbd_peer_device *peer_device;
struct net_conf *old_net_conf, *new_net_conf = NULL;
@@ -2350,6 +2384,22 @@ int drbd_adm_connect(struct sk_buff *skb, struct genl_info *info)
connection->peer_addr_len = nla_len(adm_ctx.peer_addr);
memcpy(&connection->peer_addr, nla_data(adm_ctx.peer_addr), connection->peer_addr_len);

+ idr_for_each_entry(&connection->peer_devices, peer_device, i) {
+ peer_devices++;
+ }
+
+ connection_to_info(&connection_info, connection);
+ flags = (peer_devices--) ? NOTIFY_CONTINUES : 0;
+ mutex_lock(&notification_mutex);
+ notify_connection_state(NULL, 0, connection, &connection_info, NOTIFY_CREATE | flags);
+ idr_for_each_entry(&connection->peer_devices, peer_device, i) {
+ struct peer_device_info peer_device_info;
+
+ peer_device_to_info(&peer_device_info, peer_device);
+ flags = (peer_devices--) ? NOTIFY_CONTINUES : 0;
+ notify_peer_device_state(NULL, 0, peer_device, &peer_device_info, NOTIFY_CREATE | flags);
+ }
+ mutex_unlock(&notification_mutex);
mutex_unlock(&adm_ctx.resource->conf_update);

rcu_read_lock();
@@ -2431,6 +2481,8 @@ static enum drbd_state_rv conn_try_disconnect(struct drbd_connection *connection
drbd_err(connection,
"unexpected rv2=%d in conn_try_disconnect()\n",
rv2);
+ /* Unlike in DRBD 9, the state engine has generated
+ * NOTIFY_DESTROY events before clearing connection->net_conf. */
}
return rv;
}
@@ -3417,8 +3469,18 @@ drbd_check_resource_name(struct drbd_config_context *adm_ctx)
return NO_ERROR;
}

+static void resource_to_info(struct resource_info *info,
+ struct drbd_resource *resource)
+{
+ info->res_role = conn_highest_role(first_connection(resource));
+ info->res_susp = resource->susp;
+ info->res_susp_nod = resource->susp_nod;
+ info->res_susp_fen = resource->susp_fen;
+}
+
int drbd_adm_new_resource(struct sk_buff *skb, struct genl_info *info)
{
+ struct drbd_connection *connection;
struct drbd_config_context adm_ctx;
enum drbd_ret_code retcode;
struct res_opts res_opts;
@@ -3453,14 +3515,32 @@ int drbd_adm_new_resource(struct sk_buff *skb, struct genl_info *info)

/* not yet safe for genl_family.parallel_ops */
mutex_lock(&resources_mutex);
- if (!conn_create(adm_ctx.resource_name, &res_opts))
- retcode = ERR_NOMEM;
+ connection = conn_create(adm_ctx.resource_name, &res_opts);
mutex_unlock(&resources_mutex);
+
+ if (connection) {
+ struct resource_info resource_info;
+
+ mutex_lock(&notification_mutex);
+ resource_to_info(&resource_info, connection->resource);
+ notify_resource_state(NULL, 0, connection->resource,
+ &resource_info, NOTIFY_CREATE);
+ mutex_unlock(&notification_mutex);
+ } else
+ retcode = ERR_NOMEM;
+
out:
drbd_adm_finish(&adm_ctx, info, retcode);
return 0;
}

+static void device_to_info(struct device_info *info,
+ struct drbd_device *device)
+{
+ info->dev_disk_state = device->state.disk;
+}
+
+
int drbd_adm_new_minor(struct sk_buff *skb, struct genl_info *info)
{
struct drbd_config_context adm_ctx;
@@ -3495,6 +3575,36 @@ int drbd_adm_new_minor(struct sk_buff *skb, struct genl_info *info)

mutex_lock(&adm_ctx.resource->adm_mutex);
retcode = drbd_create_device(&adm_ctx, dh->minor);
+ if (retcode == NO_ERROR) {
+ struct drbd_device *device;
+ struct drbd_peer_device *peer_device;
+ struct device_info info;
+ unsigned int peer_devices = 0;
+ enum drbd_notification_type flags;
+
+ device = minor_to_device(dh->minor);
+ for_each_peer_device(peer_device, device) {
+ if (!has_net_conf(peer_device->connection))
+ continue;
+ peer_devices++;
+ }
+
+ device_to_info(&info, device);
+ mutex_lock(&notification_mutex);
+ flags = (peer_devices--) ? NOTIFY_CONTINUES : 0;
+ notify_device_state(NULL, 0, device, &info, NOTIFY_CREATE | flags);
+ for_each_peer_device(peer_device, device) {
+ struct peer_device_info peer_device_info;
+
+ if (!has_net_conf(peer_device->connection))
+ continue;
+ peer_device_to_info(&peer_device_info, peer_device);
+ flags = (peer_devices--) ? NOTIFY_CONTINUES : 0;
+ notify_peer_device_state(NULL, 0, peer_device, &peer_device_info,
+ NOTIFY_CREATE | flags);
+ }
+ mutex_unlock(&notification_mutex);
+ }
mutex_unlock(&adm_ctx.resource->adm_mutex);
out:
drbd_adm_finish(&adm_ctx, info, retcode);
@@ -3503,13 +3613,35 @@ out:

static enum drbd_ret_code adm_del_minor(struct drbd_device *device)
{
+ struct drbd_peer_device *peer_device;
+
if (device->state.disk == D_DISKLESS &&
/* no need to be device->state.conn == C_STANDALONE &&
* we may want to delete a minor from a live replication group.
*/
device->state.role == R_SECONDARY) {
+ struct drbd_connection *connection =
+ first_connection(device->resource);
+
_drbd_request_state(device, NS(conn, C_WF_REPORT_PARAMS),
CS_VERBOSE + CS_WAIT_COMPLETE);
+
+ /* If the state engine hasn't stopped the sender thread yet, we
+ * need to flush the sender work queue before generating the
+ * DESTROY events here. */
+ if (get_t_state(&connection->worker) == RUNNING)
+ drbd_flush_workqueue(&connection->sender_work);
+
+ mutex_lock(&notification_mutex);
+ for_each_peer_device(peer_device, device) {
+ if (!has_net_conf(peer_device->connection))
+ continue;
+ notify_peer_device_state(NULL, 0, peer_device, NULL,
+ NOTIFY_DESTROY | NOTIFY_CONTINUES);
+ }
+ notify_device_state(NULL, 0, device, NULL, NOTIFY_DESTROY);
+ mutex_unlock(&notification_mutex);
+
drbd_delete_device(device);
return NO_ERROR;
} else
@@ -3546,6 +3678,13 @@ static int adm_del_resource(struct drbd_resource *resource)
if (!idr_is_empty(&resource->devices))
return ERR_RES_IN_USE;

+ /* The state engine has stopped the sender thread, so we don't
+ * need to flush the sender work queue before generating the
+ * DESTROY event here. */
+ mutex_lock(&notification_mutex);
+ notify_resource_state(NULL, 0, resource, NULL, NOTIFY_DESTROY);
+ mutex_unlock(&notification_mutex);
+
mutex_lock(&resources_mutex);
list_del_rcu(&resource->resources);
mutex_unlock(&resources_mutex);
@@ -3644,7 +3783,6 @@ finish:

void drbd_bcast_event(struct drbd_device *device, const struct sib_info *sib)
{
- static atomic_t drbd_genl_seq = ATOMIC_INIT(2); /* two. */
struct sk_buff *msg;
struct drbd_genlmsghdr *d_out;
unsigned seq;
@@ -3679,3 +3817,484 @@ failed:
"Event seq:%u sib_reason:%u\n",
err, seq, sib->sib_reason);
}
+
+static void device_to_statistics(struct device_statistics *s,
+ struct drbd_device *device)
+{
+ memset(s, 0, sizeof(*s));
+ s->dev_upper_blocked = !may_inc_ap_bio(device);
+ if (get_ldev(device)) {
+ struct drbd_md *md = &device->ldev->md;
+ u64 *history_uuids = (u64 *)s->history_uuids;
+ struct request_queue *q;
+ int n;
+
+ spin_lock_irq(&md->uuid_lock);
+ s->dev_current_uuid = md->uuid[UI_CURRENT];
+ BUILD_BUG_ON(sizeof(s->history_uuids) < UI_HISTORY_END - UI_HISTORY_START + 1);
+ for (n = 0; n < UI_HISTORY_END - UI_HISTORY_START + 1; n++)
+ history_uuids[n] = md->uuid[UI_HISTORY_START + n];
+ for (; n < HISTORY_UUIDS; n++)
+ history_uuids[n] = 0;
+ s->history_uuids_len = HISTORY_UUIDS;
+ spin_unlock_irq(&md->uuid_lock);
+
+ s->dev_disk_flags = md->flags;
+ q = bdev_get_queue(device->ldev->backing_bdev);
+ s->dev_lower_blocked =
+ bdi_congested(&q->backing_dev_info,
+ (1 << WB_async_congested) |
+ (1 << WB_sync_congested));
+ put_ldev(device);
+ }
+ s->dev_size = drbd_get_capacity(device->this_bdev);
+ s->dev_read = device->read_cnt;
+ s->dev_write = device->writ_cnt;
+ s->dev_al_writes = device->al_writ_cnt;
+ s->dev_bm_writes = device->bm_writ_cnt;
+ s->dev_upper_pending = atomic_read(&device->ap_bio_cnt);
+ s->dev_lower_pending = atomic_read(&device->local_cnt);
+ s->dev_al_suspended = test_bit(AL_SUSPENDED, &device->flags);
+ s->dev_exposed_data_uuid = device->ed_uuid;
+}
+
+enum mdf_peer_flag {
+ MDF_PEER_CONNECTED = 1 << 0,
+ MDF_PEER_OUTDATED = 1 << 1,
+ MDF_PEER_FENCING = 1 << 2,
+ MDF_PEER_FULL_SYNC = 1 << 3,
+};
+
+static void peer_device_to_statistics(struct peer_device_statistics *s,
+ struct drbd_peer_device *peer_device)
+{
+ struct drbd_device *device = peer_device->device;
+
+ memset(s, 0, sizeof(*s));
+ s->peer_dev_received = device->recv_cnt;
+ s->peer_dev_sent = device->send_cnt;
+ s->peer_dev_pending = atomic_read(&device->ap_pending_cnt) +
+ atomic_read(&device->rs_pending_cnt);
+ s->peer_dev_unacked = atomic_read(&device->unacked_cnt);
+ s->peer_dev_out_of_sync = drbd_bm_total_weight(device) << (BM_BLOCK_SHIFT - 9);
+ s->peer_dev_resync_failed = device->rs_failed << (BM_BLOCK_SHIFT - 9);
+ if (get_ldev(device)) {
+ struct drbd_md *md = &device->ldev->md;
+
+ spin_lock_irq(&md->uuid_lock);
+ s->peer_dev_bitmap_uuid = md->uuid[UI_BITMAP];
+ spin_unlock_irq(&md->uuid_lock);
+ s->peer_dev_flags =
+ (drbd_md_test_flag(device->ldev, MDF_CONNECTED_IND) ?
+ MDF_PEER_CONNECTED : 0) +
+ (drbd_md_test_flag(device->ldev, MDF_CONSISTENT) &&
+ !drbd_md_test_flag(device->ldev, MDF_WAS_UP_TO_DATE) ?
+ MDF_PEER_OUTDATED : 0) +
+ /* FIXME: MDF_PEER_FENCING? */
+ (drbd_md_test_flag(device->ldev, MDF_FULL_SYNC) ?
+ MDF_PEER_FULL_SYNC : 0);
+ put_ldev(device);
+ }
+}
+
+static int nla_put_notification_header(struct sk_buff *msg,
+ enum drbd_notification_type type)
+{
+ struct drbd_notification_header nh = {
+ .nh_type = type,
+ };
+
+ return drbd_notification_header_to_skb(msg, &nh, true);
+}
+
+void notify_resource_state(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_resource *resource,
+ struct resource_info *resource_info,
+ enum drbd_notification_type type)
+{
+ struct resource_statistics resource_statistics;
+ struct drbd_genlmsghdr *dh;
+ bool multicast = false;
+ int err;
+
+ if (!skb) {
+ seq = atomic_inc_return(&notify_genl_seq);
+ skb = genlmsg_new(NLMSG_GOODSIZE, GFP_NOIO);
+ err = -ENOMEM;
+ if (!skb)
+ goto failed;
+ multicast = true;
+ }
+
+ err = -EMSGSIZE;
+ dh = genlmsg_put(skb, 0, seq, &drbd_genl_family, 0, DRBD_RESOURCE_STATE);
+ if (!dh)
+ goto nla_put_failure;
+ dh->minor = -1U;
+ dh->ret_code = NO_ERROR;
+ if (nla_put_drbd_cfg_context(skb, resource, NULL, NULL) ||
+ nla_put_notification_header(skb, type) ||
+ ((type & ~NOTIFY_FLAGS) != NOTIFY_DESTROY &&
+ resource_info_to_skb(skb, resource_info, true)))
+ goto nla_put_failure;
+ resource_statistics.res_stat_write_ordering = resource->write_ordering;
+ err = resource_statistics_to_skb(skb, &resource_statistics, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto nla_put_failure;
+ genlmsg_end(skb, dh);
+ if (multicast) {
+ err = drbd_genl_multicast_events(skb, 0);
+ /* skb has been consumed or freed in netlink_broadcast() */
+ if (err && err != -ESRCH)
+ goto failed;
+ }
+ return;
+
+nla_put_failure:
+ nlmsg_free(skb);
+failed:
+ drbd_err(resource, "Error %d while broadcasting event. Event seq:%u\n",
+ err, seq);
+}
+
+void notify_device_state(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_device *device,
+ struct device_info *device_info,
+ enum drbd_notification_type type)
+{
+ struct device_statistics device_statistics;
+ struct drbd_genlmsghdr *dh;
+ bool multicast = false;
+ int err;
+
+ if (!skb) {
+ seq = atomic_inc_return(&notify_genl_seq);
+ skb = genlmsg_new(NLMSG_GOODSIZE, GFP_NOIO);
+ err = -ENOMEM;
+ if (!skb)
+ goto failed;
+ multicast = true;
+ }
+
+ err = -EMSGSIZE;
+ dh = genlmsg_put(skb, 0, seq, &drbd_genl_family, 0, DRBD_DEVICE_STATE);
+ if (!dh)
+ goto nla_put_failure;
+ dh->minor = device->minor;
+ dh->ret_code = NO_ERROR;
+ if (nla_put_drbd_cfg_context(skb, device->resource, NULL, device) ||
+ nla_put_notification_header(skb, type) ||
+ ((type & ~NOTIFY_FLAGS) != NOTIFY_DESTROY &&
+ device_info_to_skb(skb, device_info, true)))
+ goto nla_put_failure;
+ device_to_statistics(&device_statistics, device);
+ device_statistics_to_skb(skb, &device_statistics, !capable(CAP_SYS_ADMIN));
+ genlmsg_end(skb, dh);
+ if (multicast) {
+ err = drbd_genl_multicast_events(skb, 0);
+ /* skb has been consumed or freed in netlink_broadcast() */
+ if (err && err != -ESRCH)
+ goto failed;
+ }
+ return;
+
+nla_put_failure:
+ nlmsg_free(skb);
+failed:
+ drbd_err(device, "Error %d while broadcasting event. Event seq:%u\n",
+ err, seq);
+}
+
+void notify_connection_state(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_connection *connection,
+ struct connection_info *connection_info,
+ enum drbd_notification_type type)
+{
+ struct connection_statistics connection_statistics;
+ struct drbd_genlmsghdr *dh;
+ bool multicast = false;
+ int err;
+
+ if (!skb) {
+ seq = atomic_inc_return(&notify_genl_seq);
+ skb = genlmsg_new(NLMSG_GOODSIZE, GFP_NOIO);
+ err = -ENOMEM;
+ if (!skb)
+ goto failed;
+ multicast = true;
+ }
+
+ err = -EMSGSIZE;
+ dh = genlmsg_put(skb, 0, seq, &drbd_genl_family, 0, DRBD_CONNECTION_STATE);
+ if (!dh)
+ goto nla_put_failure;
+ dh->minor = -1U;
+ dh->ret_code = NO_ERROR;
+ if (nla_put_drbd_cfg_context(skb, connection->resource, connection, NULL) ||
+ nla_put_notification_header(skb, type) ||
+ ((type & ~NOTIFY_FLAGS) != NOTIFY_DESTROY &&
+ connection_info_to_skb(skb, connection_info, true)))
+ goto nla_put_failure;
+ connection_statistics.conn_congested = test_bit(NET_CONGESTED, &connection->flags);
+ connection_statistics_to_skb(skb, &connection_statistics, !capable(CAP_SYS_ADMIN));
+ genlmsg_end(skb, dh);
+ if (multicast) {
+ err = drbd_genl_multicast_events(skb, 0);
+ /* skb has been consumed or freed in netlink_broadcast() */
+ if (err && err != -ESRCH)
+ goto failed;
+ }
+ return;
+
+nla_put_failure:
+ nlmsg_free(skb);
+failed:
+ drbd_err(connection, "Error %d while broadcasting event. Event seq:%u\n",
+ err, seq);
+}
+
+void notify_peer_device_state(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_peer_device *peer_device,
+ struct peer_device_info *peer_device_info,
+ enum drbd_notification_type type)
+{
+ struct peer_device_statistics peer_device_statistics;
+ struct drbd_resource *resource = peer_device->device->resource;
+ struct drbd_genlmsghdr *dh;
+ bool multicast = false;
+ int err;
+
+ if (!skb) {
+ seq = atomic_inc_return(&notify_genl_seq);
+ skb = genlmsg_new(NLMSG_GOODSIZE, GFP_NOIO);
+ err = -ENOMEM;
+ if (!skb)
+ goto failed;
+ multicast = true;
+ }
+
+ err = -EMSGSIZE;
+ dh = genlmsg_put(skb, 0, seq, &drbd_genl_family, 0, DRBD_PEER_DEVICE_STATE);
+ if (!dh)
+ goto nla_put_failure;
+ dh->minor = -1U;
+ dh->ret_code = NO_ERROR;
+ if (nla_put_drbd_cfg_context(skb, resource, peer_device->connection, peer_device->device) ||
+ nla_put_notification_header(skb, type) ||
+ ((type & ~NOTIFY_FLAGS) != NOTIFY_DESTROY &&
+ peer_device_info_to_skb(skb, peer_device_info, true)))
+ goto nla_put_failure;
+ peer_device_to_statistics(&peer_device_statistics, peer_device);
+ peer_device_statistics_to_skb(skb, &peer_device_statistics, !capable(CAP_SYS_ADMIN));
+ genlmsg_end(skb, dh);
+ if (multicast) {
+ err = drbd_genl_multicast_events(skb, 0);
+ /* skb has been consumed or freed in netlink_broadcast() */
+ if (err && err != -ESRCH)
+ goto failed;
+ }
+ return;
+
+nla_put_failure:
+ nlmsg_free(skb);
+failed:
+ drbd_err(peer_device, "Error %d while broadcasting event. Event seq:%u\n",
+ err, seq);
+}
+
+void notify_helper(enum drbd_notification_type type,
+ struct drbd_device *device, struct drbd_connection *connection,
+ const char *name, int status)
+{
+ struct drbd_resource *resource = device ? device->resource : connection->resource;
+ struct drbd_helper_info helper_info;
+ unsigned int seq = atomic_inc_return(&notify_genl_seq);
+ struct sk_buff *skb = NULL;
+ struct drbd_genlmsghdr *dh;
+ int err;
+
+ strlcpy(helper_info.helper_name, name, sizeof(helper_info.helper_name));
+ helper_info.helper_name_len = min(strlen(name), sizeof(helper_info.helper_name));
+ helper_info.helper_status = status;
+
+ skb = genlmsg_new(NLMSG_GOODSIZE, GFP_NOIO);
+ err = -ENOMEM;
+ if (!skb)
+ goto fail;
+
+ err = -EMSGSIZE;
+ dh = genlmsg_put(skb, 0, seq, &drbd_genl_family, 0, DRBD_HELPER);
+ if (!dh)
+ goto fail;
+ dh->minor = device ? device->minor : -1;
+ dh->ret_code = NO_ERROR;
+ mutex_lock(&notification_mutex);
+ if (nla_put_drbd_cfg_context(skb, resource, connection, device) ||
+ nla_put_notification_header(skb, type) ||
+ drbd_helper_info_to_skb(skb, &helper_info, true))
+ goto unlock_fail;
+ genlmsg_end(skb, dh);
+ err = drbd_genl_multicast_events(skb, 0);
+ skb = NULL;
+ /* skb has been consumed or freed in netlink_broadcast() */
+ if (err && err != -ESRCH)
+ goto unlock_fail;
+ mutex_unlock(&notification_mutex);
+ return;
+
+unlock_fail:
+ mutex_unlock(&notification_mutex);
+fail:
+ nlmsg_free(skb);
+ drbd_err(resource, "Error %d while broadcasting event. Event seq:%u\n",
+ err, seq);
+}
+
+static void notify_initial_state_done(struct sk_buff *skb, unsigned int seq)
+{
+ struct drbd_genlmsghdr *dh;
+ int err;
+
+ err = -EMSGSIZE;
+ dh = genlmsg_put(skb, 0, seq, &drbd_genl_family, 0, DRBD_INITIAL_STATE_DONE);
+ if (!dh)
+ goto nla_put_failure;
+ dh->minor = -1U;
+ dh->ret_code = NO_ERROR;
+ if (nla_put_notification_header(skb, NOTIFY_EXISTS))
+ goto nla_put_failure;
+ genlmsg_end(skb, dh);
+ return;
+
+nla_put_failure:
+ nlmsg_free(skb);
+ pr_err("Error %d sending event. Event seq:%u\n", err, seq);
+}
+
+static void free_state_changes(struct list_head *list)
+{
+ while (!list_empty(list)) {
+ struct drbd_state_change *state_change =
+ list_first_entry(list, struct drbd_state_change, list);
+ list_del(&state_change->list);
+ forget_state_change(state_change);
+ }
+}
+
+static unsigned int notifications_for_state_change(struct drbd_state_change *state_change)
+{
+ return 1 +
+ state_change->n_connections +
+ state_change->n_devices +
+ state_change->n_devices * state_change->n_connections;
+}
+
+static int get_initial_state(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct drbd_state_change *state_change = (struct drbd_state_change *)cb->args[0];
+ unsigned int seq = cb->args[2];
+ unsigned int n;
+ enum drbd_notification_type flags = 0;
+
+ /* There is no need for taking notification_mutex here: it doesn't
+ matter if the initial state events mix with later state chage
+ events; we can always tell the events apart by the NOTIFY_EXISTS
+ flag. */
+
+ cb->args[5]--;
+ if (cb->args[5] == 1) {
+ notify_initial_state_done(skb, seq);
+ goto out;
+ }
+ n = cb->args[4]++;
+ if (cb->args[4] < cb->args[3])
+ flags |= NOTIFY_CONTINUES;
+ if (n < 1) {
+ notify_resource_state_change(skb, seq, state_change->resource,
+ NOTIFY_EXISTS | flags);
+ goto next;
+ }
+ n--;
+ if (n < state_change->n_connections) {
+ notify_connection_state_change(skb, seq, &state_change->connections[n],
+ NOTIFY_EXISTS | flags);
+ goto next;
+ }
+ n -= state_change->n_connections;
+ if (n < state_change->n_devices) {
+ notify_device_state_change(skb, seq, &state_change->devices[n],
+ NOTIFY_EXISTS | flags);
+ goto next;
+ }
+ n -= state_change->n_devices;
+ if (n < state_change->n_devices * state_change->n_connections) {
+ notify_peer_device_state_change(skb, seq, &state_change->peer_devices[n],
+ NOTIFY_EXISTS | flags);
+ goto next;
+ }
+
+next:
+ if (cb->args[4] == cb->args[3]) {
+ struct drbd_state_change *next_state_change =
+ list_entry(state_change->list.next,
+ struct drbd_state_change, list);
+ cb->args[0] = (long)next_state_change;
+ cb->args[3] = notifications_for_state_change(next_state_change);
+ cb->args[4] = 0;
+ }
+out:
+ return skb->len;
+}
+
+int drbd_adm_get_initial_state(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct drbd_resource *resource;
+ LIST_HEAD(head);
+
+ if (cb->args[5] >= 1) {
+ if (cb->args[5] > 1)
+ return get_initial_state(skb, cb);
+ if (cb->args[0]) {
+ struct drbd_state_change *state_change =
+ (struct drbd_state_change *)cb->args[0];
+
+ /* connect list to head */
+ list_add(&head, &state_change->list);
+ free_state_changes(&head);
+ }
+ return 0;
+ }
+
+ cb->args[5] = 2; /* number of iterations */
+ mutex_lock(&resources_mutex);
+ for_each_resource(resource, &drbd_resources) {
+ struct drbd_state_change *state_change;
+
+ state_change = remember_old_state(resource, GFP_KERNEL);
+ if (!state_change) {
+ if (!list_empty(&head))
+ free_state_changes(&head);
+ mutex_unlock(&resources_mutex);
+ return -ENOMEM;
+ }
+ copy_old_to_new_state_change(state_change);
+ list_add_tail(&state_change->list, &head);
+ cb->args[5] += notifications_for_state_change(state_change);
+ }
+ mutex_unlock(&resources_mutex);
+
+ if (!list_empty(&head)) {
+ struct drbd_state_change *state_change =
+ list_entry(head.next, struct drbd_state_change, list);
+ cb->args[0] = (long)state_change;
+ cb->args[3] = notifications_for_state_change(state_change);
+ list_del(&head); /* detach list from head */
+ }
+
+ cb->args[2] = cb->nlh->nlmsg_seq;
+ return get_initial_state(skb, cb);
+}
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index bf38b95..61b73c7 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1508,12 +1508,6 @@ static void conn_wait_active_ee_empty(struct drbd_connection *connection)
rcu_read_unlock();
}

-static struct drbd_peer_device *
-conn_peer_device(struct drbd_connection *connection, int volume_number)
-{
- return idr_find(&connection->peer_devices, volume_number);
-}
-
static int receive_Barrier(struct drbd_connection *connection, struct packet_info *pi)
{
int rv;
diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 535ae47..bc4b45b 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -29,6 +29,7 @@
#include "drbd_int.h"
#include "drbd_protocol.h"
#include "drbd_req.h"
+#include "drbd_state_change.h"

struct after_state_chg_work {
struct drbd_work w;
@@ -37,6 +38,7 @@ struct after_state_chg_work {
union drbd_state ns;
enum chg_state_flags flags;
struct completion *done;
+ struct drbd_state_change *state_change;
};

enum sanitize_state_warnings {
@@ -48,9 +50,266 @@ enum sanitize_state_warnings {
IMPLICITLY_UPGRADED_PDSK,
};

+static void count_objects(struct drbd_resource *resource,
+ unsigned int *n_devices,
+ unsigned int *n_connections)
+{
+ struct drbd_device *device;
+ struct drbd_connection *connection;
+ int vnr;
+
+ *n_devices = 0;
+ *n_connections = 0;
+
+ idr_for_each_entry(&resource->devices, device, vnr)
+ (*n_devices)++;
+ for_each_connection(connection, resource) {
+ if (!has_net_conf(connection))
+ continue;
+ (*n_connections)++;
+ }
+}
+
+static struct drbd_state_change *alloc_state_change(unsigned int n_devices, unsigned int n_connections, gfp_t gfp)
+{
+ struct drbd_state_change *state_change;
+ unsigned int size, n;
+
+ size = sizeof(struct drbd_state_change) +
+ n_devices * sizeof(struct drbd_device_state_change) +
+ n_connections * sizeof(struct drbd_connection_state_change) +
+ n_devices * n_connections * sizeof(struct drbd_peer_device_state_change);
+ state_change = kmalloc(size, gfp);
+ if (!state_change)
+ return NULL;
+ state_change->n_devices = n_devices;
+ state_change->n_connections = n_connections;
+ state_change->devices = (void *)(state_change + 1);
+ state_change->connections = (void *)&state_change->devices[n_devices];
+ state_change->peer_devices = (void *)&state_change->connections[n_connections];
+ state_change->resource->resource = NULL;
+ for (n = 0; n < n_devices; n++)
+ state_change->devices[n].device = NULL;
+ for (n = 0; n < n_connections; n++)
+ state_change->connections[n].connection = NULL;
+ return state_change;
+}
+
+struct drbd_state_change *remember_old_state(struct drbd_resource *resource, gfp_t gfp)
+{
+ struct drbd_state_change *state_change;
+ struct drbd_device *device;
+ unsigned int n_devices;
+ struct drbd_connection *connection;
+ unsigned int n_connections;
+ int vnr;
+
+ struct drbd_device_state_change *device_state_change;
+ struct drbd_peer_device_state_change *peer_device_state_change;
+ struct drbd_connection_state_change *connection_state_change;
+
+retry:
+ rcu_read_lock();
+ count_objects(resource, &n_devices, &n_connections);
+ rcu_read_unlock();
+ state_change = alloc_state_change(n_devices, n_connections, gfp);
+ if (!state_change)
+ return NULL;
+
+ rcu_read_lock();
+ count_objects(resource, &n_devices, &n_connections);
+ if (n_devices != state_change->n_devices ||
+ n_connections != state_change->n_connections) {
+ kfree(state_change);
+ rcu_read_unlock();
+ goto retry;
+ }
+
+ kref_get(&resource->kref);
+ state_change->resource->resource = resource;
+ state_change->resource->role[OLD] =
+ conn_highest_role(first_connection(resource));
+ state_change->resource->susp[OLD] = resource->susp;
+ state_change->resource->susp_nod[OLD] = resource->susp_nod;
+ state_change->resource->susp_fen[OLD] = resource->susp_fen;
+
+ device_state_change = state_change->devices;
+ peer_device_state_change = state_change->peer_devices;
+ idr_for_each_entry(&resource->devices, device, vnr) {
+ kref_get(&device->kref);
+ device_state_change->device = device;
+ device_state_change->disk_state[OLD] = device->state.disk;
+
+ /* The peer_devices for each device have to be enumerated in
+ the order of the connections. We may not use for_each_peer_device() here. */
+ for_each_connection(connection, resource) {
+ struct drbd_peer_device *peer_device;
+
+ if (!has_net_conf(connection))
+ continue;
+ peer_device = conn_peer_device(connection, device->vnr);
+ peer_device_state_change->peer_device = peer_device;
+ peer_device_state_change->disk_state[OLD] =
+ device->state.pdsk;
+ peer_device_state_change->repl_state[OLD] =
+ max_t(enum drbd_conns,
+ C_WF_REPORT_PARAMS, device->state.conn);
+ peer_device_state_change->resync_susp_user[OLD] =
+ device->state.user_isp;
+ peer_device_state_change->resync_susp_peer[OLD] =
+ device->state.peer_isp;
+ peer_device_state_change->resync_susp_dependency[OLD] =
+ device->state.aftr_isp;
+ peer_device_state_change++;
+ }
+ device_state_change++;
+ }
+
+ connection_state_change = state_change->connections;
+ for_each_connection(connection, resource) {
+ if (!has_net_conf(connection))
+ continue;
+ kref_get(&connection->kref);
+ connection_state_change->connection = connection;
+ connection_state_change->cstate[OLD] =
+ connection->cstate;
+ connection_state_change->peer_role[OLD] =
+ conn_highest_peer(connection);
+ connection_state_change++;
+ }
+ rcu_read_unlock();
+
+ return state_change;
+}
+
+static void remember_new_state(struct drbd_state_change *state_change)
+{
+ struct drbd_resource_state_change *resource_state_change;
+ struct drbd_resource *resource;
+ unsigned int n;
+
+ if (!state_change)
+ return;
+
+ resource_state_change = &state_change->resource[0];
+ resource = resource_state_change->resource;
+
+ resource_state_change->role[NEW] =
+ conn_highest_role(first_connection(resource));
+ resource_state_change->susp[NEW] = resource->susp;
+ resource_state_change->susp_nod[NEW] = resource->susp_nod;
+ resource_state_change->susp_fen[NEW] = resource->susp_fen;
+
+ for (n = 0; n < state_change->n_devices; n++) {
+ struct drbd_device_state_change *device_state_change =
+ &state_change->devices[n];
+ struct drbd_device *device = device_state_change->device;
+
+ device_state_change->disk_state[NEW] = device->state.disk;
+ }
+
+ for (n = 0; n < state_change->n_connections; n++) {
+ struct drbd_connection_state_change *connection_state_change =
+ &state_change->connections[n];
+ struct drbd_connection *connection =
+ connection_state_change->connection;
+
+ connection_state_change->cstate[NEW] = connection->cstate;
+ connection_state_change->peer_role[NEW] =
+ conn_highest_peer(connection);
+ }
+
+ for (n = 0; n < state_change->n_devices * state_change->n_connections; n++) {
+ struct drbd_peer_device_state_change *peer_device_state_change =
+ &state_change->peer_devices[n];
+ struct drbd_device *device =
+ peer_device_state_change->peer_device->device;
+ union drbd_dev_state state = device->state;
+
+ peer_device_state_change->disk_state[NEW] = state.pdsk;
+ peer_device_state_change->repl_state[NEW] =
+ max_t(enum drbd_conns, C_WF_REPORT_PARAMS, state.conn);
+ peer_device_state_change->resync_susp_user[NEW] =
+ state.user_isp;
+ peer_device_state_change->resync_susp_peer[NEW] =
+ state.peer_isp;
+ peer_device_state_change->resync_susp_dependency[NEW] =
+ state.aftr_isp;
+ }
+}
+
+void copy_old_to_new_state_change(struct drbd_state_change *state_change)
+{
+ struct drbd_resource_state_change *resource_state_change = &state_change->resource[0];
+ unsigned int n_device, n_connection, n_peer_device, n_peer_devices;
+
+#define OLD_TO_NEW(x) \
+ (x[NEW] = x[OLD])
+
+ OLD_TO_NEW(resource_state_change->role);
+ OLD_TO_NEW(resource_state_change->susp);
+ OLD_TO_NEW(resource_state_change->susp_nod);
+ OLD_TO_NEW(resource_state_change->susp_fen);
+
+ for (n_connection = 0; n_connection < state_change->n_connections; n_connection++) {
+ struct drbd_connection_state_change *connection_state_change =
+ &state_change->connections[n_connection];
+
+ OLD_TO_NEW(connection_state_change->peer_role);
+ OLD_TO_NEW(connection_state_change->cstate);
+ }
+
+ for (n_device = 0; n_device < state_change->n_devices; n_device++) {
+ struct drbd_device_state_change *device_state_change =
+ &state_change->devices[n_device];
+
+ OLD_TO_NEW(device_state_change->disk_state);
+ }
+
+ n_peer_devices = state_change->n_devices * state_change->n_connections;
+ for (n_peer_device = 0; n_peer_device < n_peer_devices; n_peer_device++) {
+ struct drbd_peer_device_state_change *p =
+ &state_change->peer_devices[n_peer_device];
+
+ OLD_TO_NEW(p->disk_state);
+ OLD_TO_NEW(p->repl_state);
+ OLD_TO_NEW(p->resync_susp_user);
+ OLD_TO_NEW(p->resync_susp_peer);
+ OLD_TO_NEW(p->resync_susp_dependency);
+ }
+
+#undef OLD_TO_NEW
+}
+
+void forget_state_change(struct drbd_state_change *state_change)
+{
+ unsigned int n;
+
+ if (!state_change)
+ return;
+
+ if (state_change->resource->resource)
+ kref_put(&state_change->resource->resource->kref, drbd_destroy_resource);
+ for (n = 0; n < state_change->n_devices; n++) {
+ struct drbd_device *device = state_change->devices[n].device;
+
+ if (device)
+ kref_put(&device->kref, drbd_destroy_device);
+ }
+ for (n = 0; n < state_change->n_connections; n++) {
+ struct drbd_connection *connection =
+ state_change->connections[n].connection;
+
+ if (connection)
+ kref_put(&connection->kref, drbd_destroy_connection);
+ }
+ kfree(state_change);
+}
+
static int w_after_state_ch(struct drbd_work *w, int unused);
static void after_state_ch(struct drbd_device *device, union drbd_state os,
- union drbd_state ns, enum chg_state_flags flags);
+ union drbd_state ns, enum chg_state_flags flags,
+ struct drbd_state_change *);
static enum drbd_state_rv is_valid_state(struct drbd_device *, union drbd_state);
static enum drbd_state_rv is_valid_soft_transition(union drbd_state, union drbd_state, struct drbd_connection *);
static enum drbd_state_rv is_valid_transition(union drbd_state os, union drbd_state ns);
@@ -93,6 +352,7 @@ static enum drbd_role max_role(enum drbd_role role1, enum drbd_role role2)
return R_SECONDARY;
return R_UNKNOWN;
}
+
static enum drbd_role min_role(enum drbd_role role1, enum drbd_role role2)
{
if (role1 == R_UNKNOWN || role2 == R_UNKNOWN)
@@ -983,6 +1243,7 @@ _drbd_set_state(struct drbd_device *device, union drbd_state ns,
enum drbd_state_rv rv = SS_SUCCESS;
enum sanitize_state_warnings ssw;
struct after_state_chg_work *ascw;
+ struct drbd_state_change *state_change;

os = drbd_read_state(device);

@@ -1037,6 +1298,9 @@ _drbd_set_state(struct drbd_device *device, union drbd_state ns,
if (!is_sync_state(os.conn) && is_sync_state(ns.conn))
clear_bit(RS_DONE, &device->flags);

+ /* FIXME: Have any flags been set earlier in this function already? */
+ state_change = remember_old_state(device->resource, GFP_ATOMIC);
+
/* changes to local_cnt and device flags should be visible before
* changes to state, which again should be visible before anything else
* depending on that change happens. */
@@ -1047,6 +1311,8 @@ _drbd_set_state(struct drbd_device *device, union drbd_state ns,
device->resource->susp_fen = ns.susp_fen;
smp_wmb();

+ remember_new_state(state_change);
+
/* put replicated vs not-replicated requests in seperate epochs */
if (drbd_should_do_remote((union drbd_dev_state)os.i) !=
drbd_should_do_remote((union drbd_dev_state)ns.i))
@@ -1184,6 +1450,7 @@ _drbd_set_state(struct drbd_device *device, union drbd_state ns,
ascw->w.cb = w_after_state_ch;
ascw->device = device;
ascw->done = done;
+ ascw->state_change = state_change;
drbd_queue_work(&connection->sender_work,
&ascw->w);
} else {
@@ -1199,7 +1466,8 @@ static int w_after_state_ch(struct drbd_work *w, int unused)
container_of(w, struct after_state_chg_work, w);
struct drbd_device *device = ascw->device;

- after_state_ch(device, ascw->os, ascw->ns, ascw->flags);
+ after_state_ch(device, ascw->os, ascw->ns, ascw->flags, ascw->state_change);
+ forget_state_change(ascw->state_change);
if (ascw->flags & CS_WAIT_COMPLETE)
complete(ascw->done);
kfree(ascw);
@@ -1245,6 +1513,139 @@ int drbd_bitmap_io_from_worker(struct drbd_device *device,
return rv;
}

+void notify_resource_state_change(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_resource_state_change *resource_state_change,
+ enum drbd_notification_type type)
+{
+ struct drbd_resource *resource = resource_state_change->resource;
+ struct resource_info resource_info = {
+ .res_role = resource_state_change->role[NEW],
+ .res_susp = resource_state_change->susp[NEW],
+ .res_susp_nod = resource_state_change->susp_nod[NEW],
+ .res_susp_fen = resource_state_change->susp_fen[NEW],
+ };
+
+ notify_resource_state(skb, seq, resource, &resource_info, type);
+}
+
+void notify_connection_state_change(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_connection_state_change *connection_state_change,
+ enum drbd_notification_type type)
+{
+ struct drbd_connection *connection = connection_state_change->connection;
+ struct connection_info connection_info = {
+ .conn_connection_state = connection_state_change->cstate[NEW],
+ .conn_role = connection_state_change->peer_role[NEW],
+ };
+
+ notify_connection_state(skb, seq, connection, &connection_info, type);
+}
+
+void notify_device_state_change(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_device_state_change *device_state_change,
+ enum drbd_notification_type type)
+{
+ struct drbd_device *device = device_state_change->device;
+ struct device_info device_info = {
+ .dev_disk_state = device_state_change->disk_state[NEW],
+ };
+
+ notify_device_state(skb, seq, device, &device_info, type);
+}
+
+void notify_peer_device_state_change(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_peer_device_state_change *p,
+ enum drbd_notification_type type)
+{
+ struct drbd_peer_device *peer_device = p->peer_device;
+ struct peer_device_info peer_device_info = {
+ .peer_repl_state = p->repl_state[NEW],
+ .peer_disk_state = p->disk_state[NEW],
+ .peer_resync_susp_user = p->resync_susp_user[NEW],
+ .peer_resync_susp_peer = p->resync_susp_peer[NEW],
+ .peer_resync_susp_dependency = p->resync_susp_dependency[NEW],
+ };
+
+ notify_peer_device_state(skb, seq, peer_device, &peer_device_info, type);
+}
+
+static void broadcast_state_change(struct drbd_state_change *state_change)
+{
+ struct drbd_resource_state_change *resource_state_change = &state_change->resource[0];
+ bool resource_state_has_changed;
+ unsigned int n_device, n_connection, n_peer_device, n_peer_devices;
+ void (*last_func)(struct sk_buff *, unsigned int, void *,
+ enum drbd_notification_type) = NULL;
+ void *uninitialized_var(last_arg);
+
+#define HAS_CHANGED(state) ((state)[OLD] != (state)[NEW])
+#define FINAL_STATE_CHANGE(type) \
+ ({ if (last_func) \
+ last_func(NULL, 0, last_arg, type); \
+ })
+#define REMEMBER_STATE_CHANGE(func, arg, type) \
+ ({ FINAL_STATE_CHANGE(type | NOTIFY_CONTINUES); \
+ last_func = (typeof(last_func))func; \
+ last_arg = arg; \
+ })
+
+ mutex_lock(&notification_mutex);
+
+ resource_state_has_changed =
+ HAS_CHANGED(resource_state_change->role) ||
+ HAS_CHANGED(resource_state_change->susp) ||
+ HAS_CHANGED(resource_state_change->susp_nod) ||
+ HAS_CHANGED(resource_state_change->susp_fen);
+
+ if (resource_state_has_changed)
+ REMEMBER_STATE_CHANGE(notify_resource_state_change,
+ resource_state_change, NOTIFY_CHANGE);
+
+ for (n_connection = 0; n_connection < state_change->n_connections; n_connection++) {
+ struct drbd_connection_state_change *connection_state_change =
+ &state_change->connections[n_connection];
+
+ if (HAS_CHANGED(connection_state_change->peer_role) ||
+ HAS_CHANGED(connection_state_change->cstate))
+ REMEMBER_STATE_CHANGE(notify_connection_state_change,
+ connection_state_change, NOTIFY_CHANGE);
+ }
+
+ for (n_device = 0; n_device < state_change->n_devices; n_device++) {
+ struct drbd_device_state_change *device_state_change =
+ &state_change->devices[n_device];
+
+ if (HAS_CHANGED(device_state_change->disk_state))
+ REMEMBER_STATE_CHANGE(notify_device_state_change,
+ device_state_change, NOTIFY_CHANGE);
+ }
+
+ n_peer_devices = state_change->n_devices * state_change->n_connections;
+ for (n_peer_device = 0; n_peer_device < n_peer_devices; n_peer_device++) {
+ struct drbd_peer_device_state_change *p =
+ &state_change->peer_devices[n_peer_device];
+
+ if (HAS_CHANGED(p->disk_state) ||
+ HAS_CHANGED(p->repl_state) ||
+ HAS_CHANGED(p->resync_susp_user) ||
+ HAS_CHANGED(p->resync_susp_peer) ||
+ HAS_CHANGED(p->resync_susp_dependency))
+ REMEMBER_STATE_CHANGE(notify_peer_device_state_change,
+ p, NOTIFY_CHANGE);
+ }
+
+ FINAL_STATE_CHANGE(NOTIFY_CHANGE);
+ mutex_unlock(&notification_mutex);
+
+#undef HAS_CHANGED
+#undef FINAL_STATE_CHANGE
+#undef REMEMBER_STATE_CHANGE
+}
+
/**
* after_state_ch() - Perform after state change actions that may sleep
* @device: DRBD device.
@@ -1253,13 +1654,16 @@ int drbd_bitmap_io_from_worker(struct drbd_device *device,
* @flags: Flags
*/
static void after_state_ch(struct drbd_device *device, union drbd_state os,
- union drbd_state ns, enum chg_state_flags flags)
+ union drbd_state ns, enum chg_state_flags flags,
+ struct drbd_state_change *state_change)
{
struct drbd_resource *resource = device->resource;
struct drbd_peer_device *peer_device = first_peer_device(device);
struct drbd_connection *connection = peer_device ? peer_device->connection : NULL;
struct sib_info sib;

+ broadcast_state_change(state_change);
+
sib.sib_reason = SIB_STATE_CHANGE;
sib.os = os;
sib.ns = ns;
@@ -1572,6 +1976,7 @@ struct after_conn_state_chg_work {
union drbd_state ns_max; /* new, max state, over all devices */
enum chg_state_flags flags;
struct drbd_connection *connection;
+ struct drbd_state_change *state_change;
};

static int w_after_conn_state_ch(struct drbd_work *w, int unused)
@@ -1584,6 +1989,8 @@ static int w_after_conn_state_ch(struct drbd_work *w, int unused)
struct drbd_peer_device *peer_device;
int vnr;

+ broadcast_state_change(acscw->state_change);
+ forget_state_change(acscw->state_change);
kfree(acscw);

/* Upon network configuration, we need to start the receiver */
@@ -1593,6 +2000,13 @@ static int w_after_conn_state_ch(struct drbd_work *w, int unused)
if (oc == C_DISCONNECTING && ns_max.conn == C_STANDALONE) {
struct net_conf *old_conf;

+ mutex_lock(&notification_mutex);
+ idr_for_each_entry(&connection->peer_devices, peer_device, vnr)
+ notify_peer_device_state(NULL, 0, peer_device, NULL,
+ NOTIFY_DESTROY | NOTIFY_CONTINUES);
+ notify_connection_state(NULL, 0, connection, NULL, NOTIFY_DESTROY);
+ mutex_unlock(&notification_mutex);
+
mutex_lock(&connection->resource->conf_update);
old_conf = connection->net_conf;
connection->my_addr_len = 0;
@@ -1823,6 +2237,7 @@ _conn_request_state(struct drbd_connection *connection, union drbd_state mask, u
enum drbd_conns oc = connection->cstate;
union drbd_state ns_max, ns_min, os;
bool have_mutex = false;
+ struct drbd_state_change *state_change;

if (mask.conn) {
rv = is_valid_conn_transition(oc, val.conn);
@@ -1868,10 +2283,12 @@ _conn_request_state(struct drbd_connection *connection, union drbd_state mask, u
goto abort;
}

+ state_change = remember_old_state(connection->resource, GFP_ATOMIC);
conn_old_common_state(connection, &os, &flags);
flags |= CS_DC_SUSP;
conn_set_state(connection, mask, val, &ns_min, &ns_max, flags);
conn_pr_state_change(connection, os, ns_max, flags);
+ remember_new_state(state_change);

acscw = kmalloc(sizeof(*acscw), GFP_ATOMIC);
if (acscw) {
@@ -1882,6 +2299,7 @@ _conn_request_state(struct drbd_connection *connection, union drbd_state mask, u
acscw->w.cb = w_after_conn_state_ch;
kref_get(&connection->kref);
acscw->connection = connection;
+ acscw->state_change = state_change;
drbd_queue_work(&connection->sender_work, &acscw->w);
} else {
drbd_err(connection, "Could not kmalloc an acscw\n");
diff --git a/drivers/block/drbd/drbd_state_change.h b/drivers/block/drbd/drbd_state_change.h
new file mode 100644
index 0000000..9e503a1
--- /dev/null
+++ b/drivers/block/drbd/drbd_state_change.h
@@ -0,0 +1,63 @@
+#ifndef DRBD_STATE_CHANGE_H
+#define DRBD_STATE_CHANGE_H
+
+struct drbd_resource_state_change {
+ struct drbd_resource *resource;
+ enum drbd_role role[2];
+ bool susp[2];
+ bool susp_nod[2];
+ bool susp_fen[2];
+};
+
+struct drbd_device_state_change {
+ struct drbd_device *device;
+ enum drbd_disk_state disk_state[2];
+};
+
+struct drbd_connection_state_change {
+ struct drbd_connection *connection;
+ enum drbd_conns cstate[2]; /* drbd9: enum drbd_conn_state */
+ enum drbd_role peer_role[2];
+};
+
+struct drbd_peer_device_state_change {
+ struct drbd_peer_device *peer_device;
+ enum drbd_disk_state disk_state[2];
+ enum drbd_conns repl_state[2]; /* drbd9: enum drbd_repl_state */
+ bool resync_susp_user[2];
+ bool resync_susp_peer[2];
+ bool resync_susp_dependency[2];
+};
+
+struct drbd_state_change {
+ struct list_head list;
+ unsigned int n_devices;
+ unsigned int n_connections;
+ struct drbd_resource_state_change resource[1];
+ struct drbd_device_state_change *devices;
+ struct drbd_connection_state_change *connections;
+ struct drbd_peer_device_state_change *peer_devices;
+};
+
+extern struct drbd_state_change *remember_old_state(struct drbd_resource *, gfp_t);
+extern void copy_old_to_new_state_change(struct drbd_state_change *);
+extern void forget_state_change(struct drbd_state_change *);
+
+extern void notify_resource_state_change(struct sk_buff *,
+ unsigned int,
+ struct drbd_resource_state_change *,
+ enum drbd_notification_type type);
+extern void notify_connection_state_change(struct sk_buff *,
+ unsigned int,
+ struct drbd_connection_state_change *,
+ enum drbd_notification_type type);
+extern void notify_device_state_change(struct sk_buff *,
+ unsigned int,
+ struct drbd_device_state_change *,
+ enum drbd_notification_type type);
+extern void notify_peer_device_state_change(struct sk_buff *,
+ unsigned int,
+ struct drbd_peer_device_state_change *,
+ enum drbd_notification_type type);
+
+#endif /* DRBD_STATE_CHANGE_H */
diff --git a/include/linux/drbd.h b/include/linux/drbd.h
index 15a1472..2c44d7e 100644
--- a/include/linux/drbd.h
+++ b/include/linux/drbd.h
@@ -339,6 +339,8 @@ enum drbd_state_rv {
#define MDF_AL_CLEAN (1 << 7)
#define MDF_AL_DISABLED (1 << 8)

+#define MAX_PEERS 32
+
enum drbd_uuid_index {
UI_CURRENT,
UI_BITMAP,
@@ -349,12 +351,26 @@ enum drbd_uuid_index {
UI_EXTENDED_SIZE /* Everything. */
};

+#define HISTORY_UUIDS MAX_PEERS
+
enum drbd_timeout_flag {
UT_DEFAULT = 0,
UT_DEGRADED = 1,
UT_PEER_OUTDATED = 2,
};

+enum drbd_notification_type {
+ NOTIFY_EXISTS,
+ NOTIFY_CREATE,
+ NOTIFY_CHANGE,
+ NOTIFY_DESTROY,
+ NOTIFY_CALL,
+ NOTIFY_RESPONSE,
+
+ NOTIFY_CONTINUES = 0x8000,
+ NOTIFY_FLAGS = NOTIFY_CONTINUES,
+};
+
#define UUID_JUST_CREATED ((__u64)4)

enum write_ordering_e {
diff --git a/include/linux/drbd_genl.h b/include/linux/drbd_genl.h
index 7b131ed..90304f8 100644
--- a/include/linux/drbd_genl.h
+++ b/include/linux/drbd_genl.h
@@ -250,6 +250,76 @@ GENL_struct(DRBD_NLA_DETACH_PARMS, 13, detach_parms,
__flg_field(1, DRBD_GENLA_F_MANDATORY, force_detach)
)

+GENL_struct(DRBD_NLA_RESOURCE_INFO, 15, resource_info,
+ __u32_field(1, 0, res_role)
+ __flg_field(2, 0, res_susp)
+ __flg_field(3, 0, res_susp_nod)
+ __flg_field(4, 0, res_susp_fen)
+ /* __flg_field(5, 0, res_weak) */
+)
+
+GENL_struct(DRBD_NLA_DEVICE_INFO, 16, device_info,
+ __u32_field(1, 0, dev_disk_state)
+)
+
+GENL_struct(DRBD_NLA_CONNECTION_INFO, 17, connection_info,
+ __u32_field(1, 0, conn_connection_state)
+ __u32_field(2, 0, conn_role)
+)
+
+GENL_struct(DRBD_NLA_PEER_DEVICE_INFO, 18, peer_device_info,
+ __u32_field(1, 0, peer_repl_state)
+ __u32_field(2, 0, peer_disk_state)
+ __u32_field(3, 0, peer_resync_susp_user)
+ __u32_field(4, 0, peer_resync_susp_peer)
+ __u32_field(5, 0, peer_resync_susp_dependency)
+)
+
+GENL_struct(DRBD_NLA_RESOURCE_STATISTICS, 19, resource_statistics,
+ __u32_field(1, 0, res_stat_write_ordering)
+)
+
+GENL_struct(DRBD_NLA_DEVICE_STATISTICS, 20, device_statistics,
+ __u64_field(1, 0, dev_size) /* (sectors) */
+ __u64_field(2, 0, dev_read) /* (sectors) */
+ __u64_field(3, 0, dev_write) /* (sectors) */
+ __u64_field(4, 0, dev_al_writes) /* activity log writes (count) */
+ __u64_field(5, 0, dev_bm_writes) /* bitmap writes (count) */
+ __u32_field(6, 0, dev_upper_pending) /* application requests in progress */
+ __u32_field(7, 0, dev_lower_pending) /* backing device requests in progress */
+ __flg_field(8, 0, dev_upper_blocked)
+ __flg_field(9, 0, dev_lower_blocked)
+ __flg_field(10, 0, dev_al_suspended) /* activity log suspended */
+ __u64_field(11, 0, dev_exposed_data_uuid)
+ __u64_field(12, 0, dev_current_uuid)
+ __u32_field(13, 0, dev_disk_flags)
+ __bin_field(14, 0, history_uuids, HISTORY_UUIDS * sizeof(__u64))
+)
+
+GENL_struct(DRBD_NLA_CONNECTION_STATISTICS, 21, connection_statistics,
+ __flg_field(1, 0, conn_congested)
+)
+
+GENL_struct(DRBD_NLA_PEER_DEVICE_STATISTICS, 22, peer_device_statistics,
+ __u64_field(1, 0, peer_dev_received) /* sectors */
+ __u64_field(2, 0, peer_dev_sent) /* sectors */
+ __u32_field(3, 0, peer_dev_pending) /* number of requests */
+ __u32_field(4, 0, peer_dev_unacked) /* number of requests */
+ __u64_field(5, 0, peer_dev_out_of_sync) /* sectors */
+ __u64_field(6, 0, peer_dev_resync_failed) /* sectors */
+ __u64_field(7, 0, peer_dev_bitmap_uuid)
+ __u32_field(9, 0, peer_dev_flags)
+)
+
+GENL_struct(DRBD_NLA_NOTIFICATION_HEADER, 23, drbd_notification_header,
+ __u32_field(1, DRBD_GENLA_F_MANDATORY, nh_type)
+)
+
+GENL_struct(DRBD_NLA_HELPER, 24, drbd_helper_info,
+ __str_field(1, DRBD_GENLA_F_MANDATORY, helper_name, 32)
+ __u32_field(2, DRBD_GENLA_F_MANDATORY, helper_status)
+)
+
/*
* Notifications and commands (genlmsghdr->cmd)
*/
@@ -382,3 +452,47 @@ GENL_op(DRBD_ADM_GET_TIMEOUT_TYPE, 26, GENL_doit(drbd_adm_get_timeout_type),
GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED))
GENL_op(DRBD_ADM_DOWN, 27, GENL_doit(drbd_adm_down),
GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED))
+
+GENL_notification(
+ DRBD_RESOURCE_STATE, 34, events,
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_NOTIFICATION_HEADER, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_RESOURCE_INFO, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_RESOURCE_STATISTICS, DRBD_F_REQUIRED))
+
+GENL_notification(
+ DRBD_DEVICE_STATE, 35, events,
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_NOTIFICATION_HEADER, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_DEVICE_INFO, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_DEVICE_STATISTICS, DRBD_F_REQUIRED))
+
+GENL_notification(
+ DRBD_CONNECTION_STATE, 36, events,
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_NOTIFICATION_HEADER, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_CONNECTION_INFO, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_CONNECTION_STATISTICS, DRBD_F_REQUIRED))
+
+GENL_notification(
+ DRBD_PEER_DEVICE_STATE, 37, events,
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_NOTIFICATION_HEADER, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_PEER_DEVICE_INFO, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_PEER_DEVICE_STATISTICS, DRBD_F_REQUIRED))
+
+GENL_op(
+ DRBD_ADM_GET_INITIAL_STATE, 38,
+ GENL_op_init(
+ .dumpit = drbd_adm_get_initial_state,
+ ),
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY))
+
+GENL_notification(
+ DRBD_HELPER, 40, events,
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_HELPER, DRBD_F_REQUIRED))
+
+GENL_notification(
+ DRBD_INITIAL_STATE_DONE, 41, events,
+ GENL_tla_expected(DRBD_NLA_NOTIFICATION_HEADER, DRBD_F_REQUIRED))
--
1.9.1

2015-11-25 11:06:53

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 09/38] drbd: Backport the "status" command

From: Andreas Gruenbacher <[email protected]>

The status command originates the drbd9 code base. While for now we
keep the status information in /proc/drbd available, this commit
allows the user base to gracefully migrate their monitoring
infrastructure to the new status reporting interface.

In drbd9 no status information is exposed through /proc/drbd.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 566 +++++++++++++++++++++++++++++++++++++------
include/linux/drbd_genl.h | 35 +++
include/linux/idr.h | 14 ++
3 files changed, 536 insertions(+), 79 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index aa805cd..1eb10e2 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -76,6 +76,13 @@ int drbd_adm_get_status(struct sk_buff *skb, struct genl_info *info);
int drbd_adm_get_timeout_type(struct sk_buff *skb, struct genl_info *info);
/* .dumpit */
int drbd_adm_get_status_all(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_dump_resources(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_dump_devices(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_dump_devices_done(struct netlink_callback *cb);
+int drbd_adm_dump_connections(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_dump_connections_done(struct netlink_callback *cb);
+int drbd_adm_dump_peer_devices(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_dump_peer_devices_done(struct netlink_callback *cb);
int drbd_adm_get_initial_state(struct sk_buff *skb, struct netlink_callback *cb);

#include <linux/drbd_genl_api.h>
@@ -2965,6 +2972,486 @@ nla_put_failure:
}

/*
+ * The generic netlink dump callbacks are called outside the genl_lock(), so
+ * they cannot use the simple attribute parsing code which uses global
+ * attribute tables.
+ */
+static struct nlattr *find_cfg_context_attr(const struct nlmsghdr *nlh, int attr)
+{
+ const unsigned hdrlen = GENL_HDRLEN + GENL_MAGIC_FAMILY_HDRSZ;
+ const int maxtype = ARRAY_SIZE(drbd_cfg_context_nl_policy) - 1;
+ struct nlattr *nla;
+
+ nla = nla_find(nlmsg_attrdata(nlh, hdrlen), nlmsg_attrlen(nlh, hdrlen),
+ DRBD_NLA_CFG_CONTEXT);
+ if (!nla)
+ return NULL;
+ return drbd_nla_find_nested(maxtype, nla, __nla_type(attr));
+}
+
+static void resource_to_info(struct resource_info *, struct drbd_resource *);
+
+int drbd_adm_dump_resources(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct drbd_genlmsghdr *dh;
+ struct drbd_resource *resource;
+ struct resource_info resource_info;
+ struct resource_statistics resource_statistics;
+ int err;
+
+ rcu_read_lock();
+ if (cb->args[0]) {
+ for_each_resource_rcu(resource, &drbd_resources)
+ if (resource == (struct drbd_resource *)cb->args[0])
+ goto found_resource;
+ err = 0; /* resource was probably deleted */
+ goto out;
+ }
+ resource = list_entry(&drbd_resources,
+ struct drbd_resource, resources);
+
+found_resource:
+ list_for_each_entry_continue_rcu(resource, &drbd_resources, resources) {
+ goto put_result;
+ }
+ err = 0;
+ goto out;
+
+put_result:
+ dh = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, &drbd_genl_family,
+ NLM_F_MULTI, DRBD_ADM_GET_RESOURCES);
+ err = -ENOMEM;
+ if (!dh)
+ goto out;
+ dh->minor = -1U;
+ dh->ret_code = NO_ERROR;
+ err = nla_put_drbd_cfg_context(skb, resource, NULL, NULL);
+ if (err)
+ goto out;
+ err = res_opts_to_skb(skb, &resource->res_opts, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ resource_to_info(&resource_info, resource);
+ err = resource_info_to_skb(skb, &resource_info, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ resource_statistics.res_stat_write_ordering = resource->write_ordering;
+ err = resource_statistics_to_skb(skb, &resource_statistics, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ cb->args[0] = (long)resource;
+ genlmsg_end(skb, dh);
+ err = 0;
+
+out:
+ rcu_read_unlock();
+ if (err)
+ return err;
+ return skb->len;
+}
+
+static void device_to_statistics(struct device_statistics *s,
+ struct drbd_device *device)
+{
+ memset(s, 0, sizeof(*s));
+ s->dev_upper_blocked = !may_inc_ap_bio(device);
+ if (get_ldev(device)) {
+ struct drbd_md *md = &device->ldev->md;
+ u64 *history_uuids = (u64 *)s->history_uuids;
+ struct request_queue *q;
+ int n;
+
+ spin_lock_irq(&md->uuid_lock);
+ s->dev_current_uuid = md->uuid[UI_CURRENT];
+ BUILD_BUG_ON(sizeof(s->history_uuids) < UI_HISTORY_END - UI_HISTORY_START + 1);
+ for (n = 0; n < UI_HISTORY_END - UI_HISTORY_START + 1; n++)
+ history_uuids[n] = md->uuid[UI_HISTORY_START + n];
+ for (; n < HISTORY_UUIDS; n++)
+ history_uuids[n] = 0;
+ s->history_uuids_len = HISTORY_UUIDS;
+ spin_unlock_irq(&md->uuid_lock);
+
+ s->dev_disk_flags = md->flags;
+ q = bdev_get_queue(device->ldev->backing_bdev);
+ s->dev_lower_blocked =
+ bdi_congested(&q->backing_dev_info,
+ (1 << WB_async_congested) |
+ (1 << WB_sync_congested));
+ put_ldev(device);
+ }
+ s->dev_size = drbd_get_capacity(device->this_bdev);
+ s->dev_read = device->read_cnt;
+ s->dev_write = device->writ_cnt;
+ s->dev_al_writes = device->al_writ_cnt;
+ s->dev_bm_writes = device->bm_writ_cnt;
+ s->dev_upper_pending = atomic_read(&device->ap_bio_cnt);
+ s->dev_lower_pending = atomic_read(&device->local_cnt);
+ s->dev_al_suspended = test_bit(AL_SUSPENDED, &device->flags);
+ s->dev_exposed_data_uuid = device->ed_uuid;
+}
+
+static int put_resource_in_arg0(struct netlink_callback *cb, int holder_nr)
+{
+ if (cb->args[0]) {
+ struct drbd_resource *resource =
+ (struct drbd_resource *)cb->args[0];
+ kref_put(&resource->kref, drbd_destroy_resource);
+ }
+
+ return 0;
+}
+
+int drbd_adm_dump_devices_done(struct netlink_callback *cb) {
+ return put_resource_in_arg0(cb, 7);
+}
+
+static void device_to_info(struct device_info *, struct drbd_device *);
+
+int drbd_adm_dump_devices(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct nlattr *resource_filter;
+ struct drbd_resource *resource;
+ struct drbd_device *uninitialized_var(device);
+ int minor, err, retcode;
+ struct drbd_genlmsghdr *dh;
+ struct device_info device_info;
+ struct device_statistics device_statistics;
+ struct idr *idr_to_search;
+
+ resource = (struct drbd_resource *)cb->args[0];
+ if (!cb->args[0] && !cb->args[1]) {
+ resource_filter = find_cfg_context_attr(cb->nlh, T_ctx_resource_name);
+ if (resource_filter) {
+ retcode = ERR_RES_NOT_KNOWN;
+ resource = drbd_find_resource(nla_data(resource_filter));
+ if (!resource)
+ goto put_result;
+ cb->args[0] = (long)resource;
+ }
+ }
+
+ rcu_read_lock();
+ minor = cb->args[1];
+ idr_to_search = resource ? &resource->devices : &drbd_devices;
+ device = idr_get_next(idr_to_search, &minor);
+ if (!device) {
+ err = 0;
+ goto out;
+ }
+ idr_for_each_entry_continue(idr_to_search, device, minor) {
+ retcode = NO_ERROR;
+ goto put_result; /* only one iteration */
+ }
+ err = 0;
+ goto out; /* no more devices */
+
+put_result:
+ dh = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, &drbd_genl_family,
+ NLM_F_MULTI, DRBD_ADM_GET_DEVICES);
+ err = -ENOMEM;
+ if (!dh)
+ goto out;
+ dh->ret_code = retcode;
+ dh->minor = -1U;
+ if (retcode == NO_ERROR) {
+ dh->minor = device->minor;
+ err = nla_put_drbd_cfg_context(skb, device->resource, NULL, device);
+ if (err)
+ goto out;
+ if (get_ldev(device)) {
+ struct disk_conf *disk_conf =
+ rcu_dereference(device->ldev->disk_conf);
+
+ err = disk_conf_to_skb(skb, disk_conf, !capable(CAP_SYS_ADMIN));
+ put_ldev(device);
+ if (err)
+ goto out;
+ }
+ device_to_info(&device_info, device);
+ err = device_info_to_skb(skb, &device_info, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+
+ device_to_statistics(&device_statistics, device);
+ err = device_statistics_to_skb(skb, &device_statistics, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ cb->args[1] = minor + 1;
+ }
+ genlmsg_end(skb, dh);
+ err = 0;
+
+out:
+ rcu_read_unlock();
+ if (err)
+ return err;
+ return skb->len;
+}
+
+int drbd_adm_dump_connections_done(struct netlink_callback *cb)
+{
+ return put_resource_in_arg0(cb, 6);
+}
+
+enum { SINGLE_RESOURCE, ITERATE_RESOURCES };
+
+int drbd_adm_dump_connections(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct nlattr *resource_filter;
+ struct drbd_resource *resource = NULL, *next_resource;
+ struct drbd_connection *uninitialized_var(connection);
+ int err = 0, retcode;
+ struct drbd_genlmsghdr *dh;
+ struct connection_info connection_info;
+ struct connection_statistics connection_statistics;
+
+ rcu_read_lock();
+ resource = (struct drbd_resource *)cb->args[0];
+ if (!cb->args[0]) {
+ resource_filter = find_cfg_context_attr(cb->nlh, T_ctx_resource_name);
+ if (resource_filter) {
+ retcode = ERR_RES_NOT_KNOWN;
+ resource = drbd_find_resource(nla_data(resource_filter));
+ if (!resource)
+ goto put_result;
+ cb->args[0] = (long)resource;
+ cb->args[1] = SINGLE_RESOURCE;
+ }
+ }
+ if (!resource) {
+ if (list_empty(&drbd_resources))
+ goto out;
+ resource = list_first_entry(&drbd_resources, struct drbd_resource, resources);
+ kref_get(&resource->kref);
+ cb->args[0] = (long)resource;
+ cb->args[1] = ITERATE_RESOURCES;
+ }
+
+ next_resource:
+ rcu_read_unlock();
+ mutex_lock(&resource->conf_update);
+ rcu_read_lock();
+ if (cb->args[2]) {
+ for_each_connection_rcu(connection, resource)
+ if (connection == (struct drbd_connection *)cb->args[2])
+ goto found_connection;
+ /* connection was probably deleted */
+ goto no_more_connections;
+ }
+ connection = list_entry(&resource->connections, struct drbd_connection, connections);
+
+found_connection:
+ list_for_each_entry_continue_rcu(connection, &resource->connections, connections) {
+ if (!has_net_conf(connection))
+ continue;
+ retcode = NO_ERROR;
+ goto put_result; /* only one iteration */
+ }
+
+no_more_connections:
+ if (cb->args[1] == ITERATE_RESOURCES) {
+ for_each_resource_rcu(next_resource, &drbd_resources) {
+ if (next_resource == resource)
+ goto found_resource;
+ }
+ /* resource was probably deleted */
+ }
+ goto out;
+
+found_resource:
+ list_for_each_entry_continue_rcu(next_resource, &drbd_resources, resources) {
+ mutex_unlock(&resource->conf_update);
+ kref_put(&resource->kref, drbd_destroy_resource);
+ resource = next_resource;
+ kref_get(&resource->kref);
+ cb->args[0] = (long)resource;
+ cb->args[2] = 0;
+ goto next_resource;
+ }
+ goto out; /* no more resources */
+
+put_result:
+ dh = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, &drbd_genl_family,
+ NLM_F_MULTI, DRBD_ADM_GET_CONNECTIONS);
+ err = -ENOMEM;
+ if (!dh)
+ goto out;
+ dh->ret_code = retcode;
+ dh->minor = -1U;
+ if (retcode == NO_ERROR) {
+ struct net_conf *net_conf;
+
+ err = nla_put_drbd_cfg_context(skb, resource, connection, NULL);
+ if (err)
+ goto out;
+ net_conf = rcu_dereference(connection->net_conf);
+ if (net_conf) {
+ err = net_conf_to_skb(skb, net_conf, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ }
+ connection_to_info(&connection_info, connection);
+ err = connection_info_to_skb(skb, &connection_info, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ connection_statistics.conn_congested = test_bit(NET_CONGESTED, &connection->flags);
+ err = connection_statistics_to_skb(skb, &connection_statistics, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ cb->args[2] = (long)connection;
+ }
+ genlmsg_end(skb, dh);
+ err = 0;
+
+out:
+ rcu_read_unlock();
+ if (resource)
+ mutex_unlock(&resource->conf_update);
+ if (err)
+ return err;
+ return skb->len;
+}
+
+enum mdf_peer_flag {
+ MDF_PEER_CONNECTED = 1 << 0,
+ MDF_PEER_OUTDATED = 1 << 1,
+ MDF_PEER_FENCING = 1 << 2,
+ MDF_PEER_FULL_SYNC = 1 << 3,
+};
+
+static void peer_device_to_statistics(struct peer_device_statistics *s,
+ struct drbd_peer_device *peer_device)
+{
+ struct drbd_device *device = peer_device->device;
+
+ memset(s, 0, sizeof(*s));
+ s->peer_dev_received = device->recv_cnt;
+ s->peer_dev_sent = device->send_cnt;
+ s->peer_dev_pending = atomic_read(&device->ap_pending_cnt) +
+ atomic_read(&device->rs_pending_cnt);
+ s->peer_dev_unacked = atomic_read(&device->unacked_cnt);
+ s->peer_dev_out_of_sync = drbd_bm_total_weight(device) << (BM_BLOCK_SHIFT - 9);
+ s->peer_dev_resync_failed = device->rs_failed << (BM_BLOCK_SHIFT - 9);
+ if (get_ldev(device)) {
+ struct drbd_md *md = &device->ldev->md;
+
+ spin_lock_irq(&md->uuid_lock);
+ s->peer_dev_bitmap_uuid = md->uuid[UI_BITMAP];
+ spin_unlock_irq(&md->uuid_lock);
+ s->peer_dev_flags =
+ (drbd_md_test_flag(device->ldev, MDF_CONNECTED_IND) ?
+ MDF_PEER_CONNECTED : 0) +
+ (drbd_md_test_flag(device->ldev, MDF_CONSISTENT) &&
+ !drbd_md_test_flag(device->ldev, MDF_WAS_UP_TO_DATE) ?
+ MDF_PEER_OUTDATED : 0) +
+ /* FIXME: MDF_PEER_FENCING? */
+ (drbd_md_test_flag(device->ldev, MDF_FULL_SYNC) ?
+ MDF_PEER_FULL_SYNC : 0);
+ put_ldev(device);
+ }
+}
+
+int drbd_adm_dump_peer_devices_done(struct netlink_callback *cb)
+{
+ return put_resource_in_arg0(cb, 9);
+}
+
+int drbd_adm_dump_peer_devices(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct nlattr *resource_filter;
+ struct drbd_resource *resource;
+ struct drbd_device *uninitialized_var(device);
+ struct drbd_peer_device *peer_device = NULL;
+ int minor, err, retcode;
+ struct drbd_genlmsghdr *dh;
+ struct idr *idr_to_search;
+
+ resource = (struct drbd_resource *)cb->args[0];
+ if (!cb->args[0] && !cb->args[1]) {
+ resource_filter = find_cfg_context_attr(cb->nlh, T_ctx_resource_name);
+ if (resource_filter) {
+ retcode = ERR_RES_NOT_KNOWN;
+ resource = drbd_find_resource(nla_data(resource_filter));
+ if (!resource)
+ goto put_result;
+ }
+ cb->args[0] = (long)resource;
+ }
+
+ rcu_read_lock();
+ minor = cb->args[1];
+ idr_to_search = resource ? &resource->devices : &drbd_devices;
+ device = idr_find(idr_to_search, minor);
+ if (!device) {
+next_device:
+ minor++;
+ cb->args[2] = 0;
+ device = idr_get_next(idr_to_search, &minor);
+ if (!device) {
+ err = 0;
+ goto out;
+ }
+ }
+ if (cb->args[2]) {
+ for_each_peer_device(peer_device, device)
+ if (peer_device == (struct drbd_peer_device *)cb->args[2])
+ goto found_peer_device;
+ /* peer device was probably deleted */
+ goto next_device;
+ }
+ /* Make peer_device point to the list head (not the first entry). */
+ peer_device = list_entry(&device->peer_devices, struct drbd_peer_device, peer_devices);
+
+found_peer_device:
+ list_for_each_entry_continue_rcu(peer_device, &device->peer_devices, peer_devices) {
+ if (!has_net_conf(peer_device->connection))
+ continue;
+ retcode = NO_ERROR;
+ goto put_result; /* only one iteration */
+ }
+ goto next_device;
+
+put_result:
+ dh = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, &drbd_genl_family,
+ NLM_F_MULTI, DRBD_ADM_GET_PEER_DEVICES);
+ err = -ENOMEM;
+ if (!dh)
+ goto out;
+ dh->ret_code = retcode;
+ dh->minor = -1U;
+ if (retcode == NO_ERROR) {
+ struct peer_device_info peer_device_info;
+ struct peer_device_statistics peer_device_statistics;
+
+ dh->minor = minor;
+ err = nla_put_drbd_cfg_context(skb, device->resource, peer_device->connection, device);
+ if (err)
+ goto out;
+ peer_device_to_info(&peer_device_info, peer_device);
+ err = peer_device_info_to_skb(skb, &peer_device_info, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ peer_device_to_statistics(&peer_device_statistics, peer_device);
+ err = peer_device_statistics_to_skb(skb, &peer_device_statistics, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ cb->args[1] = minor;
+ cb->args[2] = (long)peer_device;
+ }
+ genlmsg_end(skb, dh);
+ err = 0;
+
+out:
+ rcu_read_unlock();
+ if (err)
+ return err;
+ return skb->len;
+}
+/*
* Return the connection of @resource if @resource has exactly one connection.
*/
static struct drbd_connection *the_only_connection(struct drbd_resource *resource)
@@ -3818,85 +4305,6 @@ failed:
err, seq, sib->sib_reason);
}

-static void device_to_statistics(struct device_statistics *s,
- struct drbd_device *device)
-{
- memset(s, 0, sizeof(*s));
- s->dev_upper_blocked = !may_inc_ap_bio(device);
- if (get_ldev(device)) {
- struct drbd_md *md = &device->ldev->md;
- u64 *history_uuids = (u64 *)s->history_uuids;
- struct request_queue *q;
- int n;
-
- spin_lock_irq(&md->uuid_lock);
- s->dev_current_uuid = md->uuid[UI_CURRENT];
- BUILD_BUG_ON(sizeof(s->history_uuids) < UI_HISTORY_END - UI_HISTORY_START + 1);
- for (n = 0; n < UI_HISTORY_END - UI_HISTORY_START + 1; n++)
- history_uuids[n] = md->uuid[UI_HISTORY_START + n];
- for (; n < HISTORY_UUIDS; n++)
- history_uuids[n] = 0;
- s->history_uuids_len = HISTORY_UUIDS;
- spin_unlock_irq(&md->uuid_lock);
-
- s->dev_disk_flags = md->flags;
- q = bdev_get_queue(device->ldev->backing_bdev);
- s->dev_lower_blocked =
- bdi_congested(&q->backing_dev_info,
- (1 << WB_async_congested) |
- (1 << WB_sync_congested));
- put_ldev(device);
- }
- s->dev_size = drbd_get_capacity(device->this_bdev);
- s->dev_read = device->read_cnt;
- s->dev_write = device->writ_cnt;
- s->dev_al_writes = device->al_writ_cnt;
- s->dev_bm_writes = device->bm_writ_cnt;
- s->dev_upper_pending = atomic_read(&device->ap_bio_cnt);
- s->dev_lower_pending = atomic_read(&device->local_cnt);
- s->dev_al_suspended = test_bit(AL_SUSPENDED, &device->flags);
- s->dev_exposed_data_uuid = device->ed_uuid;
-}
-
-enum mdf_peer_flag {
- MDF_PEER_CONNECTED = 1 << 0,
- MDF_PEER_OUTDATED = 1 << 1,
- MDF_PEER_FENCING = 1 << 2,
- MDF_PEER_FULL_SYNC = 1 << 3,
-};
-
-static void peer_device_to_statistics(struct peer_device_statistics *s,
- struct drbd_peer_device *peer_device)
-{
- struct drbd_device *device = peer_device->device;
-
- memset(s, 0, sizeof(*s));
- s->peer_dev_received = device->recv_cnt;
- s->peer_dev_sent = device->send_cnt;
- s->peer_dev_pending = atomic_read(&device->ap_pending_cnt) +
- atomic_read(&device->rs_pending_cnt);
- s->peer_dev_unacked = atomic_read(&device->unacked_cnt);
- s->peer_dev_out_of_sync = drbd_bm_total_weight(device) << (BM_BLOCK_SHIFT - 9);
- s->peer_dev_resync_failed = device->rs_failed << (BM_BLOCK_SHIFT - 9);
- if (get_ldev(device)) {
- struct drbd_md *md = &device->ldev->md;
-
- spin_lock_irq(&md->uuid_lock);
- s->peer_dev_bitmap_uuid = md->uuid[UI_BITMAP];
- spin_unlock_irq(&md->uuid_lock);
- s->peer_dev_flags =
- (drbd_md_test_flag(device->ldev, MDF_CONNECTED_IND) ?
- MDF_PEER_CONNECTED : 0) +
- (drbd_md_test_flag(device->ldev, MDF_CONSISTENT) &&
- !drbd_md_test_flag(device->ldev, MDF_WAS_UP_TO_DATE) ?
- MDF_PEER_OUTDATED : 0) +
- /* FIXME: MDF_PEER_FENCING? */
- (drbd_md_test_flag(device->ldev, MDF_FULL_SYNC) ?
- MDF_PEER_FULL_SYNC : 0);
- put_ldev(device);
- }
-}
-
static int nla_put_notification_header(struct sk_buff *msg,
enum drbd_notification_type type)
{
diff --git a/include/linux/drbd_genl.h b/include/linux/drbd_genl.h
index 90304f8..2d0e5ad 100644
--- a/include/linux/drbd_genl.h
+++ b/include/linux/drbd_genl.h
@@ -453,6 +453,41 @@ GENL_op(DRBD_ADM_GET_TIMEOUT_TYPE, 26, GENL_doit(drbd_adm_get_timeout_type),
GENL_op(DRBD_ADM_DOWN, 27, GENL_doit(drbd_adm_down),
GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED))

+GENL_op(DRBD_ADM_GET_RESOURCES, 30,
+ GENL_op_init(
+ .dumpit = drbd_adm_dump_resources,
+ ),
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_RESOURCE_INFO, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_RESOURCE_STATISTICS, DRBD_GENLA_F_MANDATORY))
+
+GENL_op(DRBD_ADM_GET_DEVICES, 31,
+ GENL_op_init(
+ .dumpit = drbd_adm_dump_devices,
+ .done = drbd_adm_dump_devices_done,
+ ),
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_DEVICE_INFO, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_DEVICE_STATISTICS, DRBD_GENLA_F_MANDATORY))
+
+GENL_op(DRBD_ADM_GET_CONNECTIONS, 32,
+ GENL_op_init(
+ .dumpit = drbd_adm_dump_connections,
+ .done = drbd_adm_dump_connections_done,
+ ),
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_CONNECTION_INFO, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_CONNECTION_STATISTICS, DRBD_GENLA_F_MANDATORY))
+
+GENL_op(DRBD_ADM_GET_PEER_DEVICES, 33,
+ GENL_op_init(
+ .dumpit = drbd_adm_dump_peer_devices,
+ .done = drbd_adm_dump_peer_devices_done,
+ ),
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_PEER_DEVICE_INFO, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_PEER_DEVICE_STATISTICS, DRBD_GENLA_F_MANDATORY))
+
GENL_notification(
DRBD_RESOURCE_STATE, 34, events,
GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 013fd9b..083d61e 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -135,6 +135,20 @@ static inline void *idr_find(struct idr *idr, int id)
#define idr_for_each_entry(idp, entry, id) \
for (id = 0; ((entry) = idr_get_next(idp, &(id))) != NULL; ++id)

+/**
+ * idr_for_each_entry - continue iteration over an idr's elements of a given type
+ * @idp: idr handle
+ * @entry: the type * to use as cursor
+ * @id: id entry's key
+ *
+ * Continue to iterate over list of given type, continuing after
+ * the current position.
+ */
+#define idr_for_each_entry_continue(idp, entry, id) \
+ for ((entry) = idr_get_next((idp), &(id)); \
+ entry; \
+ ++id, (entry) = idr_get_next((idp), &(id)))
+
/*
* IDA - IDR based id allocator, use when translation from id to
* pointer isn't necessary.
--
1.9.1

2015-11-25 11:01:17

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 10/38] drbd: Deletion of an unnecessary check before the function call "lc_destroy"

From: Markus Elfring <[email protected]>

The lc_destroy() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <[email protected]>
Signed-off-by: Roland Kammerer <[email protected]>

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 1eb10e2..b87fb31 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1133,8 +1133,7 @@ static int drbd_check_al_size(struct drbd_device *device, struct disk_conf *dc)
lc_destroy(n);
return -EBUSY;
} else {
- if (t)
- lc_destroy(t);
+ lc_destroy(t);
}
drbd_md_mark_dirty(device); /* we changed device->act_log->nr_elemens */
return 0;
--
1.9.1

2015-11-25 11:02:37

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 11/38] drbd: Replace 0 with the more meaningful GFP_NOWAIT

GFP_NOWAIT has a value of 0. I.e. functionality not changed.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index b87fb31..af78f09 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -4289,7 +4289,7 @@ void drbd_bcast_event(struct drbd_device *device, const struct sib_info *sib)
if (nla_put_status_info(msg, device, sib))
goto nla_put_failure;
genlmsg_end(msg, d_out);
- err = drbd_genl_multicast_events(msg, 0);
+ err = drbd_genl_multicast_events(msg, GFP_NOWAIT);
/* msg has been consumed or freed in netlink_broadcast() */
if (err && err != -ESRCH)
goto failed;
@@ -4351,7 +4351,7 @@ void notify_resource_state(struct sk_buff *skb,
goto nla_put_failure;
genlmsg_end(skb, dh);
if (multicast) {
- err = drbd_genl_multicast_events(skb, 0);
+ err = drbd_genl_multicast_events(skb, GFP_NOWAIT);
/* skb has been consumed or freed in netlink_broadcast() */
if (err && err != -ESRCH)
goto failed;
@@ -4400,7 +4400,7 @@ void notify_device_state(struct sk_buff *skb,
device_statistics_to_skb(skb, &device_statistics, !capable(CAP_SYS_ADMIN));
genlmsg_end(skb, dh);
if (multicast) {
- err = drbd_genl_multicast_events(skb, 0);
+ err = drbd_genl_multicast_events(skb, GFP_NOWAIT);
/* skb has been consumed or freed in netlink_broadcast() */
if (err && err != -ESRCH)
goto failed;
@@ -4449,7 +4449,7 @@ void notify_connection_state(struct sk_buff *skb,
connection_statistics_to_skb(skb, &connection_statistics, !capable(CAP_SYS_ADMIN));
genlmsg_end(skb, dh);
if (multicast) {
- err = drbd_genl_multicast_events(skb, 0);
+ err = drbd_genl_multicast_events(skb, GFP_NOWAIT);
/* skb has been consumed or freed in netlink_broadcast() */
if (err && err != -ESRCH)
goto failed;
@@ -4499,7 +4499,7 @@ void notify_peer_device_state(struct sk_buff *skb,
peer_device_statistics_to_skb(skb, &peer_device_statistics, !capable(CAP_SYS_ADMIN));
genlmsg_end(skb, dh);
if (multicast) {
- err = drbd_genl_multicast_events(skb, 0);
+ err = drbd_genl_multicast_events(skb, GFP_NOWAIT);
/* skb has been consumed or freed in netlink_broadcast() */
if (err && err != -ESRCH)
goto failed;
@@ -4545,7 +4545,7 @@ void notify_helper(enum drbd_notification_type type,
drbd_helper_info_to_skb(skb, &helper_info, true))
goto unlock_fail;
genlmsg_end(skb, dh);
- err = drbd_genl_multicast_events(skb, 0);
+ err = drbd_genl_multicast_events(skb, GFP_NOWAIT);
skb = NULL;
/* skb has been consumed or freed in netlink_broadcast() */
if (err && err != -ESRCH)
--
1.9.1

2015-11-25 11:02:22

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 12/38] drbd: Fix spurious disk-timeout

From: Lars Ellenberg <[email protected]>

(You should not use disk-timeout anyways,
see the man page for why...)

We add incoming requests to the tail of some ring list.
On local completion, requests are removed from that list.
The timer looks only at the head of that ring list,
so is supposed to only see the oldest request.
All protected by a spinlock.

The request object is created with timestamps zeroed out.
The timestamp was only filled in just before the actual submit.
But to actually submit the request, we need to give up the spinlock.

If you are unlucky, there is no older still pending request, the timer
looks at a new request with timestamp still zero (before it even was
submitted), and 0 + timeout is most likely older than "now".

Better assign the timestamp right when we put the
request object on said ring list.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_req.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 55fca68..7660f6e 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1167,7 +1167,6 @@ drbd_submit_req_private_bio(struct drbd_request *req)
* stable storage, and this is a WRITE, we may not even submit
* this bio. */
if (get_ldev(device)) {
- req->pre_submit_jif = jiffies;
if (drbd_insert_fault(device,
rw == WRITE ? DRBD_FAULT_DT_WR
: rw == READ ? DRBD_FAULT_DT_RD
@@ -1311,6 +1310,7 @@ static void drbd_send_and_submit(struct drbd_device *device, struct drbd_request
&device->pending_master_completion[rw == WRITE]);
if (req->private_bio) {
/* needs to be marked within the same spinlock */
+ req->pre_submit_jif = jiffies;
list_add_tail(&req->req_pending_local,
&device->pending_completion[rw == WRITE]);
_req_mod(req, TO_BE_SUBMITTED);
--
1.9.1

2015-11-25 11:09:38

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 13/38] drbd: drop remnants of connector -- we don't use it anymore in drbd 8.4

From: Lars Ellenberg <[email protected]>

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
include/linux/drbd.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/include/linux/drbd.h b/include/linux/drbd.h
index 2c44d7e..392fc0e 100644
--- a/include/linux/drbd.h
+++ b/include/linux/drbd.h
@@ -25,7 +25,6 @@
*/
#ifndef DRBD_H
#define DRBD_H
-#include <linux/connector.h>
#include <asm/types.h>

#ifdef __KERNEL__
--
1.9.1

2015-11-25 11:12:59

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 14/38] drbd: drbdsetup detach of an unresponsive local disk should not block IO "forever"

From: Lars Ellenberg <[email protected]>

When detaching, we make sure no application IO is in-flight
by internally suspending IO, then trigger the state change,
wait for the result, and finally internally resume IO again.

Once we triggered the stat change to "Failed",
we expect it to change from Failed to Diskless.
(To avoid races, we actually wait for it to leave "Failed").

On an unresponsive local IO backend, this may not happen, ever.
Don't have a "hung" detach block IO "forever", but resume IO
before waiting for the state change to Diskless.

We may well be able to continue IO to and from a healthy peer.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index af78f09..331b378 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1929,9 +1929,9 @@ static int adm_detach(struct drbd_device *device, int force)
retcode = drbd_request_state(device, NS(disk, D_FAILED));
drbd_md_put_buffer(device);
/* D_FAILED will transition to DISKLESS. */
+ drbd_resume_io(device);
ret = wait_event_interruptible(device->misc_wait,
device->state.disk != D_FAILED);
- drbd_resume_io(device);
if ((int)retcode == (int)SS_IS_DISKLESS)
retcode = SS_NOTHING_TO_DO;
if (ret)
--
1.9.1

2015-11-25 11:00:27

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 15/38] drbd: also bump UUIDs if a diskless primary connects

From: Lars Ellenberg <[email protected]>

If for some reason the primary lost its disk *and* the replication link
before it is able to communicate the disk loss, probably blocked IO,
then later is able to re-establish the connection, the peer needs to
bump its UUIDs just like it does when peer only loses the disk
and is able to communicate this in time.

Otherwise, a later re-attach of the disk on the primary may start a
resync in the "wrong" direction.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index bc4b45b..06afd4d 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -1781,7 +1781,7 @@ static void after_state_ch(struct drbd_device *device, union drbd_state os,
}

if (ns.pdsk < D_INCONSISTENT && get_ldev(device)) {
- if (os.peer == R_SECONDARY && ns.peer == R_PRIMARY &&
+ if (os.peer != R_PRIMARY && ns.peer == R_PRIMARY &&
device->ldev->md.uuid[UI_BITMAP] == 0 && ns.disk >= D_UP_TO_DATE) {
drbd_uuid_new_current(device);
drbd_send_uuids(peer_device);
--
1.9.1

2015-11-25 11:11:04

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 16/38] drbd: add comment why we want to first call local-io-error, then send state

From: Lars Ellenberg <[email protected]>

Even though we really want to get the state information about our bad
disk to the peer as soon as possible, it is useful to first call the
local-io-error handler.

People may chose to hard-reset the box from there.
If that looks and behaves exactly like a "regular node crash", without
bumping the data generation UUIDs on the peer in between, it makes it
easier to deal with.

If you intend to return from the local-io-error handler, then better
return as quickly as possible to avoid triggering other timeouts.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_state.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 06afd4d..a4e4505 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -1859,6 +1859,10 @@ static void after_state_ch(struct drbd_device *device, union drbd_state os,

was_io_error = test_and_clear_bit(WAS_IO_ERROR, &device->flags);

+ /* Intentionally call this handler first, before drbd_send_state().
+ * See: 2932204 drbd: call local-io-error handler early
+ * People may chose to hard-reset the box from this handler.
+ * It is useful if this looks like a "regular node crash". */
if (was_io_error && eh == EP_CALL_HELPER)
drbd_khelper(device, "local-io-error");

--
1.9.1

2015-11-25 11:00:59

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 17/38] drbd: drbd_panic_after_delayed_completion_of_aborted_request()

From: Lars Ellenberg <[email protected]>

The only way to make DRBD intentionally call panic is to
set a disk timeout, have that trigger, "abort" some request and complete
to upper layers, then have the backend IO subsystem later complete these
requests successfully regardless.

As the attached IO pages have been recycled for other purposes
meanwhile, this will cause unexpected random memory changes.
To prevent corruption, we rather panic in that case.

Make it obvious from stack traces that this was the case by introducing
drbd_panic_after_delayed_completion_of_aborted_request().

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_worker.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 3b3d980..9c89ebe 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -188,6 +188,12 @@ void drbd_peer_request_endio(struct bio *bio)
}
}

+void drbd_panic_after_delayed_completion_of_aborted_request(struct drbd_device *device)
+{
+ panic("drbd%u %s/%u potential random memory corruption caused by delayed completion of aborted local request\n",
+ device->minor, device->resource->name, device->vnr);
+}
+
/* read, readA or write requests on R_PRIMARY coming from drbd_make_request
*/
void drbd_request_endio(struct bio *bio)
@@ -231,7 +237,7 @@ void drbd_request_endio(struct bio *bio)
drbd_emerg(device, "delayed completion of aborted local request; disk-timeout may be too aggressive\n");

if (!bio->bi_error)
- panic("possible random memory corruption caused by delayed completion of aborted local request\n");
+ drbd_panic_after_delayed_completion_of_aborted_request(device);
}

/* to avoid recursion in __req_mod */
--
1.9.1

2015-11-25 11:06:49

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 18/38] drbd: improve network timeout detection

From: Lars Ellenberg <[email protected]>

Don't blame the peer for being unresponsive,
if we did not even ask the question yet.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_int.h | 2 +
drivers/block/drbd/drbd_req.c | 123 ++++++++++++++++++++++++++++++---------
drivers/block/drbd/drbd_worker.c | 2 +
3 files changed, 100 insertions(+), 27 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 965aae0..1d00f2e 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -773,6 +773,8 @@ struct drbd_connection {
struct drbd_thread_timing_details r_timing_details[DRBD_THREAD_DETAILS_HIST];

struct {
+ unsigned long last_sent_barrier_jif;
+
/* whether this sender thread
* has processed a single write yet. */
bool seen_any_write_yet;
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 7660f6e..3add7c5 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1531,6 +1531,78 @@ blk_qc_t drbd_make_request(struct request_queue *q, struct bio *bio)
return BLK_QC_T_NONE;
}

+static bool net_timeout_reached(struct drbd_request *net_req,
+ struct drbd_connection *connection,
+ unsigned long now, unsigned long ent,
+ unsigned int ko_count, unsigned int timeout)
+{
+ struct drbd_device *device = net_req->device;
+
+ if (!time_after(now, net_req->pre_send_jif + ent))
+ return false;
+
+ if (time_in_range(now, connection->last_reconnect_jif, connection->last_reconnect_jif + ent))
+ return false;
+
+ if (net_req->rq_state & RQ_NET_PENDING) {
+ drbd_warn(device, "Remote failed to finish a request within %ums > ko-count (%u) * timeout (%u * 0.1s)\n",
+ jiffies_to_msecs(now - net_req->pre_send_jif), ko_count, timeout);
+ return true;
+ }
+
+ /* We received an ACK already (or are using protocol A),
+ * but are waiting for the epoch closing barrier ack.
+ * Check if we sent the barrier already. We should not blame the peer
+ * for being unresponsive, if we did not even ask it yet. */
+ if (net_req->epoch == connection->send.current_epoch_nr) {
+ drbd_warn(device,
+ "We did not send a P_BARRIER for %ums > ko-count (%u) * timeout (%u * 0.1s); drbd kernel thread blocked?\n",
+ jiffies_to_msecs(now - net_req->pre_send_jif), ko_count, timeout);
+ return false;
+ }
+
+ /* Worst case: we may have been blocked for whatever reason, then
+ * suddenly are able to send a lot of requests (and epoch separating
+ * barriers) in quick succession.
+ * The timestamp of the net_req may be much too old and not correspond
+ * to the sending time of the relevant unack'ed barrier packet, so
+ * would trigger a spurious timeout. The latest barrier packet may
+ * have a too recent timestamp to trigger the timeout, potentially miss
+ * a timeout. Right now we don't have a place to conveniently store
+ * these timestamps.
+ * But in this particular situation, the application requests are still
+ * completed to upper layers, DRBD should still "feel" responsive.
+ * No need yet to kill this connection, it may still recover.
+ * If not, eventually we will have queued enough into the network for
+ * us to block. From that point of view, the timestamp of the last sent
+ * barrier packet is relevant enough.
+ */
+ if (time_after(now, connection->send.last_sent_barrier_jif + ent)) {
+ drbd_warn(device, "Remote failed to answer a P_BARRIER (sent at %lu jif; now=%lu jif) within %ums > ko-count (%u) * timeout (%u * 0.1s)\n",
+ connection->send.last_sent_barrier_jif, now,
+ jiffies_to_msecs(now - connection->send.last_sent_barrier_jif), ko_count, timeout);
+ return true;
+ }
+ return false;
+}
+
+/* A request is considered timed out, if
+ * - we have some effective timeout from the configuration,
+ * with some state restrictions applied,
+ * - the oldest request is waiting for a response from the network
+ * resp. the local disk,
+ * - the oldest request is in fact older than the effective timeout,
+ * - the connection was established (resp. disk was attached)
+ * for longer than the timeout already.
+ * Note that for 32bit jiffies and very stable connections/disks,
+ * we may have a wrap around, which is catched by
+ * !time_in_range(now, last_..._jif, last_..._jif + timeout).
+ *
+ * Side effect: once per 32bit wrap-around interval, which means every
+ * ~198 days with 250 HZ, we have a window where the timeout would need
+ * to expire twice (worst case) to become effective. Good enough.
+ */
+
void request_timer_fn(unsigned long data)
{
struct drbd_device *device = (struct drbd_device *) data;
@@ -1540,11 +1612,14 @@ void request_timer_fn(unsigned long data)
unsigned long oldest_submit_jif;
unsigned long ent = 0, dt = 0, et, nt; /* effective timeout = ko_count * timeout */
unsigned long now;
+ unsigned int ko_count = 0, timeout = 0;

rcu_read_lock();
nc = rcu_dereference(connection->net_conf);
- if (nc && device->state.conn >= C_WF_REPORT_PARAMS)
- ent = nc->timeout * HZ/10 * nc->ko_count;
+ if (nc && device->state.conn >= C_WF_REPORT_PARAMS) {
+ ko_count = nc->ko_count;
+ timeout = nc->timeout;
+ }

if (get_ldev(device)) { /* implicit state.disk >= D_INCONSISTENT */
dt = rcu_dereference(device->ldev->disk_conf)->disk_timeout * HZ / 10;
@@ -1552,6 +1627,8 @@ void request_timer_fn(unsigned long data)
}
rcu_read_unlock();

+
+ ent = timeout * HZ/10 * ko_count;
et = min_not_zero(dt, ent);

if (!et)
@@ -1563,11 +1640,22 @@ void request_timer_fn(unsigned long data)
spin_lock_irq(&device->resource->req_lock);
req_read = list_first_entry_or_null(&device->pending_completion[0], struct drbd_request, req_pending_local);
req_write = list_first_entry_or_null(&device->pending_completion[1], struct drbd_request, req_pending_local);
- req_peer = connection->req_not_net_done;
+
/* maybe the oldest request waiting for the peer is in fact still
- * blocking in tcp sendmsg */
- if (!req_peer && connection->req_next && connection->req_next->pre_send_jif)
- req_peer = connection->req_next;
+ * blocking in tcp sendmsg. That's ok, though, that's handled via the
+ * socket send timeout, requesting a ping, and bumping ko-count in
+ * we_should_drop_the_connection().
+ */
+
+ /* check the oldest request we did successfully sent,
+ * but which is still waiting for an ACK. */
+ req_peer = connection->req_ack_pending;
+
+ /* if we don't have such request (e.g. protocoll A)
+ * check the oldest requests which is still waiting on its epoch
+ * closing barrier ack. */
+ if (!req_peer)
+ req_peer = connection->req_not_net_done;

/* evaluate the oldest peer request only in one timer! */
if (req_peer && req_peer->device != device)
@@ -1584,28 +1672,9 @@ void request_timer_fn(unsigned long data)
: req_write ? req_write->pre_submit_jif
: req_read ? req_read->pre_submit_jif : now;

- /* The request is considered timed out, if
- * - we have some effective timeout from the configuration,
- * with above state restrictions applied,
- * - the oldest request is waiting for a response from the network
- * resp. the local disk,
- * - the oldest request is in fact older than the effective timeout,
- * - the connection was established (resp. disk was attached)
- * for longer than the timeout already.
- * Note that for 32bit jiffies and very stable connections/disks,
- * we may have a wrap around, which is catched by
- * !time_in_range(now, last_..._jif, last_..._jif + timeout).
- *
- * Side effect: once per 32bit wrap-around interval, which means every
- * ~198 days with 250 HZ, we have a window where the timeout would need
- * to expire twice (worst case) to become effective. Good enough.
- */
- if (ent && req_peer &&
- time_after(now, req_peer->pre_send_jif + ent) &&
- !time_in_range(now, connection->last_reconnect_jif, connection->last_reconnect_jif + ent)) {
- drbd_warn(device, "Remote failed to finish a request within ko-count * timeout\n");
+ if (ent && req_peer && net_timeout_reached(req_peer, connection, now, ent, ko_count, timeout))
_conn_request_state(connection, NS(conn, C_TIMEOUT), CS_VERBOSE | CS_HARD);
- }
+
if (dt && oldest_submit_jif != now &&
time_after(now, oldest_submit_jif + dt) &&
!time_in_range(now, device->last_reattach_jif, device->last_reattach_jif + dt)) {
diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 9c89ebe..8bbabe3 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -1290,6 +1290,7 @@ static int drbd_send_barrier(struct drbd_connection *connection)
p->barrier = connection->send.current_epoch_nr;
p->pad = 0;
connection->send.current_epoch_writes = 0;
+ connection->send.last_sent_barrier_jif = jiffies;

return conn_send_command(connection, sock, P_BARRIER, sizeof(*p), NULL, 0);
}
@@ -1314,6 +1315,7 @@ static void re_init_if_first_write(struct drbd_connection *connection, unsigned
connection->send.seen_any_write_yet = true;
connection->send.current_epoch_nr = epoch;
connection->send.current_epoch_writes = 0;
+ connection->send.last_sent_barrier_jif = jiffies;
}
}

--
1.9.1

2015-11-25 11:06:52

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 19/38] drbd: fix NULL deref in remember_new_state

From: Lars Ellenberg <[email protected]>

The recent (not yet released) backport of the extended state broadcasts
to support the "events2" subcommand of drbdsetup had some glitches.

remember_old_state() would first count all connections with a
net_conf != NULL, then allocate a suitable array, then populate that
array with all connections found to have net_conf != NULL.

This races with the state change to C_STANDALONE,
and the NULL assignment there.

remember_new_state() then iterates over said connection array,
assuming that it would be fully populated.

But rcu_lock() just makes sure the thing some pointer points to,
if any, won't go away. It does not make the pointer itself immutable.

In fact there is no need to "filter" connections based on whether or not
they have a currently valid configuration. Just record them always, if
they don't have a config, that's fine, there will be no change then.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_state.c | 46 +++++++++++++----------------------------
1 file changed, 14 insertions(+), 32 deletions(-)

diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index a4e4505..f022e99 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -63,11 +63,8 @@ static void count_objects(struct drbd_resource *resource,

idr_for_each_entry(&resource->devices, device, vnr)
(*n_devices)++;
- for_each_connection(connection, resource) {
- if (!has_net_conf(connection))
- continue;
+ for_each_connection(connection, resource)
(*n_connections)++;
- }
}

static struct drbd_state_change *alloc_state_change(unsigned int n_devices, unsigned int n_connections, gfp_t gfp)
@@ -108,23 +105,13 @@ struct drbd_state_change *remember_old_state(struct drbd_resource *resource, gfp
struct drbd_peer_device_state_change *peer_device_state_change;
struct drbd_connection_state_change *connection_state_change;

-retry:
- rcu_read_lock();
+ /* Caller holds req_lock spinlock.
+ * No state, no device IDR, no connections lists can change. */
count_objects(resource, &n_devices, &n_connections);
- rcu_read_unlock();
state_change = alloc_state_change(n_devices, n_connections, gfp);
if (!state_change)
return NULL;

- rcu_read_lock();
- count_objects(resource, &n_devices, &n_connections);
- if (n_devices != state_change->n_devices ||
- n_connections != state_change->n_connections) {
- kfree(state_change);
- rcu_read_unlock();
- goto retry;
- }
-
kref_get(&resource->kref);
state_change->resource->resource = resource;
state_change->resource->role[OLD] =
@@ -133,6 +120,17 @@ retry:
state_change->resource->susp_nod[OLD] = resource->susp_nod;
state_change->resource->susp_fen[OLD] = resource->susp_fen;

+ connection_state_change = state_change->connections;
+ for_each_connection(connection, resource) {
+ kref_get(&connection->kref);
+ connection_state_change->connection = connection;
+ connection_state_change->cstate[OLD] =
+ connection->cstate;
+ connection_state_change->peer_role[OLD] =
+ conn_highest_peer(connection);
+ connection_state_change++;
+ }
+
device_state_change = state_change->devices;
peer_device_state_change = state_change->peer_devices;
idr_for_each_entry(&resource->devices, device, vnr) {
@@ -145,8 +143,6 @@ retry:
for_each_connection(connection, resource) {
struct drbd_peer_device *peer_device;

- if (!has_net_conf(connection))
- continue;
peer_device = conn_peer_device(connection, device->vnr);
peer_device_state_change->peer_device = peer_device;
peer_device_state_change->disk_state[OLD] =
@@ -165,20 +161,6 @@ retry:
device_state_change++;
}

- connection_state_change = state_change->connections;
- for_each_connection(connection, resource) {
- if (!has_net_conf(connection))
- continue;
- kref_get(&connection->kref);
- connection_state_change->connection = connection;
- connection_state_change->cstate[OLD] =
- connection->cstate;
- connection_state_change->peer_role[OLD] =
- conn_highest_peer(connection);
- connection_state_change++;
- }
- rcu_read_unlock();
-
return state_change;
}

--
1.9.1

2015-11-25 11:02:20

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 20/38] drbd: fix refcount error during detach of an already failed disk

From: Lars Ellenberg <[email protected]>

A D_FAILED disk transitions as quickly as possible to
D_DISKLESS. But in the "unresponsive local disk" case,
there remains a time window where a administrative detach command could
find the disk already failed, but some internal meta data IO against the
unresponsive local disk still pending.

In that case, drbd_md_get_buffer() will return NULL.
Don't unconditionally call drbd_md_put_buffer(), or it will cause
refcount imbalance, and prevent any further re-attach on this volume
(until it is deleted and re-created).

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 331b378..79dc3d4 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1915,6 +1915,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
static int adm_detach(struct drbd_device *device, int force)
{
enum drbd_state_rv retcode;
+ void *buffer;
int ret;

if (force) {
@@ -1925,9 +1926,12 @@ static int adm_detach(struct drbd_device *device, int force)
}

drbd_suspend_io(device); /* so no-one is stuck in drbd_al_begin_io */
- drbd_md_get_buffer(device, __func__); /* make sure there is no in-flight meta-data IO */
- retcode = drbd_request_state(device, NS(disk, D_FAILED));
- drbd_md_put_buffer(device);
+ buffer = drbd_md_get_buffer(device, __func__); /* make sure there is no in-flight meta-data IO */
+ if (buffer) {
+ retcode = drbd_request_state(device, NS(disk, D_FAILED));
+ drbd_md_put_buffer(device);
+ } else /* already <= D_FAILED */
+ retcode = SS_NOTHING_TO_DO;
/* D_FAILED will transition to DISKLESS. */
drbd_resume_io(device);
ret = wait_event_interruptible(device->misc_wait,
--
1.9.1

2015-11-25 11:06:50

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 21/38] drbd: Rename asender to ack_receiver

This prepares the next patch where the sending on the meta (or
control) socket is moved to a dedicated workqueue.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_int.h | 6 +++---
drivers/block/drbd/drbd_main.c | 10 +++++-----
drivers/block/drbd/drbd_receiver.c | 6 +++---
3 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 1d00f2e..dee6297 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -754,7 +754,7 @@ struct drbd_connection {
unsigned long last_reconnect_jif;
struct drbd_thread receiver;
struct drbd_thread worker;
- struct drbd_thread asender;
+ struct drbd_thread ack_receiver;

/* cached pointers,
* so we can look up the oldest pending requests more quickly.
@@ -1557,7 +1557,7 @@ extern void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req);

/* drbd_receiver.c */
extern int drbd_receiver(struct drbd_thread *thi);
-extern int drbd_asender(struct drbd_thread *thi);
+extern int drbd_ack_receiver(struct drbd_thread *thi);
extern bool drbd_rs_c_min_rate_throttle(struct drbd_device *device);
extern bool drbd_rs_should_slow_down(struct drbd_device *device, sector_t sector,
bool throttle_if_app_is_waiting);
@@ -1971,7 +1971,7 @@ extern void drbd_flush_workqueue(struct drbd_work_queue *work_queue);
static inline void wake_asender(struct drbd_connection *connection)
{
if (test_bit(SIGNAL_ASENDER, &connection->flags))
- force_sig(DRBD_SIG, connection->asender.task);
+ force_sig(DRBD_SIG, connection->ack_receiver.task);
}

static inline void request_ping(struct drbd_connection *connection)
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index f66294d..445f2c8 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1436,8 +1436,8 @@ static int we_should_drop_the_connection(struct drbd_connection *connection, str
/* long elapsed = (long)(jiffies - device->last_received); */

drop_it = connection->meta.socket == sock
- || !connection->asender.task
- || get_t_state(&connection->asender) != RUNNING
+ || !connection->ack_receiver.task
+ || get_t_state(&connection->ack_receiver) != RUNNING
|| connection->cstate < C_WF_REPORT_PARAMS;

if (drop_it)
@@ -2564,7 +2564,7 @@ int set_resource_options(struct drbd_resource *resource, struct res_opts *res_op
cpumask_copy(resource->cpu_mask, new_cpu_mask);
for_each_connection_rcu(connection, resource) {
connection->receiver.reset_cpu_mask = 1;
- connection->asender.reset_cpu_mask = 1;
+ connection->ack_receiver.reset_cpu_mask = 1;
connection->worker.reset_cpu_mask = 1;
}
}
@@ -2653,8 +2653,8 @@ struct drbd_connection *conn_create(const char *name, struct res_opts *res_opts)
connection->receiver.connection = connection;
drbd_thread_init(resource, &connection->worker, drbd_worker, "worker");
connection->worker.connection = connection;
- drbd_thread_init(resource, &connection->asender, drbd_asender, "asender");
- connection->asender.connection = connection;
+ drbd_thread_init(resource, &connection->ack_receiver, drbd_ack_receiver, "ack_recv");
+ connection->ack_receiver.connection = connection;

kref_init(&connection->kref);

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 61b73c7..eed4ae9 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1099,7 +1099,7 @@ randomize:
return 0;
}

- drbd_thread_start(&connection->asender);
+ drbd_thread_start(&connection->ack_receiver);

mutex_lock(&connection->resource->conf_update);
/* The discard_my_data flag is a single-shot modifier to the next
@@ -4656,7 +4656,7 @@ static void conn_disconnect(struct drbd_connection *connection)
conn_request_state(connection, NS(conn, C_NETWORK_FAILURE), CS_HARD);

/* asender does not clean up anything. it must not interfere, either */
- drbd_thread_stop(&connection->asender);
+ drbd_thread_stop(&connection->ack_receiver);
drbd_free_sock(connection);

rcu_read_lock();
@@ -5487,7 +5487,7 @@ static struct asender_cmd asender_tbl[] = {
[P_RETRY_WRITE] = { sizeof(struct p_block_ack), got_BlockAck },
};

-int drbd_asender(struct drbd_thread *thi)
+int drbd_ack_receiver(struct drbd_thread *thi)
{
struct drbd_connection *connection = thi->connection;
struct asender_cmd *cmd = NULL;
--
1.9.1

2015-11-25 11:12:17

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 22/38] drbd: Create a dedicated workqueue for sending acks on the control connection

The intention is to reduce CPU utilization. Recent measurements
unveiled that the current performance bottleneck is CPU utilization
on the receiving node. The asender thread became CPU limited.

One of the main points is to eliminate the idr_for_each_entry() loop
from the sending acks code path.

One exception in that is sending back ping_acks. These stay
in the ack-receiver thread. Otherwise the logic becomes too
complicated for no added value.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_int.h | 27 ++---
drivers/block/drbd/drbd_main.c | 10 +-
drivers/block/drbd/drbd_nl.c | 4 +-
drivers/block/drbd/drbd_protocol.h | 2 +-
drivers/block/drbd/drbd_receiver.c | 203 +++++++++++++++++++++----------------
drivers/block/drbd/drbd_req.c | 2 +-
drivers/block/drbd/drbd_worker.c | 8 +-
7 files changed, 141 insertions(+), 115 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index dee6297..3efaf18 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -77,13 +77,6 @@ extern int fault_devs;
extern char usermode_helper[];


-/* I don't remember why XCPU ...
- * This is used to wake the asender,
- * and to interrupt sending the sending task
- * on disconnect.
- */
-#define DRBD_SIG SIGXCPU
-
/* This is used to stop/restart our threads.
* Cannot use SIGTERM nor SIGKILL, since these
* are sent out by init on runlevel changes
@@ -647,8 +640,7 @@ extern struct fifo_buffer *fifo_alloc(int fifo_size);
enum {
NET_CONGESTED, /* The data socket is congested */
RESOLVE_CONFLICTS, /* Set on one node, cleared on the peer! */
- SEND_PING, /* whether asender should send a ping asap */
- SIGNAL_ASENDER, /* whether asender wants to be interrupted */
+ SEND_PING,
GOT_PING_ACK, /* set when we receive a ping_ack packet, ping_wait gets woken */
CONN_WD_ST_CHG_REQ, /* A cluster wide state change on the connection is active */
CONN_WD_ST_CHG_OKAY,
@@ -755,6 +747,7 @@ struct drbd_connection {
struct drbd_thread receiver;
struct drbd_thread worker;
struct drbd_thread ack_receiver;
+ struct workqueue_struct *ack_sender;

/* cached pointers,
* so we can look up the oldest pending requests more quickly.
@@ -823,6 +816,7 @@ struct drbd_peer_device {
struct list_head peer_devices;
struct drbd_device *device;
struct drbd_connection *connection;
+ struct work_struct send_acks_work;
#ifdef CONFIG_DEBUG_FS
struct dentry *debugfs_peer_dev;
#endif
@@ -1558,6 +1552,8 @@ extern void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req);
/* drbd_receiver.c */
extern int drbd_receiver(struct drbd_thread *thi);
extern int drbd_ack_receiver(struct drbd_thread *thi);
+extern void drbd_send_ping_wf(struct work_struct *ws);
+extern void drbd_send_acks_wf(struct work_struct *ws);
extern bool drbd_rs_c_min_rate_throttle(struct drbd_device *device);
extern bool drbd_rs_should_slow_down(struct drbd_device *device, sector_t sector,
bool throttle_if_app_is_waiting);
@@ -1968,16 +1964,21 @@ drbd_device_post_work(struct drbd_device *device, int work_bit)

extern void drbd_flush_workqueue(struct drbd_work_queue *work_queue);

-static inline void wake_asender(struct drbd_connection *connection)
+/* To get the ack_receiver out of the blocking network stack,
+ * so it can change its sk_rcvtimeo from idle- to ping-timeout,
+ * and send a ping, we need to send a signal.
+ * Which signal we send is irrelevant. */
+static inline void wake_ack_receiver(struct drbd_connection *connection)
{
- if (test_bit(SIGNAL_ASENDER, &connection->flags))
- force_sig(DRBD_SIG, connection->ack_receiver.task);
+ struct task_struct *task = connection->ack_receiver.task;
+ if (task && get_t_state(&connection->ack_receiver) == RUNNING)
+ force_sig(SIGXCPU, task);
}

static inline void request_ping(struct drbd_connection *connection)
{
set_bit(SEND_PING, &connection->flags);
- wake_asender(connection);
+ wake_ack_receiver(connection);
}

extern void *conn_prepare_command(struct drbd_connection *, struct drbd_socket *);
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 445f2c8..938bca2 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1794,15 +1794,6 @@ int drbd_send(struct drbd_connection *connection, struct socket *sock,
drbd_update_congested(connection);
}
do {
- /* STRANGE
- * tcp_sendmsg does _not_ use its size parameter at all ?
- *
- * -EAGAIN on timeout, -EINTR on signal.
- */
-/* THINK
- * do we need to block DRBD_SIG if sock == &meta.socket ??
- * otherwise wake_asender() might interrupt some send_*Ack !
- */
rv = kernel_sendmsg(sock, &msg, &iov, 1, size);
if (rv == -EAGAIN) {
if (we_should_drop_the_connection(connection, sock))
@@ -2821,6 +2812,7 @@ enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsig
goto out_idr_remove_from_resource;
}
kref_get(&connection->kref);
+ INIT_WORK(&peer_device->send_acks_work, drbd_send_acks_wf);
}

if (init_submitter(device)) {
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 79dc3d4..f35cefb 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1258,8 +1258,8 @@ static void conn_reconfig_done(struct drbd_connection *connection)
connection->cstate == C_STANDALONE;
spin_unlock_irq(&connection->resource->req_lock);
if (stop_threads) {
- /* asender is implicitly stopped by receiver
- * in conn_disconnect() */
+ /* ack_receiver thread and ack_sender workqueue are implicitly
+ * stopped by receiver in conn_disconnect() */
drbd_thread_stop(&connection->receiver);
drbd_thread_stop(&connection->worker);
}
diff --git a/drivers/block/drbd/drbd_protocol.h b/drivers/block/drbd/drbd_protocol.h
index 2da9104a..ef92453 100644
--- a/drivers/block/drbd/drbd_protocol.h
+++ b/drivers/block/drbd/drbd_protocol.h
@@ -23,7 +23,7 @@ enum drbd_packet {
P_AUTH_RESPONSE = 0x11,
P_STATE_CHG_REQ = 0x12,

- /* asender (meta socket */
+ /* (meta socket) */
P_PING = 0x13,
P_PING_ACK = 0x14,
P_RECV_ACK = 0x15, /* Used in protocol B */
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index eed4ae9..ea54341 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -215,7 +215,7 @@ static void reclaim_finished_net_peer_reqs(struct drbd_device *device,
}
}

-static void drbd_kick_lo_and_reclaim_net(struct drbd_device *device)
+static void drbd_reclaim_net_peer_reqs(struct drbd_device *device)
{
LIST_HEAD(reclaimed);
struct drbd_peer_request *peer_req, *t;
@@ -223,11 +223,30 @@ static void drbd_kick_lo_and_reclaim_net(struct drbd_device *device)
spin_lock_irq(&device->resource->req_lock);
reclaim_finished_net_peer_reqs(device, &reclaimed);
spin_unlock_irq(&device->resource->req_lock);
-
list_for_each_entry_safe(peer_req, t, &reclaimed, w.list)
drbd_free_net_peer_req(device, peer_req);
}

+static void conn_reclaim_net_peer_reqs(struct drbd_connection *connection)
+{
+ struct drbd_peer_device *peer_device;
+ int vnr;
+
+ rcu_read_lock();
+ idr_for_each_entry(&connection->peer_devices, peer_device, vnr) {
+ struct drbd_device *device = peer_device->device;
+ if (!atomic_read(&device->pp_in_use_by_net))
+ continue;
+
+ kref_get(&device->kref);
+ rcu_read_unlock();
+ drbd_reclaim_net_peer_reqs(device);
+ kref_put(&device->kref, drbd_destroy_device);
+ rcu_read_lock();
+ }
+ rcu_read_unlock();
+}
+
/**
* drbd_alloc_pages() - Returns @number pages, retries forever (or until signalled)
* @device: DRBD device.
@@ -265,10 +284,15 @@ struct page *drbd_alloc_pages(struct drbd_peer_device *peer_device, unsigned int
if (atomic_read(&device->pp_in_use) < mxb)
page = __drbd_alloc_pages(device, number);

+ /* Try to keep the fast path fast, but occasionally we need
+ * to reclaim the pages we lended to the network stack. */
+ if (page && atomic_read(&device->pp_in_use_by_net) > 512)
+ drbd_reclaim_net_peer_reqs(device);
+
while (page == NULL) {
prepare_to_wait(&drbd_pp_wait, &wait, TASK_INTERRUPTIBLE);

- drbd_kick_lo_and_reclaim_net(device);
+ drbd_reclaim_net_peer_reqs(device);

if (atomic_read(&device->pp_in_use) < mxb) {
page = __drbd_alloc_pages(device, number);
@@ -1100,6 +1124,11 @@ randomize:
}

drbd_thread_start(&connection->ack_receiver);
+ connection->ack_sender = create_singlethread_workqueue("drbd_ack_sender");
+ if (!connection->ack_sender) {
+ drbd_err(connection, "Failed to create workqueue ack_sender\n");
+ return 0;
+ }

mutex_lock(&connection->resource->conf_update);
/* The discard_my_data flag is a single-shot modifier to the next
@@ -1746,7 +1775,7 @@ static int recv_dless_read(struct drbd_peer_device *peer_device, struct drbd_req
}

/*
- * e_end_resync_block() is called in asender context via
+ * e_end_resync_block() is called in ack_sender context via
* drbd_finish_peer_reqs().
*/
static int e_end_resync_block(struct drbd_work *w, int unused)
@@ -1920,7 +1949,7 @@ static void restart_conflicting_writes(struct drbd_device *device,
}

/*
- * e_end_block() is called in asender context via drbd_finish_peer_reqs().
+ * e_end_block() is called in ack_sender context via drbd_finish_peer_reqs().
*/
static int e_end_block(struct drbd_work *w, int cancel)
{
@@ -2211,7 +2240,7 @@ static int handle_write_conflicts(struct drbd_device *device,
peer_req->w.cb = superseded ? e_send_superseded :
e_send_retry_write;
list_add_tail(&peer_req->w.list, &device->done_ee);
- wake_asender(connection);
+ queue_work(connection->ack_sender, &peer_req->peer_device->send_acks_work);

err = -ENOENT;
goto out;
@@ -4050,7 +4079,7 @@ static int receive_state(struct drbd_connection *connection, struct packet_info
os = ns = drbd_read_state(device);
spin_unlock_irq(&device->resource->req_lock);

- /* If some other part of the code (asender thread, timeout)
+ /* If some other part of the code (ack_receiver thread, timeout)
* already decided to close the connection again,
* we must not "re-establish" it here. */
if (os.conn <= C_TEAR_DOWN)
@@ -4655,8 +4684,12 @@ static void conn_disconnect(struct drbd_connection *connection)
*/
conn_request_state(connection, NS(conn, C_NETWORK_FAILURE), CS_HARD);

- /* asender does not clean up anything. it must not interfere, either */
+ /* ack_receiver does not clean up anything. it must not interfere, either */
drbd_thread_stop(&connection->ack_receiver);
+ if (connection->ack_sender) {
+ destroy_workqueue(connection->ack_sender);
+ connection->ack_sender = NULL;
+ }
drbd_free_sock(connection);

rcu_read_lock();
@@ -5425,49 +5458,39 @@ static int got_skip(struct drbd_connection *connection, struct packet_info *pi)
return 0;
}

-static int connection_finish_peer_reqs(struct drbd_connection *connection)
+struct meta_sock_cmd {
+ size_t pkt_size;
+ int (*fn)(struct drbd_connection *connection, struct packet_info *);
+};
+
+static void set_rcvtimeo(struct drbd_connection *connection, bool ping_timeout)
{
- struct drbd_peer_device *peer_device;
- int vnr, not_empty = 0;
+ long t;
+ struct net_conf *nc;

- do {
- clear_bit(SIGNAL_ASENDER, &connection->flags);
- flush_signals(current);
+ rcu_read_lock();
+ nc = rcu_dereference(connection->net_conf);
+ t = ping_timeout ? nc->ping_timeo : nc->ping_int;
+ rcu_read_unlock();

- rcu_read_lock();
- idr_for_each_entry(&connection->peer_devices, peer_device, vnr) {
- struct drbd_device *device = peer_device->device;
- kref_get(&device->kref);
- rcu_read_unlock();
- if (drbd_finish_peer_reqs(device)) {
- kref_put(&device->kref, drbd_destroy_device);
- return 1;
- }
- kref_put(&device->kref, drbd_destroy_device);
- rcu_read_lock();
- }
- set_bit(SIGNAL_ASENDER, &connection->flags);
+ t *= HZ;
+ if (ping_timeout)
+ t /= 10;

- spin_lock_irq(&connection->resource->req_lock);
- idr_for_each_entry(&connection->peer_devices, peer_device, vnr) {
- struct drbd_device *device = peer_device->device;
- not_empty = !list_empty(&device->done_ee);
- if (not_empty)
- break;
- }
- spin_unlock_irq(&connection->resource->req_lock);
- rcu_read_unlock();
- } while (not_empty);
+ connection->meta.socket->sk->sk_rcvtimeo = t;
+}

- return 0;
+static void set_ping_timeout(struct drbd_connection *connection)
+{
+ set_rcvtimeo(connection, 1);
}

-struct asender_cmd {
- size_t pkt_size;
- int (*fn)(struct drbd_connection *connection, struct packet_info *);
-};
+static void set_idle_timeout(struct drbd_connection *connection)
+{
+ set_rcvtimeo(connection, 0);
+}

-static struct asender_cmd asender_tbl[] = {
+static struct meta_sock_cmd ack_receiver_tbl[] = {
[P_PING] = { 0, got_Ping },
[P_PING_ACK] = { 0, got_PingAck },
[P_RECV_ACK] = { sizeof(struct p_block_ack), got_BlockAck },
@@ -5490,61 +5513,37 @@ static struct asender_cmd asender_tbl[] = {
int drbd_ack_receiver(struct drbd_thread *thi)
{
struct drbd_connection *connection = thi->connection;
- struct asender_cmd *cmd = NULL;
+ struct meta_sock_cmd *cmd = NULL;
struct packet_info pi;
+ unsigned long pre_recv_jif;
int rv;
void *buf = connection->meta.rbuf;
int received = 0;
unsigned int header_size = drbd_header_size(connection);
int expect = header_size;
bool ping_timeout_active = false;
- struct net_conf *nc;
- int ping_timeo, tcp_cork, ping_int;
struct sched_param param = { .sched_priority = 2 };

rv = sched_setscheduler(current, SCHED_RR, &param);
if (rv < 0)
- drbd_err(connection, "drbd_asender: ERROR set priority, ret=%d\n", rv);
+ drbd_err(connection, "drbd_ack_receiver: ERROR set priority, ret=%d\n", rv);

while (get_t_state(thi) == RUNNING) {
drbd_thread_current_set_cpu(thi);

- rcu_read_lock();
- nc = rcu_dereference(connection->net_conf);
- ping_timeo = nc->ping_timeo;
- tcp_cork = nc->tcp_cork;
- ping_int = nc->ping_int;
- rcu_read_unlock();
+ conn_reclaim_net_peer_reqs(connection);

if (test_and_clear_bit(SEND_PING, &connection->flags)) {
if (drbd_send_ping(connection)) {
drbd_err(connection, "drbd_send_ping has failed\n");
goto reconnect;
}
- connection->meta.socket->sk->sk_rcvtimeo = ping_timeo * HZ / 10;
+ set_ping_timeout(connection);
ping_timeout_active = true;
}

- /* TODO: conditionally cork; it may hurt latency if we cork without
- much to send */
- if (tcp_cork)
- drbd_tcp_cork(connection->meta.socket);
- if (connection_finish_peer_reqs(connection)) {
- drbd_err(connection, "connection_finish_peer_reqs() failed\n");
- goto reconnect;
- }
- /* but unconditionally uncork unless disabled */
- if (tcp_cork)
- drbd_tcp_uncork(connection->meta.socket);
-
- /* short circuit, recv_msg would return EINTR anyways. */
- if (signal_pending(current))
- continue;
-
+ pre_recv_jif = jiffies;
rv = drbd_recv_short(connection->meta.socket, buf, expect-received, 0);
- clear_bit(SIGNAL_ASENDER, &connection->flags);
-
- flush_signals(current);

/* Note:
* -EINTR (on meta) we got a signal
@@ -5556,7 +5555,6 @@ int drbd_ack_receiver(struct drbd_thread *thi)
* rv < expected: "woken" by signal during receive
* rv == 0 : "connection shut down by peer"
*/
-received_more:
if (likely(rv > 0)) {
received += rv;
buf += rv;
@@ -5578,8 +5576,7 @@ received_more:
} else if (rv == -EAGAIN) {
/* If the data socket received something meanwhile,
* that is good enough: peer is still alive. */
- if (time_after(connection->last_received,
- jiffies - connection->meta.socket->sk->sk_rcvtimeo))
+ if (time_after(connection->last_received, pre_recv_jif))
continue;
if (ping_timeout_active) {
drbd_err(connection, "PingAck did not arrive in time.\n");
@@ -5588,6 +5585,10 @@ received_more:
set_bit(SEND_PING, &connection->flags);
continue;
} else if (rv == -EINTR) {
+ /* maybe drbd_thread_stop(): the while condition will notice.
+ * maybe woken for send_ping: we'll send a ping above,
+ * and change the rcvtimeo */
+ flush_signals(current);
continue;
} else {
drbd_err(connection, "sock_recvmsg returned %d\n", rv);
@@ -5597,8 +5598,8 @@ received_more:
if (received == expect && cmd == NULL) {
if (decode_header(connection, connection->meta.rbuf, &pi))
goto reconnect;
- cmd = &asender_tbl[pi.cmd];
- if (pi.cmd >= ARRAY_SIZE(asender_tbl) || !cmd->fn) {
+ cmd = &ack_receiver_tbl[pi.cmd];
+ if (pi.cmd >= ARRAY_SIZE(ack_receiver_tbl) || !cmd->fn) {
drbd_err(connection, "Unexpected meta packet %s (0x%04x)\n",
cmdname(pi.cmd), pi.cmd);
goto disconnect;
@@ -5621,9 +5622,8 @@ received_more:

connection->last_received = jiffies;

- if (cmd == &asender_tbl[P_PING_ACK]) {
- /* restore idle timeout */
- connection->meta.socket->sk->sk_rcvtimeo = ping_int * HZ;
+ if (cmd == &ack_receiver_tbl[P_PING_ACK]) {
+ set_idle_timeout(connection);
ping_timeout_active = false;
}

@@ -5632,11 +5632,6 @@ received_more:
expect = header_size;
cmd = NULL;
}
- if (test_bit(SEND_PING, &connection->flags))
- continue;
- rv = drbd_recv_short(connection->meta.socket, buf, expect-received, MSG_DONTWAIT);
- if (rv > 0)
- goto received_more;
}

if (0) {
@@ -5648,9 +5643,41 @@ reconnect:
disconnect:
conn_request_state(connection, NS(conn, C_DISCONNECTING), CS_HARD);
}
- clear_bit(SIGNAL_ASENDER, &connection->flags);

- drbd_info(connection, "asender terminated\n");
+ drbd_info(connection, "ack_receiver terminated\n");

return 0;
}
+
+void drbd_send_acks_wf(struct work_struct *ws)
+{
+ struct drbd_peer_device *peer_device =
+ container_of(ws, struct drbd_peer_device, send_acks_work);
+ struct drbd_connection *connection = peer_device->connection;
+ struct drbd_device *device = peer_device->device;
+ struct net_conf *nc;
+ int tcp_cork, err;
+
+ rcu_read_lock();
+ nc = rcu_dereference(connection->net_conf);
+ tcp_cork = nc->tcp_cork;
+ rcu_read_unlock();
+
+ if (tcp_cork)
+ drbd_tcp_cork(connection->meta.socket);
+
+ err = drbd_finish_peer_reqs(device);
+ kref_put(&device->kref, drbd_destroy_device);
+ /* get is in drbd_endio_write_sec_final(). That is necessary to keep the
+ struct work_struct send_acks_work alive, which is in the peer_device object */
+
+ if (err) {
+ conn_request_state(connection, NS(conn, C_NETWORK_FAILURE), CS_HARD);
+ return;
+ }
+
+ if (tcp_cork)
+ drbd_tcp_uncork(connection->meta.socket);
+
+ return;
+}
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 3add7c5..7907fb5 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -453,7 +453,7 @@ static void mod_rq_state(struct drbd_request *req, struct bio_and_error *m,
kref_get(&req->kref); /* wait for the DONE */

if (!(s & RQ_NET_SENT) && (set & RQ_NET_SENT)) {
- /* potentially already completed in the asender thread */
+ /* potentially already completed in the ack_receiver thread */
if (!(s & RQ_NET_DONE)) {
atomic_add(req->i.size >> 9, &device->ap_in_flight);
set_if_null_req_not_net_done(peer_device, req);
diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 8bbabe3..2f29bf3 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -113,6 +113,7 @@ void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req) __releases(l
unsigned long flags = 0;
struct drbd_peer_device *peer_device = peer_req->peer_device;
struct drbd_device *device = peer_device->device;
+ struct drbd_connection *connection = peer_device->connection;
struct drbd_interval i;
int do_wake;
u64 block_id;
@@ -145,6 +146,12 @@ void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req) __releases(l
* ((peer_req->flags & (EE_WAS_ERROR|EE_IS_TRIM)) == EE_WAS_ERROR) */
if (peer_req->flags & EE_WAS_ERROR)
__drbd_chk_io_error(device, DRBD_WRITE_ERROR);
+
+ if (connection->cstate >= C_WF_REPORT_PARAMS) {
+ kref_get(&device->kref); /* put is in drbd_send_acks_wf() */
+ if (!queue_work(connection->ack_sender, &peer_device->send_acks_work))
+ kref_put(&device->kref, drbd_destroy_device);
+ }
spin_unlock_irqrestore(&device->resource->req_lock, flags);

if (block_id == ID_SYNCER)
@@ -156,7 +163,6 @@ void drbd_endio_write_sec_final(struct drbd_peer_request *peer_req) __releases(l
if (do_al_complete_io)
drbd_al_complete_io(device, &i);

- wake_asender(peer_device->connection);
put_ldev(device);
}

--
1.9.1

2015-11-25 11:13:42

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 23/38] drbd: prevent NULL pointer deref when resuming diskless primary

From: Lars Ellenberg <[email protected]>

In a multiple error scenario, we may end up with a "frozen" Primary,
that has no access to any data (no local disk, no replication link).

If we then resume-io, we try to generate a new data generation id,
which will fail if there is no longer a local disk.

Double check for available local data,
which prevents the NULL pointer deref.

If we are diskless, turn the resume-io in this situation
into the first stage of a "force down", by bumping the "effective" data
gen id, which will prevent later attach or connect to the former data
set without first being demoted (deconfigured).

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 25 ++++++++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index f35cefb..5e4adff 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2920,7 +2920,30 @@ int drbd_adm_resume_io(struct sk_buff *skb, struct genl_info *info)
mutex_lock(&adm_ctx.resource->adm_mutex);
device = adm_ctx.device;
if (test_bit(NEW_CUR_UUID, &device->flags)) {
- drbd_uuid_new_current(device);
+ if (get_ldev_if_state(device, D_ATTACHING)) {
+ drbd_uuid_new_current(device);
+ put_ldev(device);
+ } else {
+ /* This is effectively a multi-stage "forced down".
+ * The NEW_CUR_UUID bit is supposedly only set, if we
+ * lost the replication connection, and are configured
+ * to freeze IO and wait for some fence-peer handler.
+ * So we still don't have a replication connection.
+ * And now we don't have a local disk either. After
+ * resume, we will fail all pending and new IO, because
+ * we don't have any data anymore. Which means we will
+ * eventually be able to terminate all users of this
+ * device, and then take it down. By bumping the
+ * "effective" data uuid, we make sure that you really
+ * need to tear down before you reconfigure, we will
+ * the refuse to re-connect or re-attach (because no
+ * matching real data uuid exists).
+ */
+ u64 val;
+ get_random_bytes(&val, sizeof(u64));
+ drbd_set_ed_uuid(device, val);
+ drbd_warn(device, "Resumed without access to data; please tear down before attempting to re-configure.\n");
+ }
clear_bit(NEW_CUR_UUID, &device->flags);
}
drbd_suspend_io(device);
--
1.9.1

2015-11-25 11:11:02

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 24/38] drbd: debugfs: expose ed_data_gen_id

From: Lars Ellenberg <[email protected]>

The effective data generation ID may be interesting for debugging
purposes of scenarios involving diskless states.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_debugfs.c | 10 ++++++++++
drivers/block/drbd/drbd_int.h | 1 +
2 files changed, 11 insertions(+)

diff --git a/drivers/block/drbd/drbd_debugfs.c b/drivers/block/drbd/drbd_debugfs.c
index 6b88a35..96a0107 100644
--- a/drivers/block/drbd/drbd_debugfs.c
+++ b/drivers/block/drbd/drbd_debugfs.c
@@ -771,6 +771,13 @@ static int device_data_gen_id_show(struct seq_file *m, void *ignored)
return 0;
}

+static int device_ed_gen_id_show(struct seq_file *m, void *ignored)
+{
+ struct drbd_device *device = m->private;
+ seq_printf(m, "0x%016llX\n", (unsigned long long)device->ed_uuid);
+ return 0;
+}
+
#define drbd_debugfs_device_attr(name) \
static int device_ ## name ## _open(struct inode *inode, struct file *file) \
{ \
@@ -796,6 +803,7 @@ drbd_debugfs_device_attr(oldest_requests)
drbd_debugfs_device_attr(act_log_extents)
drbd_debugfs_device_attr(resync_extents)
drbd_debugfs_device_attr(data_gen_id)
+drbd_debugfs_device_attr(ed_gen_id)

void drbd_debugfs_device_add(struct drbd_device *device)
{
@@ -839,6 +847,7 @@ void drbd_debugfs_device_add(struct drbd_device *device)
DCF(act_log_extents);
DCF(resync_extents);
DCF(data_gen_id);
+ DCF(ed_gen_id);
#undef DCF
return;

@@ -854,6 +863,7 @@ void drbd_debugfs_device_cleanup(struct drbd_device *device)
drbd_debugfs_remove(&device->debugfs_vol_act_log_extents);
drbd_debugfs_remove(&device->debugfs_vol_resync_extents);
drbd_debugfs_remove(&device->debugfs_vol_data_gen_id);
+ drbd_debugfs_remove(&device->debugfs_vol_ed_gen_id);
drbd_debugfs_remove(&device->debugfs_vol);
}

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 3efaf18..08f266e 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -835,6 +835,7 @@ struct drbd_device {
struct dentry *debugfs_vol_act_log_extents;
struct dentry *debugfs_vol_resync_extents;
struct dentry *debugfs_vol_data_gen_id;
+ struct dentry *debugfs_vol_ed_gen_id;
#endif

unsigned int vnr; /* volume number within the connection */
--
1.9.1

2015-11-25 11:01:39

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 25/38] drbd: use resource name in workqueue

From: Lars Ellenberg <[email protected]>

Since kernel 3.3, we can use snprintf-style arguments
to create a workqueue.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_main.c | 4 ++--
drivers/block/drbd/drbd_receiver.c | 5 ++++-
2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 938bca2..3a9a0f1 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2694,8 +2694,8 @@ static int init_submitter(struct drbd_device *device)
{
/* opencoded create_singlethread_workqueue(),
* to be able to say "drbd%d", ..., minor */
- device->submit.wq = alloc_workqueue("drbd%u_submit",
- WQ_UNBOUND | WQ_MEM_RECLAIM, 1, device->minor);
+ device->submit.wq =
+ alloc_ordered_workqueue("drbd%u_submit", WQ_MEM_RECLAIM, device->minor);
if (!device->submit.wq)
return -ENOMEM;

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index ea54341..1957fe8 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1124,7 +1124,10 @@ randomize:
}

drbd_thread_start(&connection->ack_receiver);
- connection->ack_sender = create_singlethread_workqueue("drbd_ack_sender");
+ /* opencoded create_singlethread_workqueue(),
+ * to be able to use format string arguments */
+ connection->ack_sender =
+ alloc_ordered_workqueue("drbd_as_%s", WQ_MEM_RECLAIM, connection->resource->name);
if (!connection->ack_sender) {
drbd_err(connection, "Failed to create workqueue ack_sender\n");
return 0;
--
1.9.1

2015-11-25 11:02:35

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 26/38] drbd: avoid redefinition of BITS_PER_PAGE

From: Lars Ellenberg <[email protected]>

Apparently we now implicitly get definitions for BITS_PER_PAGE and
BITS_PER_PAGE_MASK from the pid_namespace.h

Instead of renaming our defines, I chose to define only if not yet
defined, but to double check the value if already defined.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_bitmap.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index 9462d27..8bdc34d 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -479,8 +479,14 @@ void drbd_bm_cleanup(struct drbd_device *device)
* this masks out the remaining bits.
* Returns the number of bits cleared.
*/
+#ifndef BITS_PER_PAGE
#define BITS_PER_PAGE (1UL << (PAGE_SHIFT + 3))
#define BITS_PER_PAGE_MASK (BITS_PER_PAGE - 1)
+#else
+# if BITS_PER_PAGE != (1UL << (PAGE_SHIFT + 3))
+# error "ambiguous BITS_PER_PAGE"
+# endif
+#endif
#define BITS_PER_LONG_MASK (BITS_PER_LONG - 1)
static int bm_clear_surplus(struct drbd_bitmap *b)
{
--
1.9.1

2015-11-25 11:06:48

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 27/38] drbd: use bitmap_weight() helper, don't open code

From: Lars Ellenberg <[email protected]>

Suggested by Akinobu Mita <[email protected]>

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_bitmap.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index 8bdc34d..0dabc9b 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -24,7 +24,7 @@

#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt

-#include <linux/bitops.h>
+#include <linux/bitmap.h>
#include <linux/vmalloc.h>
#include <linux/string.h>
#include <linux/drbd.h>
@@ -565,21 +565,19 @@ static unsigned long bm_count_bits(struct drbd_bitmap *b)
unsigned long *p_addr;
unsigned long bits = 0;
unsigned long mask = (1UL << (b->bm_bits & BITS_PER_LONG_MASK)) -1;
- int idx, i, last_word;
+ int idx, last_word;

/* all but last page */
for (idx = 0; idx < b->bm_number_of_pages - 1; idx++) {
p_addr = __bm_map_pidx(b, idx);
- for (i = 0; i < LWPP; i++)
- bits += hweight_long(p_addr[i]);
+ bits += bitmap_weight(p_addr, BITS_PER_PAGE);
__bm_unmap(p_addr);
cond_resched();
}
/* last (or only) page */
last_word = ((b->bm_bits - 1) & BITS_PER_PAGE_MASK) >> LN2_BPL;
p_addr = __bm_map_pidx(b, idx);
- for (i = 0; i < last_word; i++)
- bits += hweight_long(p_addr[i]);
+ bits += bitmap_weight(p_addr, last_word * BITS_PER_LONG);
p_addr[last_word] &= cpu_to_lel(mask);
bits += hweight_long(p_addr[last_word]);
/* 32bit arch, may have an unused padding long */
@@ -1425,6 +1423,9 @@ static inline void bm_set_full_words_within_one_page(struct drbd_bitmap *b,
int bits;
int changed = 0;
unsigned long *paddr = kmap_atomic(b->bm_pages[page_nr]);
+
+ /* I think it is more cache line friendly to hweight_long then set to ~0UL,
+ * than to first bitmap_weight() all words, then bitmap_fill() all words */
for (i = first_word; i < last_word; i++) {
bits = hweight_long(paddr[i]);
paddr[i] = ~0UL;
@@ -1634,8 +1635,7 @@ int drbd_bm_e_weight(struct drbd_device *device, unsigned long enr)
int n = e-s;
p_addr = bm_map_pidx(b, bm_word_to_page_idx(b, s));
bm = p_addr + MLPP(s);
- while (n--)
- count += hweight_long(*bm++);
+ count += bitmap_weight(bm, n * BITS_PER_LONG);
bm_unmap(p_addr);
} else {
drbd_err(device, "start offset (%d) too large in drbd_bm_e_weight\n", s);
--
1.9.1

2015-11-25 11:00:12

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 28/38] drbd: fix spurious alert level printk

From: Lars Ellenberg <[email protected]>

When accessing out meta data area on disk, we double check the
plausibility of the requested sector offsets, and are very noisy about
it if they look suspicious.

During initial read of our "superblock", for "external" meta data,
this triggered because the range estimate returned by
drbd_md_last_sector() was still wrong.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_main.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 3a9a0f1..a4aa7eb 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -3270,6 +3270,10 @@ int drbd_md_read(struct drbd_device *device, struct drbd_backing_dev *bdev)
* and read it. */
bdev->md.meta_dev_idx = bdev->disk_conf->meta_dev_idx;
bdev->md.md_offset = drbd_md_ss(bdev);
+ /* Even for (flexible or indexed) external meta data,
+ * initially restrict us to the 4k superblock for now.
+ * Affects the paranoia out-of-range access check in drbd_md_sync_page_io(). */
+ bdev->md.md_size_sect = 8;

if (drbd_md_sync_page_io(device, bdev, bdev->md.md_offset, READ)) {
/* NOTE: can't do normal error processing here as this is
--
1.9.1

2015-11-25 11:09:41

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 29/38] drbd: fix queue limit setup for discard

From: Lars Ellenberg <[email protected]>

We cannot possibly support SECDISCARD, even if all backend devices would
support it: if our peer is currently unreachable, some instance of the
data may obviously still be recoverable.

We did not set discard_granularity at all. We don't really care (yet),
we only pass them on, so for now, set our granularity to one sector.
blkdev_stack_limits() takes care of the rest.

If we decide we cannot support discards,
not only clear the (not user visible) QUEUE_FLAG_DISCARD,
but set both (user visible) discard_granularity and max_discard_sectors
to zero, to avoid confusion with e.g. lsblk -D.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 5e4adff..4703f1a 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1168,21 +1168,20 @@ static void drbd_setup_queue_param(struct drbd_device *device, struct drbd_backi
if (b) {
struct drbd_connection *connection = first_peer_device(device)->connection;

+ blk_queue_max_discard_sectors(q, DRBD_MAX_DISCARD_SECTORS);
+
if (blk_queue_discard(b) &&
(connection->cstate < C_CONNECTED || connection->agreed_features & FF_TRIM)) {
- /* For now, don't allow more than one activity log extent worth of data
- * to be discarded in one go. We may need to rework drbd_al_begin_io()
- * to allow for even larger discard ranges */
- blk_queue_max_discard_sectors(q, DRBD_MAX_DISCARD_SECTORS);
-
+ /* We don't care, stacking below should fix it for the local device.
+ * Whether or not it is a suitable granularity on the remote device
+ * is not our problem, really. If you care, you need to
+ * use devices with similar topology on all peers. */
+ q->limits.discard_granularity = 512;
queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, q);
- /* REALLY? Is stacking secdiscard "legal"? */
- if (blk_queue_secdiscard(b))
- queue_flag_set_unlocked(QUEUE_FLAG_SECDISCARD, q);
} else {
blk_queue_max_discard_sectors(q, 0);
queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
- queue_flag_clear_unlocked(QUEUE_FLAG_SECDISCARD, q);
+ q->limits.discard_granularity = 0;
}

blk_queue_stack_limits(q, b);
@@ -1194,6 +1193,12 @@ static void drbd_setup_queue_param(struct drbd_device *device, struct drbd_backi
q->backing_dev_info.ra_pages = b->backing_dev_info.ra_pages;
}
}
+ /* To avoid confusion, if this queue does not support discard, clear
+ * max_discard_sectors, which is what lsblk -D reports to the user. */
+ if (!blk_queue_discard(q)) {
+ blk_queue_max_discard_sectors(q, 0);
+ q->limits.discard_granularity = 0;
+ }
}

void drbd_reconsider_max_bio_size(struct drbd_device *device, struct drbd_backing_dev *bdev)
--
1.9.1

2015-11-25 11:00:14

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 30/38] drbd: make drbd known to lsblk: use bd_link_disk_holder

From: Lars Ellenberg <[email protected]>

lsblk should be able to pick up stacking device driver relations
involving DRBD conveniently.

Even though upstream kernel since 2011 says
"DON'T USE THIS UNLESS YOU'RE ALREADY USING IT."
a new user has been added since (bcache),
which sets the precedences for us to use it as well.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_int.h | 2 +-
drivers/block/drbd/drbd_main.c | 16 +-----
drivers/block/drbd/drbd_nl.c | 121 ++++++++++++++++++++++++++++-----------
drivers/block/drbd/drbd_worker.c | 2 +-
include/linux/drbd.h | 2 +-
5 files changed, 91 insertions(+), 52 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 08f266e..a262653 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -1126,7 +1126,7 @@ extern int drbd_send_ov_request(struct drbd_peer_device *, sector_t sector, int
extern int drbd_send_bitmap(struct drbd_device *device);
extern void drbd_send_sr_reply(struct drbd_peer_device *, enum drbd_state_rv retcode);
extern void conn_send_sr_reply(struct drbd_connection *connection, enum drbd_state_rv retcode);
-extern void drbd_free_ldev(struct drbd_backing_dev *ldev);
+extern void drbd_backing_dev_free(struct drbd_device *device, struct drbd_backing_dev *ldev);
extern void drbd_device_cleanup(struct drbd_device *device);
void drbd_print_uuids(struct drbd_device *device, const char *text);

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index a4aa7eb..136fa73 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1992,7 +1992,7 @@ void drbd_device_cleanup(struct drbd_device *device)
drbd_bm_cleanup(device);
}

- drbd_free_ldev(device->ldev);
+ drbd_backing_dev_free(device, device->ldev);
device->ldev = NULL;

clear_bit(AL_SUSPENDED, &device->flags);
@@ -2171,7 +2171,7 @@ void drbd_destroy_device(struct kref *kref)
if (device->this_bdev)
bdput(device->this_bdev);

- drbd_free_ldev(device->ldev);
+ drbd_backing_dev_free(device, device->ldev);
device->ldev = NULL;

drbd_release_all_peer_reqs(device);
@@ -2964,18 +2964,6 @@ fail:
return err;
}

-void drbd_free_ldev(struct drbd_backing_dev *ldev)
-{
- if (ldev == NULL)
- return;
-
- blkdev_put(ldev->backing_bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
- blkdev_put(ldev->md_bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
-
- kfree(ldev->disk_conf);
- kfree(ldev);
-}
-
static void drbd_free_one_sock(struct drbd_socket *ds)
{
struct socket *s;
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 4703f1a..ee34739 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1471,6 +1471,88 @@ success:
return 0;
}

+static struct block_device *open_backing_dev(struct drbd_device *device,
+ const char *bdev_path, void *claim_ptr, bool do_bd_link)
+{
+ struct block_device *bdev;
+ int err = 0;
+
+ bdev = blkdev_get_by_path(bdev_path,
+ FMODE_READ | FMODE_WRITE | FMODE_EXCL, claim_ptr);
+ if (IS_ERR(bdev)) {
+ drbd_err(device, "open(\"%s\") failed with %ld\n",
+ bdev_path, PTR_ERR(bdev));
+ return bdev;
+ }
+
+ if (!do_bd_link)
+ return bdev;
+
+ err = bd_link_disk_holder(bdev, device->vdisk);
+ if (err) {
+ blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+ drbd_err(device, "bd_link_disk_holder(\"%s\", ...) failed with %d\n",
+ bdev_path, err);
+ bdev = ERR_PTR(err);
+ }
+ return bdev;
+}
+
+static int open_backing_devices(struct drbd_device *device,
+ struct disk_conf *new_disk_conf,
+ struct drbd_backing_dev *nbc)
+{
+ struct block_device *bdev;
+
+ bdev = open_backing_dev(device, new_disk_conf->backing_dev, device, true);
+ if (IS_ERR(bdev))
+ return ERR_OPEN_DISK;
+ nbc->backing_bdev = bdev;
+
+ /*
+ * meta_dev_idx >= 0: external fixed size, possibly multiple
+ * drbd sharing one meta device. TODO in that case, paranoia
+ * check that [md_bdev, meta_dev_idx] is not yet used by some
+ * other drbd minor! (if you use drbd.conf + drbdadm, that
+ * should check it for you already; but if you don't, or
+ * someone fooled it, we need to double check here)
+ */
+ bdev = open_backing_dev(device, new_disk_conf->meta_dev,
+ /* claim ptr: device, if claimed exclusively; shared drbd_m_holder,
+ * if potentially shared with other drbd minors */
+ (new_disk_conf->meta_dev_idx < 0) ? (void*)device : (void*)drbd_m_holder,
+ /* avoid double bd_claim_by_disk() for the same (source,target) tuple,
+ * as would happen with internal metadata. */
+ (new_disk_conf->meta_dev_idx != DRBD_MD_INDEX_FLEX_INT &&
+ new_disk_conf->meta_dev_idx != DRBD_MD_INDEX_INTERNAL));
+ if (IS_ERR(bdev))
+ return ERR_OPEN_MD_DISK;
+ nbc->md_bdev = bdev;
+ return NO_ERROR;
+}
+
+static void close_backing_dev(struct drbd_device *device, struct block_device *bdev,
+ bool do_bd_unlink)
+{
+ if (!bdev)
+ return;
+ if (do_bd_unlink)
+ bd_unlink_disk_holder(bdev, device->vdisk);
+ blkdev_put(bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+}
+
+void drbd_backing_dev_free(struct drbd_device *device, struct drbd_backing_dev *ldev)
+{
+ if (ldev == NULL)
+ return;
+
+ close_backing_dev(device, ldev->md_bdev, ldev->md_bdev != ldev->backing_bdev);
+ close_backing_dev(device, ldev->backing_bdev, true);
+
+ kfree(ldev->disk_conf);
+ kfree(ldev);
+}
+
int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
{
struct drbd_config_context adm_ctx;
@@ -1484,7 +1566,6 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
sector_t min_md_device_sectors;
struct drbd_backing_dev *nbc = NULL; /* new_backing_conf */
struct disk_conf *new_disk_conf = NULL;
- struct block_device *bdev;
struct lru_cache *resync_lru = NULL;
struct fifo_buffer *new_plan = NULL;
union drbd_state ns, os;
@@ -1572,35 +1653,9 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
}
rcu_read_unlock();

- bdev = blkdev_get_by_path(new_disk_conf->backing_dev,
- FMODE_READ | FMODE_WRITE | FMODE_EXCL, device);
- if (IS_ERR(bdev)) {
- drbd_err(device, "open(\"%s\") failed with %ld\n", new_disk_conf->backing_dev,
- PTR_ERR(bdev));
- retcode = ERR_OPEN_DISK;
- goto fail;
- }
- nbc->backing_bdev = bdev;
-
- /*
- * meta_dev_idx >= 0: external fixed size, possibly multiple
- * drbd sharing one meta device. TODO in that case, paranoia
- * check that [md_bdev, meta_dev_idx] is not yet used by some
- * other drbd minor! (if you use drbd.conf + drbdadm, that
- * should check it for you already; but if you don't, or
- * someone fooled it, we need to double check here)
- */
- bdev = blkdev_get_by_path(new_disk_conf->meta_dev,
- FMODE_READ | FMODE_WRITE | FMODE_EXCL,
- (new_disk_conf->meta_dev_idx < 0) ?
- (void *)device : (void *)drbd_m_holder);
- if (IS_ERR(bdev)) {
- drbd_err(device, "open(\"%s\") failed with %ld\n", new_disk_conf->meta_dev,
- PTR_ERR(bdev));
- retcode = ERR_OPEN_MD_DISK;
+ retcode = open_backing_devices(device, new_disk_conf, nbc);
+ if (retcode != NO_ERROR)
goto fail;
- }
- nbc->md_bdev = bdev;

if ((nbc->backing_bdev == nbc->md_bdev) !=
(new_disk_conf->meta_dev_idx == DRBD_MD_INDEX_INTERNAL ||
@@ -1900,12 +1955,8 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
fail:
conn_reconfig_done(connection);
if (nbc) {
- if (nbc->backing_bdev)
- blkdev_put(nbc->backing_bdev,
- FMODE_READ | FMODE_WRITE | FMODE_EXCL);
- if (nbc->md_bdev)
- blkdev_put(nbc->md_bdev,
- FMODE_READ | FMODE_WRITE | FMODE_EXCL);
+ close_backing_dev(device, nbc->md_bdev, nbc->md_bdev != nbc->backing_bdev);
+ close_backing_dev(device, nbc->backing_bdev, true);
kfree(nbc);
}
kfree(new_disk_conf);
diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 2f29bf3..eff716c 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -1841,7 +1841,7 @@ static void drbd_ldev_destroy(struct drbd_device *device)
device->act_log = NULL;

__acquire(local);
- drbd_free_ldev(device->ldev);
+ drbd_backing_dev_free(device, device->ldev);
device->ldev = NULL;
__release(local);

diff --git a/include/linux/drbd.h b/include/linux/drbd.h
index 392fc0e..d6b3c99 100644
--- a/include/linux/drbd.h
+++ b/include/linux/drbd.h
@@ -51,7 +51,7 @@
#endif

extern const char *drbd_buildtag(void);
-#define REL_VERSION "8.4.5"
+#define REL_VERSION "8.4.6"
#define API_VERSION 1
#define PRO_VERSION_MIN 86
#define PRO_VERSION_MAX 101
--
1.9.1

2015-11-25 11:12:58

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 31/38] lru_cache: Converted lc_seq_printf_status to return void

From: Roland Kammerer <[email protected]>

Fix the semantic of lc_seq_printf. Currently, it always returns 0 and
the return value is unused, therefore, convert the return type to void.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
include/linux/lru_cache.h | 2 +-
lib/lru_cache.c | 4 +---
2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/lru_cache.h b/include/linux/lru_cache.h
index 4626228..04fc6e6 100644
--- a/include/linux/lru_cache.h
+++ b/include/linux/lru_cache.h
@@ -264,7 +264,7 @@ extern unsigned int lc_put(struct lru_cache *lc, struct lc_element *e);
extern void lc_committed(struct lru_cache *lc);

struct seq_file;
-extern size_t lc_seq_printf_stats(struct seq_file *seq, struct lru_cache *lc);
+extern void lc_seq_printf_stats(struct seq_file *seq, struct lru_cache *lc);

extern void lc_seq_dump_details(struct seq_file *seq, struct lru_cache *lc, char *utext,
void (*detail) (struct seq_file *, struct lc_element *));
diff --git a/lib/lru_cache.c b/lib/lru_cache.c
index 028f5d9..28ba40b 100644
--- a/lib/lru_cache.c
+++ b/lib/lru_cache.c
@@ -238,7 +238,7 @@ void lc_reset(struct lru_cache *lc)
* @seq: the seq_file to print into
* @lc: the lru cache to print statistics of
*/
-size_t lc_seq_printf_stats(struct seq_file *seq, struct lru_cache *lc)
+void lc_seq_printf_stats(struct seq_file *seq, struct lru_cache *lc)
{
/* NOTE:
* total calls to lc_get are
@@ -250,8 +250,6 @@ size_t lc_seq_printf_stats(struct seq_file *seq, struct lru_cache *lc)
seq_printf(seq, "\t%s: used:%u/%u hits:%lu misses:%lu starving:%lu locked:%lu changed:%lu\n",
lc->name, lc->used, lc->nr_elements,
lc->hits, lc->misses, lc->starving, lc->locked, lc->changed);
-
- return 0;
}

static struct hlist_head *lc_hash_slot(struct lru_cache *lc, unsigned int enr)
--
1.9.1

2015-11-25 11:02:24

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 32/38] drbd: don't block forever in disconnect during resync if fencing=r-a-stonith

From: Lars Ellenberg <[email protected]>

Disconnect should wait for pending bitmap IO.
But if that bitmap IO is not happening, because it is waiting for
pending application IO, and there is no progress, because the fencing
policy suspended application IO because of the disconnect,
then we deadlock.

The bitmap writeout in this case does not care for concurrent
application IO, so there is no point waiting for it.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_main.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 136fa73..5b43dfb 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -3563,7 +3563,9 @@ void drbd_queue_bitmap_io(struct drbd_device *device,

spin_lock_irq(&device->resource->req_lock);
set_bit(BITMAP_IO, &device->flags);
- if (atomic_read(&device->ap_bio_cnt) == 0) {
+ /* don't wait for pending application IO if the caller indicates that
+ * application IO does not conflict anyways. */
+ if (flags == BM_LOCKED_CHANGE_ALLOWED || atomic_read(&device->ap_bio_cnt) == 0) {
if (!test_and_set_bit(BITMAP_IO_QUEUED, &device->flags))
drbd_queue_work(&first_peer_device(device)->connection->sender_work,
&device->bm_io_work.w);
--
1.9.1

2015-11-25 11:00:24

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 33/38] drbd: fix memory leak in drbd_adm_resize

From: Oleg Drokin <[email protected]>

new_disk_conf could be leaked if the follow on checks fail,
so make sure to free it on error if it was not assigned yet.

Found with smatch.

Signed-off-by: Oleg Drokin <[email protected]>
Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index ee34739..6137789 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2706,6 +2706,7 @@ int drbd_adm_resize(struct sk_buff *skb, struct genl_info *info)
mutex_unlock(&device->resource->conf_update);
synchronize_rcu();
kfree(old_disk_conf);
+ new_disk_conf = NULL;
}

ddsf = (rs.resize_force ? DDSF_FORCED : 0) | (rs.no_resync ? DDSF_NO_RESYNC : 0);
@@ -2739,6 +2740,7 @@ int drbd_adm_resize(struct sk_buff *skb, struct genl_info *info)

fail_ldev:
put_ldev(device);
+ kfree(new_disk_conf);
goto fail;
}

--
1.9.1

2015-11-25 11:02:34

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 34/38] drbd: fix "endless" transfer log walk in protocol A

From: Lars Ellenberg <[email protected]>

Don't remember a DRBD request as ack_pending, if it is not.

In protocol A, we usually clear RQ_NET_PENDING at the same time we set
RQ_NET_SENT, so when deciding to remember it as ack_pending,
mod_rq_state needs to look at the current request state,
not at the previous state before the current modification was applied.

This should prevent advance_conn_req_ack_pending() from walking the full
transfer log just to find NULL in protocol A, which would cause serious
performance degradation with many "in-flight" requests, e.g. when
working via DRBD-proxy, or with a huge bandwidth-delay product.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_req.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 7907fb5..2255dcf 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -458,7 +458,7 @@ static void mod_rq_state(struct drbd_request *req, struct bio_and_error *m,
atomic_add(req->i.size >> 9, &device->ap_in_flight);
set_if_null_req_not_net_done(peer_device, req);
}
- if (s & RQ_NET_PENDING)
+ if (req->rq_state & RQ_NET_PENDING)
set_if_null_req_ack_pending(peer_device, req);
}

--
1.9.1

2015-11-25 11:02:28

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 35/38] drbd: make suspend_io() / resume_io() must be thread and recursion safe

Avoid to prematurely resume application IO: don't set/clear a single
bit, but inc/dec an atomic counter.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_int.h | 4 ++--
drivers/block/drbd/drbd_nl.c | 8 +++++---
drivers/block/drbd/drbd_state.c | 2 +-
3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index a262653..df3d89d 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -500,7 +500,6 @@ enum {

MD_NO_FUA, /* Users wants us to not use FUA/FLUSH on meta data dev */

- SUSPEND_IO, /* suspend application io */
BITMAP_IO, /* suspend application io;
once no more io in flight, start bitmap io */
BITMAP_IO_QUEUED, /* Started bitmap IO */
@@ -880,6 +879,7 @@ struct drbd_device {
atomic_t rs_pending_cnt; /* RS request/data packets on the wire */
atomic_t unacked_cnt; /* Need to send replies for */
atomic_t local_cnt; /* Waiting for local completion */
+ atomic_t suspend_cnt;

/* Interval tree of pending local requests */
struct rb_root read_requests;
@@ -2263,7 +2263,7 @@ static inline bool may_inc_ap_bio(struct drbd_device *device)

if (drbd_suspended(device))
return false;
- if (test_bit(SUSPEND_IO, &device->flags))
+ if (atomic_read(&device->suspend_cnt))
return false;

/* to avoid potential deadlock or bitmap corruption,
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 6137789..c7cd3df 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -865,9 +865,11 @@ char *ppsize(char *buf, unsigned long long size)
* and can be long lived.
* This changes an device->flag, is triggered by drbd internals,
* and should be short-lived. */
+/* It needs to be a counter, since multiple threads might
+ independently suspend and resume IO. */
void drbd_suspend_io(struct drbd_device *device)
{
- set_bit(SUSPEND_IO, &device->flags);
+ atomic_inc(&device->suspend_cnt);
if (drbd_suspended(device))
return;
wait_event(device->misc_wait, !atomic_read(&device->ap_bio_cnt));
@@ -875,8 +877,8 @@ void drbd_suspend_io(struct drbd_device *device)

void drbd_resume_io(struct drbd_device *device)
{
- clear_bit(SUSPEND_IO, &device->flags);
- wake_up(&device->misc_wait);
+ if (atomic_dec_and_test(&device->suspend_cnt))
+ wake_up(&device->misc_wait);
}

/**
diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index f022e99..5a7ef78 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -1484,7 +1484,7 @@ int drbd_bitmap_io_from_worker(struct drbd_device *device,
D_ASSERT(device, current == first_peer_device(device)->connection->worker.task);

/* open coded non-blocking drbd_suspend_io(device); */
- set_bit(SUSPEND_IO, &device->flags);
+ atomic_inc(&device->suspend_cnt);

drbd_bm_lock(device, why, flags);
rv = io_fn(device);
--
1.9.1

2015-11-25 11:02:16

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 36/38] drbd: separate out __al_write_transaction helper function

From: Lars Ellenberg <[email protected]>

To be able to "force out" an activity log transaction,
even if there are no pending updates.

This will be used to relocate the on-disk activity log,
if the on-disk offsets have to be changed,
without the need to empty the activity log first.

While at it, move the definition,
so we can drop the forward declaration of a static helper.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_actlog.c | 304 ++++++++++++++++++++-------------------
1 file changed, 156 insertions(+), 148 deletions(-)

diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c
index b3868e7..4b484ac 100644
--- a/drivers/block/drbd/drbd_actlog.c
+++ b/drivers/block/drbd/drbd_actlog.c
@@ -288,7 +288,162 @@ bool drbd_al_begin_io_prepare(struct drbd_device *device, struct drbd_interval *
return need_transaction;
}

-static int al_write_transaction(struct drbd_device *device);
+#if (PAGE_SHIFT + 3) < (AL_EXTENT_SHIFT - BM_BLOCK_SHIFT)
+/* Currently BM_BLOCK_SHIFT, BM_EXT_SHIFT and AL_EXTENT_SHIFT
+ * are still coupled, or assume too much about their relation.
+ * Code below will not work if this is violated.
+ * Will be cleaned up with some followup patch.
+ */
+# error FIXME
+#endif
+
+static unsigned int al_extent_to_bm_page(unsigned int al_enr)
+{
+ return al_enr >>
+ /* bit to page */
+ ((PAGE_SHIFT + 3) -
+ /* al extent number to bit */
+ (AL_EXTENT_SHIFT - BM_BLOCK_SHIFT));
+}
+
+static sector_t al_tr_number_to_on_disk_sector(struct drbd_device *device)
+{
+ const unsigned int stripes = device->ldev->md.al_stripes;
+ const unsigned int stripe_size_4kB = device->ldev->md.al_stripe_size_4k;
+
+ /* transaction number, modulo on-disk ring buffer wrap around */
+ unsigned int t = device->al_tr_number % (device->ldev->md.al_size_4k);
+
+ /* ... to aligned 4k on disk block */
+ t = ((t % stripes) * stripe_size_4kB) + t/stripes;
+
+ /* ... to 512 byte sector in activity log */
+ t *= 8;
+
+ /* ... plus offset to the on disk position */
+ return device->ldev->md.md_offset + device->ldev->md.al_offset + t;
+}
+
+static int __al_write_transaction(struct drbd_device *device, struct al_transaction_on_disk *buffer)
+{
+ struct lc_element *e;
+ sector_t sector;
+ int i, mx;
+ unsigned extent_nr;
+ unsigned crc = 0;
+ int err = 0;
+
+ memset(buffer, 0, sizeof(*buffer));
+ buffer->magic = cpu_to_be32(DRBD_AL_MAGIC);
+ buffer->tr_number = cpu_to_be32(device->al_tr_number);
+
+ i = 0;
+
+ /* Even though no one can start to change this list
+ * once we set the LC_LOCKED -- from drbd_al_begin_io(),
+ * lc_try_lock_for_transaction() --, someone may still
+ * be in the process of changing it. */
+ spin_lock_irq(&device->al_lock);
+ list_for_each_entry(e, &device->act_log->to_be_changed, list) {
+ if (i == AL_UPDATES_PER_TRANSACTION) {
+ i++;
+ break;
+ }
+ buffer->update_slot_nr[i] = cpu_to_be16(e->lc_index);
+ buffer->update_extent_nr[i] = cpu_to_be32(e->lc_new_number);
+ if (e->lc_number != LC_FREE)
+ drbd_bm_mark_for_writeout(device,
+ al_extent_to_bm_page(e->lc_number));
+ i++;
+ }
+ spin_unlock_irq(&device->al_lock);
+ BUG_ON(i > AL_UPDATES_PER_TRANSACTION);
+
+ buffer->n_updates = cpu_to_be16(i);
+ for ( ; i < AL_UPDATES_PER_TRANSACTION; i++) {
+ buffer->update_slot_nr[i] = cpu_to_be16(-1);
+ buffer->update_extent_nr[i] = cpu_to_be32(LC_FREE);
+ }
+
+ buffer->context_size = cpu_to_be16(device->act_log->nr_elements);
+ buffer->context_start_slot_nr = cpu_to_be16(device->al_tr_cycle);
+
+ mx = min_t(int, AL_CONTEXT_PER_TRANSACTION,
+ device->act_log->nr_elements - device->al_tr_cycle);
+ for (i = 0; i < mx; i++) {
+ unsigned idx = device->al_tr_cycle + i;
+ extent_nr = lc_element_by_index(device->act_log, idx)->lc_number;
+ buffer->context[i] = cpu_to_be32(extent_nr);
+ }
+ for (; i < AL_CONTEXT_PER_TRANSACTION; i++)
+ buffer->context[i] = cpu_to_be32(LC_FREE);
+
+ device->al_tr_cycle += AL_CONTEXT_PER_TRANSACTION;
+ if (device->al_tr_cycle >= device->act_log->nr_elements)
+ device->al_tr_cycle = 0;
+
+ sector = al_tr_number_to_on_disk_sector(device);
+
+ crc = crc32c(0, buffer, 4096);
+ buffer->crc32c = cpu_to_be32(crc);
+
+ if (drbd_bm_write_hinted(device))
+ err = -EIO;
+ else {
+ bool write_al_updates;
+ rcu_read_lock();
+ write_al_updates = rcu_dereference(device->ldev->disk_conf)->al_updates;
+ rcu_read_unlock();
+ if (write_al_updates) {
+ if (drbd_md_sync_page_io(device, device->ldev, sector, WRITE)) {
+ err = -EIO;
+ drbd_chk_io_error(device, 1, DRBD_META_IO_ERROR);
+ } else {
+ device->al_tr_number++;
+ device->al_writ_cnt++;
+ }
+ }
+ }
+
+ return err;
+}
+
+static int al_write_transaction(struct drbd_device *device)
+{
+ struct al_transaction_on_disk *buffer;
+ int err;
+
+ if (!get_ldev(device)) {
+ drbd_err(device, "disk is %s, cannot start al transaction\n",
+ drbd_disk_str(device->state.disk));
+ return -EIO;
+ }
+
+ /* The bitmap write may have failed, causing a state change. */
+ if (device->state.disk < D_INCONSISTENT) {
+ drbd_err(device,
+ "disk is %s, cannot write al transaction\n",
+ drbd_disk_str(device->state.disk));
+ put_ldev(device);
+ return -EIO;
+ }
+
+ /* protects md_io_buffer, al_tr_cycle, ... */
+ buffer = drbd_md_get_buffer(device, __func__);
+ if (!buffer) {
+ drbd_err(device, "disk failed while waiting for md_io buffer\n");
+ put_ldev(device);
+ return -ENODEV;
+ }
+
+ err = __al_write_transaction(device, buffer);
+
+ drbd_md_put_buffer(device);
+ put_ldev(device);
+
+ return err;
+}
+

void drbd_al_begin_io_commit(struct drbd_device *device)
{
@@ -420,153 +575,6 @@ void drbd_al_complete_io(struct drbd_device *device, struct drbd_interval *i)
wake_up(&device->al_wait);
}

-#if (PAGE_SHIFT + 3) < (AL_EXTENT_SHIFT - BM_BLOCK_SHIFT)
-/* Currently BM_BLOCK_SHIFT, BM_EXT_SHIFT and AL_EXTENT_SHIFT
- * are still coupled, or assume too much about their relation.
- * Code below will not work if this is violated.
- * Will be cleaned up with some followup patch.
- */
-# error FIXME
-#endif
-
-static unsigned int al_extent_to_bm_page(unsigned int al_enr)
-{
- return al_enr >>
- /* bit to page */
- ((PAGE_SHIFT + 3) -
- /* al extent number to bit */
- (AL_EXTENT_SHIFT - BM_BLOCK_SHIFT));
-}
-
-static sector_t al_tr_number_to_on_disk_sector(struct drbd_device *device)
-{
- const unsigned int stripes = device->ldev->md.al_stripes;
- const unsigned int stripe_size_4kB = device->ldev->md.al_stripe_size_4k;
-
- /* transaction number, modulo on-disk ring buffer wrap around */
- unsigned int t = device->al_tr_number % (device->ldev->md.al_size_4k);
-
- /* ... to aligned 4k on disk block */
- t = ((t % stripes) * stripe_size_4kB) + t/stripes;
-
- /* ... to 512 byte sector in activity log */
- t *= 8;
-
- /* ... plus offset to the on disk position */
- return device->ldev->md.md_offset + device->ldev->md.al_offset + t;
-}
-
-int al_write_transaction(struct drbd_device *device)
-{
- struct al_transaction_on_disk *buffer;
- struct lc_element *e;
- sector_t sector;
- int i, mx;
- unsigned extent_nr;
- unsigned crc = 0;
- int err = 0;
-
- if (!get_ldev(device)) {
- drbd_err(device, "disk is %s, cannot start al transaction\n",
- drbd_disk_str(device->state.disk));
- return -EIO;
- }
-
- /* The bitmap write may have failed, causing a state change. */
- if (device->state.disk < D_INCONSISTENT) {
- drbd_err(device,
- "disk is %s, cannot write al transaction\n",
- drbd_disk_str(device->state.disk));
- put_ldev(device);
- return -EIO;
- }
-
- /* protects md_io_buffer, al_tr_cycle, ... */
- buffer = drbd_md_get_buffer(device, __func__);
- if (!buffer) {
- drbd_err(device, "disk failed while waiting for md_io buffer\n");
- put_ldev(device);
- return -ENODEV;
- }
-
- memset(buffer, 0, sizeof(*buffer));
- buffer->magic = cpu_to_be32(DRBD_AL_MAGIC);
- buffer->tr_number = cpu_to_be32(device->al_tr_number);
-
- i = 0;
-
- /* Even though no one can start to change this list
- * once we set the LC_LOCKED -- from drbd_al_begin_io(),
- * lc_try_lock_for_transaction() --, someone may still
- * be in the process of changing it. */
- spin_lock_irq(&device->al_lock);
- list_for_each_entry(e, &device->act_log->to_be_changed, list) {
- if (i == AL_UPDATES_PER_TRANSACTION) {
- i++;
- break;
- }
- buffer->update_slot_nr[i] = cpu_to_be16(e->lc_index);
- buffer->update_extent_nr[i] = cpu_to_be32(e->lc_new_number);
- if (e->lc_number != LC_FREE)
- drbd_bm_mark_for_writeout(device,
- al_extent_to_bm_page(e->lc_number));
- i++;
- }
- spin_unlock_irq(&device->al_lock);
- BUG_ON(i > AL_UPDATES_PER_TRANSACTION);
-
- buffer->n_updates = cpu_to_be16(i);
- for ( ; i < AL_UPDATES_PER_TRANSACTION; i++) {
- buffer->update_slot_nr[i] = cpu_to_be16(-1);
- buffer->update_extent_nr[i] = cpu_to_be32(LC_FREE);
- }
-
- buffer->context_size = cpu_to_be16(device->act_log->nr_elements);
- buffer->context_start_slot_nr = cpu_to_be16(device->al_tr_cycle);
-
- mx = min_t(int, AL_CONTEXT_PER_TRANSACTION,
- device->act_log->nr_elements - device->al_tr_cycle);
- for (i = 0; i < mx; i++) {
- unsigned idx = device->al_tr_cycle + i;
- extent_nr = lc_element_by_index(device->act_log, idx)->lc_number;
- buffer->context[i] = cpu_to_be32(extent_nr);
- }
- for (; i < AL_CONTEXT_PER_TRANSACTION; i++)
- buffer->context[i] = cpu_to_be32(LC_FREE);
-
- device->al_tr_cycle += AL_CONTEXT_PER_TRANSACTION;
- if (device->al_tr_cycle >= device->act_log->nr_elements)
- device->al_tr_cycle = 0;
-
- sector = al_tr_number_to_on_disk_sector(device);
-
- crc = crc32c(0, buffer, 4096);
- buffer->crc32c = cpu_to_be32(crc);
-
- if (drbd_bm_write_hinted(device))
- err = -EIO;
- else {
- bool write_al_updates;
- rcu_read_lock();
- write_al_updates = rcu_dereference(device->ldev->disk_conf)->al_updates;
- rcu_read_unlock();
- if (write_al_updates) {
- if (drbd_md_sync_page_io(device, device->ldev, sector, WRITE)) {
- err = -EIO;
- drbd_chk_io_error(device, 1, DRBD_META_IO_ERROR);
- } else {
- device->al_tr_number++;
- device->al_writ_cnt++;
- }
- }
- }
-
- drbd_md_put_buffer(device);
- put_ldev(device);
-
- return err;
-}
-
static int _try_lc_del(struct drbd_device *device, struct lc_element *al_ext)
{
int rv;
--
1.9.1

2015-11-25 11:09:39

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 37/38] drbd: avoid potential deadlock during handshake

From: Lars Ellenberg <[email protected]>

During handshake communication, we also reconsider our device size,
using drbd_determine_dev_size(). Just in case we need to change the
offsets or layout of our on-disk metadata, we lock out application
and other meta data IO, and wait for the activity log to be "idle"
(no more referenced extents).

If this handshake happens just after a connection loss, with a fencing
policy of "resource-and-stonith", we have frozen IO.

If, additionally, the activity log was "starving" (too many incoming
random writes at that point in time), it won't become idle, ever,
because of the frozen IO, and this would be a lockup of the receiver
thread, and consquentially of DRBD.

Previous logic (re-)initialized with a special "empty" transaction
block, which required the activity log to fully drain first.

Instead, write out some standard activity log transactions.
Using lc_try_lock_for_transaction() instead of lc_try_lock() does not
care about pending activity log references, avoiding the potential
deadlock.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_actlog.c | 19 +++++++++++--------
drivers/block/drbd/drbd_int.h | 2 +-
drivers/block/drbd/drbd_nl.c | 33 +++++++++++++++++++--------------
3 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c
index 4b484ac..10459a1 100644
--- a/drivers/block/drbd/drbd_actlog.c
+++ b/drivers/block/drbd/drbd_actlog.c
@@ -614,21 +614,24 @@ void drbd_al_shrink(struct drbd_device *device)
wake_up(&device->al_wait);
}

-int drbd_initialize_al(struct drbd_device *device, void *buffer)
+int drbd_al_initialize(struct drbd_device *device, void *buffer)
{
struct al_transaction_on_disk *al = buffer;
struct drbd_md *md = &device->ldev->md;
- sector_t al_base = md->md_offset + md->al_offset;
int al_size_4k = md->al_stripes * md->al_stripe_size_4k;
int i;

- memset(al, 0, 4096);
- al->magic = cpu_to_be32(DRBD_AL_MAGIC);
- al->transaction_type = cpu_to_be16(AL_TR_INITIALIZED);
- al->crc32c = cpu_to_be32(crc32c(0, al, 4096));
+ __al_write_transaction(device, al);
+ /* There may or may not have been a pending transaction. */
+ spin_lock_irq(&device->al_lock);
+ lc_committed(device->act_log);
+ spin_unlock_irq(&device->al_lock);

- for (i = 0; i < al_size_4k; i++) {
- int err = drbd_md_sync_page_io(device, device->ldev, al_base + i * 8, WRITE);
+ /* The rest of the transactions will have an empty "updates" list, and
+ * are written out only to provide the context, and to initialize the
+ * on-disk ring buffer. */
+ for (i = 1; i < al_size_4k; i++) {
+ int err = __al_write_transaction(device, al);
if (err)
return err;
}
diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index df3d89d..b6844fe 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -1667,7 +1667,7 @@ extern int __drbd_change_sync(struct drbd_device *device, sector_t sector, int s
#define drbd_rs_failed_io(device, sector, size) \
__drbd_change_sync(device, sector, size, RECORD_RS_FAILED)
extern void drbd_al_shrink(struct drbd_device *device);
-extern int drbd_initialize_al(struct drbd_device *, void *);
+extern int drbd_al_initialize(struct drbd_device *, void *);

/* drbd_nl.c */
/* state info broadcast */
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index c7cd3df..f4ca273 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -903,15 +903,14 @@ drbd_determine_dev_size(struct drbd_device *device, enum dds_flags flags, struct
int md_moved, la_size_changed;
enum determine_dev_size rv = DS_UNCHANGED;

- /* race:
- * application request passes inc_ap_bio,
- * but then cannot get an AL-reference.
- * this function later may wait on ap_bio_cnt == 0. -> deadlock.
+ /* We may change the on-disk offsets of our meta data below. Lock out
+ * anything that may cause meta data IO, to avoid acting on incomplete
+ * layout changes or scribbling over meta data that is in the process
+ * of being moved.
*
- * to avoid that:
- * Suspend IO right here.
- * still lock the act_log to not trigger ASSERTs there.
- */
+ * Move is not exactly correct, btw, currently we have all our meta
+ * data in core memory, to "move" it we just write it all out, there
+ * are no reads. */
drbd_suspend_io(device);
buffer = drbd_md_get_buffer(device, __func__); /* Lock meta-data IO */
if (!buffer) {
@@ -919,9 +918,6 @@ drbd_determine_dev_size(struct drbd_device *device, enum dds_flags flags, struct
return DS_ERROR;
}

- /* no wait necessary anymore, actually we could assert that */
- wait_event(device->al_wait, lc_try_lock(device->act_log));
-
prev_first_sect = drbd_md_first_sector(device->ldev);
prev_size = device->ldev->md.md_size_sect;
la_size_sect = device->ldev->md.la_size_sect;
@@ -997,20 +993,29 @@ drbd_determine_dev_size(struct drbd_device *device, enum dds_flags flags, struct
* Clear the timer, to avoid scary "timer expired!" messages,
* "Superblock" is written out at least twice below, anyways. */
del_timer(&device->md_sync_timer);
- drbd_al_shrink(device); /* All extents inactive. */

+ /* We won't change the "al-extents" setting, we just may need
+ * to move the on-disk location of the activity log ringbuffer.
+ * Lock for transaction is good enough, it may well be "dirty"
+ * or even "starving". */
+ wait_event(device->al_wait, lc_try_lock_for_transaction(device->act_log));
+
+ /* mark current on-disk bitmap and activity log as unreliable */
prev_flags = md->flags;
- md->flags &= ~MDF_PRIMARY_IND;
+ md->flags |= MDF_FULL_SYNC | MDF_AL_DISABLED;
drbd_md_write(device, buffer);

+ drbd_al_initialize(device, buffer);
+
drbd_info(device, "Writing the whole bitmap, %s\n",
la_size_changed && md_moved ? "size changed and md moved" :
la_size_changed ? "size changed" : "md moved");
/* next line implicitly does drbd_suspend_io()+drbd_resume_io() */
drbd_bitmap_io(device, md_moved ? &drbd_bm_write_all : &drbd_bm_write,
"size changed", BM_LOCKED_MASK);
- drbd_initialize_al(device, buffer);

+ /* on-disk bitmap and activity log is authoritative again
+ * (unless there was an IO error meanwhile...) */
md->flags = prev_flags;
drbd_md_write(device, buffer);

--
1.9.1

2015-11-25 11:13:41

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 38/38] drbd: fix error path during resize

From: Lars Ellenberg <[email protected]>

In case the lower level device size changed, but some other internal
details of the resize did not work out, drbd_determine_dev_size() would
try to restore the previous settings, trusting
drbd_md_set_sector_offsets() to "do the right thing", but overlooked
that this internally may set the meta data base offset based on device size.

This could end up with incomplete on-disk meta data layout change, and
ultimately lead to data corruption (if the failure was not noticed or
ignored by the operator, and other things go wrong as well).

Just remember all meta data related offsets/sizes,
and on error restore them all.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 68 +++++++++++++++++++++++++-------------------
1 file changed, 38 insertions(+), 30 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index f4ca273..c055c5e 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -891,12 +891,18 @@ void drbd_resume_io(struct drbd_device *device)
enum determine_dev_size
drbd_determine_dev_size(struct drbd_device *device, enum dds_flags flags, struct resize_parms *rs) __must_hold(local)
{
- sector_t prev_first_sect, prev_size; /* previous meta location */
- sector_t la_size_sect, u_size;
+ struct md_offsets_and_sizes {
+ u64 last_agreed_sect;
+ u64 md_offset;
+ s32 al_offset;
+ s32 bm_offset;
+ u32 md_size_sect;
+
+ u32 al_stripes;
+ u32 al_stripe_size_4k;
+ } prev;
+ sector_t u_size, size;
struct drbd_md *md = &device->ldev->md;
- u32 prev_al_stripe_size_4k;
- u32 prev_al_stripes;
- sector_t size;
char ppb[10];
void *buffer;

@@ -918,16 +924,17 @@ drbd_determine_dev_size(struct drbd_device *device, enum dds_flags flags, struct
return DS_ERROR;
}

- prev_first_sect = drbd_md_first_sector(device->ldev);
- prev_size = device->ldev->md.md_size_sect;
- la_size_sect = device->ldev->md.la_size_sect;
+ /* remember current offset and sizes */
+ prev.last_agreed_sect = md->la_size_sect;
+ prev.md_offset = md->md_offset;
+ prev.al_offset = md->al_offset;
+ prev.bm_offset = md->bm_offset;
+ prev.md_size_sect = md->md_size_sect;
+ prev.al_stripes = md->al_stripes;
+ prev.al_stripe_size_4k = md->al_stripe_size_4k;

if (rs) {
/* rs is non NULL if we should change the AL layout only */
-
- prev_al_stripes = md->al_stripes;
- prev_al_stripe_size_4k = md->al_stripe_size_4k;
-
md->al_stripes = rs->al_stripes;
md->al_stripe_size_4k = rs->al_stripe_size / 4;
md->al_size_4k = (u64)rs->al_stripes * rs->al_stripe_size / 4;
@@ -940,7 +947,7 @@ drbd_determine_dev_size(struct drbd_device *device, enum dds_flags flags, struct
rcu_read_unlock();
size = drbd_new_dev_size(device, device->ldev, u_size, flags & DDSF_FORCED);

- if (size < la_size_sect) {
+ if (size < prev.last_agreed_sect) {
if (rs && u_size == 0) {
/* Remove "rs &&" later. This check should always be active, but
right now the receiver expects the permissive behavior */
@@ -961,30 +968,29 @@ drbd_determine_dev_size(struct drbd_device *device, enum dds_flags flags, struct
err = drbd_bm_resize(device, size, !(flags & DDSF_NO_RESYNC));
if (unlikely(err)) {
/* currently there is only one error: ENOMEM! */
- size = drbd_bm_capacity(device)>>1;
+ size = drbd_bm_capacity(device);
if (size == 0) {
drbd_err(device, "OUT OF MEMORY! "
"Could not allocate bitmap!\n");
} else {
drbd_err(device, "BM resizing failed. "
- "Leaving size unchanged at size = %lu KB\n",
- (unsigned long)size);
+ "Leaving size unchanged\n");
}
rv = DS_ERROR;
}
/* racy, see comments above. */
drbd_set_my_capacity(device, size);
- device->ldev->md.la_size_sect = size;
+ md->la_size_sect = size;
drbd_info(device, "size = %s (%llu KB)\n", ppsize(ppb, size>>1),
(unsigned long long)size>>1);
}
if (rv <= DS_ERROR)
goto err_out;

- la_size_changed = (la_size_sect != device->ldev->md.la_size_sect);
+ la_size_changed = (prev.last_agreed_sect != md->la_size_sect);

- md_moved = prev_first_sect != drbd_md_first_sector(device->ldev)
- || prev_size != device->ldev->md.md_size_sect;
+ md_moved = prev.md_offset != md->md_offset
+ || prev.md_size_sect != md->md_size_sect;

if (la_size_changed || md_moved || rs) {
u32 prev_flags;
@@ -1024,20 +1030,22 @@ drbd_determine_dev_size(struct drbd_device *device, enum dds_flags flags, struct
md->al_stripes, md->al_stripe_size_4k * 4);
}

- if (size > la_size_sect)
- rv = la_size_sect ? DS_GREW : DS_GREW_FROM_ZERO;
- if (size < la_size_sect)
+ if (size > prev.last_agreed_sect)
+ rv = prev.last_agreed_sect ? DS_GREW : DS_GREW_FROM_ZERO;
+ if (size < prev.last_agreed_sect)
rv = DS_SHRUNK;

if (0) {
err_out:
- if (rs) {
- md->al_stripes = prev_al_stripes;
- md->al_stripe_size_4k = prev_al_stripe_size_4k;
- md->al_size_4k = (u64)prev_al_stripes * prev_al_stripe_size_4k;
-
- drbd_md_set_sector_offsets(device, device->ldev);
- }
+ /* restore previous offset and sizes */
+ md->la_size_sect = prev.last_agreed_sect;
+ md->md_offset = prev.md_offset;
+ md->al_offset = prev.al_offset;
+ md->bm_offset = prev.bm_offset;
+ md->md_size_sect = prev.md_size_sect;
+ md->al_stripes = prev.al_stripes;
+ md->al_stripe_size_4k = prev.al_stripe_size_4k;
+ md->al_size_4k = (u64)prev.al_stripes * prev.al_stripe_size_4k;
}
lc_unlock(device->act_log);
wake_up(&device->al_wait);
--
1.9.1

2015-11-25 18:02:01

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH 00/38] DRBD update

On 11/25/2015 03:53 AM, Philipp Reisner wrote:
> Hi Jens,
>
> please pull these patches into your for-4.5/drivers branch.
>
> This huge patch set updates the in-tree DRBD to what we have out of tree.
> All of this has been extensively tested and in production use by LINBIT's
> customers.
>
> Andreas' patches backport some DRBD-9 interface functionality, easing
> smooth migration of the user base to DRBD-9 later on. These patches
> add contains touch the most lines in the series.
>
> Lars and others did the maintenance and bug-fixing work.

Applied for 4.5, thanks.

--
Jens Axboe