2015-07-30 11:30:56

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 00/19] RFC DRBD updates for the 4.3 merge window

Hi,

these patches are in the end intended for the 4.3 merge window.

Apart from a few random fixes and cleanups, the biggest part of
the patch are two new userspace commands for drbd ("events" and
"status"). These two commands where developed with the drbd-9
effort. Introducing them right now into the current code base,
allows for smooth transition of the user base.


Andreas Gruenbacher (7):
drbd: De-inline drbd_should_do_remote() and
drbd_should_send_out_of_sync()
drbd: Get rid of some first_peer_device() calls
drbd: Move enum write_ordering_e to drbd.h
drbd: drbd_adm_attach(): Add missing drbd_resync_after_changed()
drbd: Fix locking across all resources
drbd: Backport the "events2" command
drbd: Backport the "status" command

Lars Ellenberg (9):
drbd: Fix spurious disk-timeout
drbd: drop remnants of connector -- we don't use it anymore in drbd
8.4
drbd: drbdsetup detach of an unresponsive local disk should not block
IO "forever"
drbd: also bump UUIDs if a diskless primary connects
drbd: add comment why we want to first call local-io-error, then send
state
drbd: drbd_panic_after_delayed_completion_of_aborted_request()
drbd: improve network timeout detection
drbd: fix NULL deref in remember_new_state
drbd: fix refcount error during detach of an already failed disk

Markus Elfring (1):
drbd: Deletion of an unnecessary check before the function call
"lc_destroy"

Philipp Reisner (2):
drbd: Remove pointless check
drbd: Replace 0 with the more meaningful GFP_NOWAIT

drivers/block/drbd/drbd_int.h | 71 ++-
drivers/block/drbd/drbd_main.c | 26 +-
drivers/block/drbd/drbd_nl.c | 1077 +++++++++++++++++++++++++++++++-
drivers/block/drbd/drbd_proc.c | 6 +-
drivers/block/drbd/drbd_receiver.c | 42 +-
drivers/block/drbd/drbd_req.c | 143 ++++-
drivers/block/drbd/drbd_req.h | 17 +-
drivers/block/drbd/drbd_state.c | 426 ++++++++++++-
drivers/block/drbd/drbd_state.h | 6 +-
drivers/block/drbd/drbd_state_change.h | 63 ++
drivers/block/drbd/drbd_worker.c | 95 ++-
include/linux/drbd.h | 24 +-
include/linux/drbd_genl.h | 149 +++++
include/linux/idr.h | 14 +
14 files changed, 1983 insertions(+), 176 deletions(-)
create mode 100644 drivers/block/drbd/drbd_state_change.h

--
1.9.1


2015-07-30 11:33:21

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 01/19] drbd: Remove pointless check

In drbd-8.4 there is always a single connection per resource,
and there is always exactly one peer_device for a device.
peer_device can not be NULL here.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 74df8cf..bdd405b 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1478,7 +1478,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
device = adm_ctx.device;
mutex_lock(&adm_ctx.resource->adm_mutex);
peer_device = first_peer_device(device);
- connection = peer_device ? peer_device->connection : NULL;
+ connection = peer_device->connection;
conn_reconfig_start(connection);

/* if you want to reconfigure, please tear down first */
--
1.9.1

2015-07-30 11:35:43

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 02/19] drbd: De-inline drbd_should_do_remote() and drbd_should_send_out_of_sync()

From: Andreas Gruenbacher <[email protected]>

There is no need to have these two as inline functions. In addition,
drbd_should_send_out_of_sync() is only used in a single place, anyway.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_req.c | 18 ++++++++++++++++++
drivers/block/drbd/drbd_req.h | 17 +----------------
2 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 3907202..e9981b5 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1094,6 +1094,24 @@ static bool do_remote_read(struct drbd_request *req)
return false;
}

+bool drbd_should_do_remote(union drbd_dev_state s)
+{
+ return s.pdsk == D_UP_TO_DATE ||
+ (s.pdsk >= D_INCONSISTENT &&
+ s.conn >= C_WF_BITMAP_T &&
+ s.conn < C_AHEAD);
+ /* Before proto 96 that was >= CONNECTED instead of >= C_WF_BITMAP_T.
+ That is equivalent since before 96 IO was frozen in the C_WF_BITMAP*
+ states. */
+}
+
+static bool drbd_should_send_out_of_sync(union drbd_dev_state s)
+{
+ return s.conn == C_AHEAD || s.conn == C_WF_BITMAP_S;
+ /* pdsk = D_INCONSISTENT as a consequence. Protocol 96 check not necessary
+ since we enter state C_AHEAD only if proto >= 96 */
+}
+
/* returns number of connections (== 1, for drbd 8.4)
* expected to actually write this data,
* which does NOT include those that we are L_AHEAD for. */
diff --git a/drivers/block/drbd/drbd_req.h b/drivers/block/drbd/drbd_req.h
index 9f6a040..bb2ef78 100644
--- a/drivers/block/drbd/drbd_req.h
+++ b/drivers/block/drbd/drbd_req.h
@@ -331,21 +331,6 @@ static inline int req_mod(struct drbd_request *req,
return rv;
}

-static inline bool drbd_should_do_remote(union drbd_dev_state s)
-{
- return s.pdsk == D_UP_TO_DATE ||
- (s.pdsk >= D_INCONSISTENT &&
- s.conn >= C_WF_BITMAP_T &&
- s.conn < C_AHEAD);
- /* Before proto 96 that was >= CONNECTED instead of >= C_WF_BITMAP_T.
- That is equivalent since before 96 IO was frozen in the C_WF_BITMAP*
- states. */
-}
-static inline bool drbd_should_send_out_of_sync(union drbd_dev_state s)
-{
- return s.conn == C_AHEAD || s.conn == C_WF_BITMAP_S;
- /* pdsk = D_INCONSISTENT as a consequence. Protocol 96 check not necessary
- since we enter state C_AHEAD only if proto >= 96 */
-}
+extern bool drbd_should_do_remote(union drbd_dev_state);

#endif
--
1.9.1

2015-07-30 11:33:41

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 03/19] drbd: Get rid of some first_peer_device() calls

From: Andreas Gruenbacher <[email protected]>

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_receiver.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index c097909..e594391 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1379,7 +1379,7 @@ int drbd_submit_peer_request(struct drbd_device *device,
if (peer_req->flags & EE_IS_TRIM_USE_ZEROOUT) {
/* wait for all pending IO completions, before we start
* zeroing things out. */
- conn_wait_active_ee_empty(first_peer_device(device)->connection);
+ conn_wait_active_ee_empty(peer_req->peer_device->connection);
/* add it to the active list now,
* so we can find it to present it in debugfs */
peer_req->submit_jif = jiffies;
@@ -1965,7 +1965,7 @@ static int e_end_block(struct drbd_work *w, int cancel)
} else
D_ASSERT(device, drbd_interval_empty(&peer_req->i));

- drbd_may_finish_epoch(first_peer_device(device)->connection, peer_req->epoch, EV_PUT + (cancel ? EV_CLEANUP : 0));
+ drbd_may_finish_epoch(peer_device->connection, peer_req->epoch, EV_PUT + (cancel ? EV_CLEANUP : 0));

return err;
}
@@ -2097,7 +2097,7 @@ static int wait_for_and_update_peer_seq(struct drbd_peer_device *peer_device, co
}

rcu_read_lock();
- tp = rcu_dereference(first_peer_device(device)->connection->net_conf)->two_primaries;
+ tp = rcu_dereference(peer_device->connection->net_conf)->two_primaries;
rcu_read_unlock();

if (!tp)
@@ -2363,7 +2363,7 @@ static int receive_Data(struct drbd_connection *connection, struct packet_info *
if (dp_flags & DP_SEND_RECEIVE_ACK) {
/* I really don't like it that the receiver thread
* sends on the msock, but anyways */
- drbd_send_ack(first_peer_device(device), P_RECV_ACK, peer_req);
+ drbd_send_ack(peer_device, P_RECV_ACK, peer_req);
}

if (tp) {
--
1.9.1

2015-07-30 11:30:10

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 04/19] drbd: Move enum write_ordering_e to drbd.h

From: Andreas Gruenbacher <[email protected]>

Also change the enum values to all-capital letters.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_int.h | 6 ------
drivers/block/drbd/drbd_main.c | 2 +-
drivers/block/drbd/drbd_nl.c | 4 ++--
drivers/block/drbd/drbd_proc.c | 6 +++---
drivers/block/drbd/drbd_receiver.c | 28 ++++++++++++++--------------
include/linux/drbd.h | 7 +++++++
6 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index efd19c2..443ae18 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -632,12 +632,6 @@ struct bm_io_work {
void (*done)(struct drbd_device *device, int rv);
};

-enum write_ordering_e {
- WO_none,
- WO_drain_io,
- WO_bdev_flush,
-};
-
struct fifo_buffer {
unsigned int head_index;
unsigned int size;
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index a151853..f21f46f 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2590,7 +2590,7 @@ struct drbd_resource *drbd_create_resource(const char *name)
kref_init(&resource->kref);
idr_init(&resource->devices);
INIT_LIST_HEAD(&resource->connections);
- resource->write_ordering = WO_bdev_flush;
+ resource->write_ordering = WO_BDEV_FLUSH;
list_add_tail_rcu(&resource->resources, &drbd_resources);
mutex_init(&resource->conf_update);
mutex_init(&resource->adm_mutex);
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index bdd405b..e1d5453 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1418,7 +1418,7 @@ int drbd_adm_disk_opts(struct sk_buff *skb, struct genl_info *info)
set_bit(MD_NO_FUA, &device->flags);

if (write_ordering_changed(old_disk_conf, new_disk_conf))
- drbd_bump_write_ordering(device->resource, NULL, WO_bdev_flush);
+ drbd_bump_write_ordering(device->resource, NULL, WO_BDEV_FLUSH);

drbd_md_sync(device);

@@ -1727,7 +1727,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
new_disk_conf = NULL;
new_plan = NULL;

- drbd_bump_write_ordering(device->resource, device->ldev, WO_bdev_flush);
+ drbd_bump_write_ordering(device->resource, device->ldev, WO_BDEV_FLUSH);

if (drbd_md_test_flag(device->ldev, MDF_CRASHED_PRIMARY))
set_bit(CRASHED_PRIMARY, &device->flags);
diff --git a/drivers/block/drbd/drbd_proc.c b/drivers/block/drbd/drbd_proc.c
index 3b10fa6..6537b25 100644
--- a/drivers/block/drbd/drbd_proc.c
+++ b/drivers/block/drbd/drbd_proc.c
@@ -245,9 +245,9 @@ static int drbd_seq_show(struct seq_file *seq, void *v)
char wp;

static char write_ordering_chars[] = {
- [WO_none] = 'n',
- [WO_drain_io] = 'd',
- [WO_bdev_flush] = 'f',
+ [WO_NONE] = 'n',
+ [WO_DRAIN_IO] = 'd',
+ [WO_BDEV_FLUSH] = 'f',
};

seq_printf(seq, "version: " REL_VERSION " (api:%d/proto:%d-%d)\n%s\n",
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index e594391..85856ec 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1177,7 +1177,7 @@ static void drbd_flush(struct drbd_connection *connection)
struct drbd_peer_device *peer_device;
int vnr;

- if (connection->resource->write_ordering >= WO_bdev_flush) {
+ if (connection->resource->write_ordering >= WO_BDEV_FLUSH) {
rcu_read_lock();
idr_for_each_entry(&connection->peer_devices, peer_device, vnr) {
struct drbd_device *device = peer_device->device;
@@ -1202,7 +1202,7 @@ static void drbd_flush(struct drbd_connection *connection)
/* would rather check on EOPNOTSUPP, but that is not reliable.
* don't try again for ANY return value != 0
* if (rv == -EOPNOTSUPP) */
- drbd_bump_write_ordering(connection->resource, NULL, WO_drain_io);
+ drbd_bump_write_ordering(connection->resource, NULL, WO_DRAIN_IO);
}
put_ldev(device);
kref_put(&device->kref, drbd_destroy_device);
@@ -1298,10 +1298,10 @@ max_allowed_wo(struct drbd_backing_dev *bdev, enum write_ordering_e wo)

dc = rcu_dereference(bdev->disk_conf);

- if (wo == WO_bdev_flush && !dc->disk_flushes)
- wo = WO_drain_io;
- if (wo == WO_drain_io && !dc->disk_drain)
- wo = WO_none;
+ if (wo == WO_BDEV_FLUSH && !dc->disk_flushes)
+ wo = WO_DRAIN_IO;
+ if (wo == WO_DRAIN_IO && !dc->disk_drain)
+ wo = WO_NONE;

return wo;
}
@@ -1318,13 +1318,13 @@ void drbd_bump_write_ordering(struct drbd_resource *resource, struct drbd_backin
enum write_ordering_e pwo;
int vnr;
static char *write_ordering_str[] = {
- [WO_none] = "none",
- [WO_drain_io] = "drain",
- [WO_bdev_flush] = "flush",
+ [WO_NONE] = "none",
+ [WO_DRAIN_IO] = "drain",
+ [WO_BDEV_FLUSH] = "flush",
};

pwo = resource->write_ordering;
- if (wo != WO_bdev_flush)
+ if (wo != WO_BDEV_FLUSH)
wo = min(pwo, wo);
rcu_read_lock();
idr_for_each_entry(&resource->devices, device, vnr) {
@@ -1342,7 +1342,7 @@ void drbd_bump_write_ordering(struct drbd_resource *resource, struct drbd_backin
rcu_read_unlock();

resource->write_ordering = wo;
- if (pwo != resource->write_ordering || wo == WO_bdev_flush)
+ if (pwo != resource->write_ordering || wo == WO_BDEV_FLUSH)
drbd_info(resource, "Method to ensure write ordering: %s\n", write_ordering_str[resource->write_ordering]);
}

@@ -1532,7 +1532,7 @@ static int receive_Barrier(struct drbd_connection *connection, struct packet_inf
* Therefore we must send the barrier_ack after the barrier request was
* completed. */
switch (connection->resource->write_ordering) {
- case WO_none:
+ case WO_NONE:
if (rv == FE_RECYCLED)
return 0;

@@ -1545,8 +1545,8 @@ static int receive_Barrier(struct drbd_connection *connection, struct packet_inf
drbd_warn(connection, "Allocation of an epoch failed, slowing down\n");
/* Fall through */

- case WO_bdev_flush:
- case WO_drain_io:
+ case WO_BDEV_FLUSH:
+ case WO_DRAIN_IO:
conn_wait_active_ee_empty(connection);
drbd_flush(connection);

diff --git a/include/linux/drbd.h b/include/linux/drbd.h
index 8723f2a..15a1472 100644
--- a/include/linux/drbd.h
+++ b/include/linux/drbd.h
@@ -357,6 +357,13 @@ enum drbd_timeout_flag {

#define UUID_JUST_CREATED ((__u64)4)

+enum write_ordering_e {
+ WO_NONE,
+ WO_DRAIN_IO,
+ WO_BDEV_FLUSH,
+ WO_BIO_BARRIER
+};
+
/* magic numbers used in meta data and network packets */
#define DRBD_MAGIC 0x83740267
#define DRBD_MAGIC_BIG 0x835a
--
1.9.1

2015-07-30 11:33:37

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 05/19] drbd: drbd_adm_attach(): Add missing drbd_resync_after_changed()

From: Andreas Gruenbacher <[email protected]>

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index e1d5453..186dc9e 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1541,9 +1541,8 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)

write_lock_irq(&global_state_lock);
retcode = drbd_resync_after_valid(device, new_disk_conf->resync_after);
- write_unlock_irq(&global_state_lock);
if (retcode != NO_ERROR)
- goto fail;
+ goto fail_unlock;

rcu_read_lock();
nc = rcu_dereference(connection->net_conf);
@@ -1551,7 +1550,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
if (new_disk_conf->fencing == FP_STONITH && nc->wire_protocol == DRBD_PROT_A) {
rcu_read_unlock();
retcode = ERR_STONITH_AND_PROT_A;
- goto fail;
+ goto fail_unlock;
}
}
rcu_read_unlock();
@@ -1562,7 +1561,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
drbd_err(device, "open(\"%s\") failed with %ld\n", new_disk_conf->backing_dev,
PTR_ERR(bdev));
retcode = ERR_OPEN_DISK;
- goto fail;
+ goto fail_unlock;
}
nbc->backing_bdev = bdev;

@@ -1582,7 +1581,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
drbd_err(device, "open(\"%s\") failed with %ld\n", new_disk_conf->meta_dev,
PTR_ERR(bdev));
retcode = ERR_OPEN_MD_DISK;
- goto fail;
+ goto fail_unlock;
}
nbc->md_bdev = bdev;

@@ -1590,7 +1589,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
(new_disk_conf->meta_dev_idx == DRBD_MD_INDEX_INTERNAL ||
new_disk_conf->meta_dev_idx == DRBD_MD_INDEX_FLEX_INT)) {
retcode = ERR_MD_IDX_INVALID;
- goto fail;
+ goto fail_unlock;
}

resync_lru = lc_create("resync", drbd_bm_ext_cache,
@@ -1598,14 +1597,14 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
offsetof(struct bm_extent, lce));
if (!resync_lru) {
retcode = ERR_NOMEM;
- goto fail;
+ goto fail_unlock;
}

/* Read our meta data super block early.
* This also sets other on-disk offsets. */
retcode = drbd_md_read(device, nbc);
if (retcode != NO_ERROR)
- goto fail;
+ goto fail_unlock;

if (new_disk_conf->al_extents < DRBD_AL_EXTENTS_MIN)
new_disk_conf->al_extents = DRBD_AL_EXTENTS_MIN;
@@ -1617,7 +1616,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
(unsigned long long) drbd_get_max_capacity(nbc),
(unsigned long long) new_disk_conf->disk_size);
retcode = ERR_DISK_TOO_SMALL;
- goto fail;
+ goto fail_unlock;
}

if (new_disk_conf->meta_dev_idx < 0) {
@@ -1634,7 +1633,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
drbd_warn(device, "refusing attach: md-device too small, "
"at least %llu sectors needed for this meta-disk type\n",
(unsigned long long) min_md_device_sectors);
- goto fail;
+ goto fail_unlock;
}

/* Make sure the new disk is big enough
@@ -1642,7 +1641,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
if (drbd_get_max_capacity(nbc) <
drbd_get_capacity(device->this_bdev)) {
retcode = ERR_DISK_TOO_SMALL;
- goto fail;
+ goto fail_unlock;
}

nbc->known_size = drbd_get_capacity(nbc->backing_bdev);
@@ -1672,7 +1671,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
retcode = rv; /* FIXME: Type mismatch. */
drbd_resume_io(device);
if (rv < SS_SUCCESS)
- goto fail;
+ goto fail_unlock;

if (!get_ldev_if_state(device, D_ATTACHING))
goto force_diskless;
@@ -1727,6 +1726,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
new_disk_conf = NULL;
new_plan = NULL;

+ drbd_resync_after_changed(device);
drbd_bump_write_ordering(device->resource, device->ldev, WO_BDEV_FLUSH);

if (drbd_md_test_flag(device->ldev, MDF_CRASHED_PRIMARY))
@@ -1850,6 +1850,8 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
if (rv < SS_SUCCESS)
goto force_diskless_dec;

+ write_unlock(&global_state_lock);
+
mod_timer(&device->request_timer, jiffies + HZ);

if (device->state.role == R_PRIMARY)
@@ -1872,6 +1874,8 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
force_diskless:
drbd_force_state(device, NS(disk, D_DISKLESS));
drbd_md_sync(device);
+ fail_unlock:
+ write_unlock_irq(&global_state_lock);
fail:
conn_reconfig_done(connection);
if (nbc) {
--
1.9.1

2015-07-30 11:30:55

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 06/19] drbd: Fix locking across all resources

From: Andreas Gruenbacher <[email protected]>

Instead of using a rwlock for synchronizing state changes across
resources, take the request locks of all resources for global state
changes. Use resources_mutex to serialize global state changes.

This means that taking the request lock of a resource is now enough to
prevent changes of that resource. (Previously, a read lock on the
global state lock was needed as well.)

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_int.h | 18 ++-------
drivers/block/drbd/drbd_main.c | 24 +++++++++++-
drivers/block/drbd/drbd_nl.c | 45 +++++++++++----------
drivers/block/drbd/drbd_state.c | 14 +++----
drivers/block/drbd/drbd_state.h | 6 +--
drivers/block/drbd/drbd_worker.c | 85 ++++++++++++++++++----------------------
6 files changed, 99 insertions(+), 93 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 443ae18..a990b8e 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -292,6 +292,9 @@ struct drbd_device_work {

extern int drbd_wait_misc(struct drbd_device *, struct drbd_interval *);

+extern void lock_all_resources(void);
+extern void unlock_all_resources(void);
+
struct drbd_request {
struct drbd_work w;
struct drbd_device *device;
@@ -1418,7 +1421,7 @@ extern struct bio_set *drbd_md_io_bio_set;
/* to allocate from that set */
extern struct bio *bio_alloc_drbd(gfp_t gfp_mask);

-extern rwlock_t global_state_lock;
+extern struct mutex resources_mutex;

extern int conn_lowest_minor(struct drbd_connection *connection);
extern enum drbd_ret_code drbd_create_device(struct drbd_config_context *adm_ctx, unsigned int minor);
@@ -1688,19 +1691,6 @@ static inline int drbd_peer_req_has_active_page(struct drbd_peer_request *peer_r
return 0;
}

-static inline enum drbd_state_rv
-_drbd_set_state(struct drbd_device *device, union drbd_state ns,
- enum chg_state_flags flags, struct completion *done)
-{
- enum drbd_state_rv rv;
-
- read_lock(&global_state_lock);
- rv = __drbd_set_state(device, ns, flags, done);
- read_unlock(&global_state_lock);
-
- return rv;
-}
-
static inline union drbd_state drbd_read_state(struct drbd_device *device)
{
struct drbd_resource *resource = device->resource;
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index f21f46f..81a8679 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -117,6 +117,7 @@ module_param_string(usermode_helper, usermode_helper, sizeof(usermode_helper), 0
*/
struct idr drbd_devices;
struct list_head drbd_resources;
+struct mutex resources_mutex;

struct kmem_cache *drbd_request_cache;
struct kmem_cache *drbd_ee_cache; /* peer requests */
@@ -2924,7 +2925,7 @@ static int __init drbd_init(void)
drbd_proc = NULL; /* play safe for drbd_cleanup */
idr_init(&drbd_devices);

- rwlock_init(&global_state_lock);
+ mutex_init(&resources_mutex);
INIT_LIST_HEAD(&drbd_resources);

err = drbd_genl_register();
@@ -3747,6 +3748,27 @@ int drbd_wait_misc(struct drbd_device *device, struct drbd_interval *i)
return 0;
}

+void lock_all_resources(void)
+{
+ struct drbd_resource *resource;
+ int __maybe_unused i = 0;
+
+ mutex_lock(&resources_mutex);
+ local_irq_disable();
+ for_each_resource(resource, &drbd_resources)
+ spin_lock_nested(&resource->req_lock, i++);
+}
+
+void unlock_all_resources(void)
+{
+ struct drbd_resource *resource;
+
+ for_each_resource(resource, &drbd_resources)
+ spin_unlock(&resource->req_lock);
+ local_irq_enable();
+ mutex_unlock(&resources_mutex);
+}
+
#ifdef CONFIG_DRBD_FAULT_INJECTION
/* Fault insertion support including random number generator shamelessly
* stolen from kernel/rcutorture.c */
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 186dc9e..7bca1bc 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1389,13 +1389,13 @@ int drbd_adm_disk_opts(struct sk_buff *skb, struct genl_info *info)
goto fail_unlock;
}

- write_lock_irq(&global_state_lock);
+ lock_all_resources();
retcode = drbd_resync_after_valid(device, new_disk_conf->resync_after);
if (retcode == NO_ERROR) {
rcu_assign_pointer(device->ldev->disk_conf, new_disk_conf);
drbd_resync_after_changed(device);
}
- write_unlock_irq(&global_state_lock);
+ unlock_all_resources();

if (retcode != NO_ERROR)
goto fail_unlock;
@@ -1539,18 +1539,13 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
goto fail;
}

- write_lock_irq(&global_state_lock);
- retcode = drbd_resync_after_valid(device, new_disk_conf->resync_after);
- if (retcode != NO_ERROR)
- goto fail_unlock;
-
rcu_read_lock();
nc = rcu_dereference(connection->net_conf);
if (nc) {
if (new_disk_conf->fencing == FP_STONITH && nc->wire_protocol == DRBD_PROT_A) {
rcu_read_unlock();
retcode = ERR_STONITH_AND_PROT_A;
- goto fail_unlock;
+ goto fail;
}
}
rcu_read_unlock();
@@ -1561,7 +1556,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
drbd_err(device, "open(\"%s\") failed with %ld\n", new_disk_conf->backing_dev,
PTR_ERR(bdev));
retcode = ERR_OPEN_DISK;
- goto fail_unlock;
+ goto fail;
}
nbc->backing_bdev = bdev;

@@ -1581,7 +1576,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
drbd_err(device, "open(\"%s\") failed with %ld\n", new_disk_conf->meta_dev,
PTR_ERR(bdev));
retcode = ERR_OPEN_MD_DISK;
- goto fail_unlock;
+ goto fail;
}
nbc->md_bdev = bdev;

@@ -1589,7 +1584,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
(new_disk_conf->meta_dev_idx == DRBD_MD_INDEX_INTERNAL ||
new_disk_conf->meta_dev_idx == DRBD_MD_INDEX_FLEX_INT)) {
retcode = ERR_MD_IDX_INVALID;
- goto fail_unlock;
+ goto fail;
}

resync_lru = lc_create("resync", drbd_bm_ext_cache,
@@ -1597,14 +1592,14 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
offsetof(struct bm_extent, lce));
if (!resync_lru) {
retcode = ERR_NOMEM;
- goto fail_unlock;
+ goto fail;
}

/* Read our meta data super block early.
* This also sets other on-disk offsets. */
retcode = drbd_md_read(device, nbc);
if (retcode != NO_ERROR)
- goto fail_unlock;
+ goto fail;

if (new_disk_conf->al_extents < DRBD_AL_EXTENTS_MIN)
new_disk_conf->al_extents = DRBD_AL_EXTENTS_MIN;
@@ -1616,7 +1611,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
(unsigned long long) drbd_get_max_capacity(nbc),
(unsigned long long) new_disk_conf->disk_size);
retcode = ERR_DISK_TOO_SMALL;
- goto fail_unlock;
+ goto fail;
}

if (new_disk_conf->meta_dev_idx < 0) {
@@ -1633,7 +1628,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
drbd_warn(device, "refusing attach: md-device too small, "
"at least %llu sectors needed for this meta-disk type\n",
(unsigned long long) min_md_device_sectors);
- goto fail_unlock;
+ goto fail;
}

/* Make sure the new disk is big enough
@@ -1641,7 +1636,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
if (drbd_get_max_capacity(nbc) <
drbd_get_capacity(device->this_bdev)) {
retcode = ERR_DISK_TOO_SMALL;
- goto fail_unlock;
+ goto fail;
}

nbc->known_size = drbd_get_capacity(nbc->backing_bdev);
@@ -1671,7 +1666,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
retcode = rv; /* FIXME: Type mismatch. */
drbd_resume_io(device);
if (rv < SS_SUCCESS)
- goto fail_unlock;
+ goto fail;

if (!get_ldev_if_state(device, D_ATTACHING))
goto force_diskless;
@@ -1706,6 +1701,13 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
goto force_diskless_dec;
}

+ lock_all_resources();
+ retcode = drbd_resync_after_valid(device, new_disk_conf->resync_after);
+ if (retcode != NO_ERROR) {
+ unlock_all_resources();
+ goto force_diskless_dec;
+ }
+
/* Reset the "barriers don't work" bits here, then force meta data to
* be written, to ensure we determine if barriers are supported. */
if (new_disk_conf->md_flushes)
@@ -1728,6 +1730,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)

drbd_resync_after_changed(device);
drbd_bump_write_ordering(device->resource, device->ldev, WO_BDEV_FLUSH);
+ unlock_all_resources();

if (drbd_md_test_flag(device->ldev, MDF_CRASHED_PRIMARY))
set_bit(CRASHED_PRIMARY, &device->flags);
@@ -1850,8 +1853,6 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
if (rv < SS_SUCCESS)
goto force_diskless_dec;

- write_unlock(&global_state_lock);
-
mod_timer(&device->request_timer, jiffies + HZ);

if (device->state.role == R_PRIMARY)
@@ -1874,8 +1875,6 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
force_diskless:
drbd_force_state(device, NS(disk, D_DISKLESS));
drbd_md_sync(device);
- fail_unlock:
- write_unlock_irq(&global_state_lock);
fail:
conn_reconfig_done(connection);
if (nbc) {
@@ -3453,8 +3452,10 @@ int drbd_adm_new_resource(struct sk_buff *skb, struct genl_info *info)
}

/* not yet safe for genl_family.parallel_ops */
+ mutex_lock(&resources_mutex);
if (!conn_create(adm_ctx.resource_name, &res_opts))
retcode = ERR_NOMEM;
+ mutex_unlock(&resources_mutex);
out:
drbd_adm_finish(&adm_ctx, info, retcode);
return 0;
@@ -3545,7 +3546,9 @@ static int adm_del_resource(struct drbd_resource *resource)
if (!idr_is_empty(&resource->devices))
return ERR_RES_IN_USE;

+ mutex_lock(&resources_mutex);
list_del_rcu(&resource->resources);
+ mutex_unlock(&resources_mutex);
/* Make sure all threads have actually stopped: state handling only
* does drbd_thread_stop_nowait(). */
list_for_each_entry(connection, &resource->connections, connections)
diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 2d7dd26..535ae47 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -937,7 +937,7 @@ void drbd_resume_al(struct drbd_device *device)
drbd_info(device, "Resumed AL updates\n");
}

-/* helper for __drbd_set_state */
+/* helper for _drbd_set_state */
static void set_ov_position(struct drbd_device *device, enum drbd_conns cs)
{
if (first_peer_device(device)->connection->agreed_pro_version < 90)
@@ -965,17 +965,17 @@ static void set_ov_position(struct drbd_device *device, enum drbd_conns cs)
}

/**
- * __drbd_set_state() - Set a new DRBD state
+ * _drbd_set_state() - Set a new DRBD state
* @device: DRBD device.
* @ns: new state.
* @flags: Flags
* @done: Optional completion, that will get completed after the after_state_ch() finished
*
- * Caller needs to hold req_lock, and global_state_lock. Do not call directly.
+ * Caller needs to hold req_lock. Do not call directly.
*/
enum drbd_state_rv
-__drbd_set_state(struct drbd_device *device, union drbd_state ns,
- enum chg_state_flags flags, struct completion *done)
+_drbd_set_state(struct drbd_device *device, union drbd_state ns,
+ enum chg_state_flags flags, struct completion *done)
{
struct drbd_peer_device *peer_device = first_peer_device(device);
struct drbd_connection *connection = peer_device ? peer_device->connection : NULL;
@@ -1444,7 +1444,7 @@ static void after_state_ch(struct drbd_device *device, union drbd_state os,
if (os.disk != D_FAILED && ns.disk == D_FAILED) {
enum drbd_io_error_p eh = EP_PASS_ON;
int was_io_error = 0;
- /* corresponding get_ldev was in __drbd_set_state, to serialize
+ /* corresponding get_ldev was in _drbd_set_state, to serialize
* our cleanup here with the transition to D_DISKLESS.
* But is is still not save to dreference ldev here, since
* we might come from an failed Attach before ldev was set. */
@@ -1759,7 +1759,7 @@ conn_set_state(struct drbd_connection *connection, union drbd_state mask, union
if (flags & CS_IGN_OUTD_FAIL && ns.disk == D_OUTDATED && os.disk < D_OUTDATED)
ns.disk = os.disk;

- rv = __drbd_set_state(device, ns, flags, NULL);
+ rv = _drbd_set_state(device, ns, flags, NULL);
if (rv < SS_SUCCESS)
BUG();

diff --git a/drivers/block/drbd/drbd_state.h b/drivers/block/drbd/drbd_state.h
index 7f53c40..bd98953 100644
--- a/drivers/block/drbd/drbd_state.h
+++ b/drivers/block/drbd/drbd_state.h
@@ -122,9 +122,9 @@ extern enum drbd_state_rv
_drbd_request_state_holding_state_mutex(struct drbd_device *, union drbd_state,
union drbd_state, enum chg_state_flags);

-extern enum drbd_state_rv __drbd_set_state(struct drbd_device *, union drbd_state,
- enum chg_state_flags,
- struct completion *done);
+extern enum drbd_state_rv _drbd_set_state(struct drbd_device *, union drbd_state,
+ enum chg_state_flags,
+ struct completion *done);
extern void print_st_err(struct drbd_device *, union drbd_state,
union drbd_state, int);

diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index d0fae55..ef70e1f 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -55,13 +55,6 @@ static int make_resync_request(struct drbd_device *, int);
*
*/

-
-/* About the global_state_lock
- Each state transition on an device holds a read lock. In case we have
- to evaluate the resync after dependencies, we grab a write lock, because
- we need stable states on all devices for that. */
-rwlock_t global_state_lock;
-
/* used for synchronous meta data and bitmap IO
* submitted by drbd_md_sync_page_io()
*/
@@ -1478,70 +1471,73 @@ static int _drbd_may_sync_now(struct drbd_device *device)
}

/**
- * _drbd_pause_after() - Pause resync on all devices that may not resync now
+ * drbd_pause_after() - Pause resync on all devices that may not resync now
* @device: DRBD device.
*
* Called from process context only (admin command and after_state_ch).
*/
-static int _drbd_pause_after(struct drbd_device *device)
+static bool drbd_pause_after(struct drbd_device *device)
{
+ bool changed = false;
struct drbd_device *odev;
- int i, rv = 0;
+ int i;

rcu_read_lock();
idr_for_each_entry(&drbd_devices, odev, i) {
if (odev->state.conn == C_STANDALONE && odev->state.disk == D_DISKLESS)
continue;
- if (!_drbd_may_sync_now(odev))
- rv |= (__drbd_set_state(_NS(odev, aftr_isp, 1), CS_HARD, NULL)
- != SS_NOTHING_TO_DO);
+ if (!_drbd_may_sync_now(odev) &&
+ _drbd_set_state(_NS(odev, aftr_isp, 1),
+ CS_HARD, NULL) != SS_NOTHING_TO_DO)
+ changed = true;
}
rcu_read_unlock();

- return rv;
+ return changed;
}

/**
- * _drbd_resume_next() - Resume resync on all devices that may resync now
+ * drbd_resume_next() - Resume resync on all devices that may resync now
* @device: DRBD device.
*
* Called from process context only (admin command and worker).
*/
-static int _drbd_resume_next(struct drbd_device *device)
+static bool drbd_resume_next(struct drbd_device *device)
{
+ bool changed = false;
struct drbd_device *odev;
- int i, rv = 0;
+ int i;

rcu_read_lock();
idr_for_each_entry(&drbd_devices, odev, i) {
if (odev->state.conn == C_STANDALONE && odev->state.disk == D_DISKLESS)
continue;
if (odev->state.aftr_isp) {
- if (_drbd_may_sync_now(odev))
- rv |= (__drbd_set_state(_NS(odev, aftr_isp, 0),
- CS_HARD, NULL)
- != SS_NOTHING_TO_DO) ;
+ if (_drbd_may_sync_now(odev) &&
+ _drbd_set_state(_NS(odev, aftr_isp, 0),
+ CS_HARD, NULL) != SS_NOTHING_TO_DO)
+ changed = true;
}
}
rcu_read_unlock();
- return rv;
+ return changed;
}

void resume_next_sg(struct drbd_device *device)
{
- write_lock_irq(&global_state_lock);
- _drbd_resume_next(device);
- write_unlock_irq(&global_state_lock);
+ lock_all_resources();
+ drbd_resume_next(device);
+ unlock_all_resources();
}

void suspend_other_sg(struct drbd_device *device)
{
- write_lock_irq(&global_state_lock);
- _drbd_pause_after(device);
- write_unlock_irq(&global_state_lock);
+ lock_all_resources();
+ drbd_pause_after(device);
+ unlock_all_resources();
}

-/* caller must hold global_state_lock */
+/* caller must lock_all_resources() */
enum drbd_ret_code drbd_resync_after_valid(struct drbd_device *device, int o_minor)
{
struct drbd_device *odev;
@@ -1579,15 +1575,15 @@ enum drbd_ret_code drbd_resync_after_valid(struct drbd_device *device, int o_min
}
}

-/* caller must hold global_state_lock */
+/* caller must lock_all_resources() */
void drbd_resync_after_changed(struct drbd_device *device)
{
- int changes;
+ int changed;

do {
- changes = _drbd_pause_after(device);
- changes |= _drbd_resume_next(device);
- } while (changes);
+ changed = drbd_pause_after(device);
+ changed |= drbd_resume_next(device);
+ } while (changed);
}

void drbd_rs_controller_reset(struct drbd_device *device)
@@ -1707,19 +1703,14 @@ void drbd_start_resync(struct drbd_device *device, enum drbd_conns side)
} else {
mutex_lock(device->state_mutex);
}
- clear_bit(B_RS_H_DONE, &device->flags);

- /* req_lock: serialize with drbd_send_and_submit() and others
- * global_state_lock: for stable sync-after dependencies */
- spin_lock_irq(&device->resource->req_lock);
- write_lock(&global_state_lock);
+ lock_all_resources();
+ clear_bit(B_RS_H_DONE, &device->flags);
/* Did some connection breakage or IO error race with us? */
if (device->state.conn < C_CONNECTED
|| !get_ldev_if_state(device, D_NEGOTIATING)) {
- write_unlock(&global_state_lock);
- spin_unlock_irq(&device->resource->req_lock);
- mutex_unlock(device->state_mutex);
- return;
+ unlock_all_resources();
+ goto out;
}

ns = drbd_read_state(device);
@@ -1733,7 +1724,7 @@ void drbd_start_resync(struct drbd_device *device, enum drbd_conns side)
else /* side == C_SYNC_SOURCE */
ns.pdsk = D_INCONSISTENT;

- r = __drbd_set_state(device, ns, CS_VERBOSE, NULL);
+ r = _drbd_set_state(device, ns, CS_VERBOSE, NULL);
ns = drbd_read_state(device);

if (ns.conn < C_CONNECTED)
@@ -1754,7 +1745,7 @@ void drbd_start_resync(struct drbd_device *device, enum drbd_conns side)
device->rs_mark_left[i] = tw;
device->rs_mark_time[i] = now;
}
- _drbd_pause_after(device);
+ drbd_pause_after(device);
/* Forget potentially stale cached per resync extent bit-counts.
* Open coded drbd_rs_cancel_all(device), we already have IRQs
* disabled, and know the disk state is ok. */
@@ -1764,8 +1755,7 @@ void drbd_start_resync(struct drbd_device *device, enum drbd_conns side)
device->resync_wenr = LC_FREE;
spin_unlock(&device->al_lock);
}
- write_unlock(&global_state_lock);
- spin_unlock_irq(&device->resource->req_lock);
+ unlock_all_resources();

if (r == SS_SUCCESS) {
wake_up(&device->al_wait); /* for lc_reset() above */
@@ -1829,6 +1819,7 @@ void drbd_start_resync(struct drbd_device *device, enum drbd_conns side)
drbd_md_sync(device);
}
put_ldev(device);
+out:
mutex_unlock(device->state_mutex);
}

--
1.9.1

2015-07-30 11:33:39

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 07/19] drbd: Backport the "events2" command

From: Andreas Gruenbacher <[email protected]>

The events2 command originates from drbd-9 development. It features
more information but requires a incompatible change in output
format.
Therefore the previous events command continues to exist, the new
improved events2 command becomes available now.

This prepares the user-base for a later switch to the complete
drbd9 code base.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_int.h | 45 +++
drivers/block/drbd/drbd_nl.c | 625 ++++++++++++++++++++++++++++++++-
drivers/block/drbd/drbd_receiver.c | 6 -
drivers/block/drbd/drbd_state.c | 424 +++++++++++++++++++++-
drivers/block/drbd/drbd_state_change.h | 63 ++++
include/linux/drbd.h | 16 +
include/linux/drbd_genl.h | 114 ++++++
7 files changed, 1281 insertions(+), 12 deletions(-)
create mode 100644 drivers/block/drbd/drbd_state_change.h

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index a990b8e..816afad 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -667,6 +667,8 @@ enum {
DEVICE_WORK_PENDING, /* tell worker that some device has pending work */
};

+enum which_state { NOW, OLD = NOW, NEW };
+
struct drbd_resource {
char *name;
#ifdef CONFIG_DEBUG_FS
@@ -785,6 +787,17 @@ struct drbd_connection {
} send;
};

+static inline bool has_net_conf(struct drbd_connection *connection)
+{
+ bool has_net_conf;
+
+ rcu_read_lock();
+ has_net_conf = rcu_dereference(connection->net_conf);
+ rcu_read_unlock();
+
+ return has_net_conf;
+}
+
void __update_timing_details(
struct drbd_thread_timing_details *tdp,
unsigned int *cb_nr,
@@ -1017,6 +1030,12 @@ static inline struct drbd_peer_device *first_peer_device(struct drbd_device *dev
return list_first_entry_or_null(&device->peer_devices, struct drbd_peer_device, peer_devices);
}

+static inline struct drbd_peer_device *
+conn_peer_device(struct drbd_connection *connection, int volume_number)
+{
+ return idr_find(&connection->peer_devices, volume_number);
+}
+
#define for_each_resource(resource, _resources) \
list_for_each_entry(resource, _resources, resources)

@@ -1452,6 +1471,9 @@ extern int is_valid_ar_handle(struct drbd_request *, sector_t);


/* drbd_nl.c */
+
+extern struct mutex notification_mutex;
+
extern void drbd_suspend_io(struct drbd_device *device);
extern void drbd_resume_io(struct drbd_device *device);
extern char *ppsize(char *buf, unsigned long long size);
@@ -1665,6 +1687,29 @@ struct sib_info {
};
void drbd_bcast_event(struct drbd_device *device, const struct sib_info *sib);

+extern void notify_resource_state(struct sk_buff *,
+ unsigned int,
+ struct drbd_resource *,
+ struct resource_info *,
+ enum drbd_notification_type);
+extern void notify_device_state(struct sk_buff *,
+ unsigned int,
+ struct drbd_device *,
+ struct device_info *,
+ enum drbd_notification_type);
+extern void notify_connection_state(struct sk_buff *,
+ unsigned int,
+ struct drbd_connection *,
+ struct connection_info *,
+ enum drbd_notification_type);
+extern void notify_peer_device_state(struct sk_buff *,
+ unsigned int,
+ struct drbd_peer_device *,
+ struct peer_device_info *,
+ enum drbd_notification_type);
+extern void notify_helper(enum drbd_notification_type, struct drbd_device *,
+ struct drbd_connection *, const char *, int);
+
/*
* inline helper functions
*************************/
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 7bca1bc..e3218f7 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -36,6 +36,7 @@
#include "drbd_int.h"
#include "drbd_protocol.h"
#include "drbd_req.h"
+#include "drbd_state_change.h"
#include <asm/unaligned.h>
#include <linux/drbd_limits.h>
#include <linux/kthread.h>
@@ -75,11 +76,17 @@ int drbd_adm_get_status(struct sk_buff *skb, struct genl_info *info);
int drbd_adm_get_timeout_type(struct sk_buff *skb, struct genl_info *info);
/* .dumpit */
int drbd_adm_get_status_all(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_get_initial_state(struct sk_buff *skb, struct netlink_callback *cb);

#include <linux/drbd_genl_api.h>
#include "drbd_nla.h"
#include <linux/genl_magic_func.h>

+static atomic_t drbd_genl_seq = ATOMIC_INIT(2); /* two. */
+static atomic_t notify_genl_seq = ATOMIC_INIT(2); /* two. */
+
+DEFINE_MUTEX(notification_mutex);
+
/* used blkdev_get_by_path, to claim our meta data device(s) */
static char *drbd_m_holder = "Hands off! this is DRBD's meta data device.";

@@ -349,6 +356,7 @@ int drbd_khelper(struct drbd_device *device, char *cmd)
sib.sib_reason = SIB_HELPER_PRE;
sib.helper_name = cmd;
drbd_bcast_event(device, &sib);
+ notify_helper(NOTIFY_CALL, device, connection, cmd, 0);
ret = call_usermodehelper(usermode_helper, argv, envp, UMH_WAIT_PROC);
if (ret)
drbd_warn(device, "helper command: %s %s %s exit code %u (0x%x)\n",
@@ -361,6 +369,7 @@ int drbd_khelper(struct drbd_device *device, char *cmd)
sib.sib_reason = SIB_HELPER_POST;
sib.helper_exit_code = ret;
drbd_bcast_event(device, &sib);
+ notify_helper(NOTIFY_RESPONSE, device, connection, cmd, ret);

if (current == connection->worker.task)
clear_bit(CALLBACK_PENDING, &connection->flags);
@@ -388,6 +397,7 @@ static int conn_khelper(struct drbd_connection *connection, char *cmd)

drbd_info(connection, "helper command: %s %s %s\n", usermode_helper, cmd, resource_name);
/* TODO: conn_bcast_event() ?? */
+ notify_helper(NOTIFY_CALL, NULL, connection, cmd, 0);

ret = call_usermodehelper(usermode_helper, argv, envp, UMH_WAIT_PROC);
if (ret)
@@ -399,6 +409,7 @@ static int conn_khelper(struct drbd_connection *connection, char *cmd)
usermode_helper, cmd, resource_name,
(ret >> 8) & 0xff, ret);
/* TODO: conn_bcast_event() ?? */
+ notify_helper(NOTIFY_RESPONSE, NULL, connection, cmd, ret);

if (ret < 0) /* Ignore any ERRNOs we got. */
ret = 0;
@@ -2248,8 +2259,31 @@ int drbd_adm_net_opts(struct sk_buff *skb, struct genl_info *info)
return 0;
}

+static void connection_to_info(struct connection_info *info,
+ struct drbd_connection *connection)
+{
+ info->conn_connection_state = connection->cstate;
+ info->conn_role = conn_highest_peer(connection);
+}
+
+static void peer_device_to_info(struct peer_device_info *info,
+ struct drbd_peer_device *peer_device)
+{
+ struct drbd_device *device = peer_device->device;
+
+ info->peer_repl_state =
+ max_t(enum drbd_conns, C_WF_REPORT_PARAMS, device->state.conn);
+ info->peer_disk_state = device->state.pdsk;
+ info->peer_resync_susp_user = device->state.user_isp;
+ info->peer_resync_susp_peer = device->state.peer_isp;
+ info->peer_resync_susp_dependency = device->state.aftr_isp;
+}
+
int drbd_adm_connect(struct sk_buff *skb, struct genl_info *info)
{
+ struct connection_info connection_info;
+ enum drbd_notification_type flags;
+ unsigned int peer_devices = 0;
struct drbd_config_context adm_ctx;
struct drbd_peer_device *peer_device;
struct net_conf *old_net_conf, *new_net_conf = NULL;
@@ -2350,6 +2384,22 @@ int drbd_adm_connect(struct sk_buff *skb, struct genl_info *info)
connection->peer_addr_len = nla_len(adm_ctx.peer_addr);
memcpy(&connection->peer_addr, nla_data(adm_ctx.peer_addr), connection->peer_addr_len);

+ idr_for_each_entry(&connection->peer_devices, peer_device, i) {
+ peer_devices++;
+ }
+
+ connection_to_info(&connection_info, connection);
+ flags = (peer_devices--) ? NOTIFY_CONTINUES : 0;
+ mutex_lock(&notification_mutex);
+ notify_connection_state(NULL, 0, connection, &connection_info, NOTIFY_CREATE | flags);
+ idr_for_each_entry(&connection->peer_devices, peer_device, i) {
+ struct peer_device_info peer_device_info;
+
+ peer_device_to_info(&peer_device_info, peer_device);
+ flags = (peer_devices--) ? NOTIFY_CONTINUES : 0;
+ notify_peer_device_state(NULL, 0, peer_device, &peer_device_info, NOTIFY_CREATE | flags);
+ }
+ mutex_unlock(&notification_mutex);
mutex_unlock(&adm_ctx.resource->conf_update);

rcu_read_lock();
@@ -2431,6 +2481,8 @@ static enum drbd_state_rv conn_try_disconnect(struct drbd_connection *connection
drbd_err(connection,
"unexpected rv2=%d in conn_try_disconnect()\n",
rv2);
+ /* Unlike in DRBD 9, the state engine has generated
+ * NOTIFY_DESTROY events before clearing connection->net_conf. */
}
return rv;
}
@@ -3417,8 +3469,18 @@ drbd_check_resource_name(struct drbd_config_context *adm_ctx)
return NO_ERROR;
}

+static void resource_to_info(struct resource_info *info,
+ struct drbd_resource *resource)
+{
+ info->res_role = conn_highest_role(first_connection(resource));
+ info->res_susp = resource->susp;
+ info->res_susp_nod = resource->susp_nod;
+ info->res_susp_fen = resource->susp_fen;
+}
+
int drbd_adm_new_resource(struct sk_buff *skb, struct genl_info *info)
{
+ struct drbd_connection *connection;
struct drbd_config_context adm_ctx;
enum drbd_ret_code retcode;
struct res_opts res_opts;
@@ -3453,14 +3515,32 @@ int drbd_adm_new_resource(struct sk_buff *skb, struct genl_info *info)

/* not yet safe for genl_family.parallel_ops */
mutex_lock(&resources_mutex);
- if (!conn_create(adm_ctx.resource_name, &res_opts))
- retcode = ERR_NOMEM;
+ connection = conn_create(adm_ctx.resource_name, &res_opts);
mutex_unlock(&resources_mutex);
+
+ if (connection) {
+ struct resource_info resource_info;
+
+ mutex_lock(&notification_mutex);
+ resource_to_info(&resource_info, connection->resource);
+ notify_resource_state(NULL, 0, connection->resource,
+ &resource_info, NOTIFY_CREATE);
+ mutex_unlock(&notification_mutex);
+ } else
+ retcode = ERR_NOMEM;
+
out:
drbd_adm_finish(&adm_ctx, info, retcode);
return 0;
}

+static void device_to_info(struct device_info *info,
+ struct drbd_device *device)
+{
+ info->dev_disk_state = device->state.disk;
+}
+
+
int drbd_adm_new_minor(struct sk_buff *skb, struct genl_info *info)
{
struct drbd_config_context adm_ctx;
@@ -3495,6 +3575,36 @@ int drbd_adm_new_minor(struct sk_buff *skb, struct genl_info *info)

mutex_lock(&adm_ctx.resource->adm_mutex);
retcode = drbd_create_device(&adm_ctx, dh->minor);
+ if (retcode == NO_ERROR) {
+ struct drbd_device *device;
+ struct drbd_peer_device *peer_device;
+ struct device_info info;
+ unsigned int peer_devices = 0;
+ enum drbd_notification_type flags;
+
+ device = minor_to_device(dh->minor);
+ for_each_peer_device(peer_device, device) {
+ if (!has_net_conf(peer_device->connection))
+ continue;
+ peer_devices++;
+ }
+
+ device_to_info(&info, device);
+ mutex_lock(&notification_mutex);
+ flags = (peer_devices--) ? NOTIFY_CONTINUES : 0;
+ notify_device_state(NULL, 0, device, &info, NOTIFY_CREATE | flags);
+ for_each_peer_device(peer_device, device) {
+ struct peer_device_info peer_device_info;
+
+ if (!has_net_conf(peer_device->connection))
+ continue;
+ peer_device_to_info(&peer_device_info, peer_device);
+ flags = (peer_devices--) ? NOTIFY_CONTINUES : 0;
+ notify_peer_device_state(NULL, 0, peer_device, &peer_device_info,
+ NOTIFY_CREATE | flags);
+ }
+ mutex_unlock(&notification_mutex);
+ }
mutex_unlock(&adm_ctx.resource->adm_mutex);
out:
drbd_adm_finish(&adm_ctx, info, retcode);
@@ -3503,13 +3613,35 @@ out:

static enum drbd_ret_code adm_del_minor(struct drbd_device *device)
{
+ struct drbd_peer_device *peer_device;
+
if (device->state.disk == D_DISKLESS &&
/* no need to be device->state.conn == C_STANDALONE &&
* we may want to delete a minor from a live replication group.
*/
device->state.role == R_SECONDARY) {
+ struct drbd_connection *connection =
+ first_connection(device->resource);
+
_drbd_request_state(device, NS(conn, C_WF_REPORT_PARAMS),
CS_VERBOSE + CS_WAIT_COMPLETE);
+
+ /* If the state engine hasn't stopped the sender thread yet, we
+ * need to flush the sender work queue before generating the
+ * DESTROY events here. */
+ if (get_t_state(&connection->worker) == RUNNING)
+ drbd_flush_workqueue(&connection->sender_work);
+
+ mutex_lock(&notification_mutex);
+ for_each_peer_device(peer_device, device) {
+ if (!has_net_conf(peer_device->connection))
+ continue;
+ notify_peer_device_state(NULL, 0, peer_device, NULL,
+ NOTIFY_DESTROY | NOTIFY_CONTINUES);
+ }
+ notify_device_state(NULL, 0, device, NULL, NOTIFY_DESTROY);
+ mutex_unlock(&notification_mutex);
+
drbd_delete_device(device);
return NO_ERROR;
} else
@@ -3546,6 +3678,13 @@ static int adm_del_resource(struct drbd_resource *resource)
if (!idr_is_empty(&resource->devices))
return ERR_RES_IN_USE;

+ /* The state engine has stopped the sender thread, so we don't
+ * need to flush the sender work queue before generating the
+ * DESTROY event here. */
+ mutex_lock(&notification_mutex);
+ notify_resource_state(NULL, 0, resource, NULL, NOTIFY_DESTROY);
+ mutex_unlock(&notification_mutex);
+
mutex_lock(&resources_mutex);
list_del_rcu(&resource->resources);
mutex_unlock(&resources_mutex);
@@ -3644,7 +3783,6 @@ finish:

void drbd_bcast_event(struct drbd_device *device, const struct sib_info *sib)
{
- static atomic_t drbd_genl_seq = ATOMIC_INIT(2); /* two. */
struct sk_buff *msg;
struct drbd_genlmsghdr *d_out;
unsigned seq;
@@ -3679,3 +3817,484 @@ failed:
"Event seq:%u sib_reason:%u\n",
err, seq, sib->sib_reason);
}
+
+static void device_to_statistics(struct device_statistics *s,
+ struct drbd_device *device)
+{
+ memset(s, 0, sizeof(*s));
+ s->dev_upper_blocked = !may_inc_ap_bio(device);
+ if (get_ldev(device)) {
+ struct drbd_md *md = &device->ldev->md;
+ u64 *history_uuids = (u64 *)s->history_uuids;
+ struct request_queue *q;
+ int n;
+
+ spin_lock_irq(&md->uuid_lock);
+ s->dev_current_uuid = md->uuid[UI_CURRENT];
+ BUILD_BUG_ON(sizeof(s->history_uuids) < UI_HISTORY_END - UI_HISTORY_START + 1);
+ for (n = 0; n < UI_HISTORY_END - UI_HISTORY_START + 1; n++)
+ history_uuids[n] = md->uuid[UI_HISTORY_START + n];
+ for (; n < HISTORY_UUIDS; n++)
+ history_uuids[n] = 0;
+ s->history_uuids_len = HISTORY_UUIDS;
+ spin_unlock_irq(&md->uuid_lock);
+
+ s->dev_disk_flags = md->flags;
+ q = bdev_get_queue(device->ldev->backing_bdev);
+ s->dev_lower_blocked =
+ bdi_congested(&q->backing_dev_info,
+ (1 << WB_async_congested) |
+ (1 << WB_sync_congested));
+ put_ldev(device);
+ }
+ s->dev_size = drbd_get_capacity(device->this_bdev);
+ s->dev_read = device->read_cnt;
+ s->dev_write = device->writ_cnt;
+ s->dev_al_writes = device->al_writ_cnt;
+ s->dev_bm_writes = device->bm_writ_cnt;
+ s->dev_upper_pending = atomic_read(&device->ap_bio_cnt);
+ s->dev_lower_pending = atomic_read(&device->local_cnt);
+ s->dev_al_suspended = test_bit(AL_SUSPENDED, &device->flags);
+ s->dev_exposed_data_uuid = device->ed_uuid;
+}
+
+enum mdf_peer_flag {
+ MDF_PEER_CONNECTED = 1 << 0,
+ MDF_PEER_OUTDATED = 1 << 1,
+ MDF_PEER_FENCING = 1 << 2,
+ MDF_PEER_FULL_SYNC = 1 << 3,
+};
+
+static void peer_device_to_statistics(struct peer_device_statistics *s,
+ struct drbd_peer_device *peer_device)
+{
+ struct drbd_device *device = peer_device->device;
+
+ memset(s, 0, sizeof(*s));
+ s->peer_dev_received = device->recv_cnt;
+ s->peer_dev_sent = device->send_cnt;
+ s->peer_dev_pending = atomic_read(&device->ap_pending_cnt) +
+ atomic_read(&device->rs_pending_cnt);
+ s->peer_dev_unacked = atomic_read(&device->unacked_cnt);
+ s->peer_dev_out_of_sync = drbd_bm_total_weight(device) << (BM_BLOCK_SHIFT - 9);
+ s->peer_dev_resync_failed = device->rs_failed << (BM_BLOCK_SHIFT - 9);
+ if (get_ldev(device)) {
+ struct drbd_md *md = &device->ldev->md;
+
+ spin_lock_irq(&md->uuid_lock);
+ s->peer_dev_bitmap_uuid = md->uuid[UI_BITMAP];
+ spin_unlock_irq(&md->uuid_lock);
+ s->peer_dev_flags =
+ (drbd_md_test_flag(device->ldev, MDF_CONNECTED_IND) ?
+ MDF_PEER_CONNECTED : 0) +
+ (drbd_md_test_flag(device->ldev, MDF_CONSISTENT) &&
+ !drbd_md_test_flag(device->ldev, MDF_WAS_UP_TO_DATE) ?
+ MDF_PEER_OUTDATED : 0) +
+ /* FIXME: MDF_PEER_FENCING? */
+ (drbd_md_test_flag(device->ldev, MDF_FULL_SYNC) ?
+ MDF_PEER_FULL_SYNC : 0);
+ put_ldev(device);
+ }
+}
+
+static int nla_put_notification_header(struct sk_buff *msg,
+ enum drbd_notification_type type)
+{
+ struct drbd_notification_header nh = {
+ .nh_type = type,
+ };
+
+ return drbd_notification_header_to_skb(msg, &nh, true);
+}
+
+void notify_resource_state(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_resource *resource,
+ struct resource_info *resource_info,
+ enum drbd_notification_type type)
+{
+ struct resource_statistics resource_statistics;
+ struct drbd_genlmsghdr *dh;
+ bool multicast = false;
+ int err;
+
+ if (!skb) {
+ seq = atomic_inc_return(&notify_genl_seq);
+ skb = genlmsg_new(NLMSG_GOODSIZE, GFP_NOIO);
+ err = -ENOMEM;
+ if (!skb)
+ goto failed;
+ multicast = true;
+ }
+
+ err = -EMSGSIZE;
+ dh = genlmsg_put(skb, 0, seq, &drbd_genl_family, 0, DRBD_RESOURCE_STATE);
+ if (!dh)
+ goto nla_put_failure;
+ dh->minor = -1U;
+ dh->ret_code = NO_ERROR;
+ if (nla_put_drbd_cfg_context(skb, resource, NULL, NULL) ||
+ nla_put_notification_header(skb, type) ||
+ ((type & ~NOTIFY_FLAGS) != NOTIFY_DESTROY &&
+ resource_info_to_skb(skb, resource_info, true)))
+ goto nla_put_failure;
+ resource_statistics.res_stat_write_ordering = resource->write_ordering;
+ err = resource_statistics_to_skb(skb, &resource_statistics, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto nla_put_failure;
+ genlmsg_end(skb, dh);
+ if (multicast) {
+ err = drbd_genl_multicast_events(skb, 0);
+ /* skb has been consumed or freed in netlink_broadcast() */
+ if (err && err != -ESRCH)
+ goto failed;
+ }
+ return;
+
+nla_put_failure:
+ nlmsg_free(skb);
+failed:
+ drbd_err(resource, "Error %d while broadcasting event. Event seq:%u\n",
+ err, seq);
+}
+
+void notify_device_state(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_device *device,
+ struct device_info *device_info,
+ enum drbd_notification_type type)
+{
+ struct device_statistics device_statistics;
+ struct drbd_genlmsghdr *dh;
+ bool multicast = false;
+ int err;
+
+ if (!skb) {
+ seq = atomic_inc_return(&notify_genl_seq);
+ skb = genlmsg_new(NLMSG_GOODSIZE, GFP_NOIO);
+ err = -ENOMEM;
+ if (!skb)
+ goto failed;
+ multicast = true;
+ }
+
+ err = -EMSGSIZE;
+ dh = genlmsg_put(skb, 0, seq, &drbd_genl_family, 0, DRBD_DEVICE_STATE);
+ if (!dh)
+ goto nla_put_failure;
+ dh->minor = device->minor;
+ dh->ret_code = NO_ERROR;
+ if (nla_put_drbd_cfg_context(skb, device->resource, NULL, device) ||
+ nla_put_notification_header(skb, type) ||
+ ((type & ~NOTIFY_FLAGS) != NOTIFY_DESTROY &&
+ device_info_to_skb(skb, device_info, true)))
+ goto nla_put_failure;
+ device_to_statistics(&device_statistics, device);
+ device_statistics_to_skb(skb, &device_statistics, !capable(CAP_SYS_ADMIN));
+ genlmsg_end(skb, dh);
+ if (multicast) {
+ err = drbd_genl_multicast_events(skb, 0);
+ /* skb has been consumed or freed in netlink_broadcast() */
+ if (err && err != -ESRCH)
+ goto failed;
+ }
+ return;
+
+nla_put_failure:
+ nlmsg_free(skb);
+failed:
+ drbd_err(device, "Error %d while broadcasting event. Event seq:%u\n",
+ err, seq);
+}
+
+void notify_connection_state(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_connection *connection,
+ struct connection_info *connection_info,
+ enum drbd_notification_type type)
+{
+ struct connection_statistics connection_statistics;
+ struct drbd_genlmsghdr *dh;
+ bool multicast = false;
+ int err;
+
+ if (!skb) {
+ seq = atomic_inc_return(&notify_genl_seq);
+ skb = genlmsg_new(NLMSG_GOODSIZE, GFP_NOIO);
+ err = -ENOMEM;
+ if (!skb)
+ goto failed;
+ multicast = true;
+ }
+
+ err = -EMSGSIZE;
+ dh = genlmsg_put(skb, 0, seq, &drbd_genl_family, 0, DRBD_CONNECTION_STATE);
+ if (!dh)
+ goto nla_put_failure;
+ dh->minor = -1U;
+ dh->ret_code = NO_ERROR;
+ if (nla_put_drbd_cfg_context(skb, connection->resource, connection, NULL) ||
+ nla_put_notification_header(skb, type) ||
+ ((type & ~NOTIFY_FLAGS) != NOTIFY_DESTROY &&
+ connection_info_to_skb(skb, connection_info, true)))
+ goto nla_put_failure;
+ connection_statistics.conn_congested = test_bit(NET_CONGESTED, &connection->flags);
+ connection_statistics_to_skb(skb, &connection_statistics, !capable(CAP_SYS_ADMIN));
+ genlmsg_end(skb, dh);
+ if (multicast) {
+ err = drbd_genl_multicast_events(skb, 0);
+ /* skb has been consumed or freed in netlink_broadcast() */
+ if (err && err != -ESRCH)
+ goto failed;
+ }
+ return;
+
+nla_put_failure:
+ nlmsg_free(skb);
+failed:
+ drbd_err(connection, "Error %d while broadcasting event. Event seq:%u\n",
+ err, seq);
+}
+
+void notify_peer_device_state(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_peer_device *peer_device,
+ struct peer_device_info *peer_device_info,
+ enum drbd_notification_type type)
+{
+ struct peer_device_statistics peer_device_statistics;
+ struct drbd_resource *resource = peer_device->device->resource;
+ struct drbd_genlmsghdr *dh;
+ bool multicast = false;
+ int err;
+
+ if (!skb) {
+ seq = atomic_inc_return(&notify_genl_seq);
+ skb = genlmsg_new(NLMSG_GOODSIZE, GFP_NOIO);
+ err = -ENOMEM;
+ if (!skb)
+ goto failed;
+ multicast = true;
+ }
+
+ err = -EMSGSIZE;
+ dh = genlmsg_put(skb, 0, seq, &drbd_genl_family, 0, DRBD_PEER_DEVICE_STATE);
+ if (!dh)
+ goto nla_put_failure;
+ dh->minor = -1U;
+ dh->ret_code = NO_ERROR;
+ if (nla_put_drbd_cfg_context(skb, resource, peer_device->connection, peer_device->device) ||
+ nla_put_notification_header(skb, type) ||
+ ((type & ~NOTIFY_FLAGS) != NOTIFY_DESTROY &&
+ peer_device_info_to_skb(skb, peer_device_info, true)))
+ goto nla_put_failure;
+ peer_device_to_statistics(&peer_device_statistics, peer_device);
+ peer_device_statistics_to_skb(skb, &peer_device_statistics, !capable(CAP_SYS_ADMIN));
+ genlmsg_end(skb, dh);
+ if (multicast) {
+ err = drbd_genl_multicast_events(skb, 0);
+ /* skb has been consumed or freed in netlink_broadcast() */
+ if (err && err != -ESRCH)
+ goto failed;
+ }
+ return;
+
+nla_put_failure:
+ nlmsg_free(skb);
+failed:
+ drbd_err(peer_device, "Error %d while broadcasting event. Event seq:%u\n",
+ err, seq);
+}
+
+void notify_helper(enum drbd_notification_type type,
+ struct drbd_device *device, struct drbd_connection *connection,
+ const char *name, int status)
+{
+ struct drbd_resource *resource = device ? device->resource : connection->resource;
+ struct drbd_helper_info helper_info;
+ unsigned int seq = atomic_inc_return(&notify_genl_seq);
+ struct sk_buff *skb = NULL;
+ struct drbd_genlmsghdr *dh;
+ int err;
+
+ strlcpy(helper_info.helper_name, name, sizeof(helper_info.helper_name));
+ helper_info.helper_name_len = min(strlen(name), sizeof(helper_info.helper_name));
+ helper_info.helper_status = status;
+
+ skb = genlmsg_new(NLMSG_GOODSIZE, GFP_NOIO);
+ err = -ENOMEM;
+ if (!skb)
+ goto fail;
+
+ err = -EMSGSIZE;
+ dh = genlmsg_put(skb, 0, seq, &drbd_genl_family, 0, DRBD_HELPER);
+ if (!dh)
+ goto fail;
+ dh->minor = device ? device->minor : -1;
+ dh->ret_code = NO_ERROR;
+ mutex_lock(&notification_mutex);
+ if (nla_put_drbd_cfg_context(skb, resource, connection, device) ||
+ nla_put_notification_header(skb, type) ||
+ drbd_helper_info_to_skb(skb, &helper_info, true))
+ goto unlock_fail;
+ genlmsg_end(skb, dh);
+ err = drbd_genl_multicast_events(skb, 0);
+ skb = NULL;
+ /* skb has been consumed or freed in netlink_broadcast() */
+ if (err && err != -ESRCH)
+ goto unlock_fail;
+ mutex_unlock(&notification_mutex);
+ return;
+
+unlock_fail:
+ mutex_unlock(&notification_mutex);
+fail:
+ nlmsg_free(skb);
+ drbd_err(resource, "Error %d while broadcasting event. Event seq:%u\n",
+ err, seq);
+}
+
+static void notify_initial_state_done(struct sk_buff *skb, unsigned int seq)
+{
+ struct drbd_genlmsghdr *dh;
+ int err;
+
+ err = -EMSGSIZE;
+ dh = genlmsg_put(skb, 0, seq, &drbd_genl_family, 0, DRBD_INITIAL_STATE_DONE);
+ if (!dh)
+ goto nla_put_failure;
+ dh->minor = -1U;
+ dh->ret_code = NO_ERROR;
+ if (nla_put_notification_header(skb, NOTIFY_EXISTS))
+ goto nla_put_failure;
+ genlmsg_end(skb, dh);
+ return;
+
+nla_put_failure:
+ nlmsg_free(skb);
+ pr_err("Error %d sending event. Event seq:%u\n", err, seq);
+}
+
+static void free_state_changes(struct list_head *list)
+{
+ while (!list_empty(list)) {
+ struct drbd_state_change *state_change =
+ list_first_entry(list, struct drbd_state_change, list);
+ list_del(&state_change->list);
+ forget_state_change(state_change);
+ }
+}
+
+static unsigned int notifications_for_state_change(struct drbd_state_change *state_change)
+{
+ return 1 +
+ state_change->n_connections +
+ state_change->n_devices +
+ state_change->n_devices * state_change->n_connections;
+}
+
+static int get_initial_state(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct drbd_state_change *state_change = (struct drbd_state_change *)cb->args[0];
+ unsigned int seq = cb->args[2];
+ unsigned int n;
+ enum drbd_notification_type flags = 0;
+
+ /* There is no need for taking notification_mutex here: it doesn't
+ matter if the initial state events mix with later state chage
+ events; we can always tell the events apart by the NOTIFY_EXISTS
+ flag. */
+
+ cb->args[5]--;
+ if (cb->args[5] == 1) {
+ notify_initial_state_done(skb, seq);
+ goto out;
+ }
+ n = cb->args[4]++;
+ if (cb->args[4] < cb->args[3])
+ flags |= NOTIFY_CONTINUES;
+ if (n < 1) {
+ notify_resource_state_change(skb, seq, state_change->resource,
+ NOTIFY_EXISTS | flags);
+ goto next;
+ }
+ n--;
+ if (n < state_change->n_connections) {
+ notify_connection_state_change(skb, seq, &state_change->connections[n],
+ NOTIFY_EXISTS | flags);
+ goto next;
+ }
+ n -= state_change->n_connections;
+ if (n < state_change->n_devices) {
+ notify_device_state_change(skb, seq, &state_change->devices[n],
+ NOTIFY_EXISTS | flags);
+ goto next;
+ }
+ n -= state_change->n_devices;
+ if (n < state_change->n_devices * state_change->n_connections) {
+ notify_peer_device_state_change(skb, seq, &state_change->peer_devices[n],
+ NOTIFY_EXISTS | flags);
+ goto next;
+ }
+
+next:
+ if (cb->args[4] == cb->args[3]) {
+ struct drbd_state_change *next_state_change =
+ list_entry(state_change->list.next,
+ struct drbd_state_change, list);
+ cb->args[0] = (long)next_state_change;
+ cb->args[3] = notifications_for_state_change(next_state_change);
+ cb->args[4] = 0;
+ }
+out:
+ return skb->len;
+}
+
+int drbd_adm_get_initial_state(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct drbd_resource *resource;
+ LIST_HEAD(head);
+
+ if (cb->args[5] >= 1) {
+ if (cb->args[5] > 1)
+ return get_initial_state(skb, cb);
+ if (cb->args[0]) {
+ struct drbd_state_change *state_change =
+ (struct drbd_state_change *)cb->args[0];
+
+ /* connect list to head */
+ list_add(&head, &state_change->list);
+ free_state_changes(&head);
+ }
+ return 0;
+ }
+
+ cb->args[5] = 2; /* number of iterations */
+ mutex_lock(&resources_mutex);
+ for_each_resource(resource, &drbd_resources) {
+ struct drbd_state_change *state_change;
+
+ state_change = remember_old_state(resource, GFP_KERNEL);
+ if (!state_change) {
+ if (!list_empty(&head))
+ free_state_changes(&head);
+ mutex_unlock(&resources_mutex);
+ return -ENOMEM;
+ }
+ copy_old_to_new_state_change(state_change);
+ list_add_tail(&state_change->list, &head);
+ cb->args[5] += notifications_for_state_change(state_change);
+ }
+ mutex_unlock(&resources_mutex);
+
+ if (!list_empty(&head)) {
+ struct drbd_state_change *state_change =
+ list_entry(head.next, struct drbd_state_change, list);
+ cb->args[0] = (long)state_change;
+ cb->args[3] = notifications_for_state_change(state_change);
+ list_del(&head); /* detach list from head */
+ }
+
+ cb->args[2] = cb->nlh->nlmsg_seq;
+ return get_initial_state(skb, cb);
+}
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 85856ec..d4c61af 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -1507,12 +1507,6 @@ static void conn_wait_active_ee_empty(struct drbd_connection *connection)
rcu_read_unlock();
}

-static struct drbd_peer_device *
-conn_peer_device(struct drbd_connection *connection, int volume_number)
-{
- return idr_find(&connection->peer_devices, volume_number);
-}
-
static int receive_Barrier(struct drbd_connection *connection, struct packet_info *pi)
{
int rv;
diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 535ae47..bc4b45b 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -29,6 +29,7 @@
#include "drbd_int.h"
#include "drbd_protocol.h"
#include "drbd_req.h"
+#include "drbd_state_change.h"

struct after_state_chg_work {
struct drbd_work w;
@@ -37,6 +38,7 @@ struct after_state_chg_work {
union drbd_state ns;
enum chg_state_flags flags;
struct completion *done;
+ struct drbd_state_change *state_change;
};

enum sanitize_state_warnings {
@@ -48,9 +50,266 @@ enum sanitize_state_warnings {
IMPLICITLY_UPGRADED_PDSK,
};

+static void count_objects(struct drbd_resource *resource,
+ unsigned int *n_devices,
+ unsigned int *n_connections)
+{
+ struct drbd_device *device;
+ struct drbd_connection *connection;
+ int vnr;
+
+ *n_devices = 0;
+ *n_connections = 0;
+
+ idr_for_each_entry(&resource->devices, device, vnr)
+ (*n_devices)++;
+ for_each_connection(connection, resource) {
+ if (!has_net_conf(connection))
+ continue;
+ (*n_connections)++;
+ }
+}
+
+static struct drbd_state_change *alloc_state_change(unsigned int n_devices, unsigned int n_connections, gfp_t gfp)
+{
+ struct drbd_state_change *state_change;
+ unsigned int size, n;
+
+ size = sizeof(struct drbd_state_change) +
+ n_devices * sizeof(struct drbd_device_state_change) +
+ n_connections * sizeof(struct drbd_connection_state_change) +
+ n_devices * n_connections * sizeof(struct drbd_peer_device_state_change);
+ state_change = kmalloc(size, gfp);
+ if (!state_change)
+ return NULL;
+ state_change->n_devices = n_devices;
+ state_change->n_connections = n_connections;
+ state_change->devices = (void *)(state_change + 1);
+ state_change->connections = (void *)&state_change->devices[n_devices];
+ state_change->peer_devices = (void *)&state_change->connections[n_connections];
+ state_change->resource->resource = NULL;
+ for (n = 0; n < n_devices; n++)
+ state_change->devices[n].device = NULL;
+ for (n = 0; n < n_connections; n++)
+ state_change->connections[n].connection = NULL;
+ return state_change;
+}
+
+struct drbd_state_change *remember_old_state(struct drbd_resource *resource, gfp_t gfp)
+{
+ struct drbd_state_change *state_change;
+ struct drbd_device *device;
+ unsigned int n_devices;
+ struct drbd_connection *connection;
+ unsigned int n_connections;
+ int vnr;
+
+ struct drbd_device_state_change *device_state_change;
+ struct drbd_peer_device_state_change *peer_device_state_change;
+ struct drbd_connection_state_change *connection_state_change;
+
+retry:
+ rcu_read_lock();
+ count_objects(resource, &n_devices, &n_connections);
+ rcu_read_unlock();
+ state_change = alloc_state_change(n_devices, n_connections, gfp);
+ if (!state_change)
+ return NULL;
+
+ rcu_read_lock();
+ count_objects(resource, &n_devices, &n_connections);
+ if (n_devices != state_change->n_devices ||
+ n_connections != state_change->n_connections) {
+ kfree(state_change);
+ rcu_read_unlock();
+ goto retry;
+ }
+
+ kref_get(&resource->kref);
+ state_change->resource->resource = resource;
+ state_change->resource->role[OLD] =
+ conn_highest_role(first_connection(resource));
+ state_change->resource->susp[OLD] = resource->susp;
+ state_change->resource->susp_nod[OLD] = resource->susp_nod;
+ state_change->resource->susp_fen[OLD] = resource->susp_fen;
+
+ device_state_change = state_change->devices;
+ peer_device_state_change = state_change->peer_devices;
+ idr_for_each_entry(&resource->devices, device, vnr) {
+ kref_get(&device->kref);
+ device_state_change->device = device;
+ device_state_change->disk_state[OLD] = device->state.disk;
+
+ /* The peer_devices for each device have to be enumerated in
+ the order of the connections. We may not use for_each_peer_device() here. */
+ for_each_connection(connection, resource) {
+ struct drbd_peer_device *peer_device;
+
+ if (!has_net_conf(connection))
+ continue;
+ peer_device = conn_peer_device(connection, device->vnr);
+ peer_device_state_change->peer_device = peer_device;
+ peer_device_state_change->disk_state[OLD] =
+ device->state.pdsk;
+ peer_device_state_change->repl_state[OLD] =
+ max_t(enum drbd_conns,
+ C_WF_REPORT_PARAMS, device->state.conn);
+ peer_device_state_change->resync_susp_user[OLD] =
+ device->state.user_isp;
+ peer_device_state_change->resync_susp_peer[OLD] =
+ device->state.peer_isp;
+ peer_device_state_change->resync_susp_dependency[OLD] =
+ device->state.aftr_isp;
+ peer_device_state_change++;
+ }
+ device_state_change++;
+ }
+
+ connection_state_change = state_change->connections;
+ for_each_connection(connection, resource) {
+ if (!has_net_conf(connection))
+ continue;
+ kref_get(&connection->kref);
+ connection_state_change->connection = connection;
+ connection_state_change->cstate[OLD] =
+ connection->cstate;
+ connection_state_change->peer_role[OLD] =
+ conn_highest_peer(connection);
+ connection_state_change++;
+ }
+ rcu_read_unlock();
+
+ return state_change;
+}
+
+static void remember_new_state(struct drbd_state_change *state_change)
+{
+ struct drbd_resource_state_change *resource_state_change;
+ struct drbd_resource *resource;
+ unsigned int n;
+
+ if (!state_change)
+ return;
+
+ resource_state_change = &state_change->resource[0];
+ resource = resource_state_change->resource;
+
+ resource_state_change->role[NEW] =
+ conn_highest_role(first_connection(resource));
+ resource_state_change->susp[NEW] = resource->susp;
+ resource_state_change->susp_nod[NEW] = resource->susp_nod;
+ resource_state_change->susp_fen[NEW] = resource->susp_fen;
+
+ for (n = 0; n < state_change->n_devices; n++) {
+ struct drbd_device_state_change *device_state_change =
+ &state_change->devices[n];
+ struct drbd_device *device = device_state_change->device;
+
+ device_state_change->disk_state[NEW] = device->state.disk;
+ }
+
+ for (n = 0; n < state_change->n_connections; n++) {
+ struct drbd_connection_state_change *connection_state_change =
+ &state_change->connections[n];
+ struct drbd_connection *connection =
+ connection_state_change->connection;
+
+ connection_state_change->cstate[NEW] = connection->cstate;
+ connection_state_change->peer_role[NEW] =
+ conn_highest_peer(connection);
+ }
+
+ for (n = 0; n < state_change->n_devices * state_change->n_connections; n++) {
+ struct drbd_peer_device_state_change *peer_device_state_change =
+ &state_change->peer_devices[n];
+ struct drbd_device *device =
+ peer_device_state_change->peer_device->device;
+ union drbd_dev_state state = device->state;
+
+ peer_device_state_change->disk_state[NEW] = state.pdsk;
+ peer_device_state_change->repl_state[NEW] =
+ max_t(enum drbd_conns, C_WF_REPORT_PARAMS, state.conn);
+ peer_device_state_change->resync_susp_user[NEW] =
+ state.user_isp;
+ peer_device_state_change->resync_susp_peer[NEW] =
+ state.peer_isp;
+ peer_device_state_change->resync_susp_dependency[NEW] =
+ state.aftr_isp;
+ }
+}
+
+void copy_old_to_new_state_change(struct drbd_state_change *state_change)
+{
+ struct drbd_resource_state_change *resource_state_change = &state_change->resource[0];
+ unsigned int n_device, n_connection, n_peer_device, n_peer_devices;
+
+#define OLD_TO_NEW(x) \
+ (x[NEW] = x[OLD])
+
+ OLD_TO_NEW(resource_state_change->role);
+ OLD_TO_NEW(resource_state_change->susp);
+ OLD_TO_NEW(resource_state_change->susp_nod);
+ OLD_TO_NEW(resource_state_change->susp_fen);
+
+ for (n_connection = 0; n_connection < state_change->n_connections; n_connection++) {
+ struct drbd_connection_state_change *connection_state_change =
+ &state_change->connections[n_connection];
+
+ OLD_TO_NEW(connection_state_change->peer_role);
+ OLD_TO_NEW(connection_state_change->cstate);
+ }
+
+ for (n_device = 0; n_device < state_change->n_devices; n_device++) {
+ struct drbd_device_state_change *device_state_change =
+ &state_change->devices[n_device];
+
+ OLD_TO_NEW(device_state_change->disk_state);
+ }
+
+ n_peer_devices = state_change->n_devices * state_change->n_connections;
+ for (n_peer_device = 0; n_peer_device < n_peer_devices; n_peer_device++) {
+ struct drbd_peer_device_state_change *p =
+ &state_change->peer_devices[n_peer_device];
+
+ OLD_TO_NEW(p->disk_state);
+ OLD_TO_NEW(p->repl_state);
+ OLD_TO_NEW(p->resync_susp_user);
+ OLD_TO_NEW(p->resync_susp_peer);
+ OLD_TO_NEW(p->resync_susp_dependency);
+ }
+
+#undef OLD_TO_NEW
+}
+
+void forget_state_change(struct drbd_state_change *state_change)
+{
+ unsigned int n;
+
+ if (!state_change)
+ return;
+
+ if (state_change->resource->resource)
+ kref_put(&state_change->resource->resource->kref, drbd_destroy_resource);
+ for (n = 0; n < state_change->n_devices; n++) {
+ struct drbd_device *device = state_change->devices[n].device;
+
+ if (device)
+ kref_put(&device->kref, drbd_destroy_device);
+ }
+ for (n = 0; n < state_change->n_connections; n++) {
+ struct drbd_connection *connection =
+ state_change->connections[n].connection;
+
+ if (connection)
+ kref_put(&connection->kref, drbd_destroy_connection);
+ }
+ kfree(state_change);
+}
+
static int w_after_state_ch(struct drbd_work *w, int unused);
static void after_state_ch(struct drbd_device *device, union drbd_state os,
- union drbd_state ns, enum chg_state_flags flags);
+ union drbd_state ns, enum chg_state_flags flags,
+ struct drbd_state_change *);
static enum drbd_state_rv is_valid_state(struct drbd_device *, union drbd_state);
static enum drbd_state_rv is_valid_soft_transition(union drbd_state, union drbd_state, struct drbd_connection *);
static enum drbd_state_rv is_valid_transition(union drbd_state os, union drbd_state ns);
@@ -93,6 +352,7 @@ static enum drbd_role max_role(enum drbd_role role1, enum drbd_role role2)
return R_SECONDARY;
return R_UNKNOWN;
}
+
static enum drbd_role min_role(enum drbd_role role1, enum drbd_role role2)
{
if (role1 == R_UNKNOWN || role2 == R_UNKNOWN)
@@ -983,6 +1243,7 @@ _drbd_set_state(struct drbd_device *device, union drbd_state ns,
enum drbd_state_rv rv = SS_SUCCESS;
enum sanitize_state_warnings ssw;
struct after_state_chg_work *ascw;
+ struct drbd_state_change *state_change;

os = drbd_read_state(device);

@@ -1037,6 +1298,9 @@ _drbd_set_state(struct drbd_device *device, union drbd_state ns,
if (!is_sync_state(os.conn) && is_sync_state(ns.conn))
clear_bit(RS_DONE, &device->flags);

+ /* FIXME: Have any flags been set earlier in this function already? */
+ state_change = remember_old_state(device->resource, GFP_ATOMIC);
+
/* changes to local_cnt and device flags should be visible before
* changes to state, which again should be visible before anything else
* depending on that change happens. */
@@ -1047,6 +1311,8 @@ _drbd_set_state(struct drbd_device *device, union drbd_state ns,
device->resource->susp_fen = ns.susp_fen;
smp_wmb();

+ remember_new_state(state_change);
+
/* put replicated vs not-replicated requests in seperate epochs */
if (drbd_should_do_remote((union drbd_dev_state)os.i) !=
drbd_should_do_remote((union drbd_dev_state)ns.i))
@@ -1184,6 +1450,7 @@ _drbd_set_state(struct drbd_device *device, union drbd_state ns,
ascw->w.cb = w_after_state_ch;
ascw->device = device;
ascw->done = done;
+ ascw->state_change = state_change;
drbd_queue_work(&connection->sender_work,
&ascw->w);
} else {
@@ -1199,7 +1466,8 @@ static int w_after_state_ch(struct drbd_work *w, int unused)
container_of(w, struct after_state_chg_work, w);
struct drbd_device *device = ascw->device;

- after_state_ch(device, ascw->os, ascw->ns, ascw->flags);
+ after_state_ch(device, ascw->os, ascw->ns, ascw->flags, ascw->state_change);
+ forget_state_change(ascw->state_change);
if (ascw->flags & CS_WAIT_COMPLETE)
complete(ascw->done);
kfree(ascw);
@@ -1245,6 +1513,139 @@ int drbd_bitmap_io_from_worker(struct drbd_device *device,
return rv;
}

+void notify_resource_state_change(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_resource_state_change *resource_state_change,
+ enum drbd_notification_type type)
+{
+ struct drbd_resource *resource = resource_state_change->resource;
+ struct resource_info resource_info = {
+ .res_role = resource_state_change->role[NEW],
+ .res_susp = resource_state_change->susp[NEW],
+ .res_susp_nod = resource_state_change->susp_nod[NEW],
+ .res_susp_fen = resource_state_change->susp_fen[NEW],
+ };
+
+ notify_resource_state(skb, seq, resource, &resource_info, type);
+}
+
+void notify_connection_state_change(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_connection_state_change *connection_state_change,
+ enum drbd_notification_type type)
+{
+ struct drbd_connection *connection = connection_state_change->connection;
+ struct connection_info connection_info = {
+ .conn_connection_state = connection_state_change->cstate[NEW],
+ .conn_role = connection_state_change->peer_role[NEW],
+ };
+
+ notify_connection_state(skb, seq, connection, &connection_info, type);
+}
+
+void notify_device_state_change(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_device_state_change *device_state_change,
+ enum drbd_notification_type type)
+{
+ struct drbd_device *device = device_state_change->device;
+ struct device_info device_info = {
+ .dev_disk_state = device_state_change->disk_state[NEW],
+ };
+
+ notify_device_state(skb, seq, device, &device_info, type);
+}
+
+void notify_peer_device_state_change(struct sk_buff *skb,
+ unsigned int seq,
+ struct drbd_peer_device_state_change *p,
+ enum drbd_notification_type type)
+{
+ struct drbd_peer_device *peer_device = p->peer_device;
+ struct peer_device_info peer_device_info = {
+ .peer_repl_state = p->repl_state[NEW],
+ .peer_disk_state = p->disk_state[NEW],
+ .peer_resync_susp_user = p->resync_susp_user[NEW],
+ .peer_resync_susp_peer = p->resync_susp_peer[NEW],
+ .peer_resync_susp_dependency = p->resync_susp_dependency[NEW],
+ };
+
+ notify_peer_device_state(skb, seq, peer_device, &peer_device_info, type);
+}
+
+static void broadcast_state_change(struct drbd_state_change *state_change)
+{
+ struct drbd_resource_state_change *resource_state_change = &state_change->resource[0];
+ bool resource_state_has_changed;
+ unsigned int n_device, n_connection, n_peer_device, n_peer_devices;
+ void (*last_func)(struct sk_buff *, unsigned int, void *,
+ enum drbd_notification_type) = NULL;
+ void *uninitialized_var(last_arg);
+
+#define HAS_CHANGED(state) ((state)[OLD] != (state)[NEW])
+#define FINAL_STATE_CHANGE(type) \
+ ({ if (last_func) \
+ last_func(NULL, 0, last_arg, type); \
+ })
+#define REMEMBER_STATE_CHANGE(func, arg, type) \
+ ({ FINAL_STATE_CHANGE(type | NOTIFY_CONTINUES); \
+ last_func = (typeof(last_func))func; \
+ last_arg = arg; \
+ })
+
+ mutex_lock(&notification_mutex);
+
+ resource_state_has_changed =
+ HAS_CHANGED(resource_state_change->role) ||
+ HAS_CHANGED(resource_state_change->susp) ||
+ HAS_CHANGED(resource_state_change->susp_nod) ||
+ HAS_CHANGED(resource_state_change->susp_fen);
+
+ if (resource_state_has_changed)
+ REMEMBER_STATE_CHANGE(notify_resource_state_change,
+ resource_state_change, NOTIFY_CHANGE);
+
+ for (n_connection = 0; n_connection < state_change->n_connections; n_connection++) {
+ struct drbd_connection_state_change *connection_state_change =
+ &state_change->connections[n_connection];
+
+ if (HAS_CHANGED(connection_state_change->peer_role) ||
+ HAS_CHANGED(connection_state_change->cstate))
+ REMEMBER_STATE_CHANGE(notify_connection_state_change,
+ connection_state_change, NOTIFY_CHANGE);
+ }
+
+ for (n_device = 0; n_device < state_change->n_devices; n_device++) {
+ struct drbd_device_state_change *device_state_change =
+ &state_change->devices[n_device];
+
+ if (HAS_CHANGED(device_state_change->disk_state))
+ REMEMBER_STATE_CHANGE(notify_device_state_change,
+ device_state_change, NOTIFY_CHANGE);
+ }
+
+ n_peer_devices = state_change->n_devices * state_change->n_connections;
+ for (n_peer_device = 0; n_peer_device < n_peer_devices; n_peer_device++) {
+ struct drbd_peer_device_state_change *p =
+ &state_change->peer_devices[n_peer_device];
+
+ if (HAS_CHANGED(p->disk_state) ||
+ HAS_CHANGED(p->repl_state) ||
+ HAS_CHANGED(p->resync_susp_user) ||
+ HAS_CHANGED(p->resync_susp_peer) ||
+ HAS_CHANGED(p->resync_susp_dependency))
+ REMEMBER_STATE_CHANGE(notify_peer_device_state_change,
+ p, NOTIFY_CHANGE);
+ }
+
+ FINAL_STATE_CHANGE(NOTIFY_CHANGE);
+ mutex_unlock(&notification_mutex);
+
+#undef HAS_CHANGED
+#undef FINAL_STATE_CHANGE
+#undef REMEMBER_STATE_CHANGE
+}
+
/**
* after_state_ch() - Perform after state change actions that may sleep
* @device: DRBD device.
@@ -1253,13 +1654,16 @@ int drbd_bitmap_io_from_worker(struct drbd_device *device,
* @flags: Flags
*/
static void after_state_ch(struct drbd_device *device, union drbd_state os,
- union drbd_state ns, enum chg_state_flags flags)
+ union drbd_state ns, enum chg_state_flags flags,
+ struct drbd_state_change *state_change)
{
struct drbd_resource *resource = device->resource;
struct drbd_peer_device *peer_device = first_peer_device(device);
struct drbd_connection *connection = peer_device ? peer_device->connection : NULL;
struct sib_info sib;

+ broadcast_state_change(state_change);
+
sib.sib_reason = SIB_STATE_CHANGE;
sib.os = os;
sib.ns = ns;
@@ -1572,6 +1976,7 @@ struct after_conn_state_chg_work {
union drbd_state ns_max; /* new, max state, over all devices */
enum chg_state_flags flags;
struct drbd_connection *connection;
+ struct drbd_state_change *state_change;
};

static int w_after_conn_state_ch(struct drbd_work *w, int unused)
@@ -1584,6 +1989,8 @@ static int w_after_conn_state_ch(struct drbd_work *w, int unused)
struct drbd_peer_device *peer_device;
int vnr;

+ broadcast_state_change(acscw->state_change);
+ forget_state_change(acscw->state_change);
kfree(acscw);

/* Upon network configuration, we need to start the receiver */
@@ -1593,6 +2000,13 @@ static int w_after_conn_state_ch(struct drbd_work *w, int unused)
if (oc == C_DISCONNECTING && ns_max.conn == C_STANDALONE) {
struct net_conf *old_conf;

+ mutex_lock(&notification_mutex);
+ idr_for_each_entry(&connection->peer_devices, peer_device, vnr)
+ notify_peer_device_state(NULL, 0, peer_device, NULL,
+ NOTIFY_DESTROY | NOTIFY_CONTINUES);
+ notify_connection_state(NULL, 0, connection, NULL, NOTIFY_DESTROY);
+ mutex_unlock(&notification_mutex);
+
mutex_lock(&connection->resource->conf_update);
old_conf = connection->net_conf;
connection->my_addr_len = 0;
@@ -1823,6 +2237,7 @@ _conn_request_state(struct drbd_connection *connection, union drbd_state mask, u
enum drbd_conns oc = connection->cstate;
union drbd_state ns_max, ns_min, os;
bool have_mutex = false;
+ struct drbd_state_change *state_change;

if (mask.conn) {
rv = is_valid_conn_transition(oc, val.conn);
@@ -1868,10 +2283,12 @@ _conn_request_state(struct drbd_connection *connection, union drbd_state mask, u
goto abort;
}

+ state_change = remember_old_state(connection->resource, GFP_ATOMIC);
conn_old_common_state(connection, &os, &flags);
flags |= CS_DC_SUSP;
conn_set_state(connection, mask, val, &ns_min, &ns_max, flags);
conn_pr_state_change(connection, os, ns_max, flags);
+ remember_new_state(state_change);

acscw = kmalloc(sizeof(*acscw), GFP_ATOMIC);
if (acscw) {
@@ -1882,6 +2299,7 @@ _conn_request_state(struct drbd_connection *connection, union drbd_state mask, u
acscw->w.cb = w_after_conn_state_ch;
kref_get(&connection->kref);
acscw->connection = connection;
+ acscw->state_change = state_change;
drbd_queue_work(&connection->sender_work, &acscw->w);
} else {
drbd_err(connection, "Could not kmalloc an acscw\n");
diff --git a/drivers/block/drbd/drbd_state_change.h b/drivers/block/drbd/drbd_state_change.h
new file mode 100644
index 0000000..9e503a1
--- /dev/null
+++ b/drivers/block/drbd/drbd_state_change.h
@@ -0,0 +1,63 @@
+#ifndef DRBD_STATE_CHANGE_H
+#define DRBD_STATE_CHANGE_H
+
+struct drbd_resource_state_change {
+ struct drbd_resource *resource;
+ enum drbd_role role[2];
+ bool susp[2];
+ bool susp_nod[2];
+ bool susp_fen[2];
+};
+
+struct drbd_device_state_change {
+ struct drbd_device *device;
+ enum drbd_disk_state disk_state[2];
+};
+
+struct drbd_connection_state_change {
+ struct drbd_connection *connection;
+ enum drbd_conns cstate[2]; /* drbd9: enum drbd_conn_state */
+ enum drbd_role peer_role[2];
+};
+
+struct drbd_peer_device_state_change {
+ struct drbd_peer_device *peer_device;
+ enum drbd_disk_state disk_state[2];
+ enum drbd_conns repl_state[2]; /* drbd9: enum drbd_repl_state */
+ bool resync_susp_user[2];
+ bool resync_susp_peer[2];
+ bool resync_susp_dependency[2];
+};
+
+struct drbd_state_change {
+ struct list_head list;
+ unsigned int n_devices;
+ unsigned int n_connections;
+ struct drbd_resource_state_change resource[1];
+ struct drbd_device_state_change *devices;
+ struct drbd_connection_state_change *connections;
+ struct drbd_peer_device_state_change *peer_devices;
+};
+
+extern struct drbd_state_change *remember_old_state(struct drbd_resource *, gfp_t);
+extern void copy_old_to_new_state_change(struct drbd_state_change *);
+extern void forget_state_change(struct drbd_state_change *);
+
+extern void notify_resource_state_change(struct sk_buff *,
+ unsigned int,
+ struct drbd_resource_state_change *,
+ enum drbd_notification_type type);
+extern void notify_connection_state_change(struct sk_buff *,
+ unsigned int,
+ struct drbd_connection_state_change *,
+ enum drbd_notification_type type);
+extern void notify_device_state_change(struct sk_buff *,
+ unsigned int,
+ struct drbd_device_state_change *,
+ enum drbd_notification_type type);
+extern void notify_peer_device_state_change(struct sk_buff *,
+ unsigned int,
+ struct drbd_peer_device_state_change *,
+ enum drbd_notification_type type);
+
+#endif /* DRBD_STATE_CHANGE_H */
diff --git a/include/linux/drbd.h b/include/linux/drbd.h
index 15a1472..2c44d7e 100644
--- a/include/linux/drbd.h
+++ b/include/linux/drbd.h
@@ -339,6 +339,8 @@ enum drbd_state_rv {
#define MDF_AL_CLEAN (1 << 7)
#define MDF_AL_DISABLED (1 << 8)

+#define MAX_PEERS 32
+
enum drbd_uuid_index {
UI_CURRENT,
UI_BITMAP,
@@ -349,12 +351,26 @@ enum drbd_uuid_index {
UI_EXTENDED_SIZE /* Everything. */
};

+#define HISTORY_UUIDS MAX_PEERS
+
enum drbd_timeout_flag {
UT_DEFAULT = 0,
UT_DEGRADED = 1,
UT_PEER_OUTDATED = 2,
};

+enum drbd_notification_type {
+ NOTIFY_EXISTS,
+ NOTIFY_CREATE,
+ NOTIFY_CHANGE,
+ NOTIFY_DESTROY,
+ NOTIFY_CALL,
+ NOTIFY_RESPONSE,
+
+ NOTIFY_CONTINUES = 0x8000,
+ NOTIFY_FLAGS = NOTIFY_CONTINUES,
+};
+
#define UUID_JUST_CREATED ((__u64)4)

enum write_ordering_e {
diff --git a/include/linux/drbd_genl.h b/include/linux/drbd_genl.h
index 7b131ed..90304f8 100644
--- a/include/linux/drbd_genl.h
+++ b/include/linux/drbd_genl.h
@@ -250,6 +250,76 @@ GENL_struct(DRBD_NLA_DETACH_PARMS, 13, detach_parms,
__flg_field(1, DRBD_GENLA_F_MANDATORY, force_detach)
)

+GENL_struct(DRBD_NLA_RESOURCE_INFO, 15, resource_info,
+ __u32_field(1, 0, res_role)
+ __flg_field(2, 0, res_susp)
+ __flg_field(3, 0, res_susp_nod)
+ __flg_field(4, 0, res_susp_fen)
+ /* __flg_field(5, 0, res_weak) */
+)
+
+GENL_struct(DRBD_NLA_DEVICE_INFO, 16, device_info,
+ __u32_field(1, 0, dev_disk_state)
+)
+
+GENL_struct(DRBD_NLA_CONNECTION_INFO, 17, connection_info,
+ __u32_field(1, 0, conn_connection_state)
+ __u32_field(2, 0, conn_role)
+)
+
+GENL_struct(DRBD_NLA_PEER_DEVICE_INFO, 18, peer_device_info,
+ __u32_field(1, 0, peer_repl_state)
+ __u32_field(2, 0, peer_disk_state)
+ __u32_field(3, 0, peer_resync_susp_user)
+ __u32_field(4, 0, peer_resync_susp_peer)
+ __u32_field(5, 0, peer_resync_susp_dependency)
+)
+
+GENL_struct(DRBD_NLA_RESOURCE_STATISTICS, 19, resource_statistics,
+ __u32_field(1, 0, res_stat_write_ordering)
+)
+
+GENL_struct(DRBD_NLA_DEVICE_STATISTICS, 20, device_statistics,
+ __u64_field(1, 0, dev_size) /* (sectors) */
+ __u64_field(2, 0, dev_read) /* (sectors) */
+ __u64_field(3, 0, dev_write) /* (sectors) */
+ __u64_field(4, 0, dev_al_writes) /* activity log writes (count) */
+ __u64_field(5, 0, dev_bm_writes) /* bitmap writes (count) */
+ __u32_field(6, 0, dev_upper_pending) /* application requests in progress */
+ __u32_field(7, 0, dev_lower_pending) /* backing device requests in progress */
+ __flg_field(8, 0, dev_upper_blocked)
+ __flg_field(9, 0, dev_lower_blocked)
+ __flg_field(10, 0, dev_al_suspended) /* activity log suspended */
+ __u64_field(11, 0, dev_exposed_data_uuid)
+ __u64_field(12, 0, dev_current_uuid)
+ __u32_field(13, 0, dev_disk_flags)
+ __bin_field(14, 0, history_uuids, HISTORY_UUIDS * sizeof(__u64))
+)
+
+GENL_struct(DRBD_NLA_CONNECTION_STATISTICS, 21, connection_statistics,
+ __flg_field(1, 0, conn_congested)
+)
+
+GENL_struct(DRBD_NLA_PEER_DEVICE_STATISTICS, 22, peer_device_statistics,
+ __u64_field(1, 0, peer_dev_received) /* sectors */
+ __u64_field(2, 0, peer_dev_sent) /* sectors */
+ __u32_field(3, 0, peer_dev_pending) /* number of requests */
+ __u32_field(4, 0, peer_dev_unacked) /* number of requests */
+ __u64_field(5, 0, peer_dev_out_of_sync) /* sectors */
+ __u64_field(6, 0, peer_dev_resync_failed) /* sectors */
+ __u64_field(7, 0, peer_dev_bitmap_uuid)
+ __u32_field(9, 0, peer_dev_flags)
+)
+
+GENL_struct(DRBD_NLA_NOTIFICATION_HEADER, 23, drbd_notification_header,
+ __u32_field(1, DRBD_GENLA_F_MANDATORY, nh_type)
+)
+
+GENL_struct(DRBD_NLA_HELPER, 24, drbd_helper_info,
+ __str_field(1, DRBD_GENLA_F_MANDATORY, helper_name, 32)
+ __u32_field(2, DRBD_GENLA_F_MANDATORY, helper_status)
+)
+
/*
* Notifications and commands (genlmsghdr->cmd)
*/
@@ -382,3 +452,47 @@ GENL_op(DRBD_ADM_GET_TIMEOUT_TYPE, 26, GENL_doit(drbd_adm_get_timeout_type),
GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED))
GENL_op(DRBD_ADM_DOWN, 27, GENL_doit(drbd_adm_down),
GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED))
+
+GENL_notification(
+ DRBD_RESOURCE_STATE, 34, events,
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_NOTIFICATION_HEADER, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_RESOURCE_INFO, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_RESOURCE_STATISTICS, DRBD_F_REQUIRED))
+
+GENL_notification(
+ DRBD_DEVICE_STATE, 35, events,
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_NOTIFICATION_HEADER, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_DEVICE_INFO, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_DEVICE_STATISTICS, DRBD_F_REQUIRED))
+
+GENL_notification(
+ DRBD_CONNECTION_STATE, 36, events,
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_NOTIFICATION_HEADER, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_CONNECTION_INFO, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_CONNECTION_STATISTICS, DRBD_F_REQUIRED))
+
+GENL_notification(
+ DRBD_PEER_DEVICE_STATE, 37, events,
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_NOTIFICATION_HEADER, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_PEER_DEVICE_INFO, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_PEER_DEVICE_STATISTICS, DRBD_F_REQUIRED))
+
+GENL_op(
+ DRBD_ADM_GET_INITIAL_STATE, 38,
+ GENL_op_init(
+ .dumpit = drbd_adm_get_initial_state,
+ ),
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY))
+
+GENL_notification(
+ DRBD_HELPER, 40, events,
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)
+ GENL_tla_expected(DRBD_NLA_HELPER, DRBD_F_REQUIRED))
+
+GENL_notification(
+ DRBD_INITIAL_STATE_DONE, 41, events,
+ GENL_tla_expected(DRBD_NLA_NOTIFICATION_HEADER, DRBD_F_REQUIRED))
--
1.9.1

2015-07-30 11:32:48

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 08/19] drbd: Backport the "status" command

From: Andreas Gruenbacher <[email protected]>

The status command originates the drbd9 code base. While for now we
keep the status information in /proc/drbd available, this commit
allows the user base to gracefully migrate their monitoring
infrastructure to the new status reporting interface.

In drbd9 no status information is exposed through /proc/drbd.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 566 +++++++++++++++++++++++++++++++++++++------
include/linux/drbd_genl.h | 35 +++
include/linux/idr.h | 14 ++
3 files changed, 536 insertions(+), 79 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index e3218f7..6ed3876 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -76,6 +76,13 @@ int drbd_adm_get_status(struct sk_buff *skb, struct genl_info *info);
int drbd_adm_get_timeout_type(struct sk_buff *skb, struct genl_info *info);
/* .dumpit */
int drbd_adm_get_status_all(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_dump_resources(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_dump_devices(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_dump_devices_done(struct netlink_callback *cb);
+int drbd_adm_dump_connections(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_dump_connections_done(struct netlink_callback *cb);
+int drbd_adm_dump_peer_devices(struct sk_buff *skb, struct netlink_callback *cb);
+int drbd_adm_dump_peer_devices_done(struct netlink_callback *cb);
int drbd_adm_get_initial_state(struct sk_buff *skb, struct netlink_callback *cb);

#include <linux/drbd_genl_api.h>
@@ -2965,6 +2972,486 @@ nla_put_failure:
}

/*
+ * The generic netlink dump callbacks are called outside the genl_lock(), so
+ * they cannot use the simple attribute parsing code which uses global
+ * attribute tables.
+ */
+static struct nlattr *find_cfg_context_attr(const struct nlmsghdr *nlh, int attr)
+{
+ const unsigned hdrlen = GENL_HDRLEN + GENL_MAGIC_FAMILY_HDRSZ;
+ const int maxtype = ARRAY_SIZE(drbd_cfg_context_nl_policy) - 1;
+ struct nlattr *nla;
+
+ nla = nla_find(nlmsg_attrdata(nlh, hdrlen), nlmsg_attrlen(nlh, hdrlen),
+ DRBD_NLA_CFG_CONTEXT);
+ if (!nla)
+ return NULL;
+ return drbd_nla_find_nested(maxtype, nla, __nla_type(attr));
+}
+
+static void resource_to_info(struct resource_info *, struct drbd_resource *);
+
+int drbd_adm_dump_resources(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct drbd_genlmsghdr *dh;
+ struct drbd_resource *resource;
+ struct resource_info resource_info;
+ struct resource_statistics resource_statistics;
+ int err;
+
+ rcu_read_lock();
+ if (cb->args[0]) {
+ for_each_resource_rcu(resource, &drbd_resources)
+ if (resource == (struct drbd_resource *)cb->args[0])
+ goto found_resource;
+ err = 0; /* resource was probably deleted */
+ goto out;
+ }
+ resource = list_entry(&drbd_resources,
+ struct drbd_resource, resources);
+
+found_resource:
+ list_for_each_entry_continue_rcu(resource, &drbd_resources, resources) {
+ goto put_result;
+ }
+ err = 0;
+ goto out;
+
+put_result:
+ dh = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, &drbd_genl_family,
+ NLM_F_MULTI, DRBD_ADM_GET_RESOURCES);
+ err = -ENOMEM;
+ if (!dh)
+ goto out;
+ dh->minor = -1U;
+ dh->ret_code = NO_ERROR;
+ err = nla_put_drbd_cfg_context(skb, resource, NULL, NULL);
+ if (err)
+ goto out;
+ err = res_opts_to_skb(skb, &resource->res_opts, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ resource_to_info(&resource_info, resource);
+ err = resource_info_to_skb(skb, &resource_info, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ resource_statistics.res_stat_write_ordering = resource->write_ordering;
+ err = resource_statistics_to_skb(skb, &resource_statistics, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ cb->args[0] = (long)resource;
+ genlmsg_end(skb, dh);
+ err = 0;
+
+out:
+ rcu_read_unlock();
+ if (err)
+ return err;
+ return skb->len;
+}
+
+static void device_to_statistics(struct device_statistics *s,
+ struct drbd_device *device)
+{
+ memset(s, 0, sizeof(*s));
+ s->dev_upper_blocked = !may_inc_ap_bio(device);
+ if (get_ldev(device)) {
+ struct drbd_md *md = &device->ldev->md;
+ u64 *history_uuids = (u64 *)s->history_uuids;
+ struct request_queue *q;
+ int n;
+
+ spin_lock_irq(&md->uuid_lock);
+ s->dev_current_uuid = md->uuid[UI_CURRENT];
+ BUILD_BUG_ON(sizeof(s->history_uuids) < UI_HISTORY_END - UI_HISTORY_START + 1);
+ for (n = 0; n < UI_HISTORY_END - UI_HISTORY_START + 1; n++)
+ history_uuids[n] = md->uuid[UI_HISTORY_START + n];
+ for (; n < HISTORY_UUIDS; n++)
+ history_uuids[n] = 0;
+ s->history_uuids_len = HISTORY_UUIDS;
+ spin_unlock_irq(&md->uuid_lock);
+
+ s->dev_disk_flags = md->flags;
+ q = bdev_get_queue(device->ldev->backing_bdev);
+ s->dev_lower_blocked =
+ bdi_congested(&q->backing_dev_info,
+ (1 << WB_async_congested) |
+ (1 << WB_sync_congested));
+ put_ldev(device);
+ }
+ s->dev_size = drbd_get_capacity(device->this_bdev);
+ s->dev_read = device->read_cnt;
+ s->dev_write = device->writ_cnt;
+ s->dev_al_writes = device->al_writ_cnt;
+ s->dev_bm_writes = device->bm_writ_cnt;
+ s->dev_upper_pending = atomic_read(&device->ap_bio_cnt);
+ s->dev_lower_pending = atomic_read(&device->local_cnt);
+ s->dev_al_suspended = test_bit(AL_SUSPENDED, &device->flags);
+ s->dev_exposed_data_uuid = device->ed_uuid;
+}
+
+static int put_resource_in_arg0(struct netlink_callback *cb, int holder_nr)
+{
+ if (cb->args[0]) {
+ struct drbd_resource *resource =
+ (struct drbd_resource *)cb->args[0];
+ kref_put(&resource->kref, drbd_destroy_resource);
+ }
+
+ return 0;
+}
+
+int drbd_adm_dump_devices_done(struct netlink_callback *cb) {
+ return put_resource_in_arg0(cb, 7);
+}
+
+static void device_to_info(struct device_info *, struct drbd_device *);
+
+int drbd_adm_dump_devices(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct nlattr *resource_filter;
+ struct drbd_resource *resource;
+ struct drbd_device *uninitialized_var(device);
+ int minor, err, retcode;
+ struct drbd_genlmsghdr *dh;
+ struct device_info device_info;
+ struct device_statistics device_statistics;
+ struct idr *idr_to_search;
+
+ resource = (struct drbd_resource *)cb->args[0];
+ if (!cb->args[0] && !cb->args[1]) {
+ resource_filter = find_cfg_context_attr(cb->nlh, T_ctx_resource_name);
+ if (resource_filter) {
+ retcode = ERR_RES_NOT_KNOWN;
+ resource = drbd_find_resource(nla_data(resource_filter));
+ if (!resource)
+ goto put_result;
+ cb->args[0] = (long)resource;
+ }
+ }
+
+ rcu_read_lock();
+ minor = cb->args[1];
+ idr_to_search = resource ? &resource->devices : &drbd_devices;
+ device = idr_get_next(idr_to_search, &minor);
+ if (!device) {
+ err = 0;
+ goto out;
+ }
+ idr_for_each_entry_continue(idr_to_search, device, minor) {
+ retcode = NO_ERROR;
+ goto put_result; /* only one iteration */
+ }
+ err = 0;
+ goto out; /* no more devices */
+
+put_result:
+ dh = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, &drbd_genl_family,
+ NLM_F_MULTI, DRBD_ADM_GET_DEVICES);
+ err = -ENOMEM;
+ if (!dh)
+ goto out;
+ dh->ret_code = retcode;
+ dh->minor = -1U;
+ if (retcode == NO_ERROR) {
+ dh->minor = device->minor;
+ err = nla_put_drbd_cfg_context(skb, device->resource, NULL, device);
+ if (err)
+ goto out;
+ if (get_ldev(device)) {
+ struct disk_conf *disk_conf =
+ rcu_dereference(device->ldev->disk_conf);
+
+ err = disk_conf_to_skb(skb, disk_conf, !capable(CAP_SYS_ADMIN));
+ put_ldev(device);
+ if (err)
+ goto out;
+ }
+ device_to_info(&device_info, device);
+ err = device_info_to_skb(skb, &device_info, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+
+ device_to_statistics(&device_statistics, device);
+ err = device_statistics_to_skb(skb, &device_statistics, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ cb->args[1] = minor + 1;
+ }
+ genlmsg_end(skb, dh);
+ err = 0;
+
+out:
+ rcu_read_unlock();
+ if (err)
+ return err;
+ return skb->len;
+}
+
+int drbd_adm_dump_connections_done(struct netlink_callback *cb)
+{
+ return put_resource_in_arg0(cb, 6);
+}
+
+enum { SINGLE_RESOURCE, ITERATE_RESOURCES };
+
+int drbd_adm_dump_connections(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct nlattr *resource_filter;
+ struct drbd_resource *resource = NULL, *next_resource;
+ struct drbd_connection *uninitialized_var(connection);
+ int err = 0, retcode;
+ struct drbd_genlmsghdr *dh;
+ struct connection_info connection_info;
+ struct connection_statistics connection_statistics;
+
+ rcu_read_lock();
+ resource = (struct drbd_resource *)cb->args[0];
+ if (!cb->args[0]) {
+ resource_filter = find_cfg_context_attr(cb->nlh, T_ctx_resource_name);
+ if (resource_filter) {
+ retcode = ERR_RES_NOT_KNOWN;
+ resource = drbd_find_resource(nla_data(resource_filter));
+ if (!resource)
+ goto put_result;
+ cb->args[0] = (long)resource;
+ cb->args[1] = SINGLE_RESOURCE;
+ }
+ }
+ if (!resource) {
+ if (list_empty(&drbd_resources))
+ goto out;
+ resource = list_first_entry(&drbd_resources, struct drbd_resource, resources);
+ kref_get(&resource->kref);
+ cb->args[0] = (long)resource;
+ cb->args[1] = ITERATE_RESOURCES;
+ }
+
+ next_resource:
+ rcu_read_unlock();
+ mutex_lock(&resource->conf_update);
+ rcu_read_lock();
+ if (cb->args[2]) {
+ for_each_connection_rcu(connection, resource)
+ if (connection == (struct drbd_connection *)cb->args[2])
+ goto found_connection;
+ /* connection was probably deleted */
+ goto no_more_connections;
+ }
+ connection = list_entry(&resource->connections, struct drbd_connection, connections);
+
+found_connection:
+ list_for_each_entry_continue_rcu(connection, &resource->connections, connections) {
+ if (!has_net_conf(connection))
+ continue;
+ retcode = NO_ERROR;
+ goto put_result; /* only one iteration */
+ }
+
+no_more_connections:
+ if (cb->args[1] == ITERATE_RESOURCES) {
+ for_each_resource_rcu(next_resource, &drbd_resources) {
+ if (next_resource == resource)
+ goto found_resource;
+ }
+ /* resource was probably deleted */
+ }
+ goto out;
+
+found_resource:
+ list_for_each_entry_continue_rcu(next_resource, &drbd_resources, resources) {
+ mutex_unlock(&resource->conf_update);
+ kref_put(&resource->kref, drbd_destroy_resource);
+ resource = next_resource;
+ kref_get(&resource->kref);
+ cb->args[0] = (long)resource;
+ cb->args[2] = 0;
+ goto next_resource;
+ }
+ goto out; /* no more resources */
+
+put_result:
+ dh = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, &drbd_genl_family,
+ NLM_F_MULTI, DRBD_ADM_GET_CONNECTIONS);
+ err = -ENOMEM;
+ if (!dh)
+ goto out;
+ dh->ret_code = retcode;
+ dh->minor = -1U;
+ if (retcode == NO_ERROR) {
+ struct net_conf *net_conf;
+
+ err = nla_put_drbd_cfg_context(skb, resource, connection, NULL);
+ if (err)
+ goto out;
+ net_conf = rcu_dereference(connection->net_conf);
+ if (net_conf) {
+ err = net_conf_to_skb(skb, net_conf, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ }
+ connection_to_info(&connection_info, connection);
+ err = connection_info_to_skb(skb, &connection_info, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ connection_statistics.conn_congested = test_bit(NET_CONGESTED, &connection->flags);
+ err = connection_statistics_to_skb(skb, &connection_statistics, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ cb->args[2] = (long)connection;
+ }
+ genlmsg_end(skb, dh);
+ err = 0;
+
+out:
+ rcu_read_unlock();
+ if (resource)
+ mutex_unlock(&resource->conf_update);
+ if (err)
+ return err;
+ return skb->len;
+}
+
+enum mdf_peer_flag {
+ MDF_PEER_CONNECTED = 1 << 0,
+ MDF_PEER_OUTDATED = 1 << 1,
+ MDF_PEER_FENCING = 1 << 2,
+ MDF_PEER_FULL_SYNC = 1 << 3,
+};
+
+static void peer_device_to_statistics(struct peer_device_statistics *s,
+ struct drbd_peer_device *peer_device)
+{
+ struct drbd_device *device = peer_device->device;
+
+ memset(s, 0, sizeof(*s));
+ s->peer_dev_received = device->recv_cnt;
+ s->peer_dev_sent = device->send_cnt;
+ s->peer_dev_pending = atomic_read(&device->ap_pending_cnt) +
+ atomic_read(&device->rs_pending_cnt);
+ s->peer_dev_unacked = atomic_read(&device->unacked_cnt);
+ s->peer_dev_out_of_sync = drbd_bm_total_weight(device) << (BM_BLOCK_SHIFT - 9);
+ s->peer_dev_resync_failed = device->rs_failed << (BM_BLOCK_SHIFT - 9);
+ if (get_ldev(device)) {
+ struct drbd_md *md = &device->ldev->md;
+
+ spin_lock_irq(&md->uuid_lock);
+ s->peer_dev_bitmap_uuid = md->uuid[UI_BITMAP];
+ spin_unlock_irq(&md->uuid_lock);
+ s->peer_dev_flags =
+ (drbd_md_test_flag(device->ldev, MDF_CONNECTED_IND) ?
+ MDF_PEER_CONNECTED : 0) +
+ (drbd_md_test_flag(device->ldev, MDF_CONSISTENT) &&
+ !drbd_md_test_flag(device->ldev, MDF_WAS_UP_TO_DATE) ?
+ MDF_PEER_OUTDATED : 0) +
+ /* FIXME: MDF_PEER_FENCING? */
+ (drbd_md_test_flag(device->ldev, MDF_FULL_SYNC) ?
+ MDF_PEER_FULL_SYNC : 0);
+ put_ldev(device);
+ }
+}
+
+int drbd_adm_dump_peer_devices_done(struct netlink_callback *cb)
+{
+ return put_resource_in_arg0(cb, 9);
+}
+
+int drbd_adm_dump_peer_devices(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ struct nlattr *resource_filter;
+ struct drbd_resource *resource;
+ struct drbd_device *uninitialized_var(device);
+ struct drbd_peer_device *peer_device = NULL;
+ int minor, err, retcode;
+ struct drbd_genlmsghdr *dh;
+ struct idr *idr_to_search;
+
+ resource = (struct drbd_resource *)cb->args[0];
+ if (!cb->args[0] && !cb->args[1]) {
+ resource_filter = find_cfg_context_attr(cb->nlh, T_ctx_resource_name);
+ if (resource_filter) {
+ retcode = ERR_RES_NOT_KNOWN;
+ resource = drbd_find_resource(nla_data(resource_filter));
+ if (!resource)
+ goto put_result;
+ }
+ cb->args[0] = (long)resource;
+ }
+
+ rcu_read_lock();
+ minor = cb->args[1];
+ idr_to_search = resource ? &resource->devices : &drbd_devices;
+ device = idr_find(idr_to_search, minor);
+ if (!device) {
+next_device:
+ minor++;
+ cb->args[2] = 0;
+ device = idr_get_next(idr_to_search, &minor);
+ if (!device) {
+ err = 0;
+ goto out;
+ }
+ }
+ if (cb->args[2]) {
+ for_each_peer_device(peer_device, device)
+ if (peer_device == (struct drbd_peer_device *)cb->args[2])
+ goto found_peer_device;
+ /* peer device was probably deleted */
+ goto next_device;
+ }
+ /* Make peer_device point to the list head (not the first entry). */
+ peer_device = list_entry(&device->peer_devices, struct drbd_peer_device, peer_devices);
+
+found_peer_device:
+ list_for_each_entry_continue_rcu(peer_device, &device->peer_devices, peer_devices) {
+ if (!has_net_conf(peer_device->connection))
+ continue;
+ retcode = NO_ERROR;
+ goto put_result; /* only one iteration */
+ }
+ goto next_device;
+
+put_result:
+ dh = genlmsg_put(skb, NETLINK_CB(cb->skb).portid,
+ cb->nlh->nlmsg_seq, &drbd_genl_family,
+ NLM_F_MULTI, DRBD_ADM_GET_PEER_DEVICES);
+ err = -ENOMEM;
+ if (!dh)
+ goto out;
+ dh->ret_code = retcode;
+ dh->minor = -1U;
+ if (retcode == NO_ERROR) {
+ struct peer_device_info peer_device_info;
+ struct peer_device_statistics peer_device_statistics;
+
+ dh->minor = minor;
+ err = nla_put_drbd_cfg_context(skb, device->resource, peer_device->connection, device);
+ if (err)
+ goto out;
+ peer_device_to_info(&peer_device_info, peer_device);
+ err = peer_device_info_to_skb(skb, &peer_device_info, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ peer_device_to_statistics(&peer_device_statistics, peer_device);
+ err = peer_device_statistics_to_skb(skb, &peer_device_statistics, !capable(CAP_SYS_ADMIN));
+ if (err)
+ goto out;
+ cb->args[1] = minor;
+ cb->args[2] = (long)peer_device;
+ }
+ genlmsg_end(skb, dh);
+ err = 0;
+
+out:
+ rcu_read_unlock();
+ if (err)
+ return err;
+ return skb->len;
+}
+/*
* Return the connection of @resource if @resource has exactly one connection.
*/
static struct drbd_connection *the_only_connection(struct drbd_resource *resource)
@@ -3818,85 +4305,6 @@ failed:
err, seq, sib->sib_reason);
}

-static void device_to_statistics(struct device_statistics *s,
- struct drbd_device *device)
-{
- memset(s, 0, sizeof(*s));
- s->dev_upper_blocked = !may_inc_ap_bio(device);
- if (get_ldev(device)) {
- struct drbd_md *md = &device->ldev->md;
- u64 *history_uuids = (u64 *)s->history_uuids;
- struct request_queue *q;
- int n;
-
- spin_lock_irq(&md->uuid_lock);
- s->dev_current_uuid = md->uuid[UI_CURRENT];
- BUILD_BUG_ON(sizeof(s->history_uuids) < UI_HISTORY_END - UI_HISTORY_START + 1);
- for (n = 0; n < UI_HISTORY_END - UI_HISTORY_START + 1; n++)
- history_uuids[n] = md->uuid[UI_HISTORY_START + n];
- for (; n < HISTORY_UUIDS; n++)
- history_uuids[n] = 0;
- s->history_uuids_len = HISTORY_UUIDS;
- spin_unlock_irq(&md->uuid_lock);
-
- s->dev_disk_flags = md->flags;
- q = bdev_get_queue(device->ldev->backing_bdev);
- s->dev_lower_blocked =
- bdi_congested(&q->backing_dev_info,
- (1 << WB_async_congested) |
- (1 << WB_sync_congested));
- put_ldev(device);
- }
- s->dev_size = drbd_get_capacity(device->this_bdev);
- s->dev_read = device->read_cnt;
- s->dev_write = device->writ_cnt;
- s->dev_al_writes = device->al_writ_cnt;
- s->dev_bm_writes = device->bm_writ_cnt;
- s->dev_upper_pending = atomic_read(&device->ap_bio_cnt);
- s->dev_lower_pending = atomic_read(&device->local_cnt);
- s->dev_al_suspended = test_bit(AL_SUSPENDED, &device->flags);
- s->dev_exposed_data_uuid = device->ed_uuid;
-}
-
-enum mdf_peer_flag {
- MDF_PEER_CONNECTED = 1 << 0,
- MDF_PEER_OUTDATED = 1 << 1,
- MDF_PEER_FENCING = 1 << 2,
- MDF_PEER_FULL_SYNC = 1 << 3,
-};
-
-static void peer_device_to_statistics(struct peer_device_statistics *s,
- struct drbd_peer_device *peer_device)
-{
- struct drbd_device *device = peer_device->device;
-
- memset(s, 0, sizeof(*s));
- s->peer_dev_received = device->recv_cnt;
- s->peer_dev_sent = device->send_cnt;
- s->peer_dev_pending = atomic_read(&device->ap_pending_cnt) +
- atomic_read(&device->rs_pending_cnt);
- s->peer_dev_unacked = atomic_read(&device->unacked_cnt);
- s->peer_dev_out_of_sync = drbd_bm_total_weight(device) << (BM_BLOCK_SHIFT - 9);
- s->peer_dev_resync_failed = device->rs_failed << (BM_BLOCK_SHIFT - 9);
- if (get_ldev(device)) {
- struct drbd_md *md = &device->ldev->md;
-
- spin_lock_irq(&md->uuid_lock);
- s->peer_dev_bitmap_uuid = md->uuid[UI_BITMAP];
- spin_unlock_irq(&md->uuid_lock);
- s->peer_dev_flags =
- (drbd_md_test_flag(device->ldev, MDF_CONNECTED_IND) ?
- MDF_PEER_CONNECTED : 0) +
- (drbd_md_test_flag(device->ldev, MDF_CONSISTENT) &&
- !drbd_md_test_flag(device->ldev, MDF_WAS_UP_TO_DATE) ?
- MDF_PEER_OUTDATED : 0) +
- /* FIXME: MDF_PEER_FENCING? */
- (drbd_md_test_flag(device->ldev, MDF_FULL_SYNC) ?
- MDF_PEER_FULL_SYNC : 0);
- put_ldev(device);
- }
-}
-
static int nla_put_notification_header(struct sk_buff *msg,
enum drbd_notification_type type)
{
diff --git a/include/linux/drbd_genl.h b/include/linux/drbd_genl.h
index 90304f8..2d0e5ad 100644
--- a/include/linux/drbd_genl.h
+++ b/include/linux/drbd_genl.h
@@ -453,6 +453,41 @@ GENL_op(DRBD_ADM_GET_TIMEOUT_TYPE, 26, GENL_doit(drbd_adm_get_timeout_type),
GENL_op(DRBD_ADM_DOWN, 27, GENL_doit(drbd_adm_down),
GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED))

+GENL_op(DRBD_ADM_GET_RESOURCES, 30,
+ GENL_op_init(
+ .dumpit = drbd_adm_dump_resources,
+ ),
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_RESOURCE_INFO, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_RESOURCE_STATISTICS, DRBD_GENLA_F_MANDATORY))
+
+GENL_op(DRBD_ADM_GET_DEVICES, 31,
+ GENL_op_init(
+ .dumpit = drbd_adm_dump_devices,
+ .done = drbd_adm_dump_devices_done,
+ ),
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_DEVICE_INFO, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_DEVICE_STATISTICS, DRBD_GENLA_F_MANDATORY))
+
+GENL_op(DRBD_ADM_GET_CONNECTIONS, 32,
+ GENL_op_init(
+ .dumpit = drbd_adm_dump_connections,
+ .done = drbd_adm_dump_connections_done,
+ ),
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_CONNECTION_INFO, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_CONNECTION_STATISTICS, DRBD_GENLA_F_MANDATORY))
+
+GENL_op(DRBD_ADM_GET_PEER_DEVICES, 33,
+ GENL_op_init(
+ .dumpit = drbd_adm_dump_peer_devices,
+ .done = drbd_adm_dump_peer_devices_done,
+ ),
+ GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_PEER_DEVICE_INFO, DRBD_GENLA_F_MANDATORY)
+ GENL_tla_expected(DRBD_NLA_PEER_DEVICE_STATISTICS, DRBD_GENLA_F_MANDATORY))
+
GENL_notification(
DRBD_RESOURCE_STATE, 34, events,
GENL_tla_expected(DRBD_NLA_CFG_CONTEXT, DRBD_F_REQUIRED)
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 013fd9b..083d61e 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -135,6 +135,20 @@ static inline void *idr_find(struct idr *idr, int id)
#define idr_for_each_entry(idp, entry, id) \
for (id = 0; ((entry) = idr_get_next(idp, &(id))) != NULL; ++id)

+/**
+ * idr_for_each_entry - continue iteration over an idr's elements of a given type
+ * @idp: idr handle
+ * @entry: the type * to use as cursor
+ * @id: id entry's key
+ *
+ * Continue to iterate over list of given type, continuing after
+ * the current position.
+ */
+#define idr_for_each_entry_continue(idp, entry, id) \
+ for ((entry) = idr_get_next((idp), &(id)); \
+ entry; \
+ ++id, (entry) = idr_get_next((idp), &(id)))
+
/*
* IDA - IDR based id allocator, use when translation from id to
* pointer isn't necessary.
--
1.9.1

2015-07-30 11:30:13

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 09/19] drbd: Deletion of an unnecessary check before the function call "lc_destroy"

From: Markus Elfring <[email protected]>

The lc_destroy() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <[email protected]>
Signed-off-by: Roland Kammerer <[email protected]>

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 6ed3876..727e63e 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1133,8 +1133,7 @@ static int drbd_check_al_size(struct drbd_device *device, struct disk_conf *dc)
lc_destroy(n);
return -EBUSY;
} else {
- if (t)
- lc_destroy(t);
+ lc_destroy(t);
}
drbd_md_mark_dirty(device); /* we changed device->act_log->nr_elemens */
return 0;
--
1.9.1

2015-07-30 11:30:59

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 10/19] drbd: Replace 0 with the more meaningful GFP_NOWAIT

GFP_NOWAIT has a value of 0. I.e. functionality not changed.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 727e63e..2178fb7 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -4289,7 +4289,7 @@ void drbd_bcast_event(struct drbd_device *device, const struct sib_info *sib)
if (nla_put_status_info(msg, device, sib))
goto nla_put_failure;
genlmsg_end(msg, d_out);
- err = drbd_genl_multicast_events(msg, 0);
+ err = drbd_genl_multicast_events(msg, GFP_NOWAIT);
/* msg has been consumed or freed in netlink_broadcast() */
if (err && err != -ESRCH)
goto failed;
@@ -4351,7 +4351,7 @@ void notify_resource_state(struct sk_buff *skb,
goto nla_put_failure;
genlmsg_end(skb, dh);
if (multicast) {
- err = drbd_genl_multicast_events(skb, 0);
+ err = drbd_genl_multicast_events(skb, GFP_NOWAIT);
/* skb has been consumed or freed in netlink_broadcast() */
if (err && err != -ESRCH)
goto failed;
@@ -4400,7 +4400,7 @@ void notify_device_state(struct sk_buff *skb,
device_statistics_to_skb(skb, &device_statistics, !capable(CAP_SYS_ADMIN));
genlmsg_end(skb, dh);
if (multicast) {
- err = drbd_genl_multicast_events(skb, 0);
+ err = drbd_genl_multicast_events(skb, GFP_NOWAIT);
/* skb has been consumed or freed in netlink_broadcast() */
if (err && err != -ESRCH)
goto failed;
@@ -4449,7 +4449,7 @@ void notify_connection_state(struct sk_buff *skb,
connection_statistics_to_skb(skb, &connection_statistics, !capable(CAP_SYS_ADMIN));
genlmsg_end(skb, dh);
if (multicast) {
- err = drbd_genl_multicast_events(skb, 0);
+ err = drbd_genl_multicast_events(skb, GFP_NOWAIT);
/* skb has been consumed or freed in netlink_broadcast() */
if (err && err != -ESRCH)
goto failed;
@@ -4499,7 +4499,7 @@ void notify_peer_device_state(struct sk_buff *skb,
peer_device_statistics_to_skb(skb, &peer_device_statistics, !capable(CAP_SYS_ADMIN));
genlmsg_end(skb, dh);
if (multicast) {
- err = drbd_genl_multicast_events(skb, 0);
+ err = drbd_genl_multicast_events(skb, GFP_NOWAIT);
/* skb has been consumed or freed in netlink_broadcast() */
if (err && err != -ESRCH)
goto failed;
@@ -4545,7 +4545,7 @@ void notify_helper(enum drbd_notification_type type,
drbd_helper_info_to_skb(skb, &helper_info, true))
goto unlock_fail;
genlmsg_end(skb, dh);
- err = drbd_genl_multicast_events(skb, 0);
+ err = drbd_genl_multicast_events(skb, GFP_NOWAIT);
skb = NULL;
/* skb has been consumed or freed in netlink_broadcast() */
if (err && err != -ESRCH)
--
1.9.1

2015-07-30 11:33:40

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 11/19] drbd: Fix spurious disk-timeout

From: Lars Ellenberg <[email protected]>

(You should not use disk-timeout anyways,
see the man page for why...)

We add incoming requests to the tail of some ring list.
On local completion, requests are removed from that list.
The timer looks only at the head of that ring list,
so is supposed to only see the oldest request.
All protected by a spinlock.

The request object is created with timestamps zeroed out.
The timestamp was only filled in just before the actual submit.
But to actually submit the request, we need to give up the spinlock.

If you are unlucky, there is no older still pending request, the timer
looks at a new request with timestamp still zero (before it even was
submitted), and 0 + timeout is most likely older than "now".

Better assign the timestamp right when we put the
request object on said ring list.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_req.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index e9981b5..26c194d 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1166,7 +1166,6 @@ drbd_submit_req_private_bio(struct drbd_request *req)
* stable storage, and this is a WRITE, we may not even submit
* this bio. */
if (get_ldev(device)) {
- req->pre_submit_jif = jiffies;
if (drbd_insert_fault(device,
rw == WRITE ? DRBD_FAULT_DT_WR
: rw == READ ? DRBD_FAULT_DT_RD
@@ -1309,6 +1308,7 @@ static void drbd_send_and_submit(struct drbd_device *device, struct drbd_request
&device->pending_master_completion[rw == WRITE]);
if (req->private_bio) {
/* needs to be marked within the same spinlock */
+ req->pre_submit_jif = jiffies;
list_add_tail(&req->req_pending_local,
&device->pending_completion[rw == WRITE]);
_req_mod(req, TO_BE_SUBMITTED);
--
1.9.1

2015-07-30 11:35:19

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 12/19] drbd: drop remnants of connector -- we don't use it anymore in drbd 8.4

From: Lars Ellenberg <[email protected]>

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
include/linux/drbd.h | 1 -
1 file changed, 1 deletion(-)

diff --git a/include/linux/drbd.h b/include/linux/drbd.h
index 2c44d7e..392fc0e 100644
--- a/include/linux/drbd.h
+++ b/include/linux/drbd.h
@@ -25,7 +25,6 @@
*/
#ifndef DRBD_H
#define DRBD_H
-#include <linux/connector.h>
#include <asm/types.h>

#ifdef __KERNEL__
--
1.9.1

2015-07-30 11:33:38

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 13/19] drbd: drbdsetup detach of an unresponsive local disk should not block IO "forever"

From: Lars Ellenberg <[email protected]>

When detaching, we make sure no application IO is in-flight
by internally suspending IO, then trigger the state change,
wait for the result, and finally internally resume IO again.

Once we triggered the stat change to "Failed",
we expect it to change from Failed to Diskless.
(To avoid races, we actually wait for it to leave "Failed").

On an unresponsive local IO backend, this may not happen, ever.
Don't have a "hung" detach block IO "forever", but resume IO
before waiting for the state change to Diskless.

We may well be able to continue IO to and from a healthy peer.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 2178fb7..7229d8e 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1929,9 +1929,9 @@ static int adm_detach(struct drbd_device *device, int force)
retcode = drbd_request_state(device, NS(disk, D_FAILED));
drbd_md_put_buffer(device);
/* D_FAILED will transition to DISKLESS. */
+ drbd_resume_io(device);
ret = wait_event_interruptible(device->misc_wait,
device->state.disk != D_FAILED);
- drbd_resume_io(device);
if ((int)retcode == (int)SS_IS_DISKLESS)
retcode = SS_NOTHING_TO_DO;
if (ret)
--
1.9.1

2015-07-30 11:31:01

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 14/19] drbd: also bump UUIDs if a diskless primary connects

From: Lars Ellenberg <[email protected]>

If for some reason the primary lost its disk *and* the replication link
before it is able to communicate the disk loss, probably blocked IO,
then later is able to re-establish the connection, the peer needs to
bump its UUIDs just like it does when peer only loses the disk
and is able to communicate this in time.

Otherwise, a later re-attach of the disk on the primary may start a
resync in the "wrong" direction.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index bc4b45b..06afd4d 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -1781,7 +1781,7 @@ static void after_state_ch(struct drbd_device *device, union drbd_state os,
}

if (ns.pdsk < D_INCONSISTENT && get_ldev(device)) {
- if (os.peer == R_SECONDARY && ns.peer == R_PRIMARY &&
+ if (os.peer != R_PRIMARY && ns.peer == R_PRIMARY &&
device->ldev->md.uuid[UI_BITMAP] == 0 && ns.disk >= D_UP_TO_DATE) {
drbd_uuid_new_current(device);
drbd_send_uuids(peer_device);
--
1.9.1

2015-07-30 11:30:06

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 15/19] drbd: add comment why we want to first call local-io-error, then send state

From: Lars Ellenberg <[email protected]>

Even though we really want to get the state information about our bad
disk to the peer as soon as possible, it is useful to first call the
local-io-error handler.

People may chose to hard-reset the box from there.
If that looks and behaves exactly like a "regular node crash", without
bumping the data generation UUIDs on the peer in between, it makes it
easier to deal with.

If you intend to return from the local-io-error handler, then better
return as quickly as possible to avoid triggering other timeouts.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_state.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 06afd4d..a4e4505 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -1859,6 +1859,10 @@ static void after_state_ch(struct drbd_device *device, union drbd_state os,

was_io_error = test_and_clear_bit(WAS_IO_ERROR, &device->flags);

+ /* Intentionally call this handler first, before drbd_send_state().
+ * See: 2932204 drbd: call local-io-error handler early
+ * People may chose to hard-reset the box from this handler.
+ * It is useful if this looks like a "regular node crash". */
if (was_io_error && eh == EP_CALL_HELPER)
drbd_khelper(device, "local-io-error");

--
1.9.1

2015-07-30 11:30:58

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 16/19] drbd: drbd_panic_after_delayed_completion_of_aborted_request()

From: Lars Ellenberg <[email protected]>

The only way to make DRBD intentionally call panic is to
set a disk timeout, have that trigger, "abort" some request and complete
to upper layers, then have the backend IO subsystem later complete these
requests successfully regardless.

As the attached IO pages have been recycled for other purposes
meanwhile, this will cause unexpected random memory changes.
To prevent corruption, we rather panic in that case.

Make it obvious from stack traces that this was the case by introducing
drbd_panic_after_delayed_completion_of_aborted_request().

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_worker.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index ef70e1f..93777b1 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -199,6 +199,12 @@ void drbd_peer_request_endio(struct bio *bio, int error)
}
}

+void drbd_panic_after_delayed_completion_of_aborted_request(struct drbd_device *device)
+{
+ panic("drbd%u %s/%u potential random memory corruption caused by delayed completion of aborted local request\n",
+ device->minor, device->resource->name, device->vnr);
+}
+
/* read, readA or write requests on R_PRIMARY coming from drbd_make_request
*/
void drbd_request_endio(struct bio *bio, int error)
@@ -253,7 +259,7 @@ void drbd_request_endio(struct bio *bio, int error)
drbd_emerg(device, "delayed completion of aborted local request; disk-timeout may be too aggressive\n");

if (!error)
- panic("possible random memory corruption caused by delayed completion of aborted local request\n");
+ drbd_panic_after_delayed_completion_of_aborted_request(device);
}

/* to avoid recursion in __req_mod */
--
1.9.1

2015-07-30 11:35:42

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 17/19] drbd: improve network timeout detection

From: Lars Ellenberg <[email protected]>

Don't blame the peer for being unresponsive,
if we did not even ask the question yet.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_int.h | 2 +
drivers/block/drbd/drbd_req.c | 123 ++++++++++++++++++++++++++++++---------
drivers/block/drbd/drbd_worker.c | 2 +
3 files changed, 100 insertions(+), 27 deletions(-)

diff --git a/drivers/block/drbd/drbd_int.h b/drivers/block/drbd/drbd_int.h
index 816afad..cb5fb68 100644
--- a/drivers/block/drbd/drbd_int.h
+++ b/drivers/block/drbd/drbd_int.h
@@ -773,6 +773,8 @@ struct drbd_connection {
struct drbd_thread_timing_details r_timing_details[DRBD_THREAD_DETAILS_HIST];

struct {
+ unsigned long last_sent_barrier_jif;
+
/* whether this sender thread
* has processed a single write yet. */
bool seen_any_write_yet;
diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 26c194d..3c8ade7 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -1561,6 +1561,78 @@ int drbd_merge_bvec(struct request_queue *q, struct bvec_merge_data *bvm, struct
return limit;
}

+static bool net_timeout_reached(struct drbd_request *net_req,
+ struct drbd_connection *connection,
+ unsigned long now, unsigned long ent,
+ unsigned int ko_count, unsigned int timeout)
+{
+ struct drbd_device *device = net_req->device;
+
+ if (!time_after(now, net_req->pre_send_jif + ent))
+ return false;
+
+ if (time_in_range(now, connection->last_reconnect_jif, connection->last_reconnect_jif + ent))
+ return false;
+
+ if (net_req->rq_state & RQ_NET_PENDING) {
+ drbd_warn(device, "Remote failed to finish a request within %ums > ko-count (%u) * timeout (%u * 0.1s)\n",
+ jiffies_to_msecs(now - net_req->pre_send_jif), ko_count, timeout);
+ return true;
+ }
+
+ /* We received an ACK already (or are using protocol A),
+ * but are waiting for the epoch closing barrier ack.
+ * Check if we sent the barrier already. We should not blame the peer
+ * for being unresponsive, if we did not even ask it yet. */
+ if (net_req->epoch == connection->send.current_epoch_nr) {
+ drbd_warn(device,
+ "We did not send a P_BARRIER for %ums > ko-count (%u) * timeout (%u * 0.1s); drbd kernel thread blocked?\n",
+ jiffies_to_msecs(now - net_req->pre_send_jif), ko_count, timeout);
+ return false;
+ }
+
+ /* Worst case: we may have been blocked for whatever reason, then
+ * suddenly are able to send a lot of requests (and epoch separating
+ * barriers) in quick succession.
+ * The timestamp of the net_req may be much too old and not correspond
+ * to the sending time of the relevant unack'ed barrier packet, so
+ * would trigger a spurious timeout. The latest barrier packet may
+ * have a too recent timestamp to trigger the timeout, potentially miss
+ * a timeout. Right now we don't have a place to conveniently store
+ * these timestamps.
+ * But in this particular situation, the application requests are still
+ * completed to upper layers, DRBD should still "feel" responsive.
+ * No need yet to kill this connection, it may still recover.
+ * If not, eventually we will have queued enough into the network for
+ * us to block. From that point of view, the timestamp of the last sent
+ * barrier packet is relevant enough.
+ */
+ if (time_after(now, connection->send.last_sent_barrier_jif + ent)) {
+ drbd_warn(device, "Remote failed to answer a P_BARRIER (sent at %lu jif; now=%lu jif) within %ums > ko-count (%u) * timeout (%u * 0.1s)\n",
+ connection->send.last_sent_barrier_jif, now,
+ jiffies_to_msecs(now - connection->send.last_sent_barrier_jif), ko_count, timeout);
+ return true;
+ }
+ return false;
+}
+
+/* A request is considered timed out, if
+ * - we have some effective timeout from the configuration,
+ * with some state restrictions applied,
+ * - the oldest request is waiting for a response from the network
+ * resp. the local disk,
+ * - the oldest request is in fact older than the effective timeout,
+ * - the connection was established (resp. disk was attached)
+ * for longer than the timeout already.
+ * Note that for 32bit jiffies and very stable connections/disks,
+ * we may have a wrap around, which is catched by
+ * !time_in_range(now, last_..._jif, last_..._jif + timeout).
+ *
+ * Side effect: once per 32bit wrap-around interval, which means every
+ * ~198 days with 250 HZ, we have a window where the timeout would need
+ * to expire twice (worst case) to become effective. Good enough.
+ */
+
void request_timer_fn(unsigned long data)
{
struct drbd_device *device = (struct drbd_device *) data;
@@ -1570,11 +1642,14 @@ void request_timer_fn(unsigned long data)
unsigned long oldest_submit_jif;
unsigned long ent = 0, dt = 0, et, nt; /* effective timeout = ko_count * timeout */
unsigned long now;
+ unsigned int ko_count = 0, timeout = 0;

rcu_read_lock();
nc = rcu_dereference(connection->net_conf);
- if (nc && device->state.conn >= C_WF_REPORT_PARAMS)
- ent = nc->timeout * HZ/10 * nc->ko_count;
+ if (nc && device->state.conn >= C_WF_REPORT_PARAMS) {
+ ko_count = nc->ko_count;
+ timeout = nc->timeout;
+ }

if (get_ldev(device)) { /* implicit state.disk >= D_INCONSISTENT */
dt = rcu_dereference(device->ldev->disk_conf)->disk_timeout * HZ / 10;
@@ -1582,6 +1657,8 @@ void request_timer_fn(unsigned long data)
}
rcu_read_unlock();

+
+ ent = timeout * HZ/10 * ko_count;
et = min_not_zero(dt, ent);

if (!et)
@@ -1593,11 +1670,22 @@ void request_timer_fn(unsigned long data)
spin_lock_irq(&device->resource->req_lock);
req_read = list_first_entry_or_null(&device->pending_completion[0], struct drbd_request, req_pending_local);
req_write = list_first_entry_or_null(&device->pending_completion[1], struct drbd_request, req_pending_local);
- req_peer = connection->req_not_net_done;
+
/* maybe the oldest request waiting for the peer is in fact still
- * blocking in tcp sendmsg */
- if (!req_peer && connection->req_next && connection->req_next->pre_send_jif)
- req_peer = connection->req_next;
+ * blocking in tcp sendmsg. That's ok, though, that's handled via the
+ * socket send timeout, requesting a ping, and bumping ko-count in
+ * we_should_drop_the_connection().
+ */
+
+ /* check the oldest request we did successfully sent,
+ * but which is still waiting for an ACK. */
+ req_peer = connection->req_ack_pending;
+
+ /* if we don't have such request (e.g. protocoll A)
+ * check the oldest requests which is still waiting on its epoch
+ * closing barrier ack. */
+ if (!req_peer)
+ req_peer = connection->req_not_net_done;

/* evaluate the oldest peer request only in one timer! */
if (req_peer && req_peer->device != device)
@@ -1614,28 +1702,9 @@ void request_timer_fn(unsigned long data)
: req_write ? req_write->pre_submit_jif
: req_read ? req_read->pre_submit_jif : now;

- /* The request is considered timed out, if
- * - we have some effective timeout from the configuration,
- * with above state restrictions applied,
- * - the oldest request is waiting for a response from the network
- * resp. the local disk,
- * - the oldest request is in fact older than the effective timeout,
- * - the connection was established (resp. disk was attached)
- * for longer than the timeout already.
- * Note that for 32bit jiffies and very stable connections/disks,
- * we may have a wrap around, which is catched by
- * !time_in_range(now, last_..._jif, last_..._jif + timeout).
- *
- * Side effect: once per 32bit wrap-around interval, which means every
- * ~198 days with 250 HZ, we have a window where the timeout would need
- * to expire twice (worst case) to become effective. Good enough.
- */
- if (ent && req_peer &&
- time_after(now, req_peer->pre_send_jif + ent) &&
- !time_in_range(now, connection->last_reconnect_jif, connection->last_reconnect_jif + ent)) {
- drbd_warn(device, "Remote failed to finish a request within ko-count * timeout\n");
+ if (ent && req_peer && net_timeout_reached(req_peer, connection, now, ent, ko_count, timeout))
_conn_request_state(connection, NS(conn, C_TIMEOUT), CS_VERBOSE | CS_HARD);
- }
+
if (dt && oldest_submit_jif != now &&
time_after(now, oldest_submit_jif + dt) &&
!time_in_range(now, device->last_reattach_jif, device->last_reattach_jif + dt)) {
diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 93777b1..02fad6c 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -1312,6 +1312,7 @@ static int drbd_send_barrier(struct drbd_connection *connection)
p->barrier = connection->send.current_epoch_nr;
p->pad = 0;
connection->send.current_epoch_writes = 0;
+ connection->send.last_sent_barrier_jif = jiffies;

return conn_send_command(connection, sock, P_BARRIER, sizeof(*p), NULL, 0);
}
@@ -1336,6 +1337,7 @@ static void re_init_if_first_write(struct drbd_connection *connection, unsigned
connection->send.seen_any_write_yet = true;
connection->send.current_epoch_nr = epoch;
connection->send.current_epoch_writes = 0;
+ connection->send.last_sent_barrier_jif = jiffies;
}
}

--
1.9.1

2015-07-30 11:33:43

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 18/19] drbd: fix NULL deref in remember_new_state

From: Lars Ellenberg <[email protected]>

The recent (not yet released) backport of the extended state broadcasts
to support the "events2" subcommand of drbdsetup had some glitches.

remember_old_state() would first count all connections with a
net_conf != NULL, then allocate a suitable array, then populate that
array with all connections found to have net_conf != NULL.

This races with the state change to C_STANDALONE,
and the NULL assignment there.

remember_new_state() then iterates over said connection array,
assuming that it would be fully populated.

But rcu_lock() just makes sure the thing some pointer points to,
if any, won't go away. It does not make the pointer itself immutable.

In fact there is no need to "filter" connections based on whether or not
they have a currently valid configuration. Just record them always, if
they don't have a config, that's fine, there will be no change then.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_state.c | 46 +++++++++++++----------------------------
1 file changed, 14 insertions(+), 32 deletions(-)

diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index a4e4505..f022e99 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -63,11 +63,8 @@ static void count_objects(struct drbd_resource *resource,

idr_for_each_entry(&resource->devices, device, vnr)
(*n_devices)++;
- for_each_connection(connection, resource) {
- if (!has_net_conf(connection))
- continue;
+ for_each_connection(connection, resource)
(*n_connections)++;
- }
}

static struct drbd_state_change *alloc_state_change(unsigned int n_devices, unsigned int n_connections, gfp_t gfp)
@@ -108,23 +105,13 @@ struct drbd_state_change *remember_old_state(struct drbd_resource *resource, gfp
struct drbd_peer_device_state_change *peer_device_state_change;
struct drbd_connection_state_change *connection_state_change;

-retry:
- rcu_read_lock();
+ /* Caller holds req_lock spinlock.
+ * No state, no device IDR, no connections lists can change. */
count_objects(resource, &n_devices, &n_connections);
- rcu_read_unlock();
state_change = alloc_state_change(n_devices, n_connections, gfp);
if (!state_change)
return NULL;

- rcu_read_lock();
- count_objects(resource, &n_devices, &n_connections);
- if (n_devices != state_change->n_devices ||
- n_connections != state_change->n_connections) {
- kfree(state_change);
- rcu_read_unlock();
- goto retry;
- }
-
kref_get(&resource->kref);
state_change->resource->resource = resource;
state_change->resource->role[OLD] =
@@ -133,6 +120,17 @@ retry:
state_change->resource->susp_nod[OLD] = resource->susp_nod;
state_change->resource->susp_fen[OLD] = resource->susp_fen;

+ connection_state_change = state_change->connections;
+ for_each_connection(connection, resource) {
+ kref_get(&connection->kref);
+ connection_state_change->connection = connection;
+ connection_state_change->cstate[OLD] =
+ connection->cstate;
+ connection_state_change->peer_role[OLD] =
+ conn_highest_peer(connection);
+ connection_state_change++;
+ }
+
device_state_change = state_change->devices;
peer_device_state_change = state_change->peer_devices;
idr_for_each_entry(&resource->devices, device, vnr) {
@@ -145,8 +143,6 @@ retry:
for_each_connection(connection, resource) {
struct drbd_peer_device *peer_device;

- if (!has_net_conf(connection))
- continue;
peer_device = conn_peer_device(connection, device->vnr);
peer_device_state_change->peer_device = peer_device;
peer_device_state_change->disk_state[OLD] =
@@ -165,20 +161,6 @@ retry:
device_state_change++;
}

- connection_state_change = state_change->connections;
- for_each_connection(connection, resource) {
- if (!has_net_conf(connection))
- continue;
- kref_get(&connection->kref);
- connection_state_change->connection = connection;
- connection_state_change->cstate[OLD] =
- connection->cstate;
- connection_state_change->peer_role[OLD] =
- conn_highest_peer(connection);
- connection_state_change++;
- }
- rcu_read_unlock();
-
return state_change;
}

--
1.9.1

2015-07-30 11:32:47

by Philipp Reisner

[permalink] [raw]
Subject: [PATCH 19/19] drbd: fix refcount error during detach of an already failed disk

From: Lars Ellenberg <[email protected]>

A D_FAILED disk transitions as quickly as possible to
D_DISKLESS. But in the "unresponsive local disk" case,
there remains a time window where a administrative detach command could
find the disk already failed, but some internal meta data IO against the
unresponsive local disk still pending.

In that case, drbd_md_get_buffer() will return NULL.
Don't unconditionally call drbd_md_put_buffer(), or it will cause
refcount imbalance, and prevent any further re-attach on this volume
(until it is deleted and re-created).

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 7229d8e..4c8d8e7 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1915,6 +1915,7 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
static int adm_detach(struct drbd_device *device, int force)
{
enum drbd_state_rv retcode;
+ void *buffer;
int ret;

if (force) {
@@ -1925,9 +1926,12 @@ static int adm_detach(struct drbd_device *device, int force)
}

drbd_suspend_io(device); /* so no-one is stuck in drbd_al_begin_io */
- drbd_md_get_buffer(device, __func__); /* make sure there is no in-flight meta-data IO */
- retcode = drbd_request_state(device, NS(disk, D_FAILED));
- drbd_md_put_buffer(device);
+ buffer = drbd_md_get_buffer(device, __func__); /* make sure there is no in-flight meta-data IO */
+ if (buffer) {
+ retcode = drbd_request_state(device, NS(disk, D_FAILED));
+ drbd_md_put_buffer(device);
+ } else /* already <= D_FAILED */
+ retcode = SS_NOTHING_TO_DO;
/* D_FAILED will transition to DISKLESS. */
drbd_resume_io(device);
ret = wait_event_interruptible(device->misc_wait,
--
1.9.1