LinuxLists.cc - [PATCH 00/17] RFC: Pile of DRBD fixes

2013-03-27 13:09:21

Subject: [PATCH 00/17] RFC: Pile of DRBD fixes

The first patch improves the idr_for_each_entry() macro and its
documentation, which is not specific to DRBD.

Each patch in this series addresses an independent issue, please refer
to the individual commit messages for a description.

Alexey Khoroshilov (1):
drbd: add module_put() on error path in drbd_proc_open()

George Spelvin (1):
idr: document exit conditions on idr_for_each_entry better

Lars Ellenberg (5):
drbd: only fail empty flushes if no good data is reachable
drbd: fix memory leak
drbd: validate resync_after dependency on attach already
drbd: fix drbd epoch write count for ahead/behind mode
drbd: fix if(); found by kbuild test robot

Philipp Reisner (10):
drbd: reset ap_in_flight counter for new connections
drbd: abort start of resync early, if it raced with connection
breakage
drbd: move invalidating the whole bitmap out of after_state ch()
drbd: fix effective error returned when refusing an invalidate
drbd: drop now useless duplicate state request from invalidate
drbd: fix spurious warning about bitmap being locked from detach
drbd: Fix disconnect to keep the peer disk state if connection breaks
during operation
drbd: Fix build error when CONFIG_CRYPTO_HMAC is not set
drbd: fix for deadlock when using automatic split-brain-recovery
drbd: use sched_setscheduler()

drivers/block/drbd/drbd_actlog.c | 2 +-
drivers/block/drbd/drbd_main.c | 7 +++-
drivers/block/drbd/drbd_nl.c | 71 ++++++++++++++++++++----------------
drivers/block/drbd/drbd_proc.c | 10 ++++-
drivers/block/drbd/drbd_receiver.c | 12 +++---
drivers/block/drbd/drbd_req.c | 26 +++++++------
drivers/block/drbd/drbd_req.h | 8 ++++
drivers/block/drbd/drbd_state.c | 28 +++++++-------
drivers/block/drbd/drbd_strings.c | 1 +
drivers/block/drbd/drbd_worker.c | 19 ++++++++--
include/linux/drbd.h | 5 ++-
include/linux/idr.h | 10 +++--
12 files changed, 123 insertions(+), 76 deletions(-)

--
1.7.9.5

2013-03-27 13:09:26

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 16/17] drbd: use sched_setscheduler()

It was unnoticed for some time that assigning to current->policy is
no longer sufficient to set a real time priority for a kernel thread.

Reported-by: Charlie Suffin <[email protected]>
Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_receiver.c | 6 ++++--
include/linux/drbd.h | 2 +-
2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index a75c0b1..0f449bb 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -5257,9 +5257,11 @@ int drbd_asender(struct drbd_thread *thi)
bool ping_timeout_active = false;
struct net_conf *nc;
int ping_timeo, tcp_cork, ping_int;
+ struct sched_param param = { .sched_priority = 2 };

- current->policy = SCHED_RR; /* Make this a realtime task! */
- current->rt_priority = 2; /* more important than all other tasks */
+ rv = sched_setscheduler(current, SCHED_RR, &param);
+ if (rv < 0)
+ conn_err(tconn, "drbd_asender: ERROR set priority, ret=%d\n", rv);

while (get_t_state(thi) == RUNNING) {
drbd_thread_current_set_cpu(thi);
diff --git a/include/linux/drbd.h b/include/linux/drbd.h
index 3163307..1b4d4ee 100644
--- a/include/linux/drbd.h
+++ b/include/linux/drbd.h
@@ -52,7 +52,7 @@
#endif

extern const char *drbd_buildtag(void);
-#define REL_VERSION "8.4.2"
+#define REL_VERSION "8.4.3"
#define API_VERSION 1
#define PRO_VERSION_MIN 86
#define PRO_VERSION_MAX 101
--
1.7.9.5

2013-03-27 13:09:31

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 15/17] drbd: fix for deadlock when using automatic split-brain-recovery

With an automatic after split-brain recovery policy of
"after-sb-1pri call-pri-lost-after-sb",
when trying to drbd_set_role() to R_SECONDARY,
we run into a deadlock.

This was first recognized and supposedly fixed by
2009-06-10 "Fixed a deadlock when using automatic split brain recovery when both nodes are"
replacing drbd_set_role() with drbd_change_state() in that code-path,
but the first hunk of that patch forgets to remove the drbd_set_role().

We apparently only ever tested the "two primaries" case.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_receiver.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 7af0cc7..a75c0b1 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -2662,7 +2662,6 @@ static int drbd_asb_recover_1p(struct drbd_conf *mdev) __must_hold(local)
if (hg == -1 && mdev->state.role == R_PRIMARY) {
enum drbd_state_rv rv2;

- drbd_set_role(mdev, R_SECONDARY, 0);
/* drbd_change_state() does not sleep while in SS_IN_TRANSIENT_STATE,
* we might be here in C_WF_REPORT_PARAMS which is transient.
* we do not need to wait for the after state change work either. */
--
1.7.9.5

2013-03-27 13:09:52

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 17/17] drbd: fix if(); found by kbuild test robot

From: Lars Ellenberg <[email protected]>

Recently introduced al_begin_io_nonblock() was returning -EBUSY,
even when it should return -EWOULDBLOCK.

Impact:
A few spurious wake_up() calls in prepare_al_transaction_nonblock().

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_actlog.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_actlog.c b/drivers/block/drbd/drbd_actlog.c
index 6afe173..6608076 100644
--- a/drivers/block/drbd/drbd_actlog.c
+++ b/drivers/block/drbd/drbd_actlog.c
@@ -389,7 +389,7 @@ int drbd_al_begin_io_nonblock(struct drbd_conf *mdev, struct drbd_interval *i)
if (unlikely(tmp != NULL)) {
struct bm_extent *bm_ext = lc_entry(tmp, struct bm_extent, lce);
if (test_bit(BME_NO_WRITES, &bm_ext->flags)) {
- if (!test_and_set_bit(BME_PRIORITY, &bm_ext->flags));
+ if (!test_and_set_bit(BME_PRIORITY, &bm_ext->flags))
return -EBUSY;
return -EWOULDBLOCK;
}
--
1.7.9.5

2013-03-27 13:09:24

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 02/17] drbd: reset ap_in_flight counter for new connections

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_receiver.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index 1921871..cd172b4 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -849,6 +849,7 @@ int drbd_connected(struct drbd_conf *mdev)
err = drbd_send_current_state(mdev);
clear_bit(USE_DEGR_WFC_T, &mdev->flags);
clear_bit(RESIZE_PENDING, &mdev->flags);
+ atomic_set(&mdev->ap_in_flight, 0);
mod_timer(&mdev->request_timer, jiffies + HZ); /* just start it here. */
return err;
}
--
1.7.9.5

2013-03-27 13:10:20

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 14/17] drbd: add module_put() on error path in drbd_proc_open()

From: Alexey Khoroshilov <[email protected]>

If single_open() fails in drbd_proc_open(), module refcount is left incremented.
The patch adds module_put() on the error path.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <[email protected]>
Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_proc.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/block/drbd/drbd_proc.c b/drivers/block/drbd/drbd_proc.c
index 56672a6..30fe0a5 100644
--- a/drivers/block/drbd/drbd_proc.c
+++ b/drivers/block/drbd/drbd_proc.c
@@ -313,8 +313,14 @@ static int drbd_seq_show(struct seq_file *seq, void *v)

static int drbd_proc_open(struct inode *inode, struct file *file)
{
- if (try_module_get(THIS_MODULE))
- return single_open(file, drbd_seq_show, PDE(inode)->data);
+ int err;
+
+ if (try_module_get(THIS_MODULE)) {
+ err = single_open(file, drbd_seq_show, PDE(inode)->data);
+ if (err)
+ module_put(THIS_MODULE);
+ return err;
+ }
return -ENODEV;
}

--
1.7.9.5

2013-03-27 13:09:23

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 03/17] drbd: abort start of resync early, if it raced with connection breakage

We've seen a spurious full resync, because a connection breakage
raced with drbd_start_resync(, C_SYNC_TARGET),
and the resulting state change request intended to start the resync
ended up looking like a local invalidate.

Fix:
Double check the state inside the lock,
and don't even request that state change,
if we had connection or IO problems.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_worker.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index f41e224..7f51f88 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -1653,7 +1653,9 @@ void drbd_start_resync(struct drbd_conf *mdev, enum drbd_conns side)
clear_bit(B_RS_H_DONE, &mdev->flags);

write_lock_irq(&global_state_lock);
- if (!get_ldev_if_state(mdev, D_NEGOTIATING)) {
+ /* Did some connection breakage or IO error race with us? */
+ if (mdev->state.conn < C_CONNECTED
+ || !get_ldev_if_state(mdev, D_NEGOTIATING)) {
write_unlock_irq(&global_state_lock);
mutex_unlock(mdev->state_mutex);
return;
--
1.7.9.5

2013-03-27 13:10:51

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 13/17] drbd: fix drbd epoch write count for ahead/behind mode

From: Lars Ellenberg <[email protected]>

The sanity check when receiving P_BARRIER_ACK does expect all write
requests with a given req->epoch to have been either all replicated,
or all not replicated.

Because req->epoch was assigned before calling maybe_pull_ahead(),
this expectation was not met, leading to an off-by-one in the sanity
check, and further to a "Protocol Error".

Fix: move the call to maybe_pull_ahead() a few lines up,
and assign req->epoch only after that.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_req.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index beefe65..c24379f 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -865,8 +865,10 @@ static void maybe_pull_ahead(struct drbd_conf *mdev)
bool congested = false;
enum drbd_on_congestion on_congestion;

+ rcu_read_lock();
nc = rcu_dereference(tconn->net_conf);
on_congestion = nc ? nc->on_congestion : OC_BLOCK;
+ rcu_read_unlock();
if (on_congestion == OC_BLOCK ||
tconn->agreed_pro_version < 96)
return;
@@ -960,14 +962,8 @@ static int drbd_process_write_request(struct drbd_request *req)
struct drbd_conf *mdev = req->w.mdev;
int remote, send_oos;

- rcu_read_lock();
remote = drbd_should_do_remote(mdev->state);
- if (remote) {
- maybe_pull_ahead(mdev);
- remote = drbd_should_do_remote(mdev->state);
- }
send_oos = drbd_should_send_out_of_sync(mdev->state);
- rcu_read_unlock();

/* Need to replicate writes. Unless it is an empty flush,
* which is better mapped to a DRBD P_BARRIER packet,
@@ -1087,9 +1083,13 @@ static void drbd_send_and_submit(struct drbd_conf *mdev, struct drbd_request *re
* but will re-aquire it before it returns here.
* Needs to be before the check on drbd_suspended() */
complete_conflicting_writes(req);
+ /* no more giving up req_lock from now on! */
+
+ /* check for congestion, and potentially stop sending
+ * full data updates, but start sending "dirty bits" only. */
+ maybe_pull_ahead(mdev);
}

- /* no more giving up req_lock from now on! */

if (drbd_suspended(mdev)) {
/* push back and retry: */
--
1.7.9.5

2013-03-27 13:11:31

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 12/17] drbd: Fix build error when CONFIG_CRYPTO_HMAC is not set

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_receiver.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c
index cd172b4..7af0cc7 100644
--- a/drivers/block/drbd/drbd_receiver.c
+++ b/drivers/block/drbd/drbd_receiver.c
@@ -4660,8 +4660,8 @@ static int drbd_do_features(struct drbd_tconn *tconn)
#if !defined(CONFIG_CRYPTO_HMAC) && !defined(CONFIG_CRYPTO_HMAC_MODULE)
static int drbd_do_auth(struct drbd_tconn *tconn)
{
- dev_err(DEV, "This kernel was build without CONFIG_CRYPTO_HMAC.\n");
- dev_err(DEV, "You need to disable 'cram-hmac-alg' in drbd.conf.\n");
+ conn_err(tconn, "This kernel was build without CONFIG_CRYPTO_HMAC.\n");
+ conn_err(tconn, "You need to disable 'cram-hmac-alg' in drbd.conf.\n");
return -1;
}
#else
--
1.7.9.5

2013-03-27 13:12:05

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 08/17] drbd: Fix disconnect to keep the peer disk state if connection breaks during operation

The issue was that if the connection broke while we did the
gracefull state change to C_DISCONNECTING (C_TEARDOWN), then
we returned a success code from the state engine. (SS_CW_NO_NEED)

The result of that is that we missed to call the fence-peer
script in such a case.

Fixed that by introducing a new error code (SS_OUTDATE_WO_CONN).
This one should never reach back into user space.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 7 +++++--
drivers/block/drbd/drbd_state.c | 14 +++++++-------
drivers/block/drbd/drbd_strings.c | 1 +
include/linux/drbd.h | 3 ++-
4 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 56bafdc..39e9a91 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2198,8 +2198,11 @@ static enum drbd_state_rv conn_try_disconnect(struct drbd_tconn *tconn, bool for
return SS_SUCCESS;
case SS_PRIMARY_NOP:
/* Our state checking code wants to see the peer outdated. */
- rv = conn_request_state(tconn, NS2(conn, C_DISCONNECTING,
- pdsk, D_OUTDATED), CS_VERBOSE);
+ rv = conn_request_state(tconn, NS2(conn, C_DISCONNECTING, pdsk, D_OUTDATED), 0);
+
+ if (rv == SS_OUTDATE_WO_CONN) /* lost connection before graceful disconnect succeeded */
+ rv = conn_request_state(tconn, NS(conn, C_DISCONNECTING), CS_VERBOSE);
+
break;
case SS_CW_FAILED_BY_PEER:
/* The peer probably wants to see us outdated. */
diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 22e259f..90c5be2 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -642,6 +642,10 @@ is_valid_soft_transition(union drbd_state os, union drbd_state ns, struct drbd_t
&& os.conn < C_WF_REPORT_PARAMS)
rv = SS_NEED_CONNECTION; /* No NetworkFailure -> SyncTarget etc... */

+ if (ns.conn == C_DISCONNECTING && ns.pdsk == D_OUTDATED &&
+ os.conn < C_CONNECTED && os.pdsk > D_OUTDATED)
+ rv = SS_OUTDATE_WO_CONN;
+
return rv;
}

@@ -1748,13 +1752,9 @@ _conn_rq_cond(struct drbd_tconn *tconn, union drbd_state mask, union drbd_state
if (test_and_clear_bit(CONN_WD_ST_CHG_FAIL, &tconn->flags))
return SS_CW_FAILED_BY_PEER;

- rv = tconn->cstate != C_WF_REPORT_PARAMS ? SS_CW_NO_NEED : SS_UNKNOWN_ERROR;
-
- if (rv == SS_UNKNOWN_ERROR)
- rv = conn_is_valid_transition(tconn, mask, val, 0);
-
- if (rv == SS_SUCCESS)
- rv = SS_UNKNOWN_ERROR; /* cont waiting, otherwise fail. */
+ rv = conn_is_valid_transition(tconn, mask, val, 0);
+ if (rv == SS_SUCCESS && tconn->cstate == C_WF_REPORT_PARAMS)
+ rv = SS_UNKNOWN_ERROR; /* continue waiting */

return rv;
}
diff --git a/drivers/block/drbd/drbd_strings.c b/drivers/block/drbd/drbd_strings.c
index 9a664bd..58e08ff 100644
--- a/drivers/block/drbd/drbd_strings.c
+++ b/drivers/block/drbd/drbd_strings.c
@@ -89,6 +89,7 @@ static const char *drbd_state_sw_errors[] = {
[-SS_LOWER_THAN_OUTDATED] = "Disk state is lower than outdated",
[-SS_IN_TRANSIENT_STATE] = "In transient state, retry after next state change",
[-SS_CONCURRENT_ST_CHG] = "Concurrent state changes detected and aborted",
+ [-SS_OUTDATE_WO_CONN] = "Need a connection for a graceful disconnect/outdate peer",
[-SS_O_VOL_PEER_PRI] = "Other vol primary on peer not allowed by config",
};

diff --git a/include/linux/drbd.h b/include/linux/drbd.h
index 0c5a18e..3163307 100644
--- a/include/linux/drbd.h
+++ b/include/linux/drbd.h
@@ -319,7 +319,8 @@ enum drbd_state_rv {
SS_IN_TRANSIENT_STATE = -18, /* Retry after the next state change */
SS_CONCURRENT_ST_CHG = -19, /* Concurrent cluster side state change! */
SS_O_VOL_PEER_PRI = -20,
- SS_AFTER_LAST_ERROR = -21, /* Keep this at bottom */
+ SS_OUTDATE_WO_CONN = -21,
+ SS_AFTER_LAST_ERROR = -22, /* Keep this at bottom */
};

/* from drbd_strings.c */
--
1.7.9.5

2013-03-27 13:12:04

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 09/17] drbd: only fail empty flushes if no good data is reachable

From: Lars Ellenberg <[email protected]>

We completed empty flushes (blkdev_issue_flush()) with IO error
if we lost the local disk, even if we still have an established
replication link to a healthy remote disk.

Fix this to only report errors to upper layers,
if neither local nor remote data is reachable.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_req.c | 12 ++++++++----
drivers/block/drbd/drbd_req.h | 8 ++++++++
2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/block/drbd/drbd_req.c b/drivers/block/drbd/drbd_req.c
index 9f7ff1c..beefe65 100644
--- a/drivers/block/drbd/drbd_req.c
+++ b/drivers/block/drbd/drbd_req.c
@@ -263,8 +263,7 @@ void drbd_req_complete(struct drbd_request *req, struct bio_and_error *m)
else
root = &mdev->read_requests;
drbd_remove_request_interval(root, req);
- } else if (!(s & RQ_POSTPONED))
- D_ASSERT((s & (RQ_NET_MASK & ~RQ_NET_DONE)) == 0);
+ }

/* Before we can signal completion to the upper layers,
* we may need to close the current transfer log epoch.
@@ -755,6 +754,11 @@ int __req_mod(struct drbd_request *req, enum drbd_req_event what,
D_ASSERT(req->rq_state & RQ_NET_PENDING);
mod_rq_state(req, m, RQ_NET_PENDING, RQ_NET_OK|RQ_NET_DONE);
break;
+
+ case QUEUE_AS_DRBD_BARRIER:
+ start_new_tl_epoch(mdev->tconn);
+ mod_rq_state(req, m, 0, RQ_NET_OK|RQ_NET_DONE);
+ break;
};

return rv;
@@ -975,8 +979,8 @@ static int drbd_process_write_request(struct drbd_request *req)
/* The only size==0 bios we expect are empty flushes. */
D_ASSERT(req->master_bio->bi_rw & REQ_FLUSH);
if (remote)
- start_new_tl_epoch(mdev->tconn);
- return 0;
+ _req_mod(req, QUEUE_AS_DRBD_BARRIER);
+ return remote;
}

if (!remote && !send_oos)
diff --git a/drivers/block/drbd/drbd_req.h b/drivers/block/drbd/drbd_req.h
index c08d229..978cb1a 100644
--- a/drivers/block/drbd/drbd_req.h
+++ b/drivers/block/drbd/drbd_req.h
@@ -88,6 +88,14 @@ enum drbd_req_event {
QUEUE_FOR_NET_READ,
QUEUE_FOR_SEND_OOS,

+ /* An empty flush is queued as P_BARRIER,
+ * which will cause it to complete "successfully",
+ * even if the local disk flush failed.
+ *
+ * Just like "real" requests, empty flushes (blkdev_issue_flush()) will
+ * only see an error if neither local nor remote data is reachable. */
+ QUEUE_AS_DRBD_BARRIER,
+
SEND_CANCELED,
SEND_FAILED,
HANDED_OVER_TO_NETWORK,
--
1.7.9.5

2013-03-27 13:12:01

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 10/17] drbd: fix memory leak

From: Lars Ellenberg <[email protected]>

We forgot to free the disk_conf,
so for each attach/detach cycle we leaked 336 bytes.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_main.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 67d2bb3..1b93a726 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2819,6 +2819,7 @@ void drbd_free_bc(struct drbd_backing_dev *ldev)
blkdev_put(ldev->backing_bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
blkdev_put(ldev->md_bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);

+ kfree(ldev->disk_conf);
kfree(ldev);
}

--
1.7.9.5

2013-03-27 13:12:00

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 11/17] drbd: validate resync_after dependency on attach already

From: Lars Ellenberg <[email protected]>

We validated resync_after dependencies, if changed via disk-options.
But we did not validate them when first created via attach.
We also did not check or cleanup dependencies that used to be correct,
but now point to meanwhile removed minor devices.

If the drbd_resync_after_valid() validation in disk-options tried to
follow a dependency chain in this way, this could lead to NULL pointer
dereference.

Validate resync_after settings in drbd_adm_attach() already, as well as
in drbd_adm_disk_opts(), and and only reject dependency loops.
Depending on non-existing disks is allowed and equivalent to no dependency.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 6 ++++++
drivers/block/drbd/drbd_worker.c | 15 ++++++++++++---
2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 39e9a91..9e3f441 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -1381,6 +1381,12 @@ int drbd_adm_attach(struct sk_buff *skb, struct genl_info *info)
goto fail;
}

+ write_lock_irq(&global_state_lock);
+ retcode = drbd_resync_after_valid(mdev, new_disk_conf->resync_after);
+ write_unlock_irq(&global_state_lock);
+ if (retcode != NO_ERROR)
+ goto fail;
+
rcu_read_lock();
nc = rcu_dereference(mdev->tconn->net_conf);
if (nc) {
diff --git a/drivers/block/drbd/drbd_worker.c b/drivers/block/drbd/drbd_worker.c
index 7f51f88..891c0ec 100644
--- a/drivers/block/drbd/drbd_worker.c
+++ b/drivers/block/drbd/drbd_worker.c
@@ -1426,7 +1426,7 @@ static int _drbd_may_sync_now(struct drbd_conf *mdev)
int resync_after;

while (1) {
- if (!odev->ldev)
+ if (!odev->ldev || odev->state.disk == D_DISKLESS)
return 1;
rcu_read_lock();
resync_after = rcu_dereference(odev->ldev->disk_conf)->resync_after;
@@ -1434,7 +1434,7 @@ static int _drbd_may_sync_now(struct drbd_conf *mdev)
if (resync_after == -1)
return 1;
odev = minor_to_mdev(resync_after);
- if (!expect(odev))
+ if (!odev)
return 1;
if ((odev->state.conn >= C_SYNC_SOURCE &&
odev->state.conn <= C_PAUSED_SYNC_T) ||
@@ -1516,7 +1516,7 @@ enum drbd_ret_code drbd_resync_after_valid(struct drbd_conf *mdev, int o_minor)

if (o_minor == -1)
return NO_ERROR;
- if (o_minor < -1 || minor_to_mdev(o_minor) == NULL)
+ if (o_minor < -1 || o_minor > MINORMASK)
return ERR_RESYNC_AFTER;

/* check for loops */
@@ -1525,6 +1525,15 @@ enum drbd_ret_code drbd_resync_after_valid(struct drbd_conf *mdev, int o_minor)
if (odev == mdev)
return ERR_RESYNC_AFTER_CYCLE;

+ /* You are free to depend on diskless, non-existing,
+ * or not yet/no longer existing minors.
+ * We only reject dependency loops.
+ * We cannot follow the dependency chain beyond a detached or
+ * missing minor.
+ */
+ if (!odev || !odev->ldev || odev->state.disk == D_DISKLESS)
+ return NO_ERROR;
+
rcu_read_lock();
resync_after = rcu_dereference(odev->ldev->disk_conf)->resync_after;
rcu_read_unlock();
--
1.7.9.5

2013-03-27 13:13:20

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 07/17] drbd: fix spurious warning about bitmap being locked from detach

Introduced in drbd: always write bitmap on detach,
the bitmap bulk writeout on detach was indicating
it expected exclusive bitmap access.

Where I meant to say: expect no more modifications,
but testing/counting is still allowed.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_main.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index a150b59..67d2bb3 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -3412,8 +3412,12 @@ static int w_go_diskless(struct drbd_work *w, int unused)
* end up here after a failed attach, before ldev was even assigned.
*/
if (mdev->bitmap && mdev->ldev) {
+ /* An interrupted resync or similar is allowed to recounts bits
+ * while we detach.
+ * Any modifications would not be expected anymore, though.
+ */
if (drbd_bitmap_io_from_worker(mdev, drbd_bm_write,
- "detach", BM_LOCKED_MASK)) {
+ "detach", BM_LOCKED_TEST_ALLOWED)) {
if (test_bit(WAS_READ_ERROR, &mdev->flags)) {
drbd_md_set_flag(mdev, MDF_FULL_SYNC);
drbd_md_sync(mdev);
--
1.7.9.5

2013-03-27 13:13:18

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 06/17] drbd: drop now useless duplicate state request from invalidate

Patch best viewed with git diff --ignore-space-change.

Now that we attempt the fallback to local bitmap operation
only when disconnected, we can safely drop the extra "silent"
state request from both invalidate and invalidate-remote.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 62 +++++++++++++++++++-----------------------
1 file changed, 28 insertions(+), 34 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index c49bda7..56bafdc 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2446,26 +2446,19 @@ int drbd_adm_invalidate(struct sk_buff *skb, struct genl_info *info)
wait_event(mdev->misc_wait, !test_bit(BITMAP_IO, &mdev->flags));
drbd_flush_workqueue(mdev);

- retcode = _drbd_request_state(mdev, NS(conn, C_STARTING_SYNC_T), CS_ORDERED);
-
- /* If that did not work, try again,
- * but log failures this time (implicit CS_VERBOSE).
- *
- * If we happen to be C_STANDALONE R_SECONDARY,
- * just change to D_INCONSISTENT, and set all bits in the bitmap.
- * Otherwise, we just fail, to avoid races with the resync handshake.
+ /* If we happen to be C_STANDALONE R_SECONDARY, just change to
+ * D_INCONSISTENT, and set all bits in the bitmap. Otherwise,
+ * try to start a resync handshake as sync target for full sync.
*/
- if (retcode < SS_SUCCESS) {
- if (mdev->state.conn == C_STANDALONE && mdev->state.role == R_SECONDARY) {
- retcode = drbd_request_state(mdev, NS(disk, D_INCONSISTENT));
- if (retcode >= SS_SUCCESS) {
- if (drbd_bitmap_io(mdev, &drbd_bmio_set_n_write,
- "set_n_write from invalidate", BM_LOCKED_MASK))
- retcode = ERR_IO_MD_DISK;
- }
- } else
- retcode = drbd_request_state(mdev, NS(conn, C_STARTING_SYNC_T));
- }
+ if (mdev->state.conn == C_STANDALONE && mdev->state.role == R_SECONDARY) {
+ retcode = drbd_request_state(mdev, NS(disk, D_INCONSISTENT));
+ if (retcode >= SS_SUCCESS) {
+ if (drbd_bitmap_io(mdev, &drbd_bmio_set_n_write,
+ "set_n_write from invalidate", BM_LOCKED_MASK))
+ retcode = ERR_IO_MD_DISK;
+ }
+ } else
+ retcode = drbd_request_state(mdev, NS(conn, C_STARTING_SYNC_T));
drbd_resume_io(mdev);

out:
@@ -2519,21 +2512,22 @@ int drbd_adm_invalidate_peer(struct sk_buff *skb, struct genl_info *info)
wait_event(mdev->misc_wait, !test_bit(BITMAP_IO, &mdev->flags));
drbd_flush_workqueue(mdev);

- retcode = _drbd_request_state(mdev, NS(conn, C_STARTING_SYNC_S), CS_ORDERED);
- if (retcode < SS_SUCCESS) {
- if (mdev->state.conn == C_STANDALONE && mdev->state.role == R_PRIMARY) {
- /* The peer will get a resync upon connect anyways. Just make that
- into a full resync. */
- retcode = drbd_request_state(mdev, NS(pdsk, D_INCONSISTENT));
- if (retcode >= SS_SUCCESS) {
- if (drbd_bitmap_io(mdev, &drbd_bmio_set_susp_al,
- "set_n_write from invalidate_peer",
- BM_LOCKED_SET_ALLOWED))
- retcode = ERR_IO_MD_DISK;
- }
- } else
- retcode = drbd_request_state(mdev, NS(conn, C_STARTING_SYNC_S));
- }
+ /* If we happen to be C_STANDALONE R_PRIMARY, just set all bits
+ * in the bitmap. Otherwise, try to start a resync handshake
+ * as sync source for full sync.
+ */
+ if (mdev->state.conn == C_STANDALONE && mdev->state.role == R_PRIMARY) {
+ /* The peer will get a resync upon connect anyways. Just make that
+ into a full resync. */
+ retcode = drbd_request_state(mdev, NS(pdsk, D_INCONSISTENT));
+ if (retcode >= SS_SUCCESS) {
+ if (drbd_bitmap_io(mdev, &drbd_bmio_set_susp_al,
+ "set_n_write from invalidate_peer",
+ BM_LOCKED_SET_ALLOWED))
+ retcode = ERR_IO_MD_DISK;
+ }
+ } else
+ retcode = drbd_request_state(mdev, NS(conn, C_STARTING_SYNC_S));
drbd_resume_io(mdev);

out:
--
1.7.9.5

2013-03-27 13:13:48

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 05/17] drbd: fix effective error returned when refusing an invalidate

Since commit
drbd: Disallow the peer_disk_state to be D_OUTDATED while connected
trying to invalidate a disconnected Primary returned an error code
that did not really match the situation:
"Refusing to be Outdated while Connected"

Insert two more specific conditions into is_valid_state(),
changing that to "Need access to UpToDate data",
respectively "Need a connection to start verify or resync".

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_state.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 3bc686f..22e259f 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -570,6 +570,13 @@ is_valid_state(struct drbd_conf *mdev, union drbd_state ns)
mdev->tconn->agreed_pro_version < 88)
rv = SS_NOT_SUPPORTED;

+ else if (ns.role == R_PRIMARY && ns.disk < D_UP_TO_DATE && ns.pdsk < D_UP_TO_DATE)
+ rv = SS_NO_UP_TO_DATE_DISK;
+
+ else if ((ns.conn == C_STARTING_SYNC_S || ns.conn == C_STARTING_SYNC_T) &&
+ ns.pdsk == D_UNKNOWN)
+ rv = SS_NEED_CONNECTION;
+
else if (ns.conn >= C_CONNECTED && ns.pdsk == D_UNKNOWN)
rv = SS_CONNECTED_OUTDATES;

--
1.7.9.5

2013-03-27 13:14:12

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 01/17] idr: document exit conditions on idr_for_each_entry better

From: George Spelvin <[email protected]>

And some manual common subexpression elimination which may help the
compiler produce smaller code.

Signed-off-by: George Spelvin <[email protected]>
Signed-off-by: Philipp Reisner <[email protected]>
---
include/linux/idr.h | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 2640c7e..6ece058 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -122,11 +122,13 @@ static inline void *idr_find(struct idr *idr, int id)
* @idp: idr handle
* @entry: the type * to use as cursor
* @id: id entry's key
+ *
+ * @entry and @id do not need to be initialized before the loop, and
+ * after normal terminatinon @entry is left with the value NULL. This
+ * is convenient for a "not found" value.
*/
-#define idr_for_each_entry(idp, entry, id) \
- for (id = 0, entry = (typeof(entry))idr_get_next((idp), &(id)); \
- entry != NULL; \
- ++id, entry = (typeof(entry))idr_get_next((idp), &(id)))
+#define idr_for_each_entry(idp, entry, id) \
+ for (id = 0; ((entry) = idr_get_next(idp, &(id))) != NULL; ++id)

/*
* Don't use the following functions. These exist only to suppress
--
1.7.9.5

2013-03-27 13:14:10

by Philipp Reisner

[permalink] [raw]

Subject: [PATCH 04/17] drbd: move invalidating the whole bitmap out of after_state ch()

To avoid other state change requests, after passing through
sanitize_state(), to be mistaken for an invalidate,
move the "set all bits as out-of-sync" into the invalidate path.

Make invalidate and invalidate-remote behave consistently wrt.
current connection state (need either an established replication link,
or really be disconnected). Also mention that in the documentation.

Signed-off-by: Philipp Reisner <[email protected]>
Signed-off-by: Lars Ellenberg <[email protected]>
---
drivers/block/drbd/drbd_nl.c | 36 ++++++++++++++++++++----------------
drivers/block/drbd/drbd_state.c | 7 -------
2 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 42fda4a..c49bda7 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2448,19 +2448,23 @@ int drbd_adm_invalidate(struct sk_buff *skb, struct genl_info *info)

retcode = _drbd_request_state(mdev, NS(conn, C_STARTING_SYNC_T), CS_ORDERED);

- if (retcode < SS_SUCCESS && retcode != SS_NEED_CONNECTION)
- retcode = drbd_request_state(mdev, NS(conn, C_STARTING_SYNC_T));
-
- while (retcode == SS_NEED_CONNECTION) {
- spin_lock_irq(&mdev->tconn->req_lock);
- if (mdev->state.conn < C_CONNECTED)
- retcode = _drbd_set_state(_NS(mdev, disk, D_INCONSISTENT), CS_VERBOSE, NULL);
- spin_unlock_irq(&mdev->tconn->req_lock);
-
- if (retcode != SS_NEED_CONNECTION)
- break;
-
- retcode = drbd_request_state(mdev, NS(conn, C_STARTING_SYNC_T));
+ /* If that did not work, try again,
+ * but log failures this time (implicit CS_VERBOSE).
+ *
+ * If we happen to be C_STANDALONE R_SECONDARY,
+ * just change to D_INCONSISTENT, and set all bits in the bitmap.
+ * Otherwise, we just fail, to avoid races with the resync handshake.
+ */
+ if (retcode < SS_SUCCESS) {
+ if (mdev->state.conn == C_STANDALONE && mdev->state.role == R_SECONDARY) {
+ retcode = drbd_request_state(mdev, NS(disk, D_INCONSISTENT));
+ if (retcode >= SS_SUCCESS) {
+ if (drbd_bitmap_io(mdev, &drbd_bmio_set_n_write,
+ "set_n_write from invalidate", BM_LOCKED_MASK))
+ retcode = ERR_IO_MD_DISK;
+ }
+ } else
+ retcode = drbd_request_state(mdev, NS(conn, C_STARTING_SYNC_T));
}
drbd_resume_io(mdev);

@@ -2517,9 +2521,9 @@ int drbd_adm_invalidate_peer(struct sk_buff *skb, struct genl_info *info)

retcode = _drbd_request_state(mdev, NS(conn, C_STARTING_SYNC_S), CS_ORDERED);
if (retcode < SS_SUCCESS) {
- if (retcode == SS_NEED_CONNECTION && mdev->state.role == R_PRIMARY) {
- /* The peer will get a resync upon connect anyways.
- * Just make that into a full resync. */
+ if (mdev->state.conn == C_STANDALONE && mdev->state.role == R_PRIMARY) {
+ /* The peer will get a resync upon connect anyways. Just make that
+ into a full resync. */
retcode = drbd_request_state(mdev, NS(pdsk, D_INCONSISTENT));
if (retcode >= SS_SUCCESS) {
if (drbd_bitmap_io(mdev, &drbd_bmio_set_susp_al,
diff --git a/drivers/block/drbd/drbd_state.c b/drivers/block/drbd/drbd_state.c
index 0fe220c..3bc686f 100644
--- a/drivers/block/drbd/drbd_state.c
+++ b/drivers/block/drbd/drbd_state.c
@@ -1377,13 +1377,6 @@ static void after_state_ch(struct drbd_conf *mdev, union drbd_state os,
&drbd_bmio_set_n_write, &abw_start_sync,
"set_n_write from StartingSync", BM_LOCKED_TEST_ALLOWED);

- /* We are invalidating our self... */
- if (os.conn < C_CONNECTED && ns.conn < C_CONNECTED &&
- os.disk > D_INCONSISTENT && ns.disk == D_INCONSISTENT)
- /* other bitmap operation expected during this phase */
- drbd_queue_bitmap_io(mdev, &drbd_bmio_set_n_write, NULL,
- "set_n_write from invalidate", BM_LOCKED_MASK);
-
/* first half of local IO error, failure to attach,
* or administrative detach */
if (os.disk != D_FAILED && ns.disk == D_FAILED) {
--
1.7.9.5

2013-03-28 16:11:22

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH 00/17] RFC: Pile of DRBD fixes

On Wed, Mar 27 2013, Philipp Reisner wrote:
> The first patch improves the idr_for_each_entry() macro and its
> documentation, which is not specific to DRBD.
>
> Each patch in this series addresses an independent issue, please refer
> to the individual commit messages for a description.

Applied to for-3.10/drivers, thanks.

--
Jens Axboe