I plan to submit this to Linus tomorrow, hopefully for 3.7, unless there are
complaints. It is in my for-next branch now.
Thanks,
NeilBrown
From aec6f821ed92fac5ae4f1db50279a3999de5872a Mon Sep 17 00:00:00 2001
From: NeilBrown <[email protected]>
Date: Thu, 2 Oct 2014 13:45:00 +1000
Subject: [PATCH] md/raid5: disable 'DISCARD' by default due to safety
concerns.
It has come to my attention (thanks Martin) that 'discard_zeroes_data'
is only a hint. Some devices in some cases don't do what it
says on the label.
The use of DISCARD in RAID5 depends on reads from discarded regions
being predictably zero. If a write to a previously discarded region
performs a read-modify-write cycle it assumes that the parity block
was consistent with the data blocks. If all were zero, this would
be the case. If some are and some aren't this would not be the case.
This could lead to data corruption after a device failure when
data needs to be reconstructed from the parity.
As we cannot trust 'discard_zeroes_data', ignore it by default
and so disallow DISCARD on all raid4/5/6 arrays.
As many devices are trustworthy, and as there are benefits to using
DISCARD, add a module parameter to over-ride this caution and cause
DISCARD to work if discard_zeroes_data is set.
If a site want to enable DISCARD on some arrays but not on others they
should select DISCARD support at the filesystem level, and set the
raid456 module parameter.
raid456.devices_handle_discard_safely=Y
As this is a data-safety issue, I believe this patch is suitable for
-stable.
DISCARD support for RAID456 was added in 3.7
Cc: Shaohua Li <[email protected]>
Cc: "Martin K. Petersen" <[email protected]>
Cc: Mike Snitzer <[email protected]>
Cc: Heinz Mauelshagen <[email protected]>
Cc: [email protected] (3.7+)
Fixes: 620125f2bf8ff0c4969b79653b54d7bcc9d40637
Signed-off-by: NeilBrown <[email protected]>
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 183588b11fc1..9f0fbecd1eb5 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -64,6 +64,10 @@
#define cpu_to_group(cpu) cpu_to_node(cpu)
#define ANY_GROUP NUMA_NO_NODE
+static bool devices_handle_discard_safely = false;
+module_param(devices_handle_discard_safely, bool, 0644);
+MODULE_PARM_DESC(devices_handle_discard_safely,
+ "Set to Y if all devices in each array reliably return zeroes on reads from discarded regions");
static struct workqueue_struct *raid5_wq;
/*
* Stripe cache
@@ -6208,7 +6212,7 @@ static int run(struct mddev *mddev)
mddev->queue->limits.discard_granularity = stripe;
/*
* unaligned part of discard request will be ignored, so can't
- * guarantee discard_zerors_data
+ * guarantee discard_zeroes_data
*/
mddev->queue->limits.discard_zeroes_data = 0;
@@ -6233,6 +6237,18 @@ static int run(struct mddev *mddev)
!bdev_get_queue(rdev->bdev)->
limits.discard_zeroes_data)
discard_supported = false;
+ /* Unfortunately, discard_zeroes_data is not currently
+ * a guarantee - just a hint. So we only allow DISCARD
+ * if the sysadmin has confirmed that only safe devices
+ * are in use by setting a module parameter.
+ */
+ if (!devices_handle_discard_safely) {
+ if (discard_supported) {
+ pr_info("md/raid456: discard support disabled due to uncertainty.\n");
+ pr_info("Set raid456.devices_handle_discard_safely=Y to override.\n");
+ }
+ discard_supported = false;
+ }
}
if (discard_supported &&
On Thu, Oct 02 2014 at 12:04am -0400,
NeilBrown <[email protected]> wrote:
>
> I plan to submit this to Linus tomorrow, hopefully for 3.7, unless there are
> complaints. It is in my for-next branch now.
>
> Thanks,
> NeilBrown
>
>
> From aec6f821ed92fac5ae4f1db50279a3999de5872a Mon Sep 17 00:00:00 2001
> From: NeilBrown <[email protected]>
> Date: Thu, 2 Oct 2014 13:45:00 +1000
> Subject: [PATCH] md/raid5: disable 'DISCARD' by default due to safety
> concerns.
>
> It has come to my attention (thanks Martin) that 'discard_zeroes_data'
> is only a hint. Some devices in some cases don't do what it
> says on the label.
>
> The use of DISCARD in RAID5 depends on reads from discarded regions
> being predictably zero. If a write to a previously discarded region
> performs a read-modify-write cycle it assumes that the parity block
> was consistent with the data blocks. If all were zero, this would
> be the case. If some are and some aren't this would not be the case.
> This could lead to data corruption after a device failure when
> data needs to be reconstructed from the parity.
>
> As we cannot trust 'discard_zeroes_data', ignore it by default
> and so disallow DISCARD on all raid4/5/6 arrays.
>
> As many devices are trustworthy, and as there are benefits to using
> DISCARD, add a module parameter to over-ride this caution and cause
> DISCARD to work if discard_zeroes_data is set.
>
> If a site want to enable DISCARD on some arrays but not on others they
> should select DISCARD support at the filesystem level, and set the
> raid456 module parameter.
> raid456.devices_handle_discard_safely=Y
>
> As this is a data-safety issue, I believe this patch is suitable for
> -stable.
> DISCARD support for RAID456 was added in 3.7
>
> Cc: Shaohua Li <[email protected]>
> Cc: "Martin K. Petersen" <[email protected]>
> Cc: Mike Snitzer <[email protected]>
> Cc: Heinz Mauelshagen <[email protected]>
> Cc: [email protected] (3.7+)
> Fixes: 620125f2bf8ff0c4969b79653b54d7bcc9d40637
> Signed-off-by: NeilBrown <[email protected]>
Acked-by: Mike Snitzer <[email protected]>
Thanks Neil, I'll get something comparable staged for dm-raid for 3.18
(less urgency given that dm-raid doesn't yet support discards).
On Thu, Oct 02 2014 at 9:52am -0400,
Mike Snitzer <[email protected]> wrote:
> Thanks Neil, I'll get something comparable staged for dm-raid for 3.18
> (less urgency given that dm-raid doesn't yet support discards).
FYI, I've staged the following dm-raid change in linux-next for 3.18:
https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=for-next&id=bfc3f335f9886b68d281240aab751b96259d8860
>>>>> "Neil" == NeilBrown <[email protected]> writes:
Neil> I plan to submit this to Linus tomorrow, hopefully for 3.7, unless
Neil> there are complaints. It is in my for-next branch now.
Acked-by: Martin K. Petersen <[email protected]>
--
Martin K. Petersen Oracle Linux Engineering