(I originally reported this problem here:
https://bugzilla.kernel.org/show_bug.cgi?id=200439)
When I updated from 4.14 to 4.16, my LVM snapshotting script broke for
no apparent reason.
My script has the following line, and it fails like this:
+ lvcreate --size 5G --snapshot --name snap0 --permission r
/dev/mapper/vg0-lvol_rootfs
device-mapper: create ioctl on
vg0-snap0-cowLVM-sDdIeh9cecWdaNyRfZC31mxgfwTa4sOeHMJXVOykGVRtfP6Aii7IHvwS066AOLOM-cow
failed: Device or resource busy
Failed to lock logical volume vg0/lvol_rootfs.
Aborting. Manual intervention required.
At the same time, some errors appear in dmesg as well:
[ 26.145279] generic_make_request: Trying to write to read-only
block-device dm-3 (partno 0)
[ 26.145288] device-mapper: persistent snapshot: write_header failed
[ 26.145847] device-mapper: table: 253:4: snapshot: Failed to read
snapshot metadata
[ 26.145851] device-mapper: ioctl: error adding target to table
I bisected the vanilla kernel, and the first bad commit is
[721c7fc701c71f693307d274d2b346a1ecd4a534] block: fail op_is_write()
requests to read-only partitions
On Thu, Aug 2, 2018 at 2:26 PM WGH <[email protected]> wrote:
>
> (I originally reported this problem here:
> https://bugzilla.kernel.org/show_bug.cgi?id=200439)
>
> When I updated from 4.14 to 4.16, my LVM snapshotting script broke for
> no apparent reason.
>
> My script has the following line, and it fails like this:
> + lvcreate --size 5G --snapshot --name snap0 --permission r
> /dev/mapper/vg0-lvol_rootfs
> device-mapper: create ioctl on
> vg0-snap0-cowLVM-sDdIeh9cecWdaNyRfZC31mxgfwTa4sOeHMJXVOykGVRtfP6Aii7IHvwS066AOLOM-cow
> failed: Device or resource busy
> Failed to lock logical volume vg0/lvol_rootfs.
> Aborting. Manual intervention required.
>
> At the same time, some errors appear in dmesg as well:
> [ 26.145279] generic_make_request: Trying to write to read-only
> block-device dm-3 (partno 0)
> [ 26.145288] device-mapper: persistent snapshot: write_header failed
> [ 26.145847] device-mapper: table: 253:4: snapshot: Failed to read
> snapshot metadata
> [ 26.145851] device-mapper: ioctl: error adding target to table
>
> I bisected the vanilla kernel, and the first bad commit is
> [721c7fc701c71f693307d274d2b346a1ecd4a534] block: fail op_is_write()
> requests to read-only partitions
Adding Mike and dm-devel.
From a quick look, --permission r sets DM_READONLY_FLAG, which makes dm
mark the disk read-only with set_disk_ro(dm_disk(md), 1) in do_resume().
A bit later it tries to write to the disk from write_header():
return chunk_io(ps, ps->header_area, 0, REQ_OP_WRITE, 0, 1);
Thanks,
Ilya
On 08/02/2018 04:31 PM, Ilya Dryomov wrote:
> On Thu, Aug 2, 2018 at 2:26 PM WGH <[email protected]> wrote:
>> (I originally reported this problem here:
>> https://bugzilla.kernel.org/show_bug.cgi?id=200439)
>>
>> When I updated from 4.14 to 4.16, my LVM snapshotting script broke for
>> no apparent reason.
>>
>> My script has the following line, and it fails like this:
>> + lvcreate --size 5G --snapshot --name snap0 --permission r
>> /dev/mapper/vg0-lvol_rootfs
>> device-mapper: create ioctl on
>> vg0-snap0-cowLVM-sDdIeh9cecWdaNyRfZC31mxgfwTa4sOeHMJXVOykGVRtfP6Aii7IHvwS066AOLOM-cow
>> failed: Device or resource busy
>> Failed to lock logical volume vg0/lvol_rootfs.
>> Aborting. Manual intervention required.
>>
>> At the same time, some errors appear in dmesg as well:
>> [ 26.145279] generic_make_request: Trying to write to read-only
>> block-device dm-3 (partno 0)
>> [ 26.145288] device-mapper: persistent snapshot: write_header failed
>> [ 26.145847] device-mapper: table: 253:4: snapshot: Failed to read
>> snapshot metadata
>> [ 26.145851] device-mapper: ioctl: error adding target to table
>>
>> I bisected the vanilla kernel, and the first bad commit is
>> [721c7fc701c71f693307d274d2b346a1ecd4a534] block: fail op_is_write()
>> requests to read-only partitions
> Adding Mike and dm-devel.
>
> From a quick look, --permission r sets DM_READONLY_FLAG, which makes dm
> mark the disk read-only with set_disk_ro(dm_disk(md), 1) in do_resume().
> A bit later it tries to write to the disk from write_header():
>
> return chunk_io(ps, ps->header_area, 0, REQ_OP_WRITE, 0, 1);
>
> Thanks,
>
> Ilya
After further investigation, this was fixed on lvm2 side (userspace) in
https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3
(snapshot: keep COW writable for read-only volumes).
So I guess that's it. Time to poke my distribution package maintainers
to bump the package version.
On Thu, Aug 2, 2018 at 8:16 AM WGH <[email protected]> wrote:
>
> On 08/02/2018 04:31 PM, Ilya Dryomov wrote:
> >
> > From a quick look, --permission r sets DM_READONLY_FLAG, which makes dm
> > mark the disk read-only with set_disk_ro(dm_disk(md), 1) in do_resume().
> > A bit later it tries to write to the disk from write_header():
> >
> > return chunk_io(ps, ps->header_area, 0, REQ_OP_WRITE, 0, 1);
> >
> > Thanks,
> >
> > Ilya
>
> After further investigation, this was fixed on lvm2 side (userspace) in
> https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3
> (snapshot: keep COW writable for read-only volumes).
>
> So I guess that's it. Time to poke my distribution package maintainers
> to bump the package version.
That is *not* how kernel development is supposed to work. If your
script used to work, it should continue to work.
Why did it use to work despite that read-only flag? And what was it
that actually broke this?
We remain bug-for-bug compatible with older kernel versions when
people depend on the bugs. Unless the old bugs are security issues,
and even then we try to make it _look_ like we work the same way.
Or was it a user-space lvm tool that broke in the first place, and the
kernel release update was a red herring?
Linus
On Thu, Aug 2, 2018 at 6:41 PM Linus Torvalds
<[email protected]> wrote:
>
> On Thu, Aug 2, 2018 at 8:16 AM WGH <[email protected]> wrote:
> >
> > On 08/02/2018 04:31 PM, Ilya Dryomov wrote:
> > >
> > > From a quick look, --permission r sets DM_READONLY_FLAG, which makes dm
> > > mark the disk read-only with set_disk_ro(dm_disk(md), 1) in do_resume().
> > > A bit later it tries to write to the disk from write_header():
> > >
> > > return chunk_io(ps, ps->header_area, 0, REQ_OP_WRITE, 0, 1);
> > >
> > > Thanks,
> > >
> > > Ilya
> >
> > After further investigation, this was fixed on lvm2 side (userspace) in
> > https://sourceware.org/git/?p=lvm2.git;a=commit;h=a6fdb9d9d70f51c49ad11a87ab4243344e6701a3
> > (snapshot: keep COW writable for read-only volumes).
> >
> > So I guess that's it. Time to poke my distribution package maintainers
> > to bump the package version.
>
> That is *not* how kernel development is supposed to work. If your
> script used to work, it should continue to work.
>
> Why did it use to work despite that read-only flag? And what was it
> that actually broke this?
I think it was my commit 721c7fc701c7 ("block: fail op_is_write()
requests to read-only partitions"). The block layer was previously
allowing non-blkdev_write_iter() writes to read-only disks and
partitions. dm's chunk_io() boils down to submit_bio(), so in 4.16
it started to fail if the underlying device was marked read-only.
>
> We remain bug-for-bug compatible with older kernel versions when
> people depend on the bugs. Unless the old bugs are security issues,
> and even then we try to make it _look_ like we work the same way.
>
> Or was it a user-space lvm tool that broke in the first place, and the
> kernel release update was a red herring?
Apparently at least some versions of lvm(8) instruct dm to mark the COW
device read-only even though it is always written to. It comes up only
if the user asks for a read-only snapshot (the default is writable).
One option might be to ignore the supplied DM_READONLY_FLAG for COW
devices. Marking the COW device (i.e. the exception store) read-only
is probably never sane...
Thanks,
Ilya
On Thu, Aug 2, 2018 at 11:18 AM Ilya Dryomov <[email protected]> wrote:
>
> I think it was my commit 721c7fc701c7 ("block: fail op_is_write()
> requests to read-only partitions"). The block layer was previously
> allowing non-blkdev_write_iter() writes to read-only disks and
> partitions. dm's chunk_io() boils down to submit_bio(), so in 4.16
> it started to fail if the underlying device was marked read-only.
Ugh. That commit does look like the obviously right thing to do, and
our old behavior was obviously wrong.
At the same time, we really really shouldn't break user flow, even if
that flow was due to an obvious bug.
Dang.
> Apparently at least some versions of lvm(8) instruct dm to mark the COW
> device read-only even though it is always written to. It comes up only
> if the user asks for a read-only snapshot (the default is writable).
>
> One option might be to ignore the supplied DM_READONLY_FLAG for COW
> devices. Marking the COW device (i.e. the exception store) read-only
> is probably never sane...
That sounds like possibly the sanest fixlet for this - perhaps
together with a warning message saying "ignoring DM_READONLY_FLAG for
COW device" so that people see that they are doing insane things.
But some of our internal DM_READONLY_FLAG use obviously comes from
other sources that we have to honor (ie some grepping shows me code
like
if (get_disk_ro(disk))
param->flags |= DM_READONLY_FLAG;
where we obviously can *not* override the get_disk_ro() state. So it
depends a lot on how this all happened. But it looks like in this
particular case, it all came from one inconsistent lvmcreate command.
Or - if there really is only _one_ user that noticed this, and a new
lvm2 release fixes his problem, maybe we can say "we broke it, but the
only person who reported it can work around it".
WGH (sorry, no idea what your real name is) - what's the source of the
script that broke? Was it some system script you got from outside and
likely to affect others too?
Or was it just some local thing you wrote yourself and was
unintentionally buggy and nobody else is likely to hit this?
Because if the latter, if you can work around it and you're the only
user this hits, we might just be able to ignore it.
Linus
On 08/02/2018 09:32 PM, Linus Torvalds wrote:
> WGH (sorry, no idea what your real name is) - what's the source of the
> script that broke? Was it some system script you got from outside and
> likely to affect others too?
>
> Or was it just some local thing you wrote yourself and was
> unintentionally buggy and nobody else is likely to hit this?
>
> Because if the latter, if you can work around it and you're the only
> user this hits, we might just be able to ignore it.
The script in question is written by me and contains literally two lines:
lvcreate --size 5G --snapshot --name snap0 --permission r
/dev/mapper/vg0-lvol_rootfs
mount /dev/mapper/vg0-snap0 /mnt/rootfs_snapshot
The script is not buggy (I think), it was written under simple
assumption that --permission r works. And it does - unless you happen to
have combination of kernel >=4.16 and lvm2 <2.02.178.
The commit in question appeared only in 4.16, and this kernel version is
not widespread yet. You have to be running both recent kernel and
not-so-recent lvm2 to be bitten by this. This probably explains why no
one else reported this problem.
Workaround certainly exists: I can just create read-write snapshot, but
mount it read-only. The reason why I didn't discover the workaround
earlier is that after unsuccessful read-only snapshot creation system
ends up in some weird state where read-write snapshots are also failing
with the same error (until reboot). So I got a wrong impression that
read-write snapshots were broken as well.
On 08/03/2018 12:32 AM, WGH wrote:
> On 08/02/2018 09:32 PM, Linus Torvalds wrote:
>> WGH (sorry, no idea what your real name is) - what's the source of the
>> script that broke? Was it some system script you got from outside and
>> likely to affect others too?
>>
>> Or was it just some local thing you wrote yourself and was
>> unintentionally buggy and nobody else is likely to hit this?
>>
>> Because if the latter, if you can work around it and you're the only
>> user this hits, we might just be able to ignore it.
> The script in question is written by me and contains literally two lines:
>
> lvcreate --size 5G --snapshot --name snap0 --permission r
> /dev/mapper/vg0-lvol_rootfs
> mount /dev/mapper/vg0-snap0 /mnt/rootfs_snapshot
>
> The script is not buggy (I think), it was written under simple
> assumption that --permission r works. And it does - unless you happen to
> have combination of kernel >=4.16 and lvm2 <2.02.178.
>
> The commit in question appeared only in 4.16, and this kernel version is
> not widespread yet. You have to be running both recent kernel and
> not-so-recent lvm2 to be bitten by this. This probably explains why no
> one else reported this problem.
>
> Workaround certainly exists: I can just create read-write snapshot, but
> mount it read-only. The reason why I didn't discover the workaround
> earlier is that after unsuccessful read-only snapshot creation system
> ends up in some weird state where read-write snapshots are also failing
> with the same error (until reboot). So I got a wrong impression that
> read-write snapshots were broken as well.
>
I've just found one public report of this bug, though:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442
On Thu, Aug 2, 2018 at 2:39 PM WGH <[email protected]> wrote:
>
> I've just found one public report of this bug, though:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442
Yeah, it does sound like we should fix this issue.
Ilya?
Linus
On Thu, Aug 02 2018 at 5:52pm -0400,
Linus Torvalds <[email protected]> wrote:
> On Thu, Aug 2, 2018 at 2:39 PM WGH <[email protected]> wrote:
> >
> > I've just found one public report of this bug, though:
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442
>
> Yeah, it does sound like we should fix this issue.
Debian is notorious for having a stale and/or custom lvm2.
Generally speaking, it is recommended that lvm2 not be older than the
kernel (but the opposite is fine).
Given the narrow scope of users impacted (read-only snapshot creation is
very niche, certainly not the default): I don't think it is worth making
fiddly changes in dm-snapshot to warn the user and override the
permissions for the cow device.
Mike
Dne 2.8.2018 v 23:52 Linus Torvalds napsal(a):
> On Thu, Aug 2, 2018 at 2:39 PM WGH <[email protected]> wrote:
>>
>> I've just found one public report of this bug, though:
>> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900442
>
> Yeah, it does sound like we should fix this issue.
>
Hi
IMHO (as the author of fixing lvm2 patch) user should not be upgrading kernels
and keep running older lvm2 user-land tool (and there are very good reasons
for this).
Kernel had a bug which has been fixed, lvm2 misused this kernel bug and was
also fixed. Keeping kernel bug present allowing certain device to write to
read-only devices can be possibly seen as some security risk.
Also the number of users who ever create a read-only snapshot is probably very
low.
Maybe there could be some 'dm-snapshot' loading modinfo option to allowing
to create in case user really wants to have this bug being present in kernel
(reinstantiate old buggy logic), but on default the user should get error when
it tries to write to read-only volume and should upgrade lvm2.
Regards
Zdenek
On Fri, Aug 03, 2018 at 09:31:03AM -0400, Mike Snitzer wrote:
>
> Debian is notorious for having a stale and/or custom lvm2.
> Generally speaking, it is recommended that lvm2 not be older than the
> kernel (but the opposite is fine).
On Fri, Aug 03, 2018 at 03:31:18PM +0200, Zdenek Kabelac wrote:
> IMHO (as the author of fixing lvm2 patch) user should not be upgrading
> kernels and keep running older lvm2 user-land tool (and there are very good
> reasons for this).
I'm going to have to strenuously disagree.
In *general* it's quite common for users to update their kernel
without updating their userspace. For example, I as a *developer*, I
am often running bleeding kernels (e.g., at the moment I am running
something based on 4.18-rc6 on a Debian testing system; and it's not
at all uncommon for users to run a newer kernel on older
distribution).
This is the *first* I've heard that I should be continuously updating
lvm because I'm running bleeding edge kernels --- and I would claim
that this is entirely unreasonable.
I'll also note that very often users will update kernels while running
distribution userspace. And if you are using Linode, very often
*Linode* will offer a newer kernel to better take advantage of the
Linode VM, and this is done without needing to install the Linode
kernel into the userspace.
It *used* to be the case that users running RHEL 2 or RHEL 3 could try
updating to the latest upstream kernel, and everything would break and
fall apart. This was universally considered to be a failure, and a
Bad Thing. So if LVM2 is not backwards compatible, and breaks in the
face of newer kernels running older distributions, that is a bug.
If there is a fundamental bug in the userspace API, and it can't be
fixed without a serious security bug, sometimes we need to have an
exception to the "you can't mandate newer userspace" rule. But I
don't think this falls into this category; how would a user "exploit"
what people are calling a "security bug" to break root?
- Ted
[ Dammit. I haven't had to shout and curse at people for a while, but
this is ABSOLUTELY THE MOST IMPORTANT THING IN THE UNIVERSE WHEN IT
COMES TO SOFTWARE DEVELOPMENT ]
On Fri, Aug 3, 2018 at 6:31 AM Zdenek Kabelac <[email protected]> wrote:
>
> IMHO (as the author of fixing lvm2 patch) user should not be upgrading kernels
> and keep running older lvm2 user-land tool (and there are very good reasons
> for this).
Yeah, HELL NO!
Guess what? You're wrong. YOU ARE MISSING THE #1 KERNEL RULE.
We do not regress, and we do not regress exactly because your are 100% wrong.
And the reason you state for your opinion is in fact exactly *WHY* you
are wrong.
Your "good reasons" are pure and utter garbage.
The whole point of "we do not regress" is so that people can upgrade
the kernel and never have to worry about it.
> Kernel had a bug which has been fixed
That is *ENTIRELY* immaterial.
Guys, whether something was buggy or not DOES NOT MATTER.
Why?
Bugs happen. That's a fact of life. Arguing that "we had to break
something because we were fixing a bug" is completely insane. We fix
tens of bugs every single day, thinking that "fixing a bug" means that
we can break something is simply NOT TRUE.
So bugs simply aren't even relevant to the discussion. They happen,
they get found, they get fixed, and it has nothing to do with "we
break users".
Because the only thing that matters IS THE USER.
How hard is that to understand?
Anybody who uses "but it was buggy" as an argument is entirely missing
the point. As far as the USER was concerned, it wasn't buggy - it
worked for him/her.
Maybe it worked *because* the user had taken the bug into account,
maybe it worked because the user didn't notice - again, it doesn't
matter. It worked for the user.
Breaking a user workflow for a "bug" is absolutely the WORST reason
for breakage you can imagine.
It's basically saying "I took something that worked, and I broke it,
but now it's better". Do you not see how f*cking insane that statement
is?
And without users, your program is not a program, it's a pointless
piece of code that you might as well throw away.
Seriously. This is *why* the #1 rule for kernel development is "we
don't break users". Because "I fixed a bug" is absolutely NOT AN
ARGUMENT if that bug fix broke a user setup. You actually introduced a
MUCH BIGGER bug by "fixing" something that the user clearly didn't
even care about.
And dammit, we upgrade the kernel ALL THE TIME without upgrading any
other programs at all. It is absolutely required, because flag-days
and dependencies are horribly bad.
And it is also required simply because I as a kernel developer do not
upgrade random other tools that I don't even care about as I develop
the kernel, and I want any of my users to feel safe doing the same
time.
So no. Your rule is COMPLETELY wrong. If you cannot upgrade a kernel
without upgrading some other random binary, then we have a problem.
Linus
On Fri, Aug 03 2018 at 11:20am -0400,
Theodore Y. Ts'o <[email protected]> wrote:
> On Fri, Aug 03, 2018 at 09:31:03AM -0400, Mike Snitzer wrote:
> >
> > Debian is notorious for having a stale and/or custom lvm2.
> > Generally speaking, it is recommended that lvm2 not be older than the
> > kernel (but the opposite is fine).
>
> On Fri, Aug 03, 2018 at 03:31:18PM +0200, Zdenek Kabelac wrote:
> > IMHO (as the author of fixing lvm2 patch) user should not be upgrading
> > kernels and keep running older lvm2 user-land tool (and there are very good
> > reasons for this).
>
> I'm going to have to strenuously disagree.
>
> In *general* it's quite common for users to update their kernel
> without updating their userspace. For example, I as a *developer*, I
> am often running bleeding kernels (e.g., at the moment I am running
> something based on 4.18-rc6 on a Debian testing system; and it's not
> at all uncommon for users to run a newer kernel on older
> distribution).
>
> This is the *first* I've heard that I should be continuously updating
> lvm because I'm running bleeding edge kernels --- and I would claim
> that this is entirely unreasonable.
It isn't a hard requirement. But relative to a newer kernel, you simply
cannot deny that a newer lvm2 will work better than on older lvm2. Not
even speaking about this block regression. Lessons are learned, fixes
are made, support for the newer kernel's targets are available, etc etc.
That is where the suggestion to keep lvm2 updated along with the kernel
comes from.
It isn't about "oh we regress _all_ the time.. screw users!".
> I'll also note that very often users will update kernels while running
> distribution userspace. And if you are using Linode, very often
> *Linode* will offer a newer kernel to better take advantage of the
> Linode VM, and this is done without needing to install the Linode
> kernel into the userspace.
>
> It *used* to be the case that users running RHEL 2 or RHEL 3 could try
> updating to the latest upstream kernel, and everything would break and
> fall apart. This was universally considered to be a failure, and a
> Bad Thing. So if LVM2 is not backwards compatible, and breaks in the
> face of newer kernels running older distributions, that is a bug.
Please stop with the overreaction and making this something it isn't.
On Fri, Aug 03 2018 at 12:37pm -0400,
Linus Torvalds <[email protected]> wrote:
> [ Dammit. I haven't had to shout and curse at people for a while, but
> this is ABSOLUTELY THE MOST IMPORTANT THING IN THE UNIVERSE WHEN IT
> COMES TO SOFTWARE DEVELOPMENT ]
>
> On Fri, Aug 3, 2018 at 6:31 AM Zdenek Kabelac <[email protected]> wrote:
> >
> > IMHO (as the author of fixing lvm2 patch) user should not be upgrading kernels
> > and keep running older lvm2 user-land tool (and there are very good reasons
> > for this).
>
> Yeah, HELL NO!
>
> Guess what? You're wrong. YOU ARE MISSING THE #1 KERNEL RULE.
Nobody ever said there wasn't breakage. And yes, Zdenek papered over
the regression introduced by commit 721c7fc701c71 in userspace (lvm2)
rather than sound the alarm that the kernel regressed.
> We do not regress, and we do not regress exactly because your are 100% wrong.
We clearly _do_ regress. Hence this thread.
> And the reason you state for your opinion is in fact exactly *WHY* you
> are wrong.
>
> Your "good reasons" are pure and utter garbage.
>
> The whole point of "we do not regress" is so that people can upgrade
> the kernel and never have to worry about it.
>
> > Kernel had a bug which has been fixed
>
> That is *ENTIRELY* immaterial.
>
> Guys, whether something was buggy or not DOES NOT MATTER.
>
> Why?
>
> Bugs happen. That's a fact of life. Arguing that "we had to break
> something because we were fixing a bug" is completely insane. We fix
> tens of bugs every single day, thinking that "fixing a bug" means that
> we can break something is simply NOT TRUE.
>
> So bugs simply aren't even relevant to the discussion. They happen,
> they get found, they get fixed, and it has nothing to do with "we
> break users".
>
> Because the only thing that matters IS THE USER.
>
> How hard is that to understand?
It isn't.
But you're tearing the head off of a userspace developer (Zdenek).
> Anybody who uses "but it was buggy" as an argument is entirely missing
> the point. As far as the USER was concerned, it wasn't buggy - it
> worked for him/her.
>
> Maybe it worked *because* the user had taken the bug into account,
> maybe it worked because the user didn't notice - again, it doesn't
> matter. It worked for the user.
>
> Breaking a user workflow for a "bug" is absolutely the WORST reason
> for breakage you can imagine.
>
> It's basically saying "I took something that worked, and I broke it,
> but now it's better". Do you not see how f*cking insane that statement
> is?
>
> And without users, your program is not a program, it's a pointless
> piece of code that you might as well throw away.
>
> Seriously. This is *why* the #1 rule for kernel development is "we
> don't break users". Because "I fixed a bug" is absolutely NOT AN
> ARGUMENT if that bug fix broke a user setup. You actually introduced a
> MUCH BIGGER bug by "fixing" something that the user clearly didn't
> even care about.
>
> And dammit, we upgrade the kernel ALL THE TIME without upgrading any
> other programs at all. It is absolutely required, because flag-days
> and dependencies are horribly bad.
>
> And it is also required simply because I as a kernel developer do not
> upgrade random other tools that I don't even care about as I develop
> the kernel, and I want any of my users to feel safe doing the same
> time.
>
> So no. Your rule is COMPLETELY wrong. If you cannot upgrade a kernel
> without upgrading some other random binary, then we have a problem.
As I explained to Ted in my previous reply to this thread: using an lvm2
that is of the same vintage of the kernel is generally going to provide
a more robust user experience (fixes, features, etc have been
introduced). But both lvm2 and dm strive to never break users -- just
like the rest of Linux.
Anyway, what would you like done about this block regression? The
dm-snapshot use-case isn't compelling because it impacts 2 users (that
we know of) but there could be other scenarios (outside of DM) that
could impact more -- though you'd think we'd have heard about them by
now.
Do you want to revert 4.16's commit 721c7fc701c71 ?
Mike
On Fri, Aug 3, 2018 at 11:39 AM Mike Snitzer <[email protected]> wrote:
>
> Please stop with the overreaction and making this something it isn't.
It's not an overreaction when people get their scripts broken, and
some developers then argue "that's not a serious bug".
Guys, this needs to be fixed. With all the stupid and fundamentyally
incorrect excuses, I am now no longer even willing to maintain any
other course of action.
If you develop for the Linux kernel, you need to realize that
"breaking user space" is simply not acceptable. And if you cannot live
with that, then you should stop working on the kernel. Because I will
refuse to continue to pull from you as a developer.
At worst, I'll just revert the original commit entirely. I was hoping
we'd be able to avoid that, partly because the commit looks fine, but
partly because it also doesn't revert cleanly.
Or I'll just do something like this, since it seems like it's the lvm
people who have the hardest time with understanding the simple rules:
diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
index b810ea77e6b1..fcfab812e025 100644
--- a/drivers/md/dm-ioctl.c
+++ b/drivers/md/dm-ioctl.c
@@ -1049,7 +1049,12 @@ static int do_resume(struct dm_ioctl *param)
return PTR_ERR(old_map);
}
- if (dm_table_get_mode(new_map) & FMODE_WRITE)
+ /*
+ * This used to do
+ * dm_table_get_mode(new_map) & FMODE_WRITE
+ * but the lvm tools got this wrong, and will
+ * continue to write to "read-only" volumes.
+ if (0)
set_disk_ro(dm_disk(md), 0);
else
set_disk_ro(dm_disk(md), 1);
which seems to target the actual problem more directly.
Linus
On Fri, Aug 03 2018 at 2:57pm -0400,
Linus Torvalds <[email protected]> wrote:
> On Fri, Aug 3, 2018 at 11:39 AM Mike Snitzer <[email protected]> wrote:
> >
> > Please stop with the overreaction and making this something it isn't.
>
> It's not an overreaction when people get their scripts broken, and
> some developers then argue "that's not a serious bug".
>
> Guys, this needs to be fixed. With all the stupid and fundamentyally
> incorrect excuses, I am now no longer even willing to maintain any
> other course of action.
>
> If you develop for the Linux kernel, you need to realize that
> "breaking user space" is simply not acceptable. And if you cannot live
> with that, then you should stop working on the kernel. Because I will
> refuse to continue to pull from you as a developer.
WTF!?
> At worst, I'll just revert the original commit entirely. I was hoping
> we'd be able to avoid that, partly because the commit looks fine, but
> partly because it also doesn't revert cleanly.
>
> Or I'll just do something like this, since it seems like it's the lvm
> people who have the hardest time with understanding the simple rules:
I'll be your whipping boy all you like.
But you're making Zdenek's response into mine and threathening to no
longer pull from me.
Over what?
A block regression that an lvm2 developer papered over.
> diff --git a/drivers/md/dm-ioctl.c b/drivers/md/dm-ioctl.c
> index b810ea77e6b1..fcfab812e025 100644
> --- a/drivers/md/dm-ioctl.c
> +++ b/drivers/md/dm-ioctl.c
> @@ -1049,7 +1049,12 @@ static int do_resume(struct dm_ioctl *param)
> return PTR_ERR(old_map);
> }
>
> - if (dm_table_get_mode(new_map) & FMODE_WRITE)
> + /*
> + * This used to do
> + * dm_table_get_mode(new_map) & FMODE_WRITE
> + * but the lvm tools got this wrong, and will
> + * continue to write to "read-only" volumes.
> + if (0)
> set_disk_ro(dm_disk(md), 0);
> else
> set_disk_ro(dm_disk(md), 1);
>
> which seems to target the actual problem more directly.
How does that pass for a fix to this issue?
That'll unilaterally mark all dm device's readonly.
On Fri, Aug 3, 2018 at 11:54 AM Mike Snitzer <[email protected]> wrote:
>
>
> As I explained to Ted in my previous reply to this thread: using an lvm2
> that is of the same vintage of the kernel is generally going to provide
> a more robust user experience
You said that yes.
And it is completely irrelevant.
The fact is, if you use an older lvm2, then a newer kernel still needs
to work. Your "more robust experience" argument has nothing
what-so-ever to do with that.
Will you get new features from newer user land tools? Sure, usually.
And entirely immaterial to a kernel regression.
Will newer user land tools hopefully fix other issues? You'd hope so,
but again - immaterial.
So why are you bringing up a complete red herring? It's entirely
immaterial to the actual issue at hand.
I would _hope_ that other projects hjave the same "no regressions"
rule that the kernel has, but I know many don't. But whatever other
projects are out there, and whatever other rules _they_ have for their
development is also entirely immaterial to the kernel.
The kernel has a simple rule: no user regressions.
Yes, we've had to break that rule very occasionally - when the
semantics are a huge honking security issue and cannot possibly be
hidden any other way, then we obviously have to break them.
So it has happened. It's happily quite rare.
But in this case, the issue is that the block layer now enforces the
read-only protection more. And it seems to be the case that the lvm
tools set the read-only flag even when they then depended on being
able to write to them, because we didn't use to.
So just judging from that description, I do suspect that "we can't
depend on the lvm read-only flag", so a patch like
"let's not turn DM_READONLY_FLAG into actually set_disk_ro(dm_disk(md), 1)"
makes sense.
Obviously, if we can limit that more, that would be lovely.
But dammit, NOBODY gets to say "oh, you should just update user land tools".
Because when they do, I will explode. And I'm 1000% serious that I
will refuse to work with people who continue to say that or continue
to make excuses.
And user land developers should damn well know about this. The fact
that they are apparently not clued in about kernel rules is what
allowed this bug to go undiscovered and unreported for much too long.
Apparently the lvm2 user land developers *did* notice the breakage,
but instead of reporting it as a kernel bug, they worked around it.
So user land developers should actually know that if the kernel stops
working for them, they should *not* work around it. Sure, fix your
program, but let the kernel people know.
And kernel people should know that "oh, the user land people already
changed their behavior" is *not* a "I don't need to care about it".
Unless the user land fix was so long ago that nobody cares any more.
Linus
On Fri, Aug 3, 2018 at 12:06 PM Mike Snitzer <[email protected]> wrote:
>
> But you're making Zdenek's response into mine and threathening to no
> longer pull from me.
No. I'm *very* unhappy about how you seem to think that Zdenek's
response was even half-way ok, and you jumped in when Ted said it
wasn't.
Linus
Dne 3.8.2018 v 18:37 Linus Torvalds napsal(a):
> [ Dammit. I haven't had to shout and curse at people for a while, but
> this is ABSOLUTELY THE MOST IMPORTANT THING IN THE UNIVERSE WHEN IT
> COMES TO SOFTWARE DEVELOPMENT ]
>
> On Fri, Aug 3, 2018 at 6:31 AM Zdenek Kabelac <[email protected]> wrote:
>>
>> IMHO (as the author of fixing lvm2 patch) user should not be upgrading kernels
>> and keep running older lvm2 user-land tool (and there are very good reasons
>> for this).
>
> Yeah, HELL NO!
>
> Guess what? You're wrong. YOU ARE MISSING THE #1 KERNEL RULE.
>
> We do not regress, and we do not regress exactly because your are 100% wrong.
>
Hi Linus
Sorry to hear you so crazy about this :) - but my comment was mainly targeted
to 'distribution maintainers' trying to push the latest kernel and preserving
old tools (not providing upgraded packages). If it's worth to upgrade
kernel - it should be worth to upgrade surrounding packages, especially if
they are very tightly bind to work with kernel.
I just think you are overreacting here and you could trust me I personally do
A LOT to keep everything as much usable & compatible in lvm2 as we can across
all kernels, all distributions and many architectures.
Lvm2 is always tracking individual bugs in kernel and reports them to user or
tries to bypass...
>> Kernel had a bug which has been fixed
>
> That is *ENTIRELY* immaterial.
>
> Guys, whether something was buggy or not DOES NOT MATTER.
>
> Why?
>
> Bugs happen. That's a fact of life. Arguing that "we had to break
> something because we were fixing a bug" is completely insane. We fix
> tens of bugs every single day, thinking that "fixing a bug" means that
> we can break something is simply NOT TRUE.
From my userland POV - leaving kernel write to devices that are supposed to
be read-only 'just because' it's kernel is wrong - the fact it has NOT been
discover for so long means - there are not many users and not many testers of
this combination.
But if kernel people do want to make a big stress case about this - I'm the
last one to object - I'm just unsure what kind of bugs are supposed to be
preserved and how it's going to be recognized which bugs are usable and which
are fixable....
On a funny note - security exploits had also many users - so why fixing them....
> So no. Your rule is COMPLETELY wrong. If you cannot upgrade a kernel
> without upgrading some other random binary, then we have a problem.
It's was not meant as a rule, just recommendation - and I think we can end up
this fight over nothing here - there is probably very low number of users of
this combination....
Regards
Zdenek
On Fri, Aug 3, 2018 at 12:06 PM Mike Snitzer <[email protected]> wrote:
>
> How does that pass for a fix to this issue?
>
> That'll unilaterally mark all dm device's readonly.
Well, if it wasn't honored before anyway, then...
But yes, a much more targeted patch would be preferred.
And in fact, maybe the right thing to do is to not revert the original
commit entirely, but instead just weaken it a lot. Turn the "you did a
write request on a RO disk" into a WARN_ON_ONCE() instead of a hard
error.
Something like the attached patch.
WGH, do you build your own kernels? Does this attached (untested)
patch make things work for you? It should give a big warning, but let
the old behavior through..
Linus
On Fri, Aug 03 2018 at 3:09pm -0400,
Linus Torvalds <[email protected]> wrote:
> On Fri, Aug 3, 2018 at 11:54 AM Mike Snitzer <[email protected]> wrote:
> >
> >
> > As I explained to Ted in my previous reply to this thread: using an lvm2
> > that is of the same vintage of the kernel is generally going to provide
> > a more robust user experience
>
> You said that yes.
>
> And it is completely irrelevant.
>
> The fact is, if you use an older lvm2, then a newer kernel still needs
> to work. Your "more robust experience" argument has nothing
> what-so-ever to do with that.
>
> Will you get new features from newer user land tools? Sure, usually.
> And entirely immaterial to a kernel regression.
I was merely giving context for the suggestion of keeping lvm2 updated.
Not saying it was relevant for this regression.
> Will newer user land tools hopefully fix other issues? You'd hope so,
> but again - immaterial.
>
> So why are you bringing up a complete red herring? It's entirely
> immaterial to the actual issue at hand.
I was trying to give context for the "best to update lvm2 anyway"
disclaimer that was used. Yeah, it was specious.
And Zdenek exposed way more surface area for you to attack with his
reply to this thread. My initial response to this thread was far more
understated but was effectively: read-only dm-snapshot is rare, I'm
inclined to just let this be.
And yeah, that isn't a good excuse to ignore it but: dm-snapshot is a
steaming pile as compared to dm thin-provisioning so dm-snapshot users
who then go off the beaten path are already masochistic. SO the 2 users
who noticed can cope..
But that too is a cop-out.
> I would _hope_ that other projects hjave the same "no regressions"
> rule that the kernel has, but I know many don't. But whatever other
> projects are out there, and whatever other rules _they_ have for their
> development is also entirely immaterial to the kernel.
>
> The kernel has a simple rule: no user regressions.
>
> Yes, we've had to break that rule very occasionally - when the
> semantics are a huge honking security issue and cannot possibly be
> hidden any other way, then we obviously have to break them.
>
> So it has happened. It's happily quite rare.
>
> But in this case, the issue is that the block layer now enforces the
> read-only protection more. And it seems to be the case that the lvm
> tools set the read-only flag even when they then depended on being
> able to write to them, because we didn't use to.
>
> So just judging from that description, I do suspect that "we can't
> depend on the lvm read-only flag", so a patch like
>
> "let's not turn DM_READONLY_FLAG into actually set_disk_ro(dm_disk(md), 1)"
>
> makes sense.
>
> Obviously, if we can limit that more, that would be lovely.
>
> But dammit, NOBODY gets to say "oh, you should just update user land tools".
I'll have a closer look at all this.
Could be DM in general is lacking for read-only permissions when you
have complex stacking involved.
> Because when they do, I will explode. And I'm 1000% serious that I
> will refuse to work with people who continue to say that or continue
> to make excuses.
>
> And user land developers should damn well know about this. The fact
> that they are apparently not clued in about kernel rules is what
> allowed this bug to go undiscovered and unreported for much too long.
> Apparently the lvm2 user land developers *did* notice the breakage,
> but instead of reporting it as a kernel bug, they worked around it.
Yeap, they did.. I was unaware myself.
> So user land developers should actually know that if the kernel stops
> working for them, they should *not* work around it. Sure, fix your
> program, but let the kernel people know.
Agreed.
> And kernel people should know that "oh, the user land people already
> changed their behavior" is *not* a "I don't need to care about it".
> Unless the user land fix was so long ago that nobody cares any more.
I never didn't care. I just didn't care much. Because "dm-snapshot".
Mike
On Fri, Aug 3, 2018 at 12:18 PM Zdenek Kabelac <[email protected]> wrote:
>
> From my userland POV - leaving kernel write to devices that are supposed to
> be read-only 'just because' it's kernel is wrong - the fact it has NOT been
> discover for so long means - there are not many users and not many testers of
> this combination.
Sure. But at the same time, the "read-only" issue from a _security_
standpoint should never be about some device state.
Why? Because the Unix security rules aren't about "read-only devices".
They are about permissions, and if you don't want your users to have
write permissions to a device, then they shouldn't have those
permissions.
So the "set_disk_ro()" interface is simply not for security. Anybody
who uses it for security is seriously confused.
No, the "set_disk_ro()" interface is so that you can say "you
physically cannot write to this medium". It's meant for things like
CD-ROM devices, or for a floppy device when you notice that the
controller reports that the floppy itself is physically
write-protected.
THAT is what the new code checks for, and that is also why ignoring
the check really shouldn't be a security issue.
Because if it turns out that somebody wrote to it, and the write
succeeded, then obviously the "set_disk_ro()" usage was simply wrong.
Now, I do agree that it 100% makes sense to have a layer like md/lvm
be able to virtually mark volumes read-only. If only exactly to
emulate the "this is like a write-protected floppy or a cd-rom"
behavior.
So the DM_READONLY_FLAG makes conceptual sense.
But at the same time, if the DM_READONLY_FLAG was _wrong_, then it
also makes a ton of sense to just say "oh, it was wrong, we'll ignore
it".
Exactly because it was never supposed to be about security, and it was
about other things.
See?
Linus
On Fri, Aug 03 2018 at 3:11pm -0400,
Linus Torvalds <[email protected]> wrote:
> On Fri, Aug 3, 2018 at 12:06 PM Mike Snitzer <[email protected]> wrote:
> >
> > But you're making Zdenek's response into mine and threathening to no
> > longer pull from me.
>
> No. I'm *very* unhappy about how you seem to think that Zdenek's
> response was even half-way ok, and you jumped in when Ted said it
> wasn't.
I'm inclined not to make people who aren't kernel developers cry over
spilt milk.
But zero tolerance of kernel regressions is important. And that mantra
wasn't properly impressed upon lvm2 developers.
On Fri, Aug 3, 2018 at 12:30 PM Mike Snitzer <[email protected]> wrote:
>
> I was trying to give context for the "best to update lvm2 anyway"
> disclaimer that was used. Yeah, it was specious.
So I'll apologize for the strong words, and I'll just say - not
excuse, but explanation - that this is the *ONE* thing that continues
to drive me absolutely bat-shit insane.
It's the *ONE* thing I will refuse to even discuss. Pretty much any
technical thing, I'll happily discuss, and I'll admit when I was
completely wrong and a f*cking moron.
Hey, it happens.
But the "no regressions" thing is _so_ important to me, and _so_
frustrating when I get the feeling that people ignore it, that I just
go ballistic.
I'm sorry for going ballistic, but there it is. It keeps happening
every once in a while. I thought we'd gotten mostly over it.
Linus
On Fri, Aug 03, 2018 at 11:20:34AM -0400, Theodore Y. Ts'o wrote:
> It *used* to be the case that users running RHEL 2 or RHEL 3 could try
> updating to the latest upstream kernel, and everything would break and
> fall apart. This was universally considered to be a failure, and a
> Bad Thing. So if LVM2 is not backwards compatible, and breaks in the
> face of newer kernels running older distributions, that is a bug.
Brings back memories! For those who wonder what this is all about,
with LVM1, if the version of kernel and userspace didn't match it simply
stopped. No booting into the previous version of your kernel after
upgrading unless you reverted userspace as well! Led to all sorts of
not-so-fancy workarounds.
As a reaction to this, LVM2 (through libdevmapper and anything else
using the dm ioctls as documented in dm-ioctl.h) passes a version number
up to the kernel (to say "I know about kernels with dm up to and
including version x.y.z"), so there is an option for an in-kernel
workaround here to limit its compatibility mode to the broken userspace
versions.
Anything passing in version 4.37.0 or earlier (which is the version in
dm-ioctl.h when this kernel patch was applied) could be assumed to
require the old behaviour. check_version() is where this version is
seen, so it would either store it for later checking or do the check and
store a flag to invoke compatible behaviour later.
Alasdair
On Fri, Aug 03, 2018 at 08:56:36PM +0100, Alasdair G Kergon wrote:
> Anything passing in version 4.37.0 or earlier (which is the version in
If taking this approach, it might be better to use the current version
i.e. where we add the kernel-side fix. IOW anything compiling against
a uapi header taken from a kernel up to now will see the old behaviour,
but anything newer (after we next increment the kernel dm version) will
be required to change its userspace code if it has this problem in it.
The problematic versions of lvm2 ship with versions up to and including
4.36.0 in this field so should be caught either way.
Alasdair
On Fri, Aug 3, 2018 at 1:08 PM Alasdair G Kergon <[email protected]> wrote:
>
> If taking this approach, it might be better to use the current version
> i.e. where we add the kernel-side fix. IOW anything compiling against
> a uapi header taken from a kernel up to now will see the old behaviour,
> but anything newer (after we next increment the kernel dm version) will
> be required to change its userspace code if it has this problem in it.
No, please don't do things like that.
It's a nightmare. It just means that different people randomly have
different behavior depending on what development environment they had,
which is horrible for any situation where you do cross-builds, but
it's also horrible from a testing standpoint.
And it's a *huge* nightmare for stable releases and backporting fixes.
Absolutely *no* user space shjould ever care about kernel versions. If
you want to use a new system call or something, just *use* if, and
then if it fails, fall back to the old one. Never ever check "what's
the kernel version" or anything like that.
Instead, what I suggest we do is:
Case 1: if a user program notice that a new kernel breaks something, then
(a) report it to kernel people with a test-case
(b) if you notice that the *reason* a new kernel broke something is
because you had a bug and you can just fix that bug and it will always
work on all kernels, by all means just fix the bug. Obviously you
don't want to just keep buggy code around just because it was found by
a kernel change
(c) but even if (b) happens, do that report. In fact, you now have a
perfect test-case for the kernel people you report it to: you can tell
them *exactly* what changed, and what the two situations were.
(d) and if whoever you reported the kernel breakage to doesn't seem
to take it seriously, just escalate it to me.
Note that for the (d) case, there are situations where even _I_ won't
necessarily take it seriously. If you're the only user of some
program, I may just go "Oh, you already fixed your own problem, I
don't need to worry". Or if it turns out that the breakage wasn't
"user flow", but some test-suite regression, I will tend to ignore
those. Test suites are by definition for finding _behavior_, but
behavior can change - it's actual _breakage_ I worry about.
But also notice how none of that "case 1" has any versioning related
to it. Yes, you'll have old versions of the user program before the
bug was fixed, but they will *not* look at kernel versions, and the
new versions certainly shouldn't either.
Case 2: the kernel side. We get that breakage reported, but it turns
out that different versions of the workflow act differently, and maybe
some versions depend on the new behavior, and some depend on the old
behavior.
And yes, this happens.
We have been in the situation where we get a bug-report for something
we changed three years ago, and by now, all modern user space depends
on the new "fixed" behavior, but some embedded user who hadn't
upgraded a kernel in years is unhappy.
It's happened several times, and if it really is "it's been years",
even I will go "tough luck, I guess you're stuck on your old kernel".
Because we can't fix it.
But in a situation like this, where we really want to encourage new
behavior but we have somebody who reported the breakage in a timely
manner (ie it's fairly recent), _and_ it's a system tool like lvm2
that actually gives us a version number, at that point the *kernel*
might decide "this is an old binary, I will now use some versioning to
fall back on old behavior".
Because for the kernel, the compatibility really is a #1 concern, and
if we have to check version numbers or infer compatibility issues some
other way, we'll do it.
But that "different behavior for different application versions" is
something we generally avoid at all costs. We do have a couple of
cases where we deduce what people want based on behavior. But it is
very very rare, and we generally want to avoid it if there is _any_
other model for it.
So even in case 2, we do try to avoid versioning. More often we add a
new flag, and say "hey, if you want the new behavior, use the new flag
to say so". Not versioning, but explicit "I want the new behavior"
Not all system calls take flags, of course.
Linus
On Fri, Aug 03, 2018 at 01:42:52PM -0700, Linus Torvalds wrote:
> So even in case 2, we do try to avoid versioning. More often we add a
> new flag, and say "hey, if you want the new behavior, use the new flag
> to say so". Not versioning, but explicit "I want the new behavior"
There are spare flags available in dm-ioctl - one could indeed turn on
the new more-restrictive behaviour.
Alasdair
On Fri, Aug 03, 2018 at 03:30:37PM -0400, Mike Snitzer wrote:
>
> I was trying to give context for the "best to update lvm2 anyway"
> disclaimer that was used. Yeah, it was specious.
Well, it seemed to indicate a certain attitude that both Linus and I
are concerned about. I tried to use more of a "pursuading" style to
impress why that attitude was not ideal/correct. Linus used a much
more assertive style (e.g., "Hell, no!").
> And yeah, that isn't a good excuse to ignore it but: dm-snapshot is a
> steaming pile as compared to dm thin-provisioning...
On a side note, this is the first that I've heard the assertion that
dm-thin was better than dm-snapshot. My impression was that
dm-snapshot was a proven code base, that only did one thing and (as
far as I could tell) did it well. In contrast, dm-thin is much newer
code, **far** more complex, with functionality and corner cases
approaching that of a file system --- and just to be even more
exciting, it doesn't have an fsck/repair tool to deal with corrupted
metadata.
In your opinion, is it because you disagree with the assumption that
dm-thin is scary? Or is the argument that dm-snapshot is even
scarier?
- Ted
P.S. It could be that my impression is wrong/out-dated, but the
kernel documentation still says that userspace tools for checking and
repairing the metadata are "under development". As a file system
developer, the reaction this inspires is best summed up as:
https://imgflip.com/memetemplate/50971393/Scared-Face
Dne 4.8.2018 v 07:20 Theodore Y. Ts'o napsal(a):
> On Fri, Aug 03, 2018 at 03:30:37PM -0400, Mike Snitzer wrote:
>>
>> I was trying to give context for the "best to update lvm2 anyway"
>> disclaimer that was used. Yeah, it was specious.
>
> Well, it seemed to indicate a certain attitude that both Linus and I
> are concerned about. I tried to use more of a "pursuading" style to
> impress why that attitude was not ideal/correct. Linus used a much
> more assertive style (e.g., "Hell, no!").
My whole point was - when someone has 'enough time' to compile fresh new
kernel, spending couple seconds to build also recent lvm2 should have be no
issue. Bugs are continually fixed, new feature from newer kernel are being
used, known problems are being addressed - that's all what it means.
Bug are fixed not just in kernel, but also in user-space.
>
>> And yeah, that isn't a good excuse to ignore it but: dm-snapshot is a
>> steaming pile as compared to dm thin-provisioning...
>
> On a side note, this is the first that I've heard the assertion that
> dm-thin was better than dm-snapshot. My impression was that
> dm-snapshot was a proven code base, that only did one thing and (as
> far as I could tell) did it well. In contrast, dm-thin is much newer
> code, **far** more complex, with functionality and corner cases
> approaching that of a file system --- and just to be even more
> exciting, it doesn't have an fsck/repair tool to deal with corrupted
> metadata.
>
> In your opinion, is it because you disagree with the assumption that
> dm-thin is scary? Or is the argument that dm-snapshot is even
> scarier?
dm-snapshost has really outdated design - it's been useful in the old age
where megabyte was hell lot of space.
Nowadays, when users do need to handle snapshots in multi gigabyte sizes and
moreover have number of snapshots from the same volume taken over the time,
want to take snapshot of snapshot of snapshot, the old snapshot simple kills
all the performance, uses tons of resources and becomes serious bottleneck of
your system and has lots of usability limitation.
That's where thin provisioning will shine....
However if you still need to take short-living smallish snapshot and you want
to use non-provisioned origin device, there the snapshot will still work as
proven technology - just no one can expect it will scale for the recent
device size explosion...
Regards
Zdenek
On 08/03/2018 10:22 PM, Linus Torvalds wrote:
> And in fact, maybe the right thing to do is to not revert the original
> commit entirely, but instead just weaken it a lot. Turn the "you did a
> write request on a RO disk" into a WARN_ON_ONCE() instead of a hard
> error.
>
> Something like the attached patch.
>
> WGH, do you build your own kernels? Does this attached (untested)
> patch make things work for you? It should give a big warning, but let
> the old behavior through..
The patch works for me.
However, there's no text messsage in the kernel log, just a traceback. I
think that's because WARN_ONCE is supposed to take condition as a first
argument.
On Sat, Aug 04 2018 at 1:20am -0400,
Theodore Y. Ts'o <[email protected]> wrote:
> On Fri, Aug 03, 2018 at 03:30:37PM -0400, Mike Snitzer wrote:
> >
> > I was trying to give context for the "best to update lvm2 anyway"
> > disclaimer that was used. Yeah, it was specious.
>
> Well, it seemed to indicate a certain attitude that both Linus and I
> are concerned about. I tried to use more of a "pursuading" style to
> impress why that attitude was not ideal/correct. Linus used a much
> more assertive style (e.g., "Hell, no!").
[I debated just ignoring this portion of your reply but it needs to be
dealt with directly]
I prefer how Linus handled it (at least he was timely with his
follow-ups). Your initial reply where you joined a fragment of my
initial reply with Zdenek's (we sent simultaneously, each half way
around the world) served to merge Zdenek and myself into one fictional
straw-man you both could attack. If you have something to say to _me_
address me directly; don't put words in my mouth because you thought I
had a complete mind-meld with someone else.
And please don't act like this wasn't already beaten to death yesterday;
which left me (as DM maintainer) initially _unwarrantedly_ compromised.
There was a block regression that I wasn't aware of but someone on my
broader team (Zdenek) papered over it in userspace rather than report it
as a regression.
I did brush off the seriousness of side-effects on readonly dm-snapshot
("Because dm-snapshot"). But that doesn't speak to some systemic
"problem" you seem to be concerned about.
> > And yeah, that isn't a good excuse to ignore it but: dm-snapshot is a
> > steaming pile as compared to dm thin-provisioning...
>
> On a side note, this is the first that I've heard the assertion that
> dm-thin was better than dm-snapshot.
You don't follow DM much, that's fine. But thinp is considerably more
powerful for modern use-cases.
> My impression was that dm-snapshot was a proven code base, that only
> did one thing and (as far as I could tell) did it well. In contrast,
> dm-thin is much newer code, **far** more complex, with functionality
> and corner cases approaching that of a file system ---
dm-snapshot's scaling is _awful_. This is due to the N-way copy-out
penalty associated with N snapshots. So lots of snapshots perform very
very slowly. Even one snapshot is slow compared to dm-thinp.
dm-thin (2011) certainly is newer than dm-snapshot (well before 2005),
and yes dm-thin is complex, but dm-snapshot's code isn't exactly
"simple". The on-disk layout is but that simplicity contributes to why
it doesn't scale at all.
DM thin is a modern approach to snapshots grounded in the same
btree-based concepts and design used by btrfs. Given dm-thinp's
requirements and how it has been deployed and pushed _hard_ it really is
holding up amazingly well.
> and just to be even more exciting, it [dm-thin] doesn't have an
> fsck/repair tool to deal with corrupted metadata.
That's one definition for "exciting" on your Friday night ... ;)
The documentation was outdated, see this thread:
https://www.redhat.com/archives/dm-devel/2018-July/msg00200.html
Where I shared that this Documentation update was staged for 4.19:
https://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-4.19&id=6c7413c0f5ab61d2339cf516d25bb44d71f4bad4
That said, thin_repair has shown itself to be hit-or-miss. There are
certain corruptions that it doesn't cope with well (leaving the metadata
"repaired" but the result is an empty thin-pool). Those cases are more
rare but still occur.
So repairing thinp corruption can require escalations through
"enterprise support" (which results in fixes to thin_repair, etc).
> In your opinion, is it because you disagree with the assumption that
> dm-thin is scary? Or is the argument that dm-snapshot is even
> scarier?
Apples and oranges. DM thinp is complex but necessarily so.
dm-snapshot is still complex yet only covers legacy and narrow (read:
now useless) use-cases.
In the same thread I referenced above, see how Drew Hastings is looking
to use DM thinp to host VM guest storage, which implies a scaling
dm-snapshot has _zero_ hope of providing:
https://www.redhat.com/archives/dm-devel/2018-August/msg00088.html
> P.S. It could be that my impression is wrong/out-dated, but the
> kernel documentation still says that userspace tools for checking and
> repairing the metadata are "under development". As a file system
> developer, the reaction this inspires is best summed up as:
>
> https://imgflip.com/memetemplate/50971393/Scared-Face
Already addressed this.
On Sat, Aug 04, 2018 at 10:36:50AM +0200, Zdenek Kabelac wrote:
> dm-snapshost has really outdated design - it's been useful in the old age
> where megabyte was hell lot of space.
>
> Nowadays, when users do need to handle snapshots in multi gigabyte sizes and
> moreover have number of snapshots from the same volume taken over the time,
> want to take snapshot of snapshot of snapshot, the old snapshot simple kills
> all the performance, uses tons of resources and becomes serious bottleneck
> of your system and has lots of usability limitation.
Fair enough. I don't think I would consider that makes dm-snapshot a
"steaming pile". For me, protection against data loss is Job One.
> That's where thin provisioning will shine....
The dm-thin development might want to take a look at what's currently
in Documentation/device-mapper/thin-privisioning.txt:
Status
======
These targets are very much still in the EXPERIMENTAL state. Please
do not yet rely on them in production. But do experiment and offer us
feedback. Different use cases will have different performance
characteristics, for example due to fragmentation of the data volume.
If you find this software is not performing as expected please mail
[email protected] with details and we'll try our best to improve
things for you.
Userspace tools for checking and repairing the metadata are under
development.
Saying that dm-snapshot is a steaming pile and dm-thin is what
everyone should use doesn't seem to be consistent with the above.
Cheers,
- Ted
On Sat, Aug 4, 2018 at 3:03 AM WGH <[email protected]> wrote:
>
> >
> The patch works for me.
>
> However, there's no text messsage in the kernel log, just a traceback. I
> think that's because WARN_ONCE is supposed to take condition as a first
> argument.
Duh.
It needs to be WARN_ONCE(1, ...);
I obviously didn't test that patch, but I _did_ compile it. I wonder
why I didn't get a compiler warning for it...
[ Goes off and looks ]
Oh, because the "bio_devname(bio, b)" argument ended up being
interpreted as the format string, and since it was a dynamic string
the compiler felt it was all fine. Just bad luck.
Anyway, just out of curiosity, what was the traceback?
I'm not entirely happy with that patch either (even after the obvious
fix to add the "1" argument), but it does seem like the minimal
temporary workaround for now.
Linus
On Sat, Aug 04 2018 at 12:22pm -0400,
Theodore Y. Ts'o <[email protected]> wrote:
> On Sat, Aug 04, 2018 at 10:36:50AM +0200, Zdenek Kabelac wrote:
> > dm-snapshost has really outdated design - it's been useful in the old age
> > where megabyte was hell lot of space.
> >
> > Nowadays, when users do need to handle snapshots in multi gigabyte sizes and
> > moreover have number of snapshots from the same volume taken over the time,
> > want to take snapshot of snapshot of snapshot, the old snapshot simple kills
> > all the performance, uses tons of resources and becomes serious bottleneck
> > of your system and has lots of usability limitation.
>
> Fair enough. I don't think I would consider that makes dm-snapshot a
> "steaming pile". For me, protection against data loss is Job One.
What's your point Ted? Do you have _any_ intention of actually using
anything DM or is this just a way for you to continue to snipe at it?
> > That's where thin provisioning will shine....
>
> The dm-thin development might want to take a look at what's currently
> in Documentation/device-mapper/thin-privisioning.txt:
>
> Status
> ======
>
> These targets are very much still in the EXPERIMENTAL state. Please
> do not yet rely on them in production. But do experiment and offer us
> feedback. Different use cases will have different performance
> characteristics, for example due to fragmentation of the data volume.
>
> If you find this software is not performing as expected please mail
> [email protected] with details and we'll try our best to improve
> things for you.
>
> Userspace tools for checking and repairing the metadata are under
> development.
>
> Saying that dm-snapshot is a steaming pile and dm-thin is what
> everyone should use doesn't seem to be consistent with the above.
Maybe read your email from earlier today before repeating yourself:
https://lkml.org/lkml/2018/8/4/366
On Sat, Aug 04 2018 at 1:04pm -0400,
Linus Torvalds <[email protected]> wrote:
> On Sat, Aug 4, 2018 at 3:03 AM WGH <[email protected]> wrote:
> >
> > >
> > The patch works for me.
> >
> > However, there's no text messsage in the kernel log, just a traceback. I
> > think that's because WARN_ONCE is supposed to take condition as a first
> > argument.
>
> Duh.
>
> It needs to be WARN_ONCE(1, ...);
>
> I obviously didn't test that patch, but I _did_ compile it. I wonder
> why I didn't get a compiler warning for it...
>
> [ Goes off and looks ]
>
> Oh, because the "bio_devname(bio, b)" argument ended up being
> interpreted as the format string, and since it was a dynamic string
> the compiler felt it was all fine. Just bad luck.
>
> Anyway, just out of curiosity, what was the traceback?
>
> I'm not entirely happy with that patch either (even after the obvious
> fix to add the "1" argument), but it does seem like the minimal
> temporary workaround for now.
I agree.
Please feel free to add my:
Acked-by: Mike Snitzer <[email protected]>
On Sat, Aug 04, 2018 at 02:18:47PM -0400, Mike Snitzer wrote:
> > Fair enough. I don't think I would consider that makes dm-snapshot a
> > "steaming pile". For me, protection against data loss is Job One.
>
> What's your point Ted? Do you have _any_ intention of actually using
> anything DM or is this just a way for you to continue to snipe at it?
My point is that putting down dm-snapshot by calling it a "steaming
pile" because it can't perform well on workloads that weren't a
requirement when it was first designed is neither accurate nor fair.
And steering users away from it by badmouthing to a technology which
ever so often, requires enterprise support to recover, is something
that *I* at least would classify as "marginal".
Maybe it's just that file system developers have higher standards. I
know that Dave Chinner at LSF/MM commented that using some of the
things he has been developing for XFS subvolume support might be
interesting precisely because it could provide some of the facilities
currently provided by thin provisioning (perhaps not all of them; I'm
not sure how well his virtual block remapping layer would handle
hundreds of snapshots) but with file system tools which have a lot
more seasoning and where people have spent a lot of effort on data
recovery tools.
In any case, I do use DM quite a lot. I use LVM2 and dm-snapshot (and
it's been working just *fine* for my use cases). I've wanted to use
dm-thin, but have been put off by it being labeled as experimental and
by some of the comments about how robust its recovery tools are. If
there was documentation about how an advanced user/developer could use
low level tools to do manual repair of a thin pool when the automated
procedures didn't work, without having to pay $$$ to some company for
"enterprise support", I'd be a lot more willing to give it a try.
Sorry, I just care a *lot* about data robustness.
> Maybe read your email from earlier today before repeating yourself:
> https://lkml.org/lkml/2018/8/4/366
Apologies. I'm currently staying at an Assisted Living facility
keeping an eye on my Dad this week, and the internet at the senior
living center has been.... marginal. As a result I've been reading my
e-mail in batches, and so I hadn't seen the e-mail you had posted
earlier before I had sent my reply.
- Ted
On 08/04/2018 08:04 PM, Linus Torvalds wrote:
> Anyway, just out of curiosity, what was the traceback?
>
> I'm not entirely happy with that patch either (even after the obvious
> fix to add the "1" argument), but it does seem like the minimal
> temporary workaround for now.
Here it is.
On Sat, Aug 04 2018 at 3:37pm -0400,
Theodore Y. Ts'o <[email protected]> wrote:
> On Sat, Aug 04, 2018 at 02:18:47PM -0400, Mike Snitzer wrote:
> > > Fair enough. I don't think I would consider that makes dm-snapshot a
> > > "steaming pile". For me, protection against data loss is Job One.
> >
> > What's your point Ted? Do you have _any_ intention of actually using
> > anything DM or is this just a way for you to continue to snipe at it?
>
> My point is that putting down dm-snapshot by calling it a "steaming
> pile" because it can't perform well on workloads that weren't a
> requirement when it was first designed is neither accurate nor fair.
As a person who has written a fair amount of dm-snapshot code I'm free
to have my opinion. It is slow. Period.
If it works for you, great. But it isn't adequate for most modern
usecases I'm aware of.
> And steering users away from it by badmouthing to a technology which
> ever so often, requires enterprise support to recover, is something
> that *I* at least would classify as "marginal".
dm-snapshot is slow, as such I will badmouth it because dm-thinp is a
much more capable replacement. I have to maintain both, so I'm free to
steer people according to my experience.
> Maybe it's just that file system developers have higher standards. I
> know that Dave Chinner at LSF/MM commented that using some of the
> things he has been developing for XFS subvolume support might be
> interesting precisely because it could provide some of the facilities
> currently provided by thin provisioning (perhaps not all of them; I'm
> not sure how well his virtual block remapping layer would handle
> hundreds of snapshots) but with file system tools which have a lot
> more seasoning and where people have spent a lot of effort on data
> recovery tools.
Even new XFS features will have bugs. Just because XFS's fsck is
historically robust, oversights and bugs happen when new features are
added. And AFAIK future XFS would be looking to leverage DM thinp via
its producer/consumer model. But this is going off on a tangent now.
> In any case, I do use DM quite a lot. I use LVM2 and dm-snapshot (and
> it's been working just *fine* for my use cases). I've wanted to use
> dm-thin, but have been put off by it being labeled as experimental and
> by some of the comments about how robust its recovery tools are.
The Documentation was stale. I personally don't reference it so the
need to update it got overlooked.
> If there was documentation about how an advanced user/developer could
> use low level tools to do manual repair of a thin pool when the
> automated procedures didn't work, without having to pay $$$ to some
> company for "enterprise support", I'd be a lot more willing to give it
> a try.
We could certainly improve out documentation for the use of thin_check
and thin_repair. I know lvm2 has seen improvements to allow the metdata
voulme to be activated in standalone mode (activate the metadata volume
without the thin-pool or thin devices ontop) so that the thin_check and
thin_repair tools can be used on it. I'd imagine you aren't aware of
the lvm2 package's lvmthin manpage? See: man lvmthin
It'd likely be one of the documentation locations to see improvements.
I'll talk with others about where we can improve docs, the manpages for
thin_check and thin_repair are _very_ sparse. Anyway, room for
improvement for sure.
But demonizing "enterprise support" like you don't provide that to your
stake holders is bizarre. I again was candid and forthcoming about what
drives/catches the need for thin_check and thin_repair fixes and
improvements: it just so happens that "enterprise" deployments make use
of DM-thinp and have exposed the need for support more than community
users. I'm not saying I, or other DM thinp oriented developers,
wouldn't provided the same type of support if a community user like
yourself hit a problem. It is just that enterprise users are the
prontlines of advanced usage and scale. Deploying hundreds of Gluster
servers with every brick layered ontop of DM thinp historically exposed
issues. Those issues get fixed and benefit everyone.
This discussion, and my need to explain how "enterprise support" drives
innovation, is so.. weird.
> Sorry, I just care a *lot* about data robustness.
You aren't alone.
> > Maybe read your email from earlier today before repeating yourself:
> > https://lkml.org/lkml/2018/8/4/366
>
> Apologies. I'm currently staying at an Assisted Living facility
> keeping an eye on my Dad this week, and the internet at the senior
> living center has been.... marginal. As a result I've been reading my
> e-mail in batches, and so I hadn't seen the e-mail you had posted
> earlier before I had sent my reply.
Best wishes. I've been dealing with stresses in my personal life
myself. Might explain why we've had the awkwardness in this thread.
Mike