2017-07-12 17:07:14

by Richard W.M. Jones

[permalink] [raw]
Subject: Fast symlinks stored slow


https://bugzilla.redhat.com/show_bug.cgi?id=1470157

To cut a long story short, we were using libext2fs to create
filesystems where short symlinks (< 60 bytes) were stored the same way
as long symlinks, ie. stored as an ordinary file instead of being
stored in the inode.

This actually worked fine until very recently when this change was
made to the upstream kernel:

commit 407cd7fb83c0ebabb490190e673d8c71ee7df97e
Author: Tahsin Erdogan <[email protected]>
Date: Tue Jul 4 00:11:21 2017 -0400

ext4: change fast symlink test to not rely on i_blocks

which broke these filesystems.

I think the reason we were creating filesystems wrongly in the first
place is because our code has been around since about 2008, and the
nice ext2fs_symlink function that deals properly with fast/slow
symlinks wasn't added until 2013.

It's not too much trouble for us to recreate the incorrect
filesystems. Mostly we're creating one-off throwaway filesystems for
appliances anyway and they don't live for long.

But I suppose this might be a warning that other incorrect filesystems
exist which will break with Linux >= 4.13.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v


2017-07-12 20:52:32

by Richard W.M. Jones

[permalink] [raw]
Subject: Re: Fast symlinks stored slow

On Wed, Jul 12, 2017 at 06:07:11PM +0100, Richard W.M. Jones wrote:
> To cut a long story short, we were using libext2fs to create
> filesystems where short symlinks (< 60 bytes) were stored the same way
> as long symlinks, ie. stored as an ordinary file instead of being
> stored in the inode.

As a further data point, e2fsck does not complain about these
filesystems.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-p2v converts physical machines to virtual machines. Boot with a
live CD or over the network (PXE) and turn machines into KVM guests.
http://libguestfs.org/virt-v2v

2017-07-12 21:36:25

by Tahsin Erdogan

[permalink] [raw]
Subject: Re: Fast symlinks stored slow

>> To cut a long story short, we were using libext2fs to create
>> filesystems where short symlinks (< 60 bytes) were stored the same way
>> as long symlinks, ie. stored as an ordinary file instead of being
>> stored in the inode.
>
> As a further data point, e2fsck does not complain about these
> filesystems.

This sounds bad. We may have to revert back to i_blocks based logic.

As an example, lets say inode's EXT4_INODE_EXTENTS flag is clear,
i_size is 2 and i_data[0] contains 0x00004141. Without inspecting
xattrs and i_blocks, we can't determine whether a) the target is "AA"
or b) block 0x4141 contains two bytes that is the actual symlink
target.

2017-07-12 23:17:41

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Fast symlinks stored slow

On Wed, Jul 12, 2017 at 06:07:11PM +0100, Richard W.M. Jones wrote:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1470157
>
> To cut a long story short, we were using libext2fs to create
> filesystems where short symlinks (< 60 bytes) were stored the same way
> as long symlinks, ie. stored as an ordinary file instead of being
> stored in the inode.
>
> I think the reason we were creating filesystems wrongly in the first
> place is because our code has been around since about 2008, and the
> nice ext2fs_symlink function that deals properly with fast/slow
> symlinks wasn't added until 2013.

Thanks for the report. I had been hesitant about making this change
(and had been pushing back from those who were advocating for this
change) precisely because I was afraid that this might be a situation.

What convinced me to accept the change is that (a) I had scanned all
of the old kernels and old versions of e2fsprogs and convinced myself
that aside from someone manually creating symlinks using low-level
libext2fs, symlinks < 60 bytes would never be stored in external
blocks, and (b) using the i_blocks logic to determine whether or not
we had a slow link was getting really painful.

> It's not too much trouble for us to recreate the incorrect
> filesystems. Mostly we're creating one-off throwaway filesystems for
> appliances anyway and they don't live for long.
>
> But I suppose this might be a warning that other incorrect filesystems
> exist which will break with Linux >= 4.13.

So I see this is going to break libvert and libguestfs. So people who
are running existing distribution userspaces and then upgrade to 4.13
will break.

Hmm... I suppose we could add back support to let the kernel to use
the i_blocks logic if the ea_inode feature is not enabled. E2fsck
would still complain so we can try to gradually force userspace to do
things "correctly", but at least we would be backwards compatible.

Comments?

- Ted

2017-07-13 08:02:15

by Richard W.M. Jones

[permalink] [raw]
Subject: Re: Fast symlinks stored slow

On Wed, Jul 12, 2017 at 07:17:37PM -0400, Theodore Ts'o wrote:
> On Wed, Jul 12, 2017 at 06:07:11PM +0100, Richard W.M. Jones wrote:
> >
> > https://bugzilla.redhat.com/show_bug.cgi?id=1470157
> >
> > To cut a long story short, we were using libext2fs to create
> > filesystems where short symlinks (< 60 bytes) were stored the same way
> > as long symlinks, ie. stored as an ordinary file instead of being
> > stored in the inode.
> >
> > I think the reason we were creating filesystems wrongly in the first
> > place is because our code has been around since about 2008, and the
> > nice ext2fs_symlink function that deals properly with fast/slow
> > symlinks wasn't added until 2013.
>
> Thanks for the report. I had been hesitant about making this change
> (and had been pushing back from those who were advocating for this
> change) precisely because I was afraid that this might be a situation.
>
> What convinced me to accept the change is that (a) I had scanned all
> of the old kernels and old versions of e2fsprogs and convinced myself
> that aside from someone manually creating symlinks using low-level
> libext2fs, symlinks < 60 bytes would never be stored in external
> blocks, and (b) using the i_blocks logic to determine whether or not
> we had a slow link was getting really painful.
>
> > It's not too much trouble for us to recreate the incorrect
> > filesystems. Mostly we're creating one-off throwaway filesystems for
> > appliances anyway and they don't live for long.
> >
> > But I suppose this might be a warning that other incorrect filesystems
> > exist which will break with Linux >= 4.13.
>
> So I see this is going to break libvert and libguestfs. So people who
> are running existing distribution userspaces and then upgrade to 4.13
> will break.
>
> Hmm... I suppose we could add back support to let the kernel to use
> the i_blocks logic if the ea_inode feature is not enabled. E2fsck
> would still complain so we can try to gradually force userspace to do
> things "correctly", but at least we would be backwards compatible.
>
> Comments?

>From my point of view it's not too much trouble to recreate these
filesystems, and we've already proposed a fix for supermin so it
creates symlinks properly[1].

I think it might be a good idea to get e2fsck to complain about these
filesystems though. It'll at least tell you how widespread (or
otherwise) the problem might be.

Rich.

[1] https://www.redhat.com/archives/libguestfs/2017-July/msg00084.html

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines. Supports shell scripting,
bindings from many languages. http://libguestfs.org

2017-07-13 16:50:02

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Fast symlinks stored slow

On Thu, Jul 13, 2017 at 09:02:13AM +0100, Richard W.M. Jones wrote:
>
> From my point of view it's not too much trouble to recreate these
> filesystems, and we've already proposed a fix for supermin so it
> creates symlinks properly[1].

My concern is that people using libguestfs, et.al., on Fedura XX and
then try to decide to upgrade to the 4.13 kernel. So it sounds like
the exposure could be pretty large. Am I wrong?

Thanks,

- Ted

2017-07-13 17:13:37

by Richard W.M. Jones

[permalink] [raw]
Subject: Re: Fast symlinks stored slow

On Thu, Jul 13, 2017 at 12:49:59PM -0400, Theodore Ts'o wrote:
> On Thu, Jul 13, 2017 at 09:02:13AM +0100, Richard W.M. Jones wrote:
> >
> > From my point of view it's not too much trouble to recreate these
> > filesystems, and we've already proposed a fix for supermin so it
> > creates symlinks properly[1].
>
> My concern is that people using libguestfs, et.al., on Fedura XX and
> then try to decide to upgrade to the 4.13 kernel. So it sounds like
> the exposure could be pretty large. Am I wrong?

In this case we're using libext2fs to build an appliance filesystem,
used to boot a small Linux system which is then run under qemu by
libguestfs. This appliance is completely rebuilt automatically under
many circumstances, for example a host package upgrade (eg. upgrading
the kernel), so it's not a long-lived filesystem that would cause a
problem. Rebuilding only takes a few seconds.

The process is described in more detail here:
http://libguestfs.org/supermin.1.html#SUPERMIN-APPLIANCES

>From our point of view the only issue are some prebuilt appliances
which we have provided to other distributions that cannot / don't want
to use supermin (http://download.libguestfs.org/binaries/appliance/)
and at some point I'm going to have to rebuild these using the fixed
supermin.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top

2017-07-13 18:50:41

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Fast symlinks stored slow

On Thu, Jul 13, 2017 at 06:13:35PM +0100, Richard W.M. Jones wrote:
> In this case we're using libext2fs to build an appliance filesystem,
> used to boot a small Linux system which is then run under qemu by
> libguestfs. This appliance is completely rebuilt automatically under
> many circumstances, for example a host package upgrade (eg. upgrading
> the kernel), so it's not a long-lived filesystem that would cause a
> problem. Rebuilding only takes a few seconds.
>
> The process is described in more detail here:
> http://libguestfs.org/supermin.1.html#SUPERMIN-APPLIANCES
>
> From our point of view the only issue are some prebuilt appliances
> which we have provided to other distributions that cannot / don't want
> to use supermin (http://download.libguestfs.org/binaries/appliance/)
> and at some point I'm going to have to rebuild these using the fixed
> supermin.

OK, so the risk is if there are other people who are using supermin to
create appliances. (One potential use case we might need to
investigate are services such as SuSE Studio, since it can create
turnkey VM appliances for its users.) If these applianes are
distributed end users (as opposed to being automatically rebuilt as in
your use case), that's when we would potentially be at risk.

Did I get that right?

- Ted

2017-07-13 20:27:22

by Richard W.M. Jones

[permalink] [raw]
Subject: Re: Fast symlinks stored slow

On Thu, Jul 13, 2017 at 02:50:37PM -0400, Theodore Ts'o wrote:
> On Thu, Jul 13, 2017 at 06:13:35PM +0100, Richard W.M. Jones wrote:
> > In this case we're using libext2fs to build an appliance filesystem,
> > used to boot a small Linux system which is then run under qemu by
> > libguestfs. This appliance is completely rebuilt automatically under
> > many circumstances, for example a host package upgrade (eg. upgrading
> > the kernel), so it's not a long-lived filesystem that would cause a
> > problem. Rebuilding only takes a few seconds.
> >
> > The process is described in more detail here:
> > http://libguestfs.org/supermin.1.html#SUPERMIN-APPLIANCES
> >
> > From our point of view the only issue are some prebuilt appliances
> > which we have provided to other distributions that cannot / don't want
> > to use supermin (http://download.libguestfs.org/binaries/appliance/)
> > and at some point I'm going to have to rebuild these using the fixed
> > supermin.
>
> OK, so the risk is if there are other people who are using supermin to
> create appliances. (One potential use case we might need to
> investigate are services such as SuSE Studio, since it can create
> turnkey VM appliances for its users.) If these applianes are
> distributed end users (as opposed to being automatically rebuilt as in
> your use case), that's when we would potentially be at risk.

I can't speak about SuSE Studio, but supermin appliances aren't
distributed to end users, but get built on the fly on end user
machines.

I think you may be confusing supermin and libguestfs. Supermin is a
component we use to make libguestfs work, but it's not how libguestfs
makes new filesystems.

For example if you write:

$ guestfish -N disk.img=fs:ext4 -m /dev/sda1 touch /foo : ln-s /foo /bar
$ ll disk.img
-rw-rw-r--. 1 rjones rjones 104857600 Jul 13 21:26 disk.img

then the result is a new ext4 filesystem in a disk image, containing a
symlink. But it was created using the *kernel* + the symlink(2)
system call (not using libext2fs), and in all cases it was in the past
and will be in the future created correctly.

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines. Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top