2016-05-18 00:46:17

by Kamran Khan

[permalink] [raw]
Subject: Difference in jbd2 behavior between CentOS and Ubuntu while unmounting

I'm trying to understand the difference in jbd2 behavior across Ubuntu
14.04 and Centos 7.1. Will appreciate any help.

The uber goal is to resize the root filesystem without a reboot.
Basically, all the necessary files are copied to a tmpfs, a pivot_root
is performed then the old root is unmounted.

On Ubuntu 14.04, after the old processes are killed I verify that no
processes are holding handles to oldroot.

> root@kakhan-ubuntu:~# fuser -vm /oldroot
> USER PID ACCESS COMMAND
> /oldroot: root kernel mount /oldroot

jbd2 is still running:

> root@kakhan-ubuntu:~# lsof | grep sda1
> jbd2/sda1 176 root cwd DIR 0,19 340 2 /
> jbd2/sda1 176 root rtd DIR 0,19 340 2 /
> jbd2/sda1 176 root txt unknown /proc/176/exe
> root@kakhan-ubuntu:~# ps -f -p 176
> UID PID PPID C STIME TTY TIME CMD
> root 176 2 0 17:19 ? 00:00:00 [jbd2/sda1-8]

I can unmount the filesystem and do an fsck:

> root@kakhan-ubuntu:~# umount /oldroot
> root@kakhan-ubuntu:~# e2fsck -yf /dev/sda1
> ...
> /dev/sda1: 64967/1831424 files (0.1% non-contiguous), 480018/7323904 blocks

jbd2 does *not* hold a handle to the now unmounted filesystem:

> root@kakhan-ubuntu:~# lsof | grep sda1
> root@kakhan-ubuntu:~#

All good.

On CentOS 7.1, I verify that no processed are holding handle to oldroot.

> [root@kakhan-centos ~]# fuser -vm /oldroot
> USER PID ACCESS COMMAND
> /oldroot: root kernel mount /oldroot

I can successfully unmount the filesystem but can't fsck it:

> [root@kakhan-centos ~]# umount /oldroot
> [root@kakhan-centos ~]# e2fsck -yf /dev/sda1
> e2fsck 1.42.9 (28-Dec-2013)
> /dev/sda1 is in use.
> e2fsck: Cannot continue, aborting.

/dev/sda1 does not appear in /proc/mounts. Looks like jbd2 is the only
thing that looks like it still cares about sda1:

> [root@kakhan-centos ~]# lsof | grep sda1
> jbd2/sda1 394 root cwd DIR 0,14 340 22591 /
> jbd2/sda1 394 root rtd DIR 0,14 340 22591 /
> jbd2/sda1 394 root txt unknown /proc/394/exe
> [root@kakhan-centos ~]# ps -f -p 394
> UID PID PPID C STIME TTY TIME CMD
> root 394 2 0 00:15 ? 00:00:00 [jbd2/sda1-8]

What I'm confused about is, why is the behavior different even though
journaling is _enabled_ in *both* cases?

On Ubuntu:

> root@kakhan-ubuntu:~# dumpe2fs /dev/sda1 | grep features
> dumpe2fs 1.42.9 (4-Feb-2014)
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
> Journal features: journal_incompat_revoke

On CentOS:

> [root@kakhan-centos ~]# dumpe2fs /dev/sda1 | grep features
> dumpe2fs 1.42.9 (28-Dec-2013)
> Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
> Journal features: journal_incompat_revoke journal_64bit

Any ideas?

Thanks,
--
Kamran.

http://inspirated.com/


2016-05-18 01:33:08

by Kamran Khan

[permalink] [raw]
Subject: Re: Difference in jbd2 behavior between CentOS and Ubuntu while unmounting

Found the same issue in a thread from last year:

http://lists.openwall.net/linux-ext4/2015/02/18/6

Basically, it's got nothing to do with my root fs manipulations. The
problem is: after unmounting an ext4 filesystem jbd2 holds a lock to
it forever. It's impossible to do anything with the block device,
e.g., e2fsck fails because open fails with -EBUSY.

The thread became stale. I'll be glad to help with collection of any
additional information.

On Tue, May 17, 2016 at 5:46 PM, Kamran Khan <[email protected]> wrote:
> I'm trying to understand the difference in jbd2 behavior across Ubuntu
> 14.04 and Centos 7.1. Will appreciate any help.
>
> The uber goal is to resize the root filesystem without a reboot.
> Basically, all the necessary files are copied to a tmpfs, a pivot_root
> is performed then the old root is unmounted.
>
> On Ubuntu 14.04, after the old processes are killed I verify that no
> processes are holding handles to oldroot.
>
>> root@kakhan-ubuntu:~# fuser -vm /oldroot
>> USER PID ACCESS COMMAND
>> /oldroot: root kernel mount /oldroot
>
> jbd2 is still running:
>
>> root@kakhan-ubuntu:~# lsof | grep sda1
>> jbd2/sda1 176 root cwd DIR 0,19 340 2 /
>> jbd2/sda1 176 root rtd DIR 0,19 340 2 /
>> jbd2/sda1 176 root txt unknown /proc/176/exe
>> root@kakhan-ubuntu:~# ps -f -p 176
>> UID PID PPID C STIME TTY TIME CMD
>> root 176 2 0 17:19 ? 00:00:00 [jbd2/sda1-8]
>
> I can unmount the filesystem and do an fsck:
>
>> root@kakhan-ubuntu:~# umount /oldroot
>> root@kakhan-ubuntu:~# e2fsck -yf /dev/sda1
>> ...
>> /dev/sda1: 64967/1831424 files (0.1% non-contiguous), 480018/7323904 blocks
>
> jbd2 does *not* hold a handle to the now unmounted filesystem:
>
>> root@kakhan-ubuntu:~# lsof | grep sda1
>> root@kakhan-ubuntu:~#
>
> All good.
>
> On CentOS 7.1, I verify that no processed are holding handle to oldroot.
>
>> [root@kakhan-centos ~]# fuser -vm /oldroot
>> USER PID ACCESS COMMAND
>> /oldroot: root kernel mount /oldroot
>
> I can successfully unmount the filesystem but can't fsck it:
>
>> [root@kakhan-centos ~]# umount /oldroot
>> [root@kakhan-centos ~]# e2fsck -yf /dev/sda1
>> e2fsck 1.42.9 (28-Dec-2013)
>> /dev/sda1 is in use.
>> e2fsck: Cannot continue, aborting.
>
> /dev/sda1 does not appear in /proc/mounts. Looks like jbd2 is the only
> thing that looks like it still cares about sda1:
>
>> [root@kakhan-centos ~]# lsof | grep sda1
>> jbd2/sda1 394 root cwd DIR 0,14 340 22591 /
>> jbd2/sda1 394 root rtd DIR 0,14 340 22591 /
>> jbd2/sda1 394 root txt unknown /proc/394/exe
>> [root@kakhan-centos ~]# ps -f -p 394
>> UID PID PPID C STIME TTY TIME CMD
>> root 394 2 0 00:15 ? 00:00:00 [jbd2/sda1-8]
>
> What I'm confused about is, why is the behavior different even though
> journaling is _enabled_ in *both* cases?
>
> On Ubuntu:
>
>> root@kakhan-ubuntu:~# dumpe2fs /dev/sda1 | grep features
>> dumpe2fs 1.42.9 (4-Feb-2014)
>> Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
>> Journal features: journal_incompat_revoke
>
> On CentOS:
>
>> [root@kakhan-centos ~]# dumpe2fs /dev/sda1 | grep features
>> dumpe2fs 1.42.9 (28-Dec-2013)
>> Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
>> Journal features: journal_incompat_revoke journal_64bit
>
> Any ideas?
>
> Thanks,
> --
> Kamran.
>
> http://inspirated.com/



--
Kamran.

http://inspirated.com/

2016-05-18 02:14:23

by Kamran Khan

[permalink] [raw]
Subject: Re: Difference in jbd2 behavior between CentOS and Ubuntu while unmounting

So it seems like the jbd2 locking bug was fixed in the kernel
somewhere between versions 3.10 and 4.2 :) .

Would be great if someone can recall a bug/commit # for the fix.

On Tue, May 17, 2016 at 6:33 PM, Kamran Khan <[email protected]> wrote:
> Found the same issue in a thread from last year:
>
> http://lists.openwall.net/linux-ext4/2015/02/18/6
>
> Basically, it's got nothing to do with my root fs manipulations. The
> problem is: after unmounting an ext4 filesystem jbd2 holds a lock to
> it forever. It's impossible to do anything with the block device,
> e.g., e2fsck fails because open fails with -EBUSY.
>
> The thread became stale. I'll be glad to help with collection of any
> additional information.
>
> On Tue, May 17, 2016 at 5:46 PM, Kamran Khan <[email protected]> wrote:
>> I'm trying to understand the difference in jbd2 behavior across Ubuntu
>> 14.04 and Centos 7.1. Will appreciate any help.
>>
>> The uber goal is to resize the root filesystem without a reboot.
>> Basically, all the necessary files are copied to a tmpfs, a pivot_root
>> is performed then the old root is unmounted.
>>
>> On Ubuntu 14.04, after the old processes are killed I verify that no
>> processes are holding handles to oldroot.
>>
>>> root@kakhan-ubuntu:~# fuser -vm /oldroot
>>> USER PID ACCESS COMMAND
>>> /oldroot: root kernel mount /oldroot
>>
>> jbd2 is still running:
>>
>>> root@kakhan-ubuntu:~# lsof | grep sda1
>>> jbd2/sda1 176 root cwd DIR 0,19 340 2 /
>>> jbd2/sda1 176 root rtd DIR 0,19 340 2 /
>>> jbd2/sda1 176 root txt unknown /proc/176/exe
>>> root@kakhan-ubuntu:~# ps -f -p 176
>>> UID PID PPID C STIME TTY TIME CMD
>>> root 176 2 0 17:19 ? 00:00:00 [jbd2/sda1-8]
>>
>> I can unmount the filesystem and do an fsck:
>>
>>> root@kakhan-ubuntu:~# umount /oldroot
>>> root@kakhan-ubuntu:~# e2fsck -yf /dev/sda1
>>> ...
>>> /dev/sda1: 64967/1831424 files (0.1% non-contiguous), 480018/7323904 blocks
>>
>> jbd2 does *not* hold a handle to the now unmounted filesystem:
>>
>>> root@kakhan-ubuntu:~# lsof | grep sda1
>>> root@kakhan-ubuntu:~#
>>
>> All good.
>>
>> On CentOS 7.1, I verify that no processed are holding handle to oldroot.
>>
>>> [root@kakhan-centos ~]# fuser -vm /oldroot
>>> USER PID ACCESS COMMAND
>>> /oldroot: root kernel mount /oldroot
>>
>> I can successfully unmount the filesystem but can't fsck it:
>>
>>> [root@kakhan-centos ~]# umount /oldroot
>>> [root@kakhan-centos ~]# e2fsck -yf /dev/sda1
>>> e2fsck 1.42.9 (28-Dec-2013)
>>> /dev/sda1 is in use.
>>> e2fsck: Cannot continue, aborting.
>>
>> /dev/sda1 does not appear in /proc/mounts. Looks like jbd2 is the only
>> thing that looks like it still cares about sda1:
>>
>>> [root@kakhan-centos ~]# lsof | grep sda1
>>> jbd2/sda1 394 root cwd DIR 0,14 340 22591 /
>>> jbd2/sda1 394 root rtd DIR 0,14 340 22591 /
>>> jbd2/sda1 394 root txt unknown /proc/394/exe
>>> [root@kakhan-centos ~]# ps -f -p 394
>>> UID PID PPID C STIME TTY TIME CMD
>>> root 394 2 0 00:15 ? 00:00:00 [jbd2/sda1-8]
>>
>> What I'm confused about is, why is the behavior different even though
>> journaling is _enabled_ in *both* cases?
>>
>> On Ubuntu:
>>
>>> root@kakhan-ubuntu:~# dumpe2fs /dev/sda1 | grep features
>>> dumpe2fs 1.42.9 (4-Feb-2014)
>>> Filesystem features: has_journal ext_attr resize_inode dir_index filetype extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
>>> Journal features: journal_incompat_revoke
>>
>> On CentOS:
>>
>>> [root@kakhan-centos ~]# dumpe2fs /dev/sda1 | grep features
>>> dumpe2fs 1.42.9 (28-Dec-2013)
>>> Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery extent 64bit flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize
>>> Journal features: journal_incompat_revoke journal_64bit
>>
>> Any ideas?
>>
>> Thanks,
>> --
>> Kamran.
>>
>> http://inspirated.com/
>
>
>
> --
> Kamran.
>
> http://inspirated.com/



--
Kamran.

http://inspirated.com/

2016-05-18 02:17:22

by Eric Sandeen

[permalink] [raw]
Subject: Re: Difference in jbd2 behavior between CentOS and Ubuntu while unmounting

On 5/17/16 7:46 PM, Kamran Khan wrote:
> I'm trying to understand the difference in jbd2 behavior across Ubuntu
> 14.04 and Centos 7.1. Will appreciate any help.

For starters, what kernel versions are those? (I know what centos
is, "3.10.0" with updates, which I can check out, but I have no idea
what might be in the Ubuntu distro)

-Eric

2016-05-18 02:35:08

by Kamran Khan

[permalink] [raw]
Subject: Re: Difference in jbd2 behavior between CentOS and Ubuntu while unmounting

Yup, the kernel versions are drastically different. It's 3.10 vs 4.2.

The problem on 3.10 though is, while jbd2 holds on to the unmounted
device I cannot even rmmod jbd2 or ext4 *even if no other ext
filesystems are mounted*. That lock makes it all but impossible to do
anything with the block device.

On Tue, May 17, 2016 at 7:17 PM, Eric Sandeen <[email protected]> wrote:
> On 5/17/16 7:46 PM, Kamran Khan wrote:
>> I'm trying to understand the difference in jbd2 behavior across Ubuntu
>> 14.04 and Centos 7.1. Will appreciate any help.
>
> For starters, what kernel versions are those? (I know what centos
> is, "3.10.0" with updates, which I can check out, but I have no idea
> what might be in the Ubuntu distro)
>
> -Eric



--
Kamran.

http://inspirated.com/

2016-05-18 02:41:10

by Eric Sandeen

[permalink] [raw]
Subject: Re: Difference in jbd2 behavior between CentOS and Ubuntu while unmounting

On 5/17/16 9:35 PM, Kamran Khan wrote:
> Yup, the kernel versions are drastically different. It's 3.10 vs 4.2.
>
> The problem on 3.10 though is, while jbd2 holds on to the unmounted
> device I cannot even rmmod jbd2 or ext4 *even if no other ext
> filesystems are mounted*. That lock makes it all but impossible to do
> anything with the block device.

You may as well try Centos7.2, at least, there are 50+ updates to
jbd2 & ext4 since 7.1. If it still persists we can dig further.

-Eric

> On Tue, May 17, 2016 at 7:17 PM, Eric Sandeen <[email protected]> wrote:
>> On 5/17/16 7:46 PM, Kamran Khan wrote:
>>> I'm trying to understand the difference in jbd2 behavior across Ubuntu
>>> 14.04 and Centos 7.1. Will appreciate any help.
>>
>> For starters, what kernel versions are those? (I know what centos
>> is, "3.10.0" with updates, which I can check out, but I have no idea
>> what might be in the Ubuntu distro)
>>
>> -Eric
>
>
>


2016-05-18 02:55:15

by Kamran Khan

[permalink] [raw]
Subject: Re: Difference in jbd2 behavior between CentOS and Ubuntu while unmounting

On Tue, May 17, 2016 at 7:41 PM, Eric Sandeen <[email protected]> wrote:
> You may as well try Centos7.2, at least, there are 50+ updates to
> jbd2 & ext4 since 7.1. If it still persists we can dig further.

I am already on CentOS 7.2 with the latest kernel:

> [root@kakhan-centos ~]# cat /etc/centos-release
> CentOS Linux release 7.2.1511 (Core)
> [root@kakhan-centos ~]# uname -r
> 3.10.0-327.18.2.el7.x86_64

>> On Tue, May 17, 2016 at 7:17 PM, Eric Sandeen <[email protected]> wrote:
>>> On 5/17/16 7:46 PM, Kamran Khan wrote:
>>>> I'm trying to understand the difference in jbd2 behavior across Ubuntu
>>>> 14.04 and Centos 7.1. Will appreciate any help.
>>>
>>> For starters, what kernel versions are those? (I know what centos
>>> is, "3.10.0" with updates, which I can check out, but I have no idea
>>> what might be in the Ubuntu distro)
>>>
>>> -Eric
--
Kamran.

http://inspirated.com/

2016-05-18 03:16:25

by Eric Sandeen

[permalink] [raw]
Subject: Re: Difference in jbd2 behavior between CentOS and Ubuntu while unmounting

On 5/17/16 9:55 PM, Kamran Khan wrote:
> On Tue, May 17, 2016 at 7:41 PM, Eric Sandeen <[email protected]> wrote:
>> You may as well try Centos7.2, at least, there are 50+ updates to
>> jbd2 & ext4 since 7.1. If it still persists we can dig further.
>
> I am already on CentOS 7.2 with the latest kernel:
>
>> [root@kakhan-centos ~]# cat /etc/centos-release
>> CentOS Linux release 7.2.1511 (Core)
>> [root@kakhan-centos ~]# uname -r
>> 3.10.0-327.18.2.el7.x86_64

Sigh, ok, your first post said 7.1...

> I'm trying to understand the difference in jbd2 behavior across Ubuntu
> 14.04 and Centos 7.1. Will appreciate any help.

So which one did you actually test?

-Eric

2016-05-18 03:22:48

by Kamran Khan

[permalink] [raw]
Subject: Re: Difference in jbd2 behavior between CentOS and Ubuntu while unmounting

On Tue, May 17, 2016 at 8:16 PM, Eric Sandeen <[email protected]> wrote:
> Sigh, ok, your first post said 7.1...
> So which one did you actually test?

That was a stupid typo, I am sorry. I have tested only on 7.2.
--
Kamran.

http://inspirated.com/

2016-05-18 06:18:41

by Darrick J. Wong

[permalink] [raw]
Subject: Re: Difference in jbd2 behavior between CentOS and Ubuntu while unmounting

On Tue, May 17, 2016 at 09:41:07PM -0500, Eric Sandeen wrote:
> On 5/17/16 9:35 PM, Kamran Khan wrote:
> > Yup, the kernel versions are drastically different. It's 3.10 vs 4.2.
> >
> > The problem on 3.10 though is, while jbd2 holds on to the unmounted
> > device I cannot even rmmod jbd2 or ext4 *even if no other ext
> > filesystems are mounted*. That lock makes it all but impossible to do
> > anything with the block device.
>
> You may as well try Centos7.2, at least, there are 50+ updates to
> jbd2 & ext4 since 7.1. If it still persists we can dig further.

Just to throw some gasoline on this fire, I hit the same set of symptoms
a couple of weeks ago while trying to umount /home on 4.5.0 + Ubuntu 16.04.
Ted mused that it could be some process running in a funny mount namespace.
Or systemd dragons. Or something.

<shrug> I rebooted and haven't seen it since, tho...

--D

>
> -Eric
>
> > On Tue, May 17, 2016 at 7:17 PM, Eric Sandeen <[email protected]> wrote:
> >> On 5/17/16 7:46 PM, Kamran Khan wrote:
> >>> I'm trying to understand the difference in jbd2 behavior across Ubuntu
> >>> 14.04 and Centos 7.1. Will appreciate any help.
> >>
> >> For starters, what kernel versions are those? (I know what centos
> >> is, "3.10.0" with updates, which I can check out, but I have no idea
> >> what might be in the Ubuntu distro)
> >>
> >> -Eric
> >
> >
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2016-05-18 14:12:50

by Theodore Ts'o

[permalink] [raw]
Subject: Re: Difference in jbd2 behavior between CentOS and Ubuntu while unmounting

On Tue, May 17, 2016 at 11:18:34PM -0700, Darrick J. Wong wrote:
> Just to throw some gasoline on this fire, I hit the same set of symptoms
> a couple of weeks ago while trying to umount /home on 4.5.0 + Ubuntu 16.04.
> Ted mused that it could be some process running in a funny mount namespace.
> Or systemd dragons. Or something.

Try the following to see if someone process is playing namespace games

find /proc -name mounts | xargs grep /dev/sda3

(replace /dev/sda3 with the device that you think is unmounted).

When you find the process, kill it. (Or try doing a service XXX
restart assuming that the device has been unmounted in the "normal"
mount namespace.)

Another simple thing to try doing is to go into single user mode,
which will kill off all/most of your userspace processes, which may
also have the desired effect. Much more of a blunt hammer, though.

- Ted

2016-05-18 19:01:44

by Kamran Khan

[permalink] [raw]
Subject: Re: Difference in jbd2 behavior between CentOS and Ubuntu while unmounting

On Wed, May 18, 2016 at 7:12 AM, Theodore Ts'o <[email protected]> wrote:
> Try the following to see if someone process is playing namespace games
>
> find /proc -name mounts | xargs grep /dev/sda3
>
> (replace /dev/sda3 with the device that you think is unmounted).
>
> When you find the process, kill it. (Or try doing a service XXX
> restart assuming that the device has been unmounted in the "normal"
> mount namespace.)

That did the trick! systemd-udevd was the culprit. Somehow it wasn't
appearing in lsof/fuser outputs but had the block device listed in its
mounts.

Killed it and the block device was released for further operations.

Thanks!!
--
Kamran.

http://inspirated.com/