2010-05-04 20:37:01

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] New: serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906

Summary: serious performance regression in "umount" on ext4
over LVM
Product: File System
Version: 2.5
Kernel Version: 2.6.33
Platform: All
OS/Version: Linux
Tree: Mainline
Status: NEW
Severity: normal
Priority: P1
Component: ext4
AssignedTo: [email protected]
ReportedBy: [email protected]
Regression: Yes


As root, if /tmp is on an LV (or possibly even without this):

cd /tmp
dd if=/dev/zero of=test.ext4 bs=1 count=1 seek=1G
mkfs.ext4 -F test.ext4
mkdir -p /mnt/test
mount -o loop test.ext4 /mnt/test
echo $(seq 65536) | (cd /mnt/test; xargs touch)
time umount /mnt/test

Prior to 2.6.32, this took a second or so. Now it takes over a minute.

Reproduced in both Ubuntu 10.04 (2.6.32) and Fedora 12 (2.6.33).
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/543617

The current Ubuntu work-around is only useful on an otherwise idle system.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.


2010-05-04 20:37:33

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #1 from Kees Cook <[email protected]> 2010-05-04 20:37:32 ---
Sorry, I meant Fedora 13 (Beta).

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-04 20:41:56

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #2 from Kees Cook <[email protected]> 2010-05-04 20:41:55 ---
https://bugzilla.redhat.com/show_bug.cgi?id=588930

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-04 20:52:48

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #3 from Kees Cook <[email protected]> 2010-05-04 20:52:46 ---
As an interesting oddity, creating an ext3 filesystem, but then attempting to
mount it as ext4 shows the same pathological behavior. Mounting as ext3, it's
fine again.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-04 21:05:27

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906


Eric Sandeen <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #4 from Eric Sandeen <[email protected]> 2010-05-04 21:05:24 ---
As I mentioned on the RH bz, it seems barrier-related.

I thought most barrier enhancements/changes were before 2.6.31, though. Was
this really fast for you on .31? Lots of lvm barrier changes went into .30.

mounting either the host fs, the loopback file, or both with "nobarrier" avoids
all the cache flushes from the barriers, and it's speedy again.

-Eric

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-04 21:11:06

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #5 from Kees Cook <[email protected]> 2010-05-04 21:11:03 ---
Hm, this was really a reduced test-case, as my initial situations where I
noticed this was umounting LVM snapshots, then later umounting aufs unions (in
an effort to get away from 3x slow-down of snapshot IO).

Doing this is just as slow as loopback for me:

VG=systemvg
lvcreate -n testlv -L1G $VG
mkfs.ext4 -F /dev/$VG/testlv
mkdir -p /mnt/test
mount -o loop /dev/$VG/testlv /mnt/test
seq 65536 | (cd /mnt/test; xargs touch)
time umount /mnt/test


Should 68db1961bbf4e16c220ccec4a780e966bc1fece3 be expected to fix it too?

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-04 21:12:53

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #6 from Kees Cook <[email protected]> 2010-05-04 21:12:51 ---
Hm, it seemed to be progressively worse since 2.6.28 (when it was okay). It
was not good in 2.6.31, and bad in 2.6.32. So I don't have data for 2.6.29 or
.30.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-04 21:34:42

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906


keith mannthey <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #7 from keith mannthey <[email protected]> 2010-05-04 21:34:38 ---
Eric is spot on.

At 2.6.30 and dm/lvm started supporting barriers, this can cause serious
performance issues depending on your configuration.

Try mounting with the " -o nobarrier " option to see if your performance is
restored.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-04 21:48:31

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #8 from Eric Sandeen <[email protected]> 2010-05-04 21:48:27 ---
Kees, no, 68db1961bbf4e16c220ccec4a780e966bc1fece3 made the test on loopback
slower, because now a barrier from a filesystem on a loopback device issues an
fsync on the underlying filesystem which issues a cache flush on the underlying
block device .... you get the picture.

Unless we really have gratuitous barrier issues here, there's probably not a
whole lot to be done, though maybe some batching is possible.

As for the business about ext3, there is nothing mysterious or odd there
either, ext3 mounts with barriers off by default, despite my best efforts, and
mounting it with the ext4 driver does turn them on.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-05 00:28:48

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #9 from Kees Cook <[email protected]> 2010-05-05 00:28:42 ---
Well, it seems to me that there is something wrong somewhere in here since
"sync; umount" takes seconds and "umount" alone takes minutes. I can, however,
confirm that adding "-o nobarrier" causes the symptom to vanish.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-05 03:18:08

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906


Theodore Tso <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #10 from Theodore Tso <[email protected]> 2010-05-05 03:18:02 ---
To be fair, there does seem to be a real problem here. See my comment here:

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/543617/comments/12

It looks like if there are dirty inodes, whatever writeback path is used by
umount(8) is much more inefficient than that used by sync(8). Specifically it
seems to be doing a journal commit between each dirty inode which is flushed
out.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-05 04:00:09

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #11 from Eric Sandeen <[email protected]> 2010-05-05 04:00:04 ---
Yep, fair enough, and on closer inspection its doing way more IO than expected,
wrapping the journal 4 times. :)

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-05 04:46:18

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #12 from Eric Sandeen <[email protected]> 2010-05-05 04:46:16 ---

so basically:

sync_inodes_sb
bdi_sync_writeback (sync_mode = WB_SYNC_ALL)
...

writeback_inodes_wb
writeback_single_inode
ext4_write_inode
if (WB_SYNC_ALL)
ext4_force_commit <--- ouch.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-05 07:28:18

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906


Dmitry Monakhov <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #13 from Dmitry Monakhov <[email protected]> 2010-05-05 07:28:10 ---
Yep. i've already know that issue. In fact it was broken by followng commit

>From 03ba3782e8dcc5b0e1efe440d33084f066e38cae Mon Sep 17 00:00:00 2001
From: Jens Axboe <[email protected]>
Date: Wed, 9 Sep 2009 09:08:54 +0200
Subject: [PATCH] writeback: switch to per-bdi threads for flushing data

The problem with __sync_filesystem(0) is no longer works on umount
because sb can not be pined s_mount sem is downed for write and s_root is NULL.

And in fact ext3 is also broken in case of "-obarrier=1"
The patch attached fix the original regression, but there is one more issue
left

A delalloc option. In fact dirty inode is still dirty even after first
call of writeback_single_inode which is called from __sync_filesystem(0)
due to delalloc allocation happen during inode write. So it takes second
__sync_filesystem call to clear dirty flags. Currently i'm working on that
issue. I hope i'll post a solution today.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-05 07:30:06

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #14 from Dmitry Monakhov <[email protected]> 2010-05-05 07:30:04 ---
Created an attachment (id=26224)
--> (https://bugzilla.kernel.org/attachment.cgi?id=26224)
Patch attached fix original regression

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-05 08:27:45

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

[email protected] writes:

Hi Jens,

Just FYI, we have found a regression which was caused by your famous
writeback patch 03ba3782e8dcc5b0e1efe440d33084f066e38cae
I'm not allowed to add you to CC in BZ, that's why i wrote this mail.

Before the patch __sync_filesystem() called writeback_single_inode()
directly, but now it is called indirectly (from flush-X:X task)
which require a super_block in question to be pinned.
But this is impossible to pin this SB on umount because we already
hold s_umount sem for write, so effectively we already pinned that SB.
So my proposal is to treat umount similar to WB_SYNC_ALL, and skip
pining stage.

> https://bugzilla.kernel.org/show_bug.cgi?id=15906
>
>
> Dmitry Monakhov <[email protected]> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> CC| |[email protected]
>
>
>
>
> --- Comment #13 from Dmitry Monakhov <[email protected]> 2010-05-05 07:28:10 ---
> Yep. i've already know that issue. In fact it was broken by followng commit
>
> From 03ba3782e8dcc5b0e1efe440d33084f066e38cae Mon Sep 17 00:00:00 2001
> From: Jens Axboe <[email protected]>
> Date: Wed, 9 Sep 2009 09:08:54 +0200
> Subject: [PATCH] writeback: switch to per-bdi threads for flushing data
>
> The problem with __sync_filesystem(0) is no longer works on umount
> because sb can not be pined s_mount sem is downed for write and s_root is NULL.
>
> And in fact ext3 is also broken in case of "-obarrier=1"
> The patch attached fix the original regression, but there is one more issue
> left
>
> A delalloc option. In fact dirty inode is still dirty even after first
> call of writeback_single_inode which is called from __sync_filesystem(0)
> due to delalloc allocation happen during inode write. So it takes second
> __sync_filesystem call to clear dirty flags. Currently i'm working on that
> issue. I hope i'll post a solution today.

2010-05-05 08:40:58

by Jens Axboe

[permalink] [raw]
Subject: Re: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

On Wed, May 05 2010, Dmitry Monakhov wrote:
> [email protected] writes:
>
> Hi Jens,
>
> Just FYI, we have found a regression which was caused by your famous
> writeback patch 03ba3782e8dcc5b0e1efe440d33084f066e38cae
> I'm not allowed to add you to CC in BZ, that's why i wrote this mail.

You need to use [email protected] as that is what I use there.

> Before the patch __sync_filesystem() called writeback_single_inode()
> directly, but now it is called indirectly (from flush-X:X task)
> which require a super_block in question to be pinned.
> But this is impossible to pin this SB on umount because we already
> hold s_umount sem for write, so effectively we already pinned that SB.
> So my proposal is to treat umount similar to WB_SYNC_ALL, and skip
> pining stage.

Hmm I see, yes that is a bug. How about adding a WB_SYNC_NONE_PINNED or
something to that effect, which acts like WB_SYNC_NONE but the caller is
required to hold the s_umount already (like WB_SYNC_ALL).

Something like the below. It compiles, but not tested.

diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 4b37f7c..4327465 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -245,19 +245,20 @@ static void bdi_sync_writeback(struct backing_dev_info *bdi,
* @bdi: the backing device to write from
* @sb: write inodes from this super_block
* @nr_pages: the number of pages to write
+ * @sb_locked: caller already holds sb umount sem.
*
* Description:
* This does WB_SYNC_NONE opportunistic writeback. The IO is only
* started when this function returns, we make no guarentees on
- * completion. Caller need not hold sb s_umount semaphore.
+ * completion. Caller specifies whether sb umount sem is held already or not.
*
*/
void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
- long nr_pages)
+ long nr_pages, int sb_locked)
{
struct wb_writeback_args args = {
.sb = sb,
- .sync_mode = WB_SYNC_NONE,
+ .sync_mode = sb_locked ? WB_SYNC_NONE_PIN : WB_SYNC_NONE,
.nr_pages = nr_pages,
.range_cyclic = 1,
};
@@ -577,7 +578,8 @@ static enum sb_pin_state pin_sb_for_writeback(struct writeback_control *wbc,
/*
* Caller must already hold the ref for this
*/
- if (wbc->sync_mode == WB_SYNC_ALL) {
+ if (wbc->sync_mode == WB_SYNC_ALL ||
+ wbc->sync_mode == WB_SYNC_NONE_PIN) {
WARN_ON(!rwsem_is_locked(&sb->s_umount));
return SB_NOT_PINNED;
}
@@ -1183,6 +1185,18 @@ static void wait_sb_inodes(struct super_block *sb)
iput(old_inode);
}

+static void __writeback_inodes_sb(struct super_block *sb, int sb_locked)
+{
+ unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
+ unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
+ long nr_to_write;
+
+ nr_to_write = nr_dirty + nr_unstable +
+ (inodes_stat.nr_inodes - inodes_stat.nr_unused);
+
+ bdi_start_writeback(sb->s_bdi, sb, nr_to_write, sb_locked);
+}
+
/**
* writeback_inodes_sb - writeback dirty inodes from given super_block
* @sb: the superblock
@@ -1194,18 +1208,23 @@ static void wait_sb_inodes(struct super_block *sb)
*/
void writeback_inodes_sb(struct super_block *sb)
{
- unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
- unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
- long nr_to_write;
-
- nr_to_write = nr_dirty + nr_unstable +
- (inodes_stat.nr_inodes - inodes_stat.nr_unused);
-
- bdi_start_writeback(sb->s_bdi, sb, nr_to_write);
+ __writeback_inodes_sb(sb, 0);
}
EXPORT_SYMBOL(writeback_inodes_sb);

/**
+ * writeback_inodes_sb_locked - writeback dirty inodes from given super_block
+ * @sb: the superblock
+ *
+ * Like writeback_inodes_sb(), except the caller already holds the
+ * sb umount sem.
+ */
+void writeback_inodes_sb_locked(struct super_block *sb)
+{
+ __writeback_inodes_sb(sb, 1);
+}
+
+/**
* writeback_inodes_sb_if_idle - start writeback if none underway
* @sb: the superblock
*
diff --git a/fs/sync.c b/fs/sync.c
index 92b2281..de6a441 100644
--- a/fs/sync.c
+++ b/fs/sync.c
@@ -42,7 +42,7 @@ static int __sync_filesystem(struct super_block *sb, int wait)
if (wait)
sync_inodes_sb(sb);
else
- writeback_inodes_sb(sb);
+ writeback_inodes_sb_locked(sb);

if (sb->s_op->sync_fs)
sb->s_op->sync_fs(sb, wait);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index bd0e3c6..90e677a 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -103,7 +103,7 @@ int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
void bdi_unregister(struct backing_dev_info *bdi);
int bdi_setup_and_register(struct backing_dev_info *, char *, unsigned int);
void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
- long nr_pages);
+ long nr_pages, int sb_locked);
int bdi_writeback_task(struct bdi_writeback *wb);
int bdi_has_dirty_io(struct backing_dev_info *bdi);

diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 36520de..3cd39b0 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -19,6 +19,7 @@ extern struct list_head inode_unused;
enum writeback_sync_modes {
WB_SYNC_NONE, /* Don't wait on anything */
WB_SYNC_ALL, /* Wait on every mapping */
+ WB_SYNC_NONE_PIN, /* Like WB_SYNC_NONE, but s_umount held */
};

/*
@@ -73,6 +74,7 @@ struct writeback_control {
struct bdi_writeback;
int inode_wait(void *);
void writeback_inodes_sb(struct super_block *);
+void writeback_inodes_sb_locked(struct super_block *);
int writeback_inodes_sb_if_idle(struct super_block *);
void sync_inodes_sb(struct super_block *);
void writeback_inodes_wbc(struct writeback_control *wbc);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 0b19943..49d3508 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -597,7 +597,7 @@ static void balance_dirty_pages(struct address_space *mapping,
(!laptop_mode && ((global_page_state(NR_FILE_DIRTY)
+ global_page_state(NR_UNSTABLE_NFS))
> background_thresh)))
- bdi_start_writeback(bdi, NULL, 0);
+ bdi_start_writeback(bdi, NULL, 0, 0);
}

void set_page_dirty_balance(struct page *page, int page_mkwrite)

--
Jens Axboe


2010-05-05 09:06:50

by Dmitry Monakhov

[permalink] [raw]
Subject: Re: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

Jens Axboe <[email protected]> writes:

> On Wed, May 05 2010, Dmitry Monakhov wrote:
>> [email protected] writes:
>>
>> Hi Jens,
>>
>> Just FYI, we have found a regression which was caused by your famous
>> writeback patch 03ba3782e8dcc5b0e1efe440d33084f066e38cae
>> I'm not allowed to add you to CC in BZ, that's why i wrote this mail.
>
> You need to use [email protected] as that is what I use there.
>
>> Before the patch __sync_filesystem() called writeback_single_inode()
>> directly, but now it is called indirectly (from flush-X:X task)
>> which require a super_block in question to be pinned.
>> But this is impossible to pin this SB on umount because we already
>> hold s_umount sem for write, so effectively we already pinned that SB.
>> So my proposal is to treat umount similar to WB_SYNC_ALL, and skip
>> pining stage.
>
> Hmm I see, yes that is a bug. How about adding a WB_SYNC_NONE_PINNED or
> something to that effect, which acts like WB_SYNC_NONE but the caller is
> required to hold the s_umount already (like WB_SYNC_ALL).
Yes, the core idea definitely looks better. Caller must know in what
conditions it submit work to flush task and umount is just one of such
'special' situations. BTW do we have writeback-subsystem maintainer who
is responsible to take care of that patch?
>
> Something like the below. It compiles, but not tested.
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index 4b37f7c..4327465 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -245,19 +245,20 @@ static void bdi_sync_writeback(struct backing_dev_info *bdi,
> * @bdi: the backing device to write from
> * @sb: write inodes from this super_block
> * @nr_pages: the number of pages to write
> + * @sb_locked: caller already holds sb umount sem.
> *
> * Description:
> * This does WB_SYNC_NONE opportunistic writeback. The IO is only
> * started when this function returns, we make no guarentees on
> - * completion. Caller need not hold sb s_umount semaphore.
> + * completion. Caller specifies whether sb umount sem is held already or not.
> *
> */
> void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
> - long nr_pages)
> + long nr_pages, int sb_locked)
> {
> struct wb_writeback_args args = {
> .sb = sb,
> - .sync_mode = WB_SYNC_NONE,
> + .sync_mode = sb_locked ? WB_SYNC_NONE_PIN : WB_SYNC_NONE,
> .nr_pages = nr_pages,
> .range_cyclic = 1,
> };
> @@ -577,7 +578,8 @@ static enum sb_pin_state pin_sb_for_writeback(struct writeback_control *wbc,
> /*
> * Caller must already hold the ref for this
> */
> - if (wbc->sync_mode == WB_SYNC_ALL) {
> + if (wbc->sync_mode == WB_SYNC_ALL ||
> + wbc->sync_mode == WB_SYNC_NONE_PIN) {
> WARN_ON(!rwsem_is_locked(&sb->s_umount));
> return SB_NOT_PINNED;
> }
> @@ -1183,6 +1185,18 @@ static void wait_sb_inodes(struct super_block *sb)
> iput(old_inode);
> }
>
> +static void __writeback_inodes_sb(struct super_block *sb, int sb_locked)
> +{
> + unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
> + unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
> + long nr_to_write;
> +
> + nr_to_write = nr_dirty + nr_unstable +
> + (inodes_stat.nr_inodes - inodes_stat.nr_unused);
> +
> + bdi_start_writeback(sb->s_bdi, sb, nr_to_write, sb_locked);
> +}
> +
> /**
> * writeback_inodes_sb - writeback dirty inodes from given super_block
> * @sb: the superblock
> @@ -1194,18 +1208,23 @@ static void wait_sb_inodes(struct super_block *sb)
> */
> void writeback_inodes_sb(struct super_block *sb)
> {
> - unsigned long nr_dirty = global_page_state(NR_FILE_DIRTY);
> - unsigned long nr_unstable = global_page_state(NR_UNSTABLE_NFS);
> - long nr_to_write;
> -
> - nr_to_write = nr_dirty + nr_unstable +
> - (inodes_stat.nr_inodes - inodes_stat.nr_unused);
> -
> - bdi_start_writeback(sb->s_bdi, sb, nr_to_write);
> + __writeback_inodes_sb(sb, 0);
> }
> EXPORT_SYMBOL(writeback_inodes_sb);
>
> /**
> + * writeback_inodes_sb_locked - writeback dirty inodes from given super_block
> + * @sb: the superblock
> + *
> + * Like writeback_inodes_sb(), except the caller already holds the
> + * sb umount sem.
> + */
> +void writeback_inodes_sb_locked(struct super_block *sb)
> +{
> + __writeback_inodes_sb(sb, 1);
> +}
> +
> +/**
> * writeback_inodes_sb_if_idle - start writeback if none underway
> * @sb: the superblock
> *
> diff --git a/fs/sync.c b/fs/sync.c
> index 92b2281..de6a441 100644
> --- a/fs/sync.c
> +++ b/fs/sync.c
> @@ -42,7 +42,7 @@ static int __sync_filesystem(struct super_block *sb, int wait)
> if (wait)
> sync_inodes_sb(sb);
> else
> - writeback_inodes_sb(sb);
> + writeback_inodes_sb_locked(sb);
>
> if (sb->s_op->sync_fs)
> sb->s_op->sync_fs(sb, wait);
> diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
> index bd0e3c6..90e677a 100644
> --- a/include/linux/backing-dev.h
> +++ b/include/linux/backing-dev.h
> @@ -103,7 +103,7 @@ int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
> void bdi_unregister(struct backing_dev_info *bdi);
> int bdi_setup_and_register(struct backing_dev_info *, char *, unsigned int);
> void bdi_start_writeback(struct backing_dev_info *bdi, struct super_block *sb,
> - long nr_pages);
> + long nr_pages, int sb_locked);
> int bdi_writeback_task(struct bdi_writeback *wb);
> int bdi_has_dirty_io(struct backing_dev_info *bdi);
>
> diff --git a/include/linux/writeback.h b/include/linux/writeback.h
> index 36520de..3cd39b0 100644
> --- a/include/linux/writeback.h
> +++ b/include/linux/writeback.h
> @@ -19,6 +19,7 @@ extern struct list_head inode_unused;
> enum writeback_sync_modes {
> WB_SYNC_NONE, /* Don't wait on anything */
> WB_SYNC_ALL, /* Wait on every mapping */
> + WB_SYNC_NONE_PIN, /* Like WB_SYNC_NONE, but s_umount held */
> };
>
> /*
> @@ -73,6 +74,7 @@ struct writeback_control {
> struct bdi_writeback;
> int inode_wait(void *);
> void writeback_inodes_sb(struct super_block *);
> +void writeback_inodes_sb_locked(struct super_block *);
> int writeback_inodes_sb_if_idle(struct super_block *);
> void sync_inodes_sb(struct super_block *);
> void writeback_inodes_wbc(struct writeback_control *wbc);
> diff --git a/mm/page-writeback.c b/mm/page-writeback.c
> index 0b19943..49d3508 100644
> --- a/mm/page-writeback.c
> +++ b/mm/page-writeback.c
> @@ -597,7 +597,7 @@ static void balance_dirty_pages(struct address_space *mapping,
> (!laptop_mode && ((global_page_state(NR_FILE_DIRTY)
> + global_page_state(NR_UNSTABLE_NFS))
> > background_thresh)))
> - bdi_start_writeback(bdi, NULL, 0);
> + bdi_start_writeback(bdi, NULL, 0, 0);
> }
>
> void set_page_dirty_balance(struct page *page, int page_mkwrite)

2010-05-05 19:44:14

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] serious performance regression in "umount" on ext4 over LVM

https://bugzilla.kernel.org/show_bug.cgi?id=15906


Jens Axboe <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #15 from Theodore Tso <[email protected]> 2010-05-05 19:44:04 ---
The summary for this patch should probably be changed, yes? It should happen
with filesystems other than ext4 that use barriers (i.e., btrfs, XFS), and it
should happen on a single disk just as easily as a LVM. The latter I can
confirm --- you can see the performance degradation quite easily on a single
disk, so the mention of LVM in the summary is probably misleading.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-05 20:34:57

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906


Kees Cook <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
Summary|serious performance |performance regression in
|regression in "umount" on |"umount" of filesystems
|ext4 over LVM |using boundaries
Summary|serious performance |performance regression in
|regression in "umount" on |"umount" of filesystems
|ext4 over LVM |using barriers




--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-05 20:35:09

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906


Kees Cook <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
Summary|serious performance |performance regression in
|regression in "umount" on |"umount" of filesystems
|ext4 over LVM |using boundaries
Summary|serious performance |performance regression in
|regression in "umount" on |"umount" of filesystems
|ext4 over LVM |using barriers




--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-05 21:36:08

by Greg Freemyer

[permalink] [raw]
Subject: Re: [Bug 15906] performance regression in "umount" of filesystems using barriers

QAqPqqaAAaQw

On 5/5/10, [email protected]
<[email protected]> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=15906
>
>
> Kees Cook <[email protected]> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> Summary|serious performance |performance regression in
> |regression in "umount" on |"umount" of filesystems
> |ext4 over LVM |using boundaries
> Summary|serious performance |performance regression in
> |regression in "umount" on |"umount" of filesystems
> |ext4 over LVM |using barriers
>
>
>
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are watching the assignee of the bug.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Sent from my mobile device

Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
CNN/TruTV Aired Forensic Imaging Demo -
http://insession.blogs.cnn.com/2010/03/23/how-computer-evidence-gets-retrieved/

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

2010-05-05 21:36:56

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #16 from [email protected] 2010-05-05 21:36:46 ---
QAqPqqaAAaQw

On 5/5/10, [email protected]
<[email protected]> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=15906
>
>
> Kees Cook <[email protected]> changed:
>
> What |Removed |Added
> ----------------------------------------------------------------------------
> Summary|serious performance |performance regression in
> |regression in "umount" on |"umount" of filesystems
> |ext4 over LVM |using boundaries
> Summary|serious performance |performance regression in
> |regression in "umount" on |"umount" of filesystems
> |ext4 over LVM |using barriers
>
>
>
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are watching the assignee of the bug.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-05 21:55:34

by Greg Freemyer

[permalink] [raw]
Subject: Re: [Bug 15906] performance regression in "umount" of filesystems using barriers

That was the equivalent of a butt dial. I'd delete it from the
bugzilla if I knew how.

Sorry
Greg

On Wed, May 5, 2010 at 5:36 PM, <[email protected]> wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=15906
>
>
>
>
>
> --- Comment #16 from [email protected] ?2010-05-05 21:36:46 ---
> QAqPqqaAAaQw
>
> On 5/5/10, [email protected]
> <[email protected]> wrote:
>> https://bugzilla.kernel.org/show_bug.cgi?id=15906
>>
>>
>> Kees Cook <[email protected]> changed:
>>
>> ? ? ? ? ? ?What ? ?|Removed ? ? ? ? ? ? ? ? ? ? |Added
>> ----------------------------------------------------------------------------
>> ? ? ? ? ? ? Summary|serious performance ? ? ? ? |performance regression in
>> ? ? ? ? ? ? ? ? ? ?|regression in "umount" on ? |"umount" of filesystems
>> ? ? ? ? ? ? ? ? ? ?|ext4 over LVM ? ? ? ? ? ? ? |using boundaries
>> ? ? ? ? ? ? Summary|serious performance ? ? ? ? |performance regression in
>> ? ? ? ? ? ? ? ? ? ?|regression in "umount" on ? |"umount" of filesystems
>> ? ? ? ? ? ? ? ? ? ?|ext4 over LVM ? ? ? ? ? ? ? |using barriers
>>
>>
>>
>>
>> --
>> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
>> ------- You are receiving this mail because: -------
>> You are watching the assignee of the bug.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
>> the body of a message to [email protected]
>> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>>
>
> --
> Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are watching the assignee of the bug.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at ?http://vger.kernel.org/majordomo-info.html
>



--
Greg Freemyer
Head of EDD Tape Extraction and Processing team
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
CNN/TruTV Aired Forensic Imaging Demo -
http://insession.blogs.cnn.com/2010/03/23/how-computer-evidence-gets-retrieved/

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

2010-05-06 03:49:06

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #17 from Dmitry Monakhov <[email protected]> 2010-05-06 03:48:58 ---
(In reply to comment #15)
> The summary for this patch should probably be changed, yes? It should happen
> with filesystems other than ext4 that use barriers (i.e., btrfs, XFS), and it
> should happen on a single disk just as easily as a LVM. The latter I can
> confirm --- you can see the performance degradation quite easily on a single
> disk, so the mention of LVM in the summary is probably misleading.

Just a note, performance degradation is not that huge for XFS, as soon as i
understand this is because it batch several inode writes in to one 'log' so
less barriers required
But nor than less this is generic writeback issue.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-06 07:09:08

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #18 from Jens Axboe <[email protected]> 2010-05-06 07:08:55 ---
Created an attachment (id=26255)
--> (https://bugzilla.kernel.org/attachment.cgi?id=26255)
Explicitly pass in whether sb is pinned or not

Revision two of this patch. I got rid of WB_SYNC_NONE_PIN, since we should not
further muddy the sync type and whether or not the sb is pinned. Instead lets
add a specific flag for this.

Untested, but it compiles.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-06 07:10:37

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #19 from Jens Axboe <[email protected]> 2010-05-06 07:10:30 ---
BTW, I believe that Chris has observed this problem on btrfs as well. If you
have a lot of dirty inodes and do the umount, all the writeback will be done
with WB_SYNC_ALL which will cause a log commit for each on btrfs. End result is
the same, umount runs a lot slower than it otherwise should have.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-07 09:47:25

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #20 from Dmitry Monakhov <[email protected]> 2010-05-07 09:45:32 ---
I can confirm that Jens's fix the original regression for me,
so IMHO first part well done.

About second one.
(In reply to comment #13)
> A delalloc option. In fact dirty inode is still dirty even after first
> call of writeback_single_inode which is called from __sync_filesystem(0)
> due to delalloc allocation happen during inode write. So it takes second
> __sync_filesystem call to clear dirty flags. Currently i'm working on that
> issue. I hope i'll post a solution today.

Proposed patch was posted may be found here:
http://marc.info/?l=linux-fsdevel&m=127322500915287&w=2

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-08 14:30:33

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906


Thierry Vignaud <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #20 from Dmitry Monakhov <[email protected]> 2010-05-07 09:45:32 ---
I can confirm that Jens's fix the original regression for me,
so IMHO first part well done.

About second one.
(In reply to comment #13)
> A delalloc option. In fact dirty inode is still dirty even after first
> call of writeback_single_inode which is called from __sync_filesystem(0)
> due to delalloc allocation happen during inode write. So it takes second
> __sync_filesystem call to clear dirty flags. Currently i'm working on that
> issue. I hope i'll post a solution today.

Proposed patch was posted may be found here:
http://marc.info/?l=linux-fsdevel&m=127322500915287&w=2

--- Comment #21 from Dmitry Monakhov <[email protected]> 2010-05-07 10:49:00 ---
#########################################
# Testcase
#Prep stage
for ((i=0;i<1000;i++)) ;do echo test > /tmp/FILES/file-$i ;done
tar cf /tmp/files.tar /tmp/FILES
#Actual measurements
FSTYPE=ext4
OPT='-obarrier=1'
mkfs.$FSTYPE /dev/sdb1
mount /dev/sdb1 /mnt $OPT
time tar xf /tmp/files.tar -C /mnt/
time umount /mnt
#########################################

w/o update dflags patch
real 0m0.142s
user 0m0.005s
sys 0m0.093s

real 0m8.506s
user 0m0.000s
sys 0m0.016s

with update dflags patch
real 0m0.105s
user 0m0.006s
sys 0m0.086s

real 0m0.148s
user 0m0.002s
sys 0m0.007s

Seems where are no visible diffidence for btrfs and xfs.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-17 03:28:57

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906


Dmitry Monakhov <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
Attachment #26224|0 |1
is obsolete| |




--- Comment #22 from Dmitry Monakhov <[email protected]> 2010-05-17 03:28:50 ---
(From update of attachment 26224)
Superseded by Jens fix.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-17 03:44:22

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #23 from Dmitry Monakhov <[email protected]> 2010-05-17 03:44:19 ---
Created an attachment (id=26403)
--> (https://bugzilla.kernel.org/attachment.cgi?id=26403)
Fix async inodes writeback for FS with delalloc

Attach second patch, after it has passed 1 week review cycle in a list.

Seems that we have an agreement for both issues. IMHO They are 99.999% percent
candidates for 2.6.35-rc1(and then for stable).
Who will take care about patches? Jens, will you?

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-17 10:52:59

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #24 from Jens Axboe <[email protected]> 2010-05-17 10:52:54 ---
I'll add both patches to for-2.6.35 and mark them with a stable tag.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-19 18:58:00

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906


Andrew Morton <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
CC| |[email protected]
Resolution| |CODE_FIX




--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-19 19:12:12

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #25 from Kees Cook <[email protected]> 2010-05-19 19:12:08 ---
Thanks again to everyone who dug into this. I'd been pulling my hair out out
for months on it. :)

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-21 01:05:09

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906


Chuck Ebbert <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #26 from Chuck Ebbert <[email protected]> 2010-05-21 01:04:55 ---
(In reply to comment #18)
> Created an attachment (id=26255)
--> (https://bugzilla.kernel.org/attachment.cgi?id=26255) [details]
> Explicitly pass in whether sb is pinned or not
>

This patch is triggering a warning:

[ http://www.kerneloops.org/oops.php?number=3142224 ]

WARNING: at fs/fs-writeback.c:597 writeback_inodes_wb+0x229/0x380()
Hardware name: MS-7250
Modules linked in: fuse sunrpc ip6t_REJECT nf_conntrack_ipv6 ip6table_filter
ip6_tables ipv6 ext2 dm_multipath kvm_amd kvm uinput snd_hda_codec_atihdmi
snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep snd_seq
snd_seq_device rndis_wlan snd_pcm cfg80211 snd_timer snd rfkill soundcore
snd_page_alloc rndis_host edac_core ppdev cdc_ether i2c_nforce2 usbnet k8temp
edac_mce_amd parport_pc forcedeth mii parport serio_raw sha256_generic
aes_x86_64 aes_generic cbc dm_crypt firewire_ohci pata_acpi firewire_core
crc_itu_t floppy ata_generic pata_amd sata_nv radeon ttm drm_kms_helper drm
i2c_algo_bit i2c_core [last unloaded: scsi_wait_scan]
Pid: 1035, comm: flush-253:0 Not tainted 2.6.33.4-95.fc13.x86_64 #1
Call Trace:
[] warn_slowpath_common+0x77/0x8f
[] warn_slowpath_null+0xf/0x11
[] writeback_inodes_wb+0x229/0x380
[] wb_writeback+0x13d/0x1bc
[] ? call_rcu_sched+0x10/0x12
[] ? call_rcu+0x9/0xb
[] wb_do_writeback+0x69/0x14d
[] bdi_writeback_task+0x3a/0xaa
[] ? bdi_start_fn+0x0/0xd4
[] bdi_start_fn+0x6c/0xd4
[] ? bdi_start_fn+0x0/0xd4
[] kthread+0x7a/0x82
[] kernel_thread_helper+0x4/0x10
[] ? kthread+0x0/0x82
[] ? kernel_thread_helper+0x0/0x10

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-21 02:07:12

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #27 from Eric Sandeen <[email protected]> 2010-05-21 02:07:01 ---
static int pin_sb_for_writeback(struct writeback_control *wbc,
struct inode *inode, struct super_block **psb)
{
struct super_block *sb = inode->i_sb;

/*
* If this sb is already pinned, nothing more to do. If not and
* *psb is non-NULL, unpin the old one first
*/
if (sb == *psb)
return 0;
else if (*psb)
unpin_sb_for_writeback(psb);

/*
* Caller must already hold the ref for this
*/
if (wbc->sync_mode == WB_SYNC_ALL || wbc->sb_pinned) {
WARN_ON(!rwsem_is_locked(&sb->s_umount)); <<----- here
return 0;
}

http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=commitdiff;h=e913fc825dc685a444cb4c1d0f9d32f372f5986

is what Jens had upstream, and I think

http://git.kernel.org/?p=linux/kernel/git/axboe/linux-2.6-block.git;a=commitdiff;h=30fd1e551a730d942e91109762c942786be0ef7c

is the fix for the issue you see here...

> Even if the writeout itself isn't a data integrity operation, we need
> to ensure that the caller doesn't drop the sb umount sem before we
> have actually done the writeback.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-21 06:08:16

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #28 from Jens Axboe <[email protected]> 2010-05-21 06:08:09 ---
Eric is spot on, that latter commit does indeed fix up that issue. The problem
is that the caller holds the umount sem when initiating the WB_SYNC_NONE
writeback, and thus it passes down ->sb_pinned == 1. But since we clear the
work state for WB_SYNC_NONE early, the caller can then drop the umount sem
before we are actually done writing out inodes.

So the fix is queued up with the original patch, when it goes upstream there
should be no warnings.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-22 04:38:18

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906





--- Comment #29 from Eric Sandeen <[email protected]> 2010-05-22 04:38:10 ---
(Fedora has the updated patches now, btw - 2.6.33.4-102.fc13.x86_64 I think)

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.

2010-05-26 16:44:46

by bugzilla-daemon

[permalink] [raw]
Subject: [Bug 15906] performance regression in "umount" of filesystems using barriers

https://bugzilla.kernel.org/show_bug.cgi?id=15906


Brian Bloniarz <[email protected]> changed:

What |Removed |Added
----------------------------------------------------------------------------
CC| |[email protected]




--- Comment #30 from Brian Bloniarz <[email protected]> 2010-05-26 16:44:37 ---
I'm trying to get these fixes merged into the ubuntu release, because ubuntu's
workaround has issues. The downstream bug for this is
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/585092

I backported to ubuntu's kernel, which took a little fixup. You'll find the
patches attached on that bug.
0001-writeback-fix-WB_SYNC_NONE-writeback-from-umount.patch
0002-writeback-Update-dirty-flags-in-two-steps.patch
0003-writeback-ensure-that-WB_SYNC_NONE-writeback-with-sb.patch

If anyone who's familiar with these code paths could review my backports, I'd
appreciate it. They apply cleanly to mainline v2.6.32.11, which is the base
for ubuntu's kernel.

--
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching the assignee of the bug.