LinuxLists.cc - BUG: jbd2 slowing down file copies even though no journaling file system is used

2012-05-12 23:21:05

Subject: BUG: jbd2 slowing down file copies even though no journaling file system is used

Hey guys,

I have very slow and non-consistant transfer rates when copying files to Linux.

My system:
Ubuntu 12.04 Server x64 (same issue with Debian Squeez), 2 Gb RAM, AMD
240e processor

Hard disks:
1 * 500 Gb Seagate Momentus XT Hybrid HDD
6 * 1.5 - 2 TB Samsung HDD

The Momentus contains the OS in an encrypted LWM (dm-crypt) - one ext2
for booting, one ext2 and one LVM partition (all primary).

The other bigger hard disks each contain one encrypted Truecrypt
partition. Most are with ext4, one is with ext3. I join them to one
folder using "union-fs" (read only).

I use samba for transfer of data from my Windows 7 PC's.

The files stored are usually around 20 Gb in size (movies), and I
don't really care much about journaling and integrity (most partitions
are mounted read-only anyways as they are full ;) )

------------
The good scenario

Copying files FROM the server to a Windows client is really OK
performance wise. I see a 50% network utilization (1 Gbit network) and
it's quite constant around 46 Mb/sec.

TOP shows:
13681 root????? 20?? 0???? 0??? 0??? 0 S?? 40? 0.0?? 0:35.05 kworker/0:3
13632 windows?? 20?? 0? 122m 4916 3632 R?? 37? 0.3?? 3:05.52 smbd
13678 root????? 20?? 0???? 0??? 0??? 0 S?? 29? 0.0?? 0:34.57 kworker/0:0
?1797 root????? 20?? 0? 448m 1440? 420 S?? 22? 0.1?? 2:06.89 unionfs-fuse

IOTOP shows:
Total DISK READ:????? 50.40 M/s | Total DISK WRITE:?????? 0.00 B/s
? TID? PRIO? USER???? DISK READ? DISK WRITE? SWAPIN???? IO>??? COMMAND
13632 be/4 windows??? 50.40 M/s??? 0.00 B/s? 0.00 % 14.68 % smbd -F
10918 be/4 root??????? 8.12 M/s??? 0.00 B/s? 0.00 % 10.32 %
unionfs-fuse -o allow_other /share/h~hdd/6/Video
/share/multimedia/movies
10682 be/4 root??????? 8.36 M/s??? 0.00 B/s? 0.00 % 10.27 %
unionfs-fuse -o allow_other /share/h~hdd/6/Video
/share/multimedia/movies
10915 be/4 root??????? 8.24 M/s??? 0.00 B/s? 0.00 % 10.17 %
unionfs-fuse -o allow_other /share/h~hdd/6/Video
/share/multimedia/movies
?1798 be/4 root??????? 8.00 M/s??? 0.00 B/s? 0.00 % 10.15 %
unionfs-fuse -o allow_other /share/h~hdd/6/Video
/share/multimedia/movies
11282 be/4 root??????? 9.57 M/s??? 0.00 B/s? 0.00 % 10.13 %
unionfs-fuse -o allow_other /share/h~hdd/6/Video
/share/multimedia/movies
?1799 be/4 root??????? 8.12 M/s??? 0.00 B/s? 0.00 %? 8.78 %
unionfs-fuse -o allow_other

We can see that there is no "jbd" here - which is expected based on
what I found on the web (we're reading so we don't need journal
creation here).

------------
The bad scenario:

I do also have to transfer data to this server. And here comes the problem.

I copy the file to a specific hard disk / samba share (not to the union-fs).

And here, the data transfer is just impossibly slow.

Now before we end up in "it's encryption", let me say this is not the
case and why:

I have one standard ext2 partition on the Momentus (300 Gb), ext2 has
no journaling (which I really don't require).

I copy the file to the ext2 partition. Average transfer rate around 29
Mb/sec. However, there are a lot of spikes in the network transfer.
Peak at around 35% and it goes down to 20% in a wave form with
sometimes even 0% transfer. I made a video of this with my cell and
can share it.

IOTOP shows this in some cases, mostly jbd2 is not there:
? PRIO? USER???? DISK READ? DISK WRITE? SWAPIN???? IO>??? COMMAND
? 416 be/3 root??????? 0.00 B/s?? 97.65 M/s? 0.00 % 13.74 % [jbd2/dm-1-8]
?1067 be/4 root??????? 0.00 B/s??? 3.92 K/s? 0.00 %? 0.00 % nmbd -D
13632 be/4 windows???? 0.00 B/s?? 25.98 M/s? 0.00 %? 0.00 % smbd -F
??? 1 be/4 root??????? 0.00 B/s??? 0.00 B/s? 0.00 %? 0.00 % init
??? 2 be/4 root??????? 0.00 B/s??? 0.00 B/s? 0.00 %? 0.00 % [kthreadd]
??? 3 be/4 root??????? 0.00 B/s??? 0.00 B/s? 0.00 %? 0.00 % [ksoftirqd/0]
?1029 be/4 root??????? 0.00 B/s??? 0.00 B/s? 0.00 %? 0.00 % sshd -D
??? 6 rt/4 root??????? 0.00 B/s??? 0.00 B/s? 0.00 %? 0.00 % [migration/0]

What I can see in iotop is that smbd sometimes goes up to 40 and also
down to 10 Mb/sec. Now why is it like that?

The network? No - I did a "cp /dev/zero /share/test/z" to see how
quickly that would be. It shows that even here, jbd2 kicks in with 99%
I/O sometimes - and during that time there is NO disk access at all
acc. to iotop, 0 Kb read and written.

I don't doubt iotop - what I see there correlates very well to what I
see on Windows in the network traffic.

Now, my guess here would be something like - the kernel sees that
there has been data written, it tries to find out what has to go into
a journal (jbd2) but as it's such a lot of data this takes some time
and during this time nothing else is possible as the entire CPU(s) are
used up.

Just a guess as I don't know how the kernel works one bit ;)

So now I really hope you can help me here. These write transfer rates
are just so low that there has to be a bug here somewhere, Linux can
definately be faster I am sure :) And after spending days on this
analysis and reinstalling different linux distro's I really dont want
to go back to Windows yet (yes, there I have a constant write
performance of around 48 Mb/sec.

Whatever I can do to help let me know!

/proc/version:
Linux version 3.2.0-23-generic (buildd@crested) (gcc version 4.6.3
(Ubuntu/Linaro 4.6.3-1ubuntu4) ) #36-Ubuntu SMP Tue Apr 10 20:39:51
UTC 2012

/proc/cpuinfo:
processor?????? : 0
vendor_id?????? : AuthenticAMD
cpu family????? : 16
model?????????? : 6
model name????? : AMD Athlon(tm) II X2 240e Processor
stepping??????? : 2
microcode?????? : 0x10000c7

/proc/modules:
xts 12756 64 - Live 0x0000000000000000
gf128mul 14951 1 xts, Live 0x0000000000000000
ext2 73795 1 - Live 0x0000000000000000
snd_hda_codec_hdmi 32474 1 - Live 0x0000000000000000
snd_hda_intel 33773 0 - Live 0x0000000000000000
snd_hda_codec 127706 2 snd_hda_codec_hdmi,snd_hda_intel, Live 0x0000000000000000
snd_hwdep 13668 1 snd_hda_codec, Live 0x0000000000000000
snd_pcm 97188 3 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec, Live
0x0000000000000000
snd_timer 29990 1 snd_pcm, Live 0x0000000000000000
snd 78855 6 snd_hda_codec_hdmi,snd_hda_intel,snd_hda_codec,snd_hwdep,snd_pcm,snd_timer,
Live 0x0000000000000000
edac_core 53746 0 - Live 0x0000000000000000
edac_mce_amd 23709 0 - Live 0x0000000000000000
mac_hid 13253 0 - Live 0x0000000000000000
k10temp 13166 0 - Live 0x0000000000000000
soundcore 15091 1 snd, Live 0x0000000000000000
snd_page_alloc 18529 2 snd_hda_intel,snd_pcm, Live 0x0000000000000000
sp5100_tco 13791 0 - Live 0x0000000000000000
shpchp 37277 0 - Live 0x0000000000000000
i2c_piix4 13301 0 - Live 0x0000000000000000
lp 17799 0 - Live 0x0000000000000000
parport 46562 1 lp, Live 0x0000000000000000
dm_crypt 23125 9 - Live 0x0000000000000000
pata_jmicron 12747 0 - Live 0x0000000000000000
sata_sil 13539 4 - Live 0x0000000000000000
radeon 804372 1 - Live 0x0000000000000000
wmi 19256 0 - Live 0x0000000000000000
ttm 76949 1 radeon, Live 0x0000000000000000
drm_kms_helper 46978 1 radeon, Live 0x0000000000000000
drm 242038 3 radeon,ttm,drm_kms_helper, Live 0x0000000000000000
i2c_algo_bit 13423 1 radeon, Live 0x0000000000000000
r8169 62099 0 - Live 0x0000000000000000

I read on the other data which might be needed, but that is such a lot
and I don't know if it's really needed here for this example. If you
need it, let me know!

And yes I am quite sure this issue is reproducable. I have a second
motherboard in use plus a different CPU and the results are just the
same (faster due to faster CPU but the issue is still the same).

Thanks in advance,
Bjoern

2012-05-13 04:40:57

by Theodore Ts'o

[permalink] [raw]

Subject: Re: BUG: jbd2 slowing down file copies even though no journaling file system is used

On Sun, May 13, 2012 at 01:21:00AM +0200, Bj?rn Christoph wrote:
>
> IOTOP shows this in some cases, mostly jbd2 is not there:
> ? PRIO? USER???? DISK READ? DISK WRITE? SWAPIN???? IO>??? COMMAND
> ? 416 be/3 root??????? 0.00 B/s?? 97.65 M/s? 0.00 % 13.74 % [jbd2/dm-1-8]

Well, the file system in that has the huge write bandwidth is
whichever device is associated with device mapper device dm-1 (i.e.,
with a device mapper minor number of 1). What does "ls -l
/dev/mapper" show you?

Whatever file system is associated with it is clearly generating a lot
of journal activity.

Something that may help in determining what process is generating all
of this journal activity (which is likely the result of something
calling fsync a lot) is to try this:

echo 1 > /sys/kernel/debug/tracing/events/ext4/ext4_sync_file_enter/enable

... then wait for a minute or so, and then capture the output of:

cat /sys/kernel/debug/tracing/trace

See if that shows up anything useful.

Regards,

- Ted

2012-05-13 08:55:31

by Björn Christoph

[permalink] [raw]

Subject: Re: BUG: jbd2 slowing down file copies even though no journaling file system is used

Ted,

dm-1 points to my Linux-OS partition which is actually the encrypted
LVM one. However, I could also see that on some copies this actually
points to the related truecrypt partition so that makes sense there
(the dm-1 doesn't imo but not important at this point).

"echo 1 > /sys/kernel/debug/tracing/events/ext4/ext4_sync_file_enter/enable"
didn't work for me, I got a "permission denied" error. So then I
figured out I have to use "sudo -i" to get a root shell, then I can
actually echo into that file ? Very strange but an Ubuntu
issue/feature I guess.

However, I don't see that the file /sys/kernel/debug/tracing/trace
changes a lot? All I see is this even though I am clearly copying data
onto the disk via samba:

# tracer: nop
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
<...>-2096 [001] 1111.151516: ext4_sync_file_enter: dev
252,1 ino 3410181 parent 3410196 datasync 1
<...>-2096 [000] 1111.201520: ext4_sync_file_enter: dev
252,1 ino 3410181 parent 3410196 datasync 0
<...>-2096 [001] 1111.252939: ext4_sync_file_enter: dev
252,1 ino 3410181 parent 3410196 datasync 1
<...>-2096 [000] 1111.290241: ext4_sync_file_enter: dev
252,1 ino 3410181 parent 3410196 datasync 0
<...>-2096 [001] 1111.341757: ext4_sync_file_enter: dev
252,1 ino 3410181 parent 3410196 datasync 1
<...>-2096 [000] 1111.367540: ext4_sync_file_enter: dev
252,1 ino 3410181 parent 3410196 datasync 0
<...>-2096 [001] 1111.419511: ext4_sync_file_enter: dev
252,1 ino 3410181 parent 3410196 datasync 1
<...>-2096 [000] 1111.456354: ext4_sync_file_enter: dev
252,1 ino 3410181 parent 3410196 datasync 0

PID 2096 is sshd acc. to "top" (I think this was my local copying to a
file from /dev/zero?).

Even if I copy data now, the file never changes, I always see the same
"cat" output?

I changed the entry back to 0, rebooted, changed it back to 1, started
copying a file to an ext4 partition and cat
/sys/kernel/debug/tracing/trace is just plain empty (shows "tracer:
nop", TASK-PID etc but no data).

I'm a bit lost on how to gather the required traces :(

Thanks and regards,
Bjoern

2012/5/13 Ted Ts'o <[email protected]>:
> On Sun, May 13, 2012 at 01:21:00AM +0200, Bj?rn Christoph wrote:
>>
>> IOTOP shows this in some cases, mostly jbd2 is not there:
>> ? PRIO? USER???? DISK READ? DISK WRITE? SWAPIN???? IO>??? COMMAND
>> ? 416 be/3 root??????? 0.00 B/s?? 97.65 M/s? 0.00 % 13.74 % [jbd2/dm-1-8]
>
> Well, the file system in that has the huge write bandwidth is
> whichever device is associated with device mapper device dm-1 (i.e.,
> with a device mapper minor number of 1). ?What does "ls -l
> /dev/mapper" show you?
>
> Whatever file system is associated with it is clearly generating a lot
> of journal activity.
>
> Something that may help in determining what process is generating all
> of this journal activity (which is likely the result of something
> calling fsync a lot) is to try this:
>
> echo 1 > /sys/kernel/debug/tracing/events/ext4/ext4_sync_file_enter/enable
>
> ... then wait for a minute or so, and then capture the output of:
>
> cat /sys/kernel/debug/tracing/trace
>
> See if that shows up anything useful.
>
> Regards,
>
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?- Ted

2012-05-16 21:03:13

by Roland Eggner

[permalink] [raw]

Subject: Re: BUG: jbd2 slowing down file copies even though no journaling file system is used

Hi Björn!

“Cc: LKML” added … sorry for the duplicate to your personal address.

On 2012-05-13 Sun 01:21, Björn Christoph wrote:
> I have very slow and non-consistant transfer rates when copying files to Linux.
>
> My system:
> Ubuntu 12.04 Server x64 (same issue with Debian Squeez), 2 Gb RAM, AMD
> 240e processor
>
> Hard disks:
> 1 * 500 Gb Seagate Momentus XT Hybrid HDD
> 6 * 1.5 - 2 TB Samsung HDD
>
> The Momentus contains the OS in an encrypted LWM (dm-crypt) - one ext2
> for booting, one ext2 and one LVM partition (all primary).
>
> The other bigger hard disks each contain one encrypted Truecrypt
> partition. Most are with ext4, one is with ext3. I join them to one
> folder using "union-fs" (read only).
>
> I use samba for transfer of data from my Windows 7 PC's.
>
> The files stored are usually around 20 Gb in size (movies), and I
> don't really care much about journaling and integrity (most partitions
> are mounted read-only anyways as they are full ;) )
>
> ------------
> The good scenario
>
> Copying files FROM the server to a Windows client is really OK
> performance wise. I see a 50% network utilization (1 Gbit network) and
> it's quite constant around 46 Mb/sec.
>
> … …
>
> ------------
> The bad scenario:
>
> I do also have to transfer data to this server. And here comes the problem.
>
> I copy the file to a specific hard disk / samba share (not to the union-fs).
>
> And here, the data transfer is just impossibly slow.
>
> Now before we end up in "it's encryption", let me say this is not the
> case and why:
>
> I have one standard ext2 partition on the Momentus (300 Gb), ext2 has
> no journaling (which I really don't require).
>
> I copy the file to the ext2 partition. Average transfer rate around 29
> Mb/sec. However, there are a lot of spikes in the network transfer.
> Peak at around 35% and it goes down to 20% in a wave form with
> sometimes even 0% transfer.
> … …

I guess, your write performance problem is related to a large amount of triple
indirect blocks and high file fragmentation:

(1) Are you aware of the fundamental difference between blocklist filesystems
(ext2, ext3) and extent based filesystems (ext4 with option extents enabled,
XFS, JFS, …)?

If not, imagine the huge metadata overhead of reading a 20G sized file stored
in millions of triple indirect blocks (= pointers to pointers to pointers to
blocks), compared to reading the same 20G sized file stored in just 10 extents.

(2) Did you check “tind blocks” and “non-contiguous files” in the output of
fsck command? If you want further help, please post output of df and fsck.
df output shows in the first column the device names to use for fsck command:

df -BG -T
sudo fsck -C 0 -f -n /dev/yourdevice

(3) Did you create your ext4-filesystems by using mkfs.ext4 or by conversion
of ext3 filesystems? In the latter case, did you run following commands
(partition backup recommended prior to first trial)?

sudo tune2fs -O extents /dev/yourdevice
sudo tune2fs -I 256 /dev/yourdevice

Note, that enabling extents affects only files written in the future, not
already written files. Thus a "backup - mkfs.ext4 - restore"-cycle is
preferable. Related documentation e.g.:

http://kernel.org/doc/Documentation/filesystems/ext4.txt
man tune2fs
man mkfs.ext4

Background:

6 years ago I compared Linux filesystems and encountered this case:
(a) ext3 filesystem size some 20G, holding a few files with size 0.5G … 4G.
e2fsck required almost 1 hour for checking.
(b) XFS filesystem on the very same partition, holding exactly the same
fileset. xfs_check completed within a few seconds.

Since then and after some other considerations I use XFS for all my plain and
encrypted Linux filesystems. My shell prompt automatically calls a script
“cooked” by me, which executes sync as soon as cpuload drops to idle or nearly
idle. This protects my filesystems against the famous “zero-sized files
problem” of XFS after power interruptions, without any noticeable performance
downsides.

In terms of file fragmentation XFS most likely surpasses every other file-
system, particularly in nearly full conditions. For your use case with “most
partitions are mounted read-only anyways as they are full” this sounds
attractive, or what would you say?

http://en.wikipedia.org/wiki/XFS#Extent_based_allocation
http://kernel.org/doc/Documentation/filesystems/xfs.txt
http://xfs.org/index.php/XFS

--
Regards
Roland

Attachments:

(No filename) (4.40 kB)
(No filename) (198.00 B)
Download all attachments