From: Alexey Lyashkov Subject: some large dir testing results Date: Thu, 20 Apr 2017 22:00:48 +0300 Message-ID: <52B4B404-9FE0-4586-A02A-3451AA5BE089@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Cc: Andreas Dilger , Artem Blagodarenko To: linux-ext4 Return-path: Received: from mail-wm0-f65.google.com ([74.125.82.65]:34654 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S934581AbdDTTAx (ORCPT ); Thu, 20 Apr 2017 15:00:53 -0400 Received: by mail-wm0-f65.google.com with SMTP id z129so142799wmb.1 for ; Thu, 20 Apr 2017 12:00:53 -0700 (PDT) Sender: linux-ext4-owner@vger.kernel.org List-ID: Hi All, I run some testing on my environment with large dir patches provided by = Artem. Each test run a 11 loops with creating 20680000 mknod objects for normal = dir, and 20680000 for large dir. FS was reformatted before each test, files was created in root dir to = have an allocate inodes and blocks from GD#0 and up. Journal have a size - 4G and it was internal journal. Kernel was RHEL 7.2 based with lustre patches. Test script code #!/bin/bash LOOPS=3D11 for i in `seq ${LOOPS}`; do=20 mkfs -t ext4 -F -I 256 -J size=3D4096 ${DEV} mount -t ldiskfs ${DEV} ${MNT} pushd ${MNT} /usr/lib/lustre/tests/createmany -m test 20680000 >& = /tmp/small-mknod${i} popd umount ${DEV} done for i in `seq ${LOOPS}`; do=20 mkfs -t ext4 -F -I 256 -J size=3D4096 -O large_dir ${DEV} mount -t ldiskfs ${DEV} ${MNT} pushd ${MNT} /usr/lib/lustre/tests/createmany -m test 206800000 >& = /tmp/large-mknod${i} popd umount ${DEV} done Tests was run on two nodes - first node have a storage with raid10 of = fast HDD=E2=80=99s, second node have a NMVE as block device. Current directory code have a near of similar results for both nodes for = first test: - HDD node 56k-65k creates/s - SSD node ~80k creates/s But large_dir testing have a large differences for nodes. - HDD node have a drop a creation rate to 11k create/s - SSD node have drop to 46k create/s Initial analyze say about several problems 0) CPU load isn=E2=80=99t high, and perf top say ldiskfs functions = isn=E2=80=99t hot (2%-3% cpu), most spent for dir entry checking = function. 1) lookup have a large time to read a directory block to verify file not = exist. I think it because a block fragmentation. [root@pink03 ~]# cat /proc/100993/stack [] sleep_on_buffer+0xe/0x20 [] __wait_on_buffer+0x2a/0x30 [] ldiskfs_bread+0x7c/0xc0 [ldiskfs] [] __ldiskfs_read_dirblock+0x4a/0x400 [ldiskfs] [] ldiskfs_dx_find_entry+0xef/0x200 [ldiskfs] [] ldiskfs_find_entry+0x4cb/0x570 [ldiskfs] [] ldiskfs_lookup+0x75/0x230 [ldiskfs] [] lookup_real+0x1d/0x50 [] __lookup_hash+0x42/0x60 [] filename_create+0x98/0x180 [] user_path_create+0x41/0x60 [] SyS_mknodat+0xda/0x220 [] SyS_mknod+0x1d/0x20 [] system_call_fastpath+0x16/0x1b [] 0xffffffffffffffff 2) Some JBD problems when create thread have a wait a shadow BH from a = committed transaction. [root@pink03 ~]# cat /proc/100993/stack [] sleep_on_shadow_bh+0xe/0x20 [jbd2] [] do_get_write_access+0x2dd/0x4e0 [jbd2] [] jbd2_journal_get_write_access+0x27/0x40 [jbd2] [] __ldiskfs_journal_get_write_access+0x3b/0x80 = [ldiskfs] [] __ldiskfs_new_inode+0x447/0x1300 [ldiskfs] [] ldiskfs_create+0xd8/0x190 [ldiskfs] [] vfs_create+0xcd/0x130 [] SyS_mknodat+0x1f0/0x220 [] SyS_mknod+0x1d/0x20 [] system_call_fastpath+0x16/0x1b [] 0xffffffffffffffff [root@pink03 ~]# cat /proc/100993/stack [] sleep_on_shadow_bh+0xe/0x20 [jbd2] [] do_get_write_access+0x2dd/0x4e0 [jbd2] [] jbd2_journal_get_write_access+0x27/0x40 [jbd2] [] __ldiskfs_journal_get_write_access+0x3b/0x80 = [ldiskfs] [] ldiskfs_mb_mark_diskspace_used+0x7d/0x4f0 [ldiskfs] [] ldiskfs_mb_new_blocks+0x2ac/0x5d0 [ldiskfs] [] ldiskfs_ext_map_blocks+0x49d/0xed0 [ldiskfs] [] ldiskfs_map_blocks+0x179/0x590 [ldiskfs] [] ldiskfs_getblk+0x65/0x200 [ldiskfs] [] ldiskfs_bread+0x27/0xc0 [ldiskfs] [] ldiskfs_append+0x7e/0x150 [ldiskfs] [] do_split+0xa9/0x900 [ldiskfs] [] ldiskfs_dx_add_entry+0xc2/0xbc0 [ldiskfs] [] ldiskfs_add_entry+0x254/0x6e0 [ldiskfs] [] ldiskfs_add_nondir+0x20/0x80 [ldiskfs] [] ldiskfs_create+0x114/0x190 [ldiskfs] [] vfs_create+0xcd/0x130 [] SyS_mknodat+0x1f0/0x220 [] SyS_mknod+0x1d/0x20 [] system_call_fastpath+0x16/0x1b I know several jbd2 improvements by Kara isn=E2=80=99t landed into = RHEL7, but i don=E2=80=99t think it will big improvement, as SSD have = less perf drop. I think perf dropped due additional seeks requested to have access to = the dir data or inode allocation. Alex