Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp2732467ybl; Thu, 19 Dec 2019 20:02:04 -0800 (PST) X-Google-Smtp-Source: APXvYqwCKZ23aS8842OQPt6xu+ealQDbqyuAWqtlDFwtowGAdZPPdYFr3z+0Zoer+hvAirZH395T X-Received: by 2002:a9d:4706:: with SMTP id a6mr11703345otf.331.1576814524398; Thu, 19 Dec 2019 20:02:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576814524; cv=none; d=google.com; s=arc-20160816; b=YMJW2mQP4IPuxm8y/1RqJqwzf+sQzIU2m7KD9Kw/ZV9D5kn8/Jkb3FxlyOFd26tMTD POi2avlu8zgaPW5nE4ExBv/mOXnUqONBzVtCdabo9fQqaaFtiH2DHtuITFd8YZyD1aUl pmOmIM52y0Z/R3tptybtwSDTG2qRRcBc57Dqe1i0mDPG+Pg6y+QooxoroF+BUq94Wb7Z Vb2jA2z9LkQ5BnYByNWmDC2K2ywvEHI2onyDGSEtk2TN+AIdxEBmo/7caDU5pUtPV1bF AnAj6nsyi/3qSCAIOlPblU9y+Um6OWTP9p7qAhbh20ynaUkc379DNjWslim4/tnT0wVh ZAGg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:references:to:from:subject:dkim-signature; bh=hruZpeVd6jBbeNjBCylObWSldrTsXJJLa81F8j2XPME=; b=hEei5Lo35gCaF0RIfTo0YdrY9cwKuGWhcQkNK2FfPTbkcWPD0KwzerNXtgV2/9z36X vxW8yoaqwvel9NNmVrNhfr4VCKmU9TRb91cAACHHgL2BeFbqcfTS1jquPJ+qUC4b+y7u LJ9lfHw76xU++/e8suPNdpeoMf+JxKyGCu6ZNi0e/jZmQaF3pHvpId+VyJYmN0ZtYKC/ 6v50ErTEYaGpsBq6FYcazZ629PWyF4dEtH/G/NIvMYUG6h4a+mt/FbTnEKbeTHWvb3Tr vtBvaLXycEsS6uN2ZFa+lclE+UL8tm1Ekr30F9vOzq8F8kEBSllQZiNcHFRQxqPWQ+yu 6Lzw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2019-08-05 header.b="qWA/RHay"; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id i15si3533512otl.247.2019.12.19.20.01.44; Thu, 19 Dec 2019 20:02:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2019-08-05 header.b="qWA/RHay"; spf=pass (google.com: best guess record for domain of linux-nfs-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-nfs-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727148AbfLTEBZ (ORCPT + 99 others); Thu, 19 Dec 2019 23:01:25 -0500 Received: from userp2130.oracle.com ([156.151.31.86]:57364 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727110AbfLTEBZ (ORCPT ); Thu, 19 Dec 2019 23:01:25 -0500 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.27/8.16.0.27) with SMTP id xBK40EdK075613 for ; Fri, 20 Dec 2019 04:01:23 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=subject : from : to : references : message-id : date : mime-version : in-reply-to : content-type : content-transfer-encoding; s=corp-2019-08-05; bh=hruZpeVd6jBbeNjBCylObWSldrTsXJJLa81F8j2XPME=; b=qWA/RHayRgDSgNF+0zBtAaGp1bEKcjOURa84DyK01kpFpfsDGrVCyL0/hKFPPI2pzZgB kQXZMVGzTc/mNGCOuOCVQU3l1rjMngCVAyS+PBsnBZDMP/Qvh+ZZiZohYXCN88/7SOIl veaaB6Z1d0dk1cmSYQ/SV7lapE56z9FzjGXXtMok8zd3peA7KjQM4BTk24PktBRJWjlI Ffy3O2CUUWqu1+I/IE1+DaaNDBDLpmYl1IE6sZyKxwhbZ2rTgwFmfMAX8xyRHpa0mHpj 4OFO7417NrJx9Uj1W44UrrdBJ3uHewP5XbCSuKT2pxozvvJuOp5aWocdRXNYATVsu413 EQ== Received: from aserp3030.oracle.com (aserp3030.oracle.com [141.146.126.71]) by userp2130.oracle.com with ESMTP id 2x01jaefcn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Fri, 20 Dec 2019 04:01:23 +0000 Received: from pps.filterd (aserp3030.oracle.com [127.0.0.1]) by aserp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id xBK3rT5T139545 for ; Fri, 20 Dec 2019 04:01:22 GMT Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by aserp3030.oracle.com with ESMTP id 2x0pcb8xsq-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Fri, 20 Dec 2019 04:01:21 +0000 Received: from abhmp0006.oracle.com (abhmp0006.oracle.com [141.146.116.12]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id xBK41KGh002167 for ; Fri, 20 Dec 2019 04:01:20 GMT Received: from Macbooks-MacBook-Pro.local (/10.39.251.100) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 19 Dec 2019 20:01:20 -0800 Subject: Re: 'ls -lrt' performance issue on large dir while dir is being modified From: Dai Ngo To: "linux-nfs@vger.kernel.org" References: Message-ID: Date: Thu, 19 Dec 2019 20:01:16 -0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9476 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-1912200027 X-Proofpoint-Virus-Version: vendor=nai engine=6000 definitions=9476 signatures=668685 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1911140001 definitions=main-1912200028 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Hi Anna, Trond, I made a mistake with the 5.5 numbers. The VM that runs 5.5 has some problems. There is no regression with 5.5, here are the new numbers: Upstream Linux 5.5.0-rc1 [ORI] 93296: 3m10.917s 197891:  10m35.789s Upstream Linux 5.5.0-rc1 [MOD] 98614: 1m59.649s 192801:  3m55.003s My apologies for the mistake. Now there is no regression with 5.5, I'd like to get your opinion regarding the change to revert the call from invalidate_mapping_pages to nfs_zap_mapping in nfs_force_use_readdirplus to prevent the current 'ls' from restarting the READDIRPLUS3 from cookie 0. I'm not quite sure about the intention of the prior change from nfs_zap_mapping to invalidate_mapping_pages so that is why I'm seeking advise. Or do you have any suggestions to achieve the same? Thanks, -Dai On 12/17/19 4:34 PM, Dai Ngo wrote: > Hi, > > I'd like to report an issue with 'ls -lrt' on NFSv3 client takes > a very long time to display the content of a large directory > (100k - 200k files) while the directory is being modified by > another NFSv3 client. > > The problem can be reproduced using 3 systems. One system serves > as the NFS server, one system runs as the client that doing the > 'ls -lrt' and another system runs the client that creates files > on the server. >     Client1 creates files using this simple script: > >> #!/bin/sh >> >> if [ $# -lt 2 ]; then >>         echo "Usage: $0 number_of_files base_filename" >>         exit >> fi    nfiles=$1 >> fname=$2 >> echo "creating $nfiles files using filename[$fname]..." >> i=0         while [ i -lt $nfiles ] ; >> do            i=`expr $i + 1` >>         echo "xyz" > $fname$i >>         echo "$fname$i" done > > Client2 runs 'time ls -lrt /tmp/mnt/bd1 |wc -l' in a loop. > > The network traces and dtrace probes showed numerous READDIRPLUS3 > requests restarting  from cookie 0 which seemed to indicate the > cached pages of the directory were invalidated causing the pages > to be refilled starting from cookie 0 until the current requested > cookie.  The cached page invalidation were tracked to > nfs_force_use_readdirplus().  To verify, I made the below > modification, ran the test for various kernel versions and > captured the results shown below. > > The modification is: > >> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c >> index a73e2f8bd8ec..5d4a64555fa7 100644 >> --- a/fs/nfs/dir.c >> +++ b/fs/nfs/dir.c >> @@ -444,7 +444,7 @@ void nfs_force_use_readdirplus(struct inode *dir) >>      if (nfs_server_capable(dir, NFS_CAP_READDIRPLUS) && >>          !list_empty(&nfsi->open_files)) { >>          set_bit(NFS_INO_ADVISE_RDPLUS, &nfsi->flags); >> -        invalidate_mapping_pages(dir->i_mapping, 0, -1); >> +        nfs_zap_mapping(dir, dir->i_mapping); >>      } >>  } > > Note that after this change, I did not see READDIRPLUS3 restarting > with cookie 0 anymore. > > Below are the summary results of 'ls -lrt'.  For each kernel version > to be compared, one row for the original kernel and one row for the > kernel with the above modification. > > I cloned dtrace-linux from here: > github.com/oracle/dtrace-linux-kernel > > dtrace-linux 5.1.0-rc4 [ORI] 89191: 2m59.32s   193071: 6m7.810s > dtrace-linux 5.1.0-rc4 [MOD] 98771: 1m55.900s  191322: 3m48.668s > > I cloned upstream Linux from here: > git.kernel.org/pub/scm/linux/kernel/git/tovards/linux.git > > Upstream Linux 5.5.0-rc1 [ORI] 87891: 5m11.089s  160974: 14m4.384s > Upstream Linux 5.5.0-rc1 [MOD] 87075: 5m2.057s   161421: 14m33.615s > > Please note that these are relative performance numbers and are used > to illustrate the issue only. > > For reference, on the original dtrace-linux it takes about 9s for > 'ls -ltr' to complete on a directory with 200k files if the directory > is not modified while 'ls' is running. > > The number of the original Upstream Linux is *really* bad, and the > modification did not seem to have any effect, not sure why... > it could be something else is going on here. > > The cache invalidation in nfs_force_use_readdirplus seems too > drastic and might need to be reviewed. Even though this change > helps but it did not get the 'ls' performance to where it's > expected to be. I think even though READDIRPLUS3 was used, the > attribute cache was invalidated due to the directory modification, > causing attribute cache misses resulting in the calls to > nfs_force_use_readdirplus as shown in this stack trace: > >   0  17586     page_cache_tree_delete:entry >               vmlinux`remove_mapping+0x14 >               vmlinux`invalidate_inode_page+0x7c >               vmlinux`invalidate_mapping_pages+0x1dd >               nfs`nfs_force_use_readdirplus+0x47 >               nfs`__dta_nfs_lookup_revalidate_478+0x5dd >               vmlinux`d_revalidate.part.24+0x10 >               vmlinux`lookup_fast+0x254 >               vmlinux`walk_component+0x49 >               vmlinux`path_lookupat+0x79 >               vmlinux`filename_lookup+0xaf >               vmlinux`user_path_at_empty+0x36 >               vmlinux`vfs_statx+0x77 >               vmlinux`SYSC_newlstat+0x3d >               vmlinux`SyS_newlstat+0xe >               vmlinux`do_syscall_64+0x79 >               vmlinux`entry_SYSCALL_64+0x18d > > Besides the overhead of refilling the page caches from cookie 0, > I think the reason 'ls' still takes so long to compete because the > client has to send a bunch of additional LOOKUP/ACCESS requests > over the wire to service the stat(2) calls from 'ls' due to the > attribute cache misses. > > Please let me know you what you think and if there is any addition > information is needed. > > Thanks, > -Dai > >