From: Zach Brown Subject: Re: [PATCH 2/2] ext4: Handle readdir when a file is converted from inline to block based. Date: Thu, 28 Mar 2013 11:44:12 -0700 Message-ID: <20130328184412.GF16651@lenny.home.zabbo.net> References: <1364466899-5599-1-git-send-email-tm@tao.ma> <1364466899-5599-2-git-send-email-tm@tao.ma> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linux-ext4@vger.kernel.org To: Tao Ma Return-path: Received: from mx1.redhat.com ([209.132.183.28]:64534 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752768Ab3C1SoO (ORCPT ); Thu, 28 Mar 2013 14:44:14 -0400 Content-Disposition: inline In-Reply-To: <1364466899-5599-2-git-send-email-tm@tao.ma> Sender: linux-ext4-owner@vger.kernel.org List-ID: > Zach reported that if a dir is inlined, the offset is within the inode, while > if we have done the conversion, the dir now will have a block offset or even > a hashed pos. The good thing is that ext4 is also prepared to handle some > situation that the dir is changed during many calls of getdents. This doesn't fix the problem. The problem isn't using the right code path within ext4 for either inline or normal block directories. The problem is that offsets for existing files change. Yeah, ext4 also has this problem when it converts from classic linear dirents to hashed dirents, but I bet that basically doesn't happen any more. Inline dirs are making the problem happen for every single directory as it grows. There's two ways to experience the bug: 1) nfs clients getting the wrong entry because the offset has changed from the time that they got it from the server 2) more worryingly: a concurrent readdir() can see duplicate entries from simply advancing f_pos as it does normally Here's a quick little demonstration of the second case: d_off: 2 d_name: ., f_pos 2 d_off: 4 d_name: .., f_pos 4 d_off: 16 d_name: a, f_pos 16 d_off: 28 d_name: b, f_pos 28 d_off: 40 d_name: c, f_pos 40 d_off: 371778706554281332 d_name: .., f_pos 18446744071750344052 d_off: 1068979911240654558 d_name: b, f_pos 18446744072795659998 d_off: 6187216788877381273 d_name: c, f_pos 1633586841 d_off: 6280769109141524706 d_name: e, f_pos 1386254562 Run the following in a newly created empty dir with inline_data: #include #include #include #include #include #include #include #include #include struct linux_dirent { long d_ino; off_t d_off; unsigned short d_reclen; char d_name[]; }; int main(int argc, char **argv) { struct linux_dirent dent; char name[2] = {0,}; int i; int ret; int fd; fd = open(".", O_RDONLY | O_DIRECTORY); if (fd < 0) { printf("open(\".\", O_RDONLY|O_DIRECTORY) failed: %u (%s)\n", errno, strerror(errno)); exit(1); } for (i = 0; i < 26; i++) { name[0] = 'a' + i; mknod(name, S_IFREG|0755, 0); ret = syscall(SYS_getdents, fd, &dent, sizeof(dent)); if (ret < 1) break; printf("d_off: %llu d_name: %s, f_pos %llu\n", (unsigned long long)dent.d_off, dent.d_name, (unsigned long long)lseek(fd, 0, SEEK_CUR)); } return 0; }