Received: by 2002:a25:6193:0:0:0:0:0 with SMTP id v141csp594969ybb; Fri, 20 Mar 2020 04:51:29 -0700 (PDT) X-Google-Smtp-Source: ADFU+vuFNOdfg0mgkI6dvJ8Yue31DcRlLiO+DYG/Dxsa3K8RhWsGfU1sQQ7R3YJ06rhuHTC4xeQd X-Received: by 2002:aca:474e:: with SMTP id u75mr5782821oia.52.1584705089047; Fri, 20 Mar 2020 04:51:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1584705089; cv=none; d=google.com; s=arc-20160816; b=YRQSNHo015OHBxEsUb1a7A/4BZKTuuTE430gNS59J3ftT0QXMREo2J1x4dmf4Zvq9y mt+x+lHZ7cJTHFyuDBrT4NnjChkCNvE9EjBaS+d3ptVQjDrDw6M9yLUz7BAJbBOH5taI Fcdf0w7QUG7ofa0XjdRws5gaRDsLcvzC8Q5z/+R0RCvirJouiafGUM+J9xcSXWrnpH6z X6pHYk0lHP/V0Mh+B1+KVVn0p7KRx3iJuDM1XGB0INdVtkoFvG70QPd6qlW7/WFsnvse snyhbOLmoJ6R09ziFcbLYifcIUO7Bg+V8ak7+pH3I9XmX+UR7oXI8IH9klYAOF5pjJaa vAGQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=onehb6fBn8n0n87goALfEXRII9/fc4EdYpbSYgT8mA0=; b=SNNhJyZLygW/CxQKed69YQUwEOPcFJyt1tjCplKDEbgUnBJ5WdHxA+kW17sX4LpEsz KtEd69KPyIf0HLpGDoNdIbYYx1Wd0J19+aUDKet6Pg/w8VJN/xv6ybRaolJHuW6dnEKA oQMFIEAbZm3iax3ZbvKkdyNNZ01nY+65lSv8L7C+zWeEjbxuGojOkhvRkR4IZpc+Wdxj 5Hvnexwip22tH/9g3e4Y+0iDZg8PZ7kQ6wVHQJv0oPm54fL8EPjEHCACUO8iHHRm2ruS RJ2AADOjKjtXl+Gr+qQoGd048eM0z1yo552cdu5ioHOsW81D0MF+kvsv5KgisQd4Xt/7 TygQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b110si2816473otc.69.2020.03.20.04.51.09; Fri, 20 Mar 2020 04:51:29 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727035AbgCTLtn (ORCPT + 99 others); Fri, 20 Mar 2020 07:49:43 -0400 Received: from mx2.suse.de ([195.135.220.15]:57136 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726814AbgCTLtn (ORCPT ); Fri, 20 Mar 2020 07:49:43 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 32DB4AE7B; Fri, 20 Mar 2020 11:49:41 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id 2E4771E0D66; Fri, 20 Mar 2020 12:49:40 +0100 (CET) Date: Fri, 20 Mar 2020 12:49:40 +0100 From: Jan Kara To: Ritesh Harjani Cc: linux-ext4@vger.kernel.org, "Theodore Y. Ts'o" , "Aneesh Kumar K.V" , Jan Kara Subject: Re: Ext4 corruption with VM images as 3 > drop_caches Message-ID: <20200320114940.GA20455@quack2.suse.cz> References: <87pndagw7s.fsf@linux.ibm.com> <20200320053451.B7AD0AE04D@d06av26.portsmouth.uk.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20200320053451.B7AD0AE04D@d06av26.portsmouth.uk.ibm.com> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri 20-03-20 11:04:50, Ritesh Harjani wrote: > On 3/19/20 6:54 PM, Ritesh Harjani wrote: > > On 3/18/20 9:17 AM, Aneesh Kumar K.V wrote: > > > Hi, > > > > > > With new vm install I am finding corruption with the vm image if I > > > follow up the install with echo 3 > /proc/sys/vm/drop_caches > > > > > > The file system reports below error. > > > > > > Begin: Running /scripts/local-bottom ... done. > > > Begin: Running /scripts/init-bottom ... > > > [??? 4.916017] EXT4-fs error (device vda2): ext4_lookup:1700: inode > > > #787185: comm sh: iget: checksum invalid > > > done. > > > [??? 5.244312] EXT4-fs error (device vda2): ext4_lookup:1700: inode > > > #917954: comm init: iget: checksum invalid > > > [??? 5.257246] EXT4-fs error (device vda2): ext4_lookup:1700: inode > > > #917954: comm init: iget: checksum invalid > > > /sbin/init: error while loading shared libraries: libc.so.6: cannot > > > open shared object file: Error 74 > > > [??? 5.271207] Kernel panic - not syncing: Attempted to kill init! > > > exitcode=0x00007f00 > > > > > > And debugfs reports > > > > > > debugfs:? stat <917954> > > > Inode: 917954?? Type: bad type??? Mode:? 0000?? Flags: 0x0 > > > Generation: 0??? Version: 0x00000000 > > > User:???? 0?? Group:???? 0?? Size: 0 > > > File ACL: 0 > > > Links: 0?? Blockcount: 0 > > > Fragment:? Address: 0??? Number: 0??? Size: 0 > > > ctime: 0x00000000 -- Wed Dec 31 18:00:00 1969 > > > atime: 0x00000000 -- Wed Dec 31 18:00:00 1969 > > > mtime: 0x00000000 -- Wed Dec 31 18:00:00 1969 > > > Size of extra inode fields: 0 > > > Inode checksum: 0x00000000 > > > BLOCKS: > > > debugfs: > > > > > > Bisecting this finds > > > Commit 244adf6426ee31a83f397b700d964cff12a247d3("ext4: make > > > dioread_nolock the default") > > > as bad. If I revert the same on top of linus > > > upstream(fb33c6510d5595144d585aa194d377cf74d31911) > > > I don't hit the corrupttion anymore. > > > > Tried replicating this and could easily replicate it on Power box. > > I tried to reproduce this on x86 too, but could not reproduce on x86. > > Now one difference on Power could be that pagesize is 64K and fs > > blocksize is 4K. > > > > The issue looks like the guest qemu image file is not properly written > > back, after host does echo 3 > drop_caches. (correct me if this is not > > the case). > > Ok. So tried this issue with passing "cache=directsync" parameter to > drive file. This parameter says it should bypass the host side page > cache. With this parameter, I don't see this issue on Power box. OK, so this likely means that there is something hosed in the writeback path using unwritten extents when blocksize < pagesize. Maybe we miss some conversion of unwritten extent to a written one and thus after dropping caches we effectively loose data? Honza > > I tried replicating via below test, but it could not reproduce. > > > > Any idea what kind of unit test could be written for this? > > I am not sure how exactly qemu is writing to it's image file. > > > > > > 1. Create 2 files. "mmap-file", "mmap-data". > > 2. "mmap-file" is a 2GB sparse file. Then at some random offsets (tried > > with both 64KB align and 4KB align offsets), try to write > > pagesize/blocksize amount of known data pattern. > > 3. These offsets (which are pagesize/blocksize align) are recorded into > > "mmap-data" file via normal read/write calls. > > 4. Then after we wrote to both files, we munmap the "mmap-file" and > > close both of these files. > > 5. Then we do echo 3 > drop_caches. > > 6. Then in the verify phase, using the offsets written in "mmap-data" > > file, I read the "mmap-file" to verify if it's contents are proper or > > not. > > With that could not reproduce this issue. > > > > > > -ritesh > > > > > -- Jan Kara SUSE Labs, CR