Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp2225873ybt; Sun, 28 Jun 2020 12:06:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyJQddyb+WEFwWHt7OLSaDA7pN9YirVmmmjLr0DxSeDqyQ9g7bmew7k3CUK+f/dAKAJ6+tM X-Received: by 2002:a17:906:1998:: with SMTP id g24mr6332112ejd.11.1593371198096; Sun, 28 Jun 2020 12:06:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1593371198; cv=none; d=google.com; s=arc-20160816; b=BPQOOgPx1H79cFSNl8i6rSig7LzxikS35JKf2crOkCVGdPzGh6kj/fK8fxBq6thWon RWTqMJ80cIZVRXjpSucVSosoCne6q6YwvtBNWsKdpI87qwThU1lFiSdLkhqBwYEO7zN7 LhBBLCjA9BLGP6ZQT9PcPG9qw9wLvszAR1GXr8tbNhwpdScxLn98sCQmV2hFjWzlUaop XVxeVcGLapGFkZT+EZVglpFYGSb5U/6fYyZcmgpI4DKvk+2qu0lgifDquidwQ3oV8G/1 nvDHW5KIKLRvfvt9AwqoThuqxnvcOkXBaFqbtL/1mrGNajCRnF/UfOrDaiUGX5w4IeCw +1Xw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:mime-version:date:message-id:subject:from :to; bh=mrXloaHwX+rKgVH57XHjcnGXTJoRW+5W9hLWl1IdMTM=; b=dGe184573LxKw2wTRebmN5WV4cjlhRa+b2TmUrLerpU3bk74CelzhXAPhPU/tPkx13 Fn2ePtsbhgbVFx0LJI56Ln3Rc52x0Derxvo/Y+PgM8ceMySdFY0k+IhXRQ8VsfMFS0+W T3dwSTl3UUea3gsK+zbho91RqVnpSbeKqm8HPwA5gGWqW9Nz+dS53qG20YkYxM6NqBW9 TjYX4ylzQQjOjxNw6x6xrzrvCrYQ9tJEPW2cIHJqKw/RFAtxO2PwVNjzecIOyy9t1LzA lYkDCFWcaCT+kaBxTEkTirjuTCxpzx+QjaeOvraXI6pcWqJAlV7r52oT4XeEJ7ozZTwP gaSw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p14si8896180ejr.470.2020.06.28.12.06.15; Sun, 28 Jun 2020 12:06:38 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726673AbgF1TCV (ORCPT + 99 others); Sun, 28 Jun 2020 15:02:21 -0400 Received: from mail.keff.org ([45.154.255.110]:56868 "EHLO mail.keff.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726585AbgF1TCV (ORCPT ); Sun, 28 Jun 2020 15:02:21 -0400 X-Greylist: delayed 374 seconds by postgrey-1.27 at vger.kernel.org; Sun, 28 Jun 2020 15:02:20 EDT Received: from [172.17.2.124] (unknown [45.154.255.254]) by mail.keff.org (Postfix) with ESMTPSA id 645003E3CABC for ; Sun, 28 Jun 2020 20:55:44 +0200 (CEST) To: linux-kernel@vger.kernel.org From: Sebastian Hyrwall Subject: BTRFS/EXT4 Data Corruption Message-ID: Date: Mon, 29 Jun 2020 01:55:40 +0700 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X--MailScanner-Information: Please contact the ISP for more information X--MailScanner-ID: 645003E3CABC.AF4E3 X--MailScanner: Found to be clean X--MailScanner-From: sh@keff.org X-Spam-Status: No Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Sorry if this is not the right place for this email but I can't think of another place (might be linux-fsdevel) Someone here is ought to be an expert in this. It all started as having file corruptions inside VMs that then led to alot of testing that resulted in replicatable results on the backend NAS. Tests where done by generating 100 1GB files from /dev/urandom to "volume1" (both BTRFS and EXT4 tested). MD5 hashing the files and then copying the files to "volume2". 2-4% of the files would fail the hash match every time the test was done. After alot of fiddling around it turned out that the problem goes away if doing "cp --sparse=never" when copying the files. This would to me exclude any hardware errors and feels more like something deeper inside the kernel. The box runs Kernel 3.10.105. Version >4 seems unaffected (not 100% confirmed, too few testboxes). Here is a diff between a hexdump of a failed file, 43861581c43861581 < 29d464c0: aca0 d68f 0ff4 0bad fa4M-5 1339 8148 30e8 .........E.9.H0. --- > 29d464c0: aca0 d68f 0ff4 0bad fa45 1339 8148 30e8 .........E.9.H0. 55989446c55989446 < 35654c50: 31f4 f7b5 40be 2188 c539 043b 35b4 abb5 1...@.!..9.;5... --- > 35654c50: 3174 f7b5 40be 2188 c539 043b 35b4 abb5 1t..@.!..9.;5... As you can see it's a single flipped bit (31f4, 3174). I'm not sure about "fa4M-5". Never seen "M-" before. Details, Linux 3.10.105, Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz, Volume ontop of lvm and md-raid, md2 : active raid5 sda3[0] sdj3[5] sdi3[4] sdf3[3] sde3[2] sdb3[1]       39046022720 blocks super 1.2 level 5, 64k chunk, algorithm 2 [6/6] [UUUUUU], cp (GNU coreutils) 8.24 BTRFS and EXT4 default mount options. // Sebastian H