From: Roman Mamedov Subject: Re: tune2fs can't be used on a mounted ext4, or...? Date: Tue, 12 Apr 2011 00:55:26 +0600 Message-ID: <20110412005526.3015a238@natsu> References: <20110410003954.4108b9c9@natsu> <14B9D41F-4D38-4F01-97E9-17E86DA578FC@dilger.ca> <20110410035005.64f565e3@natsu> <20110411131008.GB5802@thunk.org> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=PGP-SHA1; boundary="Sig_/fWUOfZni/GhxFfdm=_zylQr"; protocol="application/pgp-signature" Cc: Andreas Dilger , linux-ext4@vger.kernel.org, linux-raid@vger.kernel.org To: Ted Ts'o Return-path: In-Reply-To: <20110411131008.GB5802@thunk.org> Sender: linux-raid-owner@vger.kernel.org List-Id: linux-ext4.vger.kernel.org --Sig_/fWUOfZni/GhxFfdm=_zylQr Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable On Mon, 11 Apr 2011 09:10:08 -0400 Ted Ts'o wrote: > Your symptoms don't sound familiar to me, other than the standard > concerns about hardware induced file system inconsistency problems. Thing is, I do not observe any in-file random data corruptions which would point to a problem at a lower (block-device) level, so I do not think it is= a RAID or HDD problem. The breakage seemed to be on the filesystem logic level, perhaps something = to do with allocation of space for new files? And since I immediately just bef= ore that, made two operations possibly affecting it (tune2fs stride size + onli= ne grow with resize2fs) that's why I thought this might be an ext4 problem. While still in the same session, I then re-copied the affected files replac= ing their "shortened" copies, and they were written out fine the second time. A= nd after a reboot, no more file truncations are observed so far. > Have you checked your logs carefully to make sure there weren't any > hardware errors reported? No, there weren't any errors in dmesg, or on the same console where 'cp' wo= uld output its errors. > If this is a hardware RAID system, is it regularly doing disk scrubbing? > Has the hardware RAID reported anything unusual? How long have you been > running in a degraded RAID 6 state? It is an mdadm RAID6, and it does not report any problem. It was running in= a degraded state for only a short time (less than a day). And AFAIK running degraded without one disk is not a dangerous or risky situation with RAID6. > And have you tried shutting down the system and running fsck to make > sure there weren't any file system corruption problems? When's the > last time you've run fsck on the system? I have unmounted it and ran fsck just now. Admittedly there was a long time since the last fsck. # e2fsck /dev/md0 e2fsck 1.41.12 (17-May-2010) /dev/md0 has gone 306 days without being checked, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information /dev/md0: 367107/364412928 files (4.3% non-contiguous), 1219229259/14576267= 52 blocks > If this is an LVM system, I'd strongly suggest that you set aside > space you can take a snapshot, and then regularly take a snapshot, and > then run fsck on the snapshot. If any problems are noted, you can > then schedule downtime and fsck the entire system. No, I don't use LVM there. --=20 With respect, Roman --Sig_/fWUOfZni/GhxFfdm=_zylQr Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk2jTp4ACgkQTLKSvz+PZwhcegCglefgdLAxZ7VhoA3LU4WxW8rr WBYAn1o5FWEdgBcGbOW8B5pIfhJT1nnC =r96r -----END PGP SIGNATURE----- --Sig_/fWUOfZni/GhxFfdm=_zylQr--