Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp978096ybl; Fri, 24 Jan 2020 13:04:55 -0800 (PST) X-Google-Smtp-Source: APXvYqw6+LTlerBHzWbtWJxeCXSSU0cS8x/O95rTo9tst6H40lYSgr1Iow7eR+opHx++WaZdwkMe X-Received: by 2002:aca:4587:: with SMTP id s129mr531250oia.124.1579899895362; Fri, 24 Jan 2020 13:04:55 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1579899895; cv=none; d=google.com; s=arc-20160816; b=y6ssndyxlRogCpL1hwrnJeROBRdUtdwEEKz1xfQ2BkFNvTpoiYslcmIyC7Yqum8yRq IXR++jN3zX5+xtC4STipsVw+ZOVte6DA5FBZsTXMMiQPqVSA2rf7nKrcoU4duW1PIflN bIYTbUE6RwDts1bCmCmJiNXf6O4i4GqA/Jh5mOgq7D7qXaDnWBWmc77fridkX+oWwwOD M6pSrKaxGbmUXjV8Fu1MuOjaWrbcNnnVZkit8Z1CG1bIoHCDVMEfOuDR8gdfXt4Z8irq K9taZCtIXEyNwd7vVo69rXB3WWO6bDdIO+Age8BOyaxDb+ydgzDfVk5vviLfkSU5EkhV AKuQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=+6BjpYWgYlm4UFXyn3FA2N9M9oaGntOUkk0h+ky9ia4=; b=UvVwTPeFV93jEUsb4XbecKxbLDnrTVbCwlfDkksrBnklcOtPV07cFQGJT/izSUhJOe LHXZFeKvsioJT0a8+KoLhh+HaaE7GhTLByVMFoQKKJPQvqT8+pdMfJh3jPZ2MVRyHa3h ybqZ0E3ljFakQL1mpjk2u8BSEUIJes6t7nMRZYf69nJNCJjefjTCiMJRPWKNJcrbVzEZ S9Ym21IZbeO5OI2EjCoEPDROfqCJHB5FeLVz/L9t4AsDAWomUBCq4BHlNFqiuCxSdH6+ 770Z2bVpr0xDAce0xXkabVhZ4dTLzhfs2jV7s7yCnIEMguVIyUfVIqflz4F436pzEDau td4g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d21si358716oic.168.2020.01.24.13.04.43; Fri, 24 Jan 2020 13:04:55 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-ext4-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-ext4-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2387468AbgAXUhi (ORCPT + 99 others); Fri, 24 Jan 2020 15:37:38 -0500 Received: from outgoing-auth-1.mit.edu ([18.9.28.11]:58454 "EHLO outgoing.mit.edu" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2387393AbgAXUhh (ORCPT ); Fri, 24 Jan 2020 15:37:37 -0500 Received: from callcc.thunk.org (rrcs-67-53-201-206.west.biz.rr.com [67.53.201.206]) (authenticated bits=0) (User authenticated as tytso@ATHENA.MIT.EDU) by outgoing.mit.edu (8.14.7/8.12.4) with ESMTP id 00OKbPBc002715 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Fri, 24 Jan 2020 15:37:29 -0500 Received: by callcc.thunk.org (Postfix, from userid 15806) id 1664942014A; Fri, 24 Jan 2020 15:37:25 -0500 (EST) Date: Fri, 24 Jan 2020 15:37:25 -0500 From: "Theodore Y. Ts'o" To: Jean-Louis Dupond Cc: linux-ext4@vger.kernel.org Subject: Re: Filesystem corruption after unreachable storage Message-ID: <20200124203725.GH147870@mit.edu> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Fri, Jan 24, 2020 at 11:57:10AM +0100, Jean-Louis Dupond wrote: > > There was a short disruption of the SAN, which caused it to be unavailable > for 20-25 minutes for the ESXi. 20-25 minutes is "short"? I guess it depends on your definition / POV. :-) > What was observed in the VM was the following: > OK, to be expected. > > - After 1080 seconds (SCSi Timeout of 180 * 5 Retries + 1): > [5878128.028672] sd 0:0:1:0: timing out command, waited 1080s > [5878128.028701] sd 0:0:1:0: [sdb] tag#592 FAILED Result: hostbyte=DID_OK > driverbyte=DRIVER_OK > [5878128.028703] sd 0:0:1:0: [sdb] tag#592 CDB: Write(10) 2a 00 06 0c b4 c8 > 00 00 08 00 > [5878128.028704] print_req_error: I/O error, dev sdb, sector 101496008 > [5878128.028736] EXT4-fs warning (device dm-2): ext4_end_bio:323: I/O error > 10 writing to inode 3145791 (offset 0 size 0 starting block 12686745) > > So you see the I/O is getting aborted. Also expected. > - When the SAN came back, then the filesystem went Read-Only: > [5878601.212415] EXT4-fs error (device dm-2): ext4_journal_check_start:61: > Detected aborted journal Yep.... > Now I did a hard reset of the machine, and a manual fsck was needed to get > it booting again. > Fsck was showing the following: > "Inodes that were part of a corrupted orphan linked list found." > > Manual fsck started with the following: > Inodes that were part of a corrupted orphan linked list found. Fix? > Inode 165708 was part of the orphaned inode list. FIXED > > Block bitmap differences: -(863328--863355) > Fix? > > What worries me is that almost all of the VM's (out of 500) were showing the > same error. So that's a bit surprising... > And even some (+-10) were completely corrupt. What do you mean by "completely corrupt"? Can you send an e2fsck transcript of file systems that were "completely corrupt"? > Is there for example a chance that the filesystem gets corrupted the moment > the SAN storage was back accessible? Hmm... the one possibility I can think of off the top of my head is that in order to mark the file system as containing an error, we need to write to the superblock. The head of the linked list of orphan inodes is also in the superblock. If that had gotten modified in the intervening 20-25 minutes, it's possible that this would result in orphaned inodes not on the linked list, causing that error. It doesn't explain the more severe cases of corruption, though. > I also have some snapshot available of a corrupted disk if some additional > debugging info is required. Before e2fsck was run? Can you send me a copy of the output of dumpe2fs run on that disk, and then transcript of e2fsck -fy run on a copy of that snapshot? > It would be great to gather some feedback on how to improve the situation > (next to of course have no SAN outage :)). Something that you could consider is setting up your system to trigger a panic/reboot on a hung task timeout, or when ext4 detects an error (see the man page of tune2fs and mke2fs and the -e option for those programs). There are tradeoffs with this, but if you've lost the SAN for 15-30 minutes, the file systems are going to need to be checked anyway, and the machine will certainly not be serving. So forcing a reboot might be the best thing to do. > On KVM for example there is a unlimited timeout (afaik) until the storage is > back, and the VM just continues running after storage recovery. Well, you can adjust the SCSI timeout, if you want to give that a try.... Cheers, - Ted