Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755226AbXJYBZP (ORCPT ); Wed, 24 Oct 2007 21:25:15 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1753165AbXJYBZB (ORCPT ); Wed, 24 Oct 2007 21:25:01 -0400 Received: from mx1.redhat.com ([66.187.233.31]:46301 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753105AbXJYBZA (ORCPT ); Wed, 24 Oct 2007 21:25:00 -0400 Date: Thu, 25 Oct 2007 02:24:56 +0100 From: Alasdair G Kergon To: device-mapper development Cc: linux-kernel@vger.kernel.org Subject: Re: [dm-devel] [PATCH] dm: noflush resizing (0/3) Message-ID: <20071025012456.GK10006@agk.fab.redhat.com> Mail-Followup-To: device-mapper development , linux-kernel@vger.kernel.org References: <471FB83D.4060307@ce.jp.nec.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <471FB83D.4060307@ce.jp.nec.com> User-Agent: Mutt/1.4.1i Organization: Red Hat UK Ltd. Registered in England and Wales, number 03798903. Registered Office: Amberley Place, 107-111 Peascod Street, Windsor, Berkshire, SL4 1TE. Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2597 Lines: 54 On Wed, Oct 24, 2007 at 05:25:17PM -0400, Jun'ichi Nomura wrote: > - For some device-mapper targets (multipath and mirror), > the mapping table sometimes has to be replaced to cope with device > failure. > OTOH, device-mapper flushes all pending I/Os upon table replacement > and may result in I/O errors, if there are device failures. > 'noflush' suspend is used to let dm queue the pending I/Os > instead of flushing them. > Since it's not possible for user space program to tell whether > the suspend could cause I/O error, they always use > 'noflush' to suspend mirror/multipath targets. > > - Currently resizing is disabled for 'noflush' suspend. > Resizing occurs in the course of table replacement. > To resize the device under use, device-mapper needs to get its > bdev inode. However, using bdget() in this case could cause deadlock > by waiting for I_LOCK where an I/O process holding I_LOCK is > waiting for completion of table replacement. Before reviewing the details of the proposed workaround, I'd like to see a deeper analysis of the problem to see that there isn't a cleaner way to resolve this. For example: Question) What are the realistic situations we must support that lead to a resize during table reload with I/O outstanding? - The resize is the purpose of the reload; noflush is only set to avoid losing I/O if a path should fail. So any outstanding I/O may be expected to be consistent with both the old and new sizes of the device. E.g. If it's beyond the end of a shrinking device and userspace cared about not losing that I/O, it would have waited for that I/O to be flushed *before* issuing the resize. If the I/O is beyond the end of the existing device but within the new size, userspace would have waited for the resize operation to complete before allowing the new I/O to be issued. => Is it OK for device-mapper to handle the device size check internally, rejecting any I/O that falls beyond the end of the table (it already must do this lookup anyway), and to update the size recorded in the inode later, after I/O is flowing through the device again, but (of course) before reporting that the resize operation is complete? I.e. does it eliminate deadlocks if the bdget() and i_size_write() happen after the 'resume'? Alasdair -- agk@redhat.com - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/