From: jim owens Subject: Re: [PATCH 3/3] Add timeout feature Date: Mon, 14 Jul 2008 10:04:30 -0400 Message-ID: <487B5CEE.90404@hp.com> References: <20080709005254.GQ11558@disturbed> <20080709010922.GE9957@mit.edu> <20080709061621.GA5260@infradead.org> <20080708234120.5072111f@infradead.org> <20080708235502.1c52a586@infradead.org> <20080709071346.GS11558@disturbed> <20080709110900.GI9957@mit.edu> <20080709114958.GV11558@disturbed> <4874C3E8.20804@hp.com> <88E7CDF01964465CB9F33DE11298271D@nsl.ad.nec.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: mtk.manpages@googlemail.com, axboe@kernel.dk, linux-kernel@vger.kernel.org, dm-devel@redhat.com, xfs@oss.sgi.com, linux-ext4@vger.kernel.org, viro@ZenIV.linux.org.uk, akpm@linux-foundation.org, pavel@suse.cz, linux-fsdevel@vger.kernel.org, hch@infradead.org, Miklos Szeredi , Arjan van de Ven , Theodore Tso , Dave Chinner To: Takashi Sato Return-path: Received: from g1t0027.austin.hp.com ([15.216.28.34]:26026 "EHLO g1t0027.austin.hp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753492AbYGNOEn (ORCPT ); Mon, 14 Jul 2008 10:04:43 -0400 In-Reply-To: <88E7CDF01964465CB9F33DE11298271D@nsl.ad.nec.co.jp> Sender: linux-ext4-owner@vger.kernel.org List-ID: Takashi Sato wrote: > What is the difference between the timeout and AUTO-THAW? > When the kernel detects a deadlock, does it occur to solve it? TIMEOUT is a user-specified limit for the freeze. It is not a deadlock preventer or deadlock breaker. The reason it exists is: - middle of the night (low but not zero users) - cron triggers freeze and hardware snapshot - san is overloaded by tape copy traffic so hardware will take 2 hours to ack snapshot done - user "company president" tries to create a report needed for an AM meeting with bankers - with so few users, system will just patiently wait for hardware to finish - after 10 minutes "company president" pages admin, admin's boss, and "IT vice president" in a real unhappy mood AUTO-THAW is simply a name for the effect of all deadlock preventer and deadlock breaker code that the kernel has in the freeze implementation paths... if that code would unfreeze the filesystem. We also implemented deadlock preventer code that does not thaw the freeze. None of the AUTO-THAW code is there to stop a stupid userspace program caller of freeze. It handles things like "a system in our cluster is going down so we must have this filesystem unfrozen or the whole cluster will crash". In places where there could be a kernel deadlock we made it "lock-only-if-non-blocking" and if we could not wait to retry later, the failure to lock would trigger an immediate unfreeze. Deadlock prevention needs code in critical paths in more than just filesystems. Sometimes this is as simple as an "I can't wait on freeze" flag added to a vm-filesystem interface. Timers just don't work for keeping the kernel alive because they don't trigger on resource exhaustion. jim