Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1750813AbVJJWto (ORCPT ); Mon, 10 Oct 2005 18:49:44 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1750906AbVJJWto (ORCPT ); Mon, 10 Oct 2005 18:49:44 -0400 Received: from artax.karlin.mff.cuni.cz ([195.113.31.125]:46559 "EHLO artax.karlin.mff.cuni.cz") by vger.kernel.org with ESMTP id S1750813AbVJJWtn (ORCPT ); Mon, 10 Oct 2005 18:49:43 -0400 Date: Tue, 11 Oct 2005 00:49:39 +0200 (CEST) From: Mikulas Patocka To: Anton Altaparmakov Cc: Glauber de Oliveira Costa , linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, ext2-devel@lists.sourceforge.net, hirofumi@mail.parknet.co.jp, linux-ntfs-dev@lists.sourceforge.net, aia21@cantab.net, hch@infradead.org, viro@zeniv.linux.org.uk, akpm@osdl.org Subject: Re: [PATCH] Use of getblk differs between locations In-Reply-To: Message-ID: References: <20051010204517.GA30867@br.ibm.com> <20051010214605.GA11427@br.ibm.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4421 Lines: 94 On Mon, 10 Oct 2005, Anton Altaparmakov wrote: > On Mon, 10 Oct 2005, Mikulas Patocka wrote: >> On Mon, 10 Oct 2005, Glauber de Oliveira Costa wrote: >>> On Mon, Oct 10, 2005 at 10:20:07PM +0100, Anton Altaparmakov wrote: >>>> On Mon, 10 Oct 2005, Glauber de Oliveira Costa wrote: >>>>> I've just noticed that the use of sb_getblk differs between locations >>>>> inside the kernel. To be precise, in some locations there are tests >>>>> against its return value, and in some places there are not. >>>>> >>>>> According to the comments in __getblk definition, the tests are not >>>>> necessary, as the function always return a buffer_head (maybe a wrong >>>>> one), >>>> >>>> If you had read the source code rather than just the comments you would >>>> have seen that this is not true. It can return NULL (see >>>> fs/buffer.c::__getblk_slow()). Certainly I would prefer to keep the >>>> checks in NTFS, please. They may only be good for catching bugs but I >>>> like catching bugs rather than segfaulting due to a NULL dereference. >> >> The check should be rather a BUG() than dump_stack() and return NULL --- I >> think it's not right to write code to recover from programming errors. > > Why programming errors? It could be faulty memory or other corruption, > perhaps even caused by a different driver altogether (e.g. I found a bug > in ntfs last week which caused it to memset() to zero a random location in > memory of a random size and it caused a lot of strange effects like my You are lucky that it didn't hit buffers with inode table or something like that. > shell suddenly exiting and me being left on the login prompt...). Also it > could be that the function one day changes and it can return NULL. It is > far safer to do checking than to make assumptions about not being able to > return NULL. It is safest to panic() in case of overwriting random blocks of memory --- not even oops or BUG() but panic... --- you lose your running processes in that case but at least you won't lose anything on disk. I got inode table corrupted because of a completely unrelated bug (luckily it hit the part with /dev/ nodes that were easy to recreate --- if it hit some real important directory, I would have to reinstall). >> Filesystem drivers are supposed to pass correct blocksize to getblk(). --- >> even for users it's better to crash, because user whose machine has locked up >> on BUG() will report bug more likely than user whose machine has written stack >> dump into log and corrupted filesystem --- by the time he discovers the >> corruption and mesage he might not even remember what triggered it. >> >> As comment in buffer.c says, getblk will deadlock if the machine is out of >> memory. It is questionable whether to deadlock or return NULL and corrupt >> filesystem in this case --- deadlock is probably better. > > What do you mean corrupt filesystem? If a filesystem is written so badly > that it will cause corruption when a NULL is returned somewhere, I > certainly don't want to have anything to do with it. What should a filesystem driver do if it can't suddenly read or write any blocks on media? > Going BUG() is generally a bad thing if the error can be recovered from. > Certainly all my code attempts to recover from all error conditions it can > possibly encounter. > > I would much rather see NULL and then handle the error gracefully with an > error message than go BUG(). You can then still umount and remove the fs > module and everything works fine (you may need an fsck you may not depends > on how good your error handling is). If you do a BUG() you are guaranteed > to cause corruption... > > I only use BUG() when something really cannot happen unless there is a > bug in which case I want to know it... Of course, this "can't happen unless there is a bug" is exactly the case of __getblk_slow(). Mikulas > Best regards, > > Anton > -- > Anton Altaparmakov (replace at with @) > Unix Support, Computing Service, University of Cambridge, CB2 3QH, UK > Linux NTFS maintainer / IRC: #ntfs on irc.freenode.net > WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/ > - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/