Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161156AbXAHF5K (ORCPT ); Mon, 8 Jan 2007 00:57:10 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1161166AbXAHF5J (ORCPT ); Mon, 8 Jan 2007 00:57:09 -0500 Received: from artax.karlin.mff.cuni.cz ([195.113.31.125]:57347 "EHLO artax.karlin.mff.cuni.cz" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1161156AbXAHF5I (ORCPT ); Mon, 8 Jan 2007 00:57:08 -0500 Date: Mon, 8 Jan 2007 06:57:06 +0100 (CET) From: Mikulas Patocka To: Miklos Szeredi Cc: pavel@ucw.cz, matthew@wil.cx, bhalevy@panasas.com, arjan@infradead.org, jaharkes@cs.cmu.edu, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, nfsv4@ietf.org Subject: Re: Finding hardlinks In-Reply-To: Message-ID: References: <4593E1B7.6080408@panasas.com> <20070102191504.GA5276@ucw.cz> <20070103115632.GA3062@elf.ucw.cz> <20070103135455.GA24620@parisc-linux.org> <20070104225929.GC8243@elf.ucw.cz> <20070105131235.GB4662@ucw.cz> X-Personality-Disorder: Schizoid MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1763 Lines: 38 >> And does it matter? If you rename a file, tar might skip it no matter of >> hardlink detection (if readdir races with rename, you can read none of the >> names of file, one or both --- all these are possible). >> >> If you have "dir1/a" hardlinked to "dir1/b" and while tar runs you delete >> both "a" and "b" and create totally new files "dir2/c" linked to "dir2/d", >> tar might hardlink both "c" and "d" to "a" and "b". >> >> No one guarantees you sane result of tar or cp -a while changing the tree. >> I don't see how is_samefile() could make it worse. > > There are several cases where changing the tree doesn't affect the > correctness of the tar or cp -a result. In some of these cases using > samefile() instead of st_ino _will_ result in a corrupted result. ... and those are what? If you create hardlinks while copying, you may have files duplicated instead of hardlinked in the backup. If you unlink hardlinks, cp will miss hardlinks too and create two copies of the same file (it searches the hash only for files with i_nlink > 1). If you rename files, the archive will be completely fscked up (either missing or duplicate files). > Generally samefile() is _weaker_ than the st_ino interface in > comparing the identity of two files without using massive amounts of > memory. You're searching for a better solution, not one that is > broken in a different way, aren't you? What is the relevant case where st_ino/st_dev works and samefile(char *path1, char *path2) doesn't? > Miklos Mikulas - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/