Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Fri, 28 Feb 2003 03:24:07 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Fri, 28 Feb 2003 03:24:07 -0500 Received: from ns.suse.de ([213.95.15.193]:41740 "EHLO Cantor.suse.de") by vger.kernel.org with ESMTP id ; Fri, 28 Feb 2003 03:24:05 -0500 To: torvalds@transmeta.com (Linus Torvalds) Cc: linux-kernel@vger.kernel.org, lse-tech@projects.sourceforge.net Subject: Re: [PATCH] New dcache / inode hash tuning patch References: <20030226164904.GA21342@wotan.suse.de.suse.lists.linux.kernel> From: Andi Kleen Date: 28 Feb 2003 09:34:24 +0100 In-Reply-To: torvalds@transmeta.com's message of "28 Feb 2003 00:24:06 +0100" Message-ID: X-Mailer: Gnus v5.7/Emacs 20.7 Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3906 Lines: 82 torvalds@transmeta.com (Linus Torvalds) writes: [please read the whole post before answering. thanks] > But quite frankly, if the hash list heads are actually noticeable > memory users, the hash is likely to be _way_ too big. The list heads > are _much_ smaller than the entries they point to, the hash list just > shouldn't be big enough that it really matters. On big machines it is currently 1+MB. IMHO this is way too big. > > - hash list cache footprint > > Again: the hash head array itself is at least dense in the cache, and > each entry is _much_ smaller than the actual data structures it > points to. So even if you improve the hash heads to be better from a > cache standpoint, you're only getting a very small percentage of the > real cache costs. It would be possible to cache line optimize the layout of struct dentry in addition. May be an interesting add-on project for someone... But for lookup walking even one cache line - the one containing d_hash - should be needed. Unless d_hash is unlucky enough to cross a cache line for its two members ... but I doubt that. > > So let's say that the cache costs of the dcache is 4% (according to > the oprofile run), then saving a few procent of that is not actually > going to be noticeable at a user level. > > And the downsides of the hash list is that addition/removal is costlier > due to the conditionals, and a non-conditional version (a common But the list walking is faster. Testing for NULL generates much better code on i386 than having to dedicate a register for storing the head to test against. List walking happens more often than insertion/deletion. I believe the conditionals are completely left in the noise compared to the cache misses the two pointer head version causes. You can execute a lot of conditionals in the time needed to serve one cache miss! Please take a look at the x86 assembly generated by list_for_each vs hlist_for_each. hlist_for_each looks much nicer, especially when you can use the register normally wasted on the head for something else in the loop body. In case of dcache rcu it also made things simpler/faster because it didn't require the complicated is_bucket race breaker check. > In other words: it may be that our current dentry hashes are too big, > and that is certainly worth fixing if so. But the "hlist" approach very > fundamentally cannot really help the _real_ problem very much, and it > will (slightly) hurt the case where the hashes are actually cached. I admit it is a kind of micro optimization, but I believe it is an useful one. Frankly wasting two pointers for a hash bucket in a potentially big hash table is just s*d. > > So I really think that the only longterm fix is to make the lookup data > structures be "local" to the base of the lookup, in order to get away > from the inherently non-local nature of the current hash lookups. Yes, that may be a good idea. I don't have time to work on this though. Still even with local lookups single pointer buckets will likely help ;) Also isn't it a bit late in the 2.5 cycle to think about such radical changes like local lookup? It sounds more like a nice 2.7 project. I believe my patch with a bit more tweaking (my current 64K hash table seems to be too small) is suitable even for an soon to be stable kernel. Also my patch had some other changes that I believe should be included anyways because they're independent and improvement. It replaces the max_dentries race break hack with a better algorithm to detect cycles on walk. Also it does more prefetches while list walking which I believe to be useful. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/