Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Mon, 11 Mar 2002 23:20:32 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Mon, 11 Mar 2002 23:20:12 -0500 Received: from parcelfarce.linux.theplanet.co.uk ([195.92.249.252]:54798 "EHLO www.linux.org.uk") by vger.kernel.org with ESMTP id ; Mon, 11 Mar 2002 23:20:00 -0500 Date: Tue, 12 Mar 2002 04:19:58 +0000 From: wli@parcelfarce.linux.theplanet.co.uk To: andrea@suse.de Cc: "Richard B. Johnson" , linux-kernel@vger.kernel.org, riel@surriel.com, hch@infradead.org, phillips@bonn-fries.net Subject: Re: 2.4.19pre2aa1 Message-ID: <20020312041958.C687@holomorphy.com> Mail-Followup-To: wli@parcelfarce.linux.theplanet.co.uk, andrea@suse.de, "Richard B. Johnson" , linux-kernel@vger.kernel.org, riel@surriel.com, hch@infradead.org, phillips@bonn-fries.net Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Description: brief message Content-Disposition: inline User-Agent: Mutt/1.2.5.1i Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 08, 2002 at 01:22:39AM +0100, Andrea Arcangeli wrote: > - "i" is the random not predictable input > - "i" is generated by dropping the non random bits of the "page" > with a cute trick that drops the bit with > a shiftright with an immediate argument, it's not a > division so it's faster it could been > "page/sizeof(struct page)" but it would been slower > - s(i) takes into account not just the last "wait_table->shift" bytes > of the random information in "i", but also the next "shift" > bytes, so it's more random even if some of those lower or higher > bits is generating patterns for whatever reason > - finally you mask out the resulting random information s(i) with > size-1, so you fit into the wait_table->head memory. > The result is that it exploits most of the random input, and physically > consecutive pages are liekly to use the same cacheline in case there is > some pattern. That looks a good algorithm to me. I'm sorry, this reasoning is too flawed to address in detail. I'm a programmer, not a schoolteacher. I'll drop a hint: "randomness" is not the right word to use. The way to make a meaningful statement is a statistical measurement of how close the resulting bucket distributions are to uniform, as uniform distributions across buckets minimize collisions. I suggest the usual chi^2 and Kolmogorov-Smirnov (and variants) tests. The results of these tests are simple: they will compute the probability that the difference between the measured distribution and a true uniform distribution could not have occurred by chance (i.e. whether its difference from the uniform distribution is statistically significant). With these things in hand you should be able to make more meaningful assertions about your hashing algorithms. Cheers, Bill P.S.: The quality (or lack thereof) of this hash function is already visible in histograms of the pagecache bucket distribution on larger systems. Statistical measurements of its quality reflect this very clearly as well. Someone needs to move the stable out of the Sahara and closer to a lake. http://www.samba.org/~anton/linux/pagecache/pagecache_before.png is a histogram of the pagecache hash function's bucket distribution on an SMP ppc64 machine after some benchmark run. http://www.samba.org/~anton/linux/pagecache/pagecache_after.png has a histogram of a Fibonacci hashing hash function's bucket distribution on the same machine after an identical benchmark run. (There is more to come from the hash table analysis front.) - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/