From: Ma Ling <[email protected]>
In order to reduce memory latency when last level cache miss occurs,
modern CPUs i.e. x86 and arm introduced Critical Word First(CWF) or
Early Restart(ER) to get data ASAP. For CWF if critical word is first member
in cache line, memory feed CPU with critical word, then fill others
data in cache line one by one, otherwise after critical word it must
cost more cycle to fill the remaining cache line. For Early First CPU will
restart until critical word in cache line reaches.
Hash value is critical word, so in this patch we place it as first member
in cache line(sock address is cache-line aligned), and it is also good for
Early Restart platform as well .
Thanks
Ling
---
include/net/sock.h | 14 +++++++-------
1 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/include/net/sock.h b/include/net/sock.h
index 4a45216..312cf31 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -128,10 +128,10 @@ struct net;
/**
* struct sock_common - minimal network layer representation of sockets
- * @skc_daddr: Foreign IPv4 addr
- * @skc_rcv_saddr: Bound local IPv4 addr
* @skc_hash: hash value used with various protocol lookup tables
* @skc_u16hashes: two u16 hash values used by UDP lookup tables
+ * @skc_daddr: Foreign IPv4 addr
+ * @skc_rcv_saddr: Bound local IPv4 addr
* @skc_family: network address family
* @skc_state: Connection state
* @skc_reuse: %SO_REUSEADDR setting
@@ -149,16 +149,16 @@ struct net;
* for struct sock and struct inet_timewait_sock.
*/
struct sock_common {
+ /* skc_hash belongs to first member in sock with cache-line aligned*/
+ union {
+ unsigned int skc_hash;
+ __u16 skc_u16hashes[2];
+ };
/* skc_daddr and skc_rcv_saddr must be grouped :
* cf INET_MATCH() and INET_TW_MATCH()
*/
__be32 skc_daddr;
__be32 skc_rcv_saddr;
-
- union {
- unsigned int skc_hash;
- __u16 skc_u16hashes[2];
- };
unsigned short skc_family;
volatile unsigned char skc_state;
unsigned char skc_reuse;
--
1.7.1
On Mon, 2012-11-26 at 11:29 +0800, [email protected] wrote:
> From: Ma Ling <[email protected]>
>
> In order to reduce memory latency when last level cache miss occurs,
> modern CPUs i.e. x86 and arm introduced Critical Word First(CWF) or
> Early Restart(ER) to get data ASAP. For CWF if critical word is first member
> in cache line, memory feed CPU with critical word, then fill others
> data in cache line one by one, otherwise after critical word it must
> cost more cycle to fill the remaining cache line. For Early First CPU will
> restart until critical word in cache line reaches.
>
> Hash value is critical word, so in this patch we place it as first member
> in cache line(sock address is cache-line aligned), and it is also good for
> Early Restart platform as well .
>
> Thanks
> Ling
networking patches should be sent to netdev.
(I understand this patch is more a generic one, but at least CC netdev)
You give no performance numbers for this change...
I never heard of this CWF/ER, where are the official Intel documents
about this, and what models really benefit from it ?
Also, why not moving skc_net as well ?
BTW, skc_daddr & skc_rcv_saddr are 'critical' as well, we use them in
INET_MATCH()
It seems we have a 32bit hole on 64bit arches, so we probably should
move inet_dport/inet_num in it. It could well remove a full cache line
miss (I'll send a patch for this after tests)
Thanks
On Sun, 2012-11-25 at 22:44 -0800, Eric Dumazet wrote:
> On Mon, 2012-11-26 at 11:29 +0800, [email protected] wrote:
> > From: Ma Ling <[email protected]>
> >
> > In order to reduce memory latency when last level cache miss occurs,
> > modern CPUs i.e. x86 and arm introduced Critical Word First(CWF) or
> > Early Restart(ER) to get data ASAP. For CWF if critical word is first member
> > in cache line, memory feed CPU with critical word, then fill others
> > data in cache line one by one, otherwise after critical word it must
> > cost more cycle to fill the remaining cache line. For Early First CPU will
> > restart until critical word in cache line reaches.
> >
> > Hash value is critical word, so in this patch we place it as first member
> > in cache line(sock address is cache-line aligned), and it is also good for
> > Early Restart platform as well .
> >
> > Thanks
> > Ling
>
> networking patches should be sent to netdev.
>
> (I understand this patch is more a generic one, but at least CC netdev)
>
> You give no performance numbers for this change...
>
> I never heard of this CWF/ER, where are the official Intel documents
> about this, and what models really benefit from it ?
[...]
CWF is a standard feature of SDRAM. Ulrich Drepper's series of articles
on memory covers this in part 2 <http://lwn.net/Articles/252125/>
section 3.5.2. As for whether it's slower to start fetching from the
middle, that may depend on the memory controller and memory type that
are used. Drepper's benchmark showed only a small penalty (<1%) for
fetching from the middle, though he didn't say anything particular about
the hardware configuration.
Ben.
--
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.
> networking patches should be sent to netdev.
>
> (I understand this patch is more a generic one, but at least CC netdev)
Ling: OK, this is my first inet patch, I will send to netdev later.
> You give no performance numbers for this change...
Ling: after I get machine, I will send out test result.
> I never heard of this CWF/ER, where are the official Intel documents
> about this, and what models really benefit from it ?
Ling:
Arm implemented it.
http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388f/Caccifbd.html
AMD also used it.
http://classes.soe.ucsc.edu/cmpe202/Fall04/papers/opteron.pdf
> Also, why not moving skc_net as well ?
>
> BTW, skc_daddr & skc_rcv_saddr are 'critical' as well, we use them in
> INET_MATCH()
Ling: in the looking-up routine, hash value is the most important key,
if it is matched, the other values have most possibility to be
satisfied, and CFW is limited by memory bandwidth(64bit usually), so
we only move hash value as critical first word.
Thanks
Ling
On Tue, 2012-11-27 at 21:48 +0800, Ling Ma wrote:
> Ling: in the looking-up routine, hash value is the most important key,
> if it is matched, the other values have most possibility to be
> satisfied, and CFW is limited by memory bandwidth(64bit usually), so
> we only move hash value as critical first word.
In practice, we have at most one TCP socket per hash slot.
99.9999 % of lookups need all fields to complete.
Your patch introduces a misalignment error. I am not sure all 64 bit
arches are able to cope with that gracefully.
It seems all CWF docs I could find are very old stuff, mostly academic,
without good performance data.
I was asking for up2date statements from Intel/AMD/... about current
cpus and current memory. Because optimizing for 10 years olds cpus is
not worth the pain.
I am assuming cpus are implementing the CWF/ER automatically, and that
only prefetches could have a slight disadvantage if the needed word is
not the first word in the cache line. Its not clear why the prefetch()
hint could not also use CWF. It seems it also could be done by the
hardware.
So before random patches in linux kernel adding their possible bugs, we
need a good study.
Thanks