Note that using memset() is better reserved to initialise variable-size
structures or buffers. Even if memset() is extremely optimised,
it is still not as fast as not doing anything.
read_events(...) {
struct io_event ent;
memset(&ent, 0, sizeof(ent));
while (...) {
aio_read_evt(ctx, &ent);
}
...
}
Should be written (when "ent" has to be cleared):
read_events(...) {
struct io_event ent = {};
while (...) {
aio_read_evt(ctx, &ent);
}
...
}
Just compare the code generated by (using GCC):
struct io_event ent;
memset(&ent, 0, sizeof(ent));
ent.data = 0;
if (ent.obj != 0) printf ("bad");
And:
struct io_event ent = {};
ent.data = 0;
if (ent.obj != 0) printf ("bad");
and that is even without speaking of complete variable elimination
when the structure is not used, unknown pointer alignement when
memset function is not inlined, or aliasing optimisation.
Etienne.
___________________________________________________________
Do You Yahoo!? -- Une adresse @yahoo.fr gratuite et en fran?ais !
Yahoo! Mail : http://fr.mail.yahoo.com
On Thu, 2003-07-10 at 12:04, Etienne Lorrain wrote:
> Note that using memset() is better reserved to initialise variable-size
> structures or buffers. Even if memset() is extremely optimised,
> it is still not as fast as not doing anything.
this is not always true....
memset can be used as an optimized cache-warmup, which can avoid the
write-allocate behavior of normal writes, which means that if you memset
a structure first and then fill it, it can be halve the memory bandwidth
and thus half as fast. This assumes an optimized memset which we
*currently* don't have I think... but well, we can fix that ;)
Arjan van de Ven <[email protected]> writes:
> On Thu, 2003-07-10 at 12:04, Etienne Lorrain wrote:
> > Note that using memset() is better reserved to initialise variable-size
> > structures or buffers. Even if memset() is extremely optimised,
> > it is still not as fast as not doing anything.
>
> this is not always true....
> memset can be used as an optimized cache-warmup, which can avoid the
> write-allocate behavior of normal writes, which means that if you memset
> a structure first and then fill it, it can be halve the memory bandwidth
> and thus half as fast. This assumes an optimized memset which we
> *currently* don't have I think... but well, we can fix that ;)
You don't want to use such an memset unlike you're clearing areas
which are significantly bigger than all your cache (>several MB)
The problem is that the instruction that avoid write-allocate usually also force
the result out of cache. And for small data sets that is typically a loss
if you want to use the data later, because the later use eats full cache misses.
In the kernel such big buffers occur only very rarely, most operations are on
4K and less. For those only in cache operation is interesting.
-Andi
On Thu, Jul 10, 2003 at 12:29:10PM +0200, Andi Kleen wrote:
> The problem is that the instruction that avoid write-allocate usually also force
> the result out of cache.
that's for the current implementation; rep stosl may get the WA-avoiding
behavior sometime without the negative cache effects.. someday maybe.