Greetings:
The patch only changes two lines which redefine the ac_uid/ac_gid fields as
uid_t/gid_t respectively. Fixes accounting for high uid/gids.
--Arthur Corliss
Bolverk's Lair -- http://arthur.corlissfamily.org/
Digital Mages -- http://www.digitalmages.com/
"Live Free or Die, the Only Way to Live" -- NH State Motto
On Thu, 4 Mar 2004, Arthur Corliss wrote:
> The patch only changes two lines which redefine the ac_uid/ac_gid fields as
> uid_t/gid_t respectively. Fixes accounting for high uid/gids.
Do the userspace commands that parse the acct files
know how to deal with this format change ?
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
On Thu, 4 Mar 2004, Rik van Riel wrote:
> On Thu, 4 Mar 2004, Arthur Corliss wrote:
>
> > The patch only changes two lines which redefine the ac_uid/ac_gid fields as
> > uid_t/gid_t respectively. Fixes accounting for high uid/gids.
>
> Do the userspace commands that parse the acct files
> know how to deal with this format change ?
Well, that's the ugly part. I had to munge glibc's sys/acct.h to reflect the
same changes as well to get the gnu tools to read the file correctly (the
accounting file does need to be started fresh with this struct, btw).
Here's the problem: the kernel uses 32 bit uids internally, but external code
would default to 16 bit due to the ifdefs.
Unfortunately, I don't how else to support accounting with high uids. All it
takes is two lines each in two files (linux/acct.h & sys/acct.h) to get
everything to work correctly.
So, what's the consensus: are we going to leave accounting broken for high
uids, or support only low uids? If I can get a patch accepted by the glibc
maintainers as well this will work for everyone.
I actually use high uids, so this is production issue for me. :-P
--Arthur Corliss
Bolverk's Lair -- http://arthur.corlissfamily.org/
Digital Mages -- http://www.digitalmages.com/
"Live Free or Die, the Only Way to Live" -- NH State Motto
> > Do the userspace commands that parse the acct files
> > know how to deal with this format change ?
>
> Well, that's the ugly part. I had to munge glibc's sys/acct.h to reflect the
> same changes as well to get the gnu tools to read the file correctly (the
> accounting file does need to be started fresh with this struct, btw).
In a first step, we could map all high uids onto one 'reserved' uid to
avoid requiring new userspace tools.
This is along the lines what I did when jiffies became 64 bits - just map
larger times onto the maximum time representable with 32 bits.
We didn't even exploit the fact that 34 bit times would be representable
in the current format, because existing userspace tools would probably
break then.
At the time I did that change, it seemed to be common consensus that too
few people cared about accurate BSD accounting to require new userspace
tools.
btw: if you actually push an incompatible change, could we do something
about large times as well?
Tim
On Thu, 4 Mar 2004, Tim Schmielau wrote:
> In a first step, we could map all high uids onto one 'reserved' uid to
> avoid requiring new userspace tools.
> This is along the lines what I did when jiffies became 64 bits - just map
> larger times onto the maximum time representable with 32 bits.
>
> We didn't even exploit the fact that 34 bit times would be representable
> in the current format, because existing userspace tools would probably
> break then.
> At the time I did that change, it seemed to be common consensus that too
> few people cared about accurate BSD accounting to require new userspace
> tools.
First off: there are no changes needed the current defacto accounting tools
on Linux (the GNU package). As long as they can reference the correct size of
the struct (as defined in acct.h) they work just fine. The GNU package
doesn't look like it's been updated since '98, and it works just fine with the
right header. I mentioned that I did update glibc's sys/acct.h because it
uses that in preference over the kernel header. Minor issue, but something
I'm looking at.
Second: I don't want new userspace tools, either, but I do want the ones
I've got to work, which is what they don't do when it reports the lower bits
of the uid field on high uids. In other words, the tools are *already*
broken. I realise that I'm probably a corner case in that most admins will
never assign high uids, but that really doesn't make me feel better about
broken tools. ;-)
> btw: if you actually push an incompatible change, could we do something
> about large times as well?
If the numbers we're logging are meaningless, then hell, yes, let's fix them
all!
--Arthur Corliss
Bolverk's Lair -- http://arthur.corlissfamily.org/
Digital Mages -- http://www.digitalmages.com/
"Live Free or Die, the Only Way to Live" -- NH State Motto
On Thu, 4 Mar 2004, Arthur Corliss wrote:
> Second: I don't want new userspace tools, either, but I do want the ones
> I've got to work, which is what they don't do when it reports the lower bits
> of the uid field on high uids. In other words, the tools are *already*
> broken. I realise that I'm probably a corner case in that most admins will
> never assign high uids, but that really doesn't make me feel better about
> broken tools. ;-)
But the current tools are only broken for the few people using high UIDs
(and generally on 64 bit archs, but that's a different story).
We shouldn't require people to recompile their userspace tools in the
middle of a stable kernel series. (OK, 2.6 has just started, but we don't
want to offend people upgrading from 2.4, either.)
How about the patch below? It requires a change to userspace tools if you
want to use high uids, but it dosn't break binary compatibility. It even
allows userspace to check whether high UIDs are supported, and allows
future incompatible format changes to be detected.
> > btw: if you actually push an incompatible change, could we do something
> > about large times as well?
>
> If the numbers we're logging are meaningless, then hell, yes, let's fix them
> all!
Well, they are not totally meaningless since we clip at the maximum
representable value instead of wrapping around.
Still, I'd prefer to do something about this as well. Will send out a
patch to deal with both things in a few minutes. (Note to Andrew/Linus:
please don't apply the patch below before considering the other patch :-)
Tim
diff -urpP --exclude-from dontdiff linux-2.6.4-rc1/include/linux/acct.h linux-2.6.4-rc1-acct/include/linux/acct.h
--- linux-2.6.4-rc1/include/linux/acct.h 2004-02-04 04:43:17.000000000 +0100
+++ linux-2.6.4-rc1-acct/include/linux/acct.h 2004-03-07 18:03:48.000000000 +0100
@@ -33,6 +33,7 @@ typedef __u16 comp_t;
*/
#define ACCT_COMM 16
+#define ACCT_VERSION 1
struct acct
{
@@ -56,7 +57,10 @@ struct acct
comp_t ac_swaps; /* Accounting Number of Swaps */
__u32 ac_exitcode; /* Accounting Exitcode */
char ac_comm[ACCT_COMM + 1]; /* Accounting Command Name */
- char ac_pad[10]; /* Accounting Padding Bytes */
+ __u16 ac_uid_hi; /* Accounting Real User ID */
+ __u16 ac_gid_hi; /* Accounting Real Group ID */
+ char ac_pad[5]; /* Accounting Padding Bytes */
+ char ac_version; /* Always set to ACCT_VERSION */
};
/*
--- linux-2.6.4-rc1/kernel/acct.c 2004-02-04 04:43:56.000000000 +0100
+++ linux-2.6.4-rc1-acct/kernel/acct.c 2004-03-07 18:03:39.000000000 +0100
@@ -270,11 +270,19 @@ void acct_auto_close(struct super_block
#define MANTSIZE 13 /* 13 bit mantissa. */
#define EXPSIZE 3 /* Base 8 (3 bit) exponent. */
#define MAXFRACT ((1 << MANTSIZE) - 1) /* Maximum fractional value. */
+#define MAXVAL ((unsigned long long) MAXFRACT << (EXPSIZE*((1<<EXPSIZE)-1)))
+ /* Maximum encodable value. */
static comp_t encode_comp_t(unsigned long value)
{
int exp, rnd;
+ /*
+ * On 64 bit platforms, value may be too large to fit into a comp_t.
+ */
+ if (value > MAXVAL)
+ return (comp_t) -1;
+
exp = rnd = 0;
while (value > MAXFRACT) {
rnd = value & (1 << (EXPSIZE - 1)); /* Round up? */
@@ -342,7 +350,9 @@ static void do_acct_process(long exitcod
ac.ac_stime = encode_comp_t(current->stime);
/* we really need to bite the bullet and change layout */
ac.ac_uid = current->uid;
+ ac.ac_uid_hi = current->uid >> 16;
ac.ac_gid = current->gid;
+ ac.ac_gid_hi = current->gid >> 16;
ac.ac_tty = current->tty ? old_encode_dev(tty_devnum(current->tty)) : 0;
ac.ac_flag = 0;
@@ -374,6 +384,7 @@ static void do_acct_process(long exitcod
ac.ac_majflt = encode_comp_t(current->maj_flt);
ac.ac_swaps = encode_comp_t(current->nswap);
ac.ac_exitcode = exitcode;
+ ac.ac_version = ACCT_VERSION;
/*
* Kernel segment override to datasegment and write it
On Sun, 7 Mar 2004, Tim Schmielau wrote:
> Still, I'd prefer to do something about this as well. Will send out a
> patch to deal with both things in a few minutes. (Note to Andrew/Linus:
> please don't apply the patch below before considering the other patch :-)
Problem was: BSD accounting is not able to report real/user/sys times of
more than 49 days with HZ=1000.
Possible solutions:
1) comp_t is actually able to encode 34 bit instead of 32 bit values,
which would be enough for 196 days.
Upside: - No incompatible format change needed.
- Maybe also useful for things like reporting memory usage
on >4 GByte Xeon machines (Actually it isn't, since memory
usage isn't acurately accounted for anyways.
Downside: - Supported by GNU acct package, but other tools might break
due to internal overflows.
- some (small) overhead.
Patch (against 2.6.4-rc1) for demonstration at
http://www.physik3.uni-rostock.de/tim/kernel/2.6/acct-02.patch
2) Report times in units of 1/USER_HZ.
Upside: - Hides the 100->1000HZ transition on i386
Downside: - Doesn't help other archs where HZ>=1000.
- some (small) overhead.
Patch (against 2.6.4-rc1 + my previous high uid patch) at
http://www.physik3.uni-rostock.de/tim/kernel/2.6/acct-user_hz-02.patch
3) Since milisecond resolution probably isn't useful anyways, report
times in units of 1/AHZ == 0.01 seconds. Good enough for 497 days.
Upside: - This is what the kernel already advertises anyways. Thus,
- supported by current userland tools, which would
magically start working correctly.
Downside: - Incompatible change during stable kernel series.
Patch below.
4) combine 1) and 3). Good for up to 1988 days.
5) Do nothing. Reported times don't wrap anyways, but max out at the
largest representable value.
Upside: Easy. Compatible with previous 2.6 kernels.
Downside: Incompatible with 2.4 kernels.
Not supported by current userspace.
If my previous patch for high uid accounting get's applied, I'd like
this to be resolved at the same time. This would allow (future)
userspace tools to work with all kernel versions by ckecking for
the signature of that patch.
Note that current 2.6 is already broken, which is a regression wrt. to
2.4 at least on i386.
Patch (against 2.6.4-rc1 + my previous high uid patch) below.
Tim
--- linux-2.6.4-rc1-acct/include/linux/acct.h 2004-03-07 18:03:48.000000000 +0100
+++ linux-2.6.4-rc1-acct4/include/linux/acct.h 2004-03-07 21:03:47.000000000 +0100
@@ -88,6 +88,28 @@ extern void acct_process(long exitcode);
#define acct_process(x) do { } while (0)
#endif
+/*
+ * Yet another HZ to *HZ helper function.
+ * See <linux/timer.h> for the original.
+ */
+
+#if (HZ % AHZ)==0
+# define jiffies_to_AHZ(x) ((x) / (HZ / AHZ))
+#else
+# define jiffies_to_AHZ(x) ((clock_t) jiffies_64_to_AHZ((u64) x))
+#endif
+
+static inline u64 jiffies_64_to_AHZ(u64 x)
+{
+#if (HZ % AHZ)==0
+ do_div(x, HZ / AHZ);
+#else
+ x *= AHZ;
+ do_div(x, HZ);
+#endif
+ return x;
+}
+
#endif /* __KERNEL */
#endif /* _LINUX_ACCT_H */
--- linux-2.6.4-rc1-acct/kernel/acct.c 2004-03-07 20:00:23.000000000 +0100
+++ linux-2.6.4-rc1-acct4/kernel/acct.c 2004-03-07 21:03:48.000000000 +0100
@@ -52,6 +52,7 @@
#include <linux/security.h>
#include <linux/vfs.h>
#include <linux/jiffies.h>
+#include <linux/times.h>
#include <asm/uaccess.h>
#include <asm/div64.h>
#include <linux/blkdev.h> /* sector_div */
@@ -341,13 +342,13 @@ static void do_acct_process(long exitcod
strlcpy(ac.ac_comm, current->comm, sizeof(ac.ac_comm));
- elapsed = get_jiffies_64() - current->start_time;
+ elapsed = jiffies_64_to_AHZ(get_jiffies_64() - current->start_time);
ac.ac_etime = encode_comp_t(elapsed < (unsigned long) -1l ?
(unsigned long) elapsed : (unsigned long) -1l);
- do_div(elapsed, HZ);
+ do_div(elapsed, AHZ);
ac.ac_btime = xtime.tv_sec - elapsed;
- ac.ac_utime = encode_comp_t(current->utime);
- ac.ac_stime = encode_comp_t(current->stime);
+ ac.ac_utime = encode_comp_t(jiffies_to_AHZ(current->utime));
+ ac.ac_stime = encode_comp_t(jiffies_to_AHZ(current->stime));
/*
* Some day we might change the layout to 32 bit uid/gid fields.
* But let's keep it compatible for now.
On Sun, 7 Mar 2004, Tim Schmielau wrote:
> But the current tools are only broken for the few people using high UIDs
> (and generally on 64 bit archs, but that's a different story).
This is broken on x86 as well. I guess I still have to question the logic of
logging bad data, even if you think the data is infrequent at best. Keep in
mind that in my environment I'm not using high UIDs because I actually have
that many accounts, I'm using them because each employee uses their employee
ID as their UID, which simplifies management for me. Again, this most likely
isn't typical, but it's not irrational, either.
> We shouldn't require people to recompile their userspace tools in the
> middle of a stable kernel series. (OK, 2.6 has just started, but we don't
> want to offend people upgrading from 2.4, either.)
I can understand this argument, and I would certainly agree for things that
are commonly used. But given the state of the BSD accounting tools (a package
that hasn't had a public update since 1998, and which has non-high uid broken
bits in it as well) I would hazard a guess that the impacted users is going to
be minimal, at best.
> How about the patch below? It requires a change to userspace tools if you
> want to use high uids, but it dosn't break binary compatibility. It even
> allows userspace to check whether high UIDs are supported, and allows
> future incompatible format changes to be detected.
I like it, and the addition of ac_version is a great idea. I might alter the
comment about 64-bit machines in acct.c, though. 32-bit UIDs affects 32-bit
machines as well.
> Well, they are not totally meaningless since we clip at the maximum
> representable value instead of wrapping around.
:-P I don't look at this any different than the byte-clipping we're doing with
UIDs. If we're logging data that's wrong, then you can't do accurate
accounting, period.
--Arthur Corliss
Bolverk's Lair -- http://arthur.corlissfamily.org/
Digital Mages -- http://www.digitalmages.com/
"Live Free or Die, the Only Way to Live" -- NH State Motto
[Cc: trimmed]
On Sun, 7 Mar 2004, Arthur Corliss wrote:
> On Sun, 7 Mar 2004, Tim Schmielau wrote:
>
> > But the current tools are only broken for the few people using high UIDs
> > (and generally on 64 bit archs, but that's a different story).
>
> This is broken on x86 as well. I guess I still have to question the logic of
> logging bad data, even if you think the data is infrequent at best.
Seems like I didn't get your point. What is broken on x86 other than high
UIDs?
> > We shouldn't require people to recompile their userspace tools in the
> > middle of a stable kernel series. (OK, 2.6 has just started, but we don't
> > want to offend people upgrading from 2.4, either.)
>
> I can understand this argument, and I would certainly agree for things that
> are commonly used. But given the state of the BSD accounting tools (a package
> that hasn't had a public update since 1998, and which has non-high uid broken
> bits in it as well) I would hazard a guess that the impacted users is going to
> be minimal, at best.
In which way are the BSD accounting tools broken? I'm unfamiliar with
them, but this might indeed allow us to estimate the user base.
> > How about the patch below? It requires a change to userspace tools if you
> > want to use high uids, but it dosn't break binary compatibility. It even
> > allows userspace to check whether high UIDs are supported, and allows
> > future incompatible format changes to be detected.
>
> I like it, and the addition of ac_version is a great idea. I might alter the
> comment about 64-bit machines in acct.c, though. 32-bit UIDs affects 32-bit
> machines as well.
The chunk with the 64-bit comment actually is independent of the high UID
problem. It's there to prevent logging of wromng data on 64-bit arches,
and is completely optimized away by the compiler on 32 bit ones.
I could as well separate it into a different patch.
I'll do a new version of the patch anyways, as I missed to change another
comment.
> > Well, they are not totally meaningless since we clip at the maximum
> > representable value instead of wrapping around.
>
> :-P I don't look at this any different than the byte-clipping we're doing with
> UIDs. If we're logging data that's wrong, then you can't do accurate
> accounting, period.
Yes, but how often do your users use more that 49 days of cputime?
I'll wait a bit to see whether my other posting generates any feedback.
Tim
On Mon, 8 Mar 2004, Tim Schmielau wrote:
> Seems like I didn't get your point. What is broken on x86 other than high
> UIDs?
I don't want to have us start talking in circles. As someone else pointed out
there's the time issues along with the uid/gid. Both are admittedly only
broken in corner cases not typically seen in the average server.
> In which way are the BSD accounting tools broken? I'm unfamiliar with
> them, but this might indeed allow us to estimate the user base.
The 'last' tool seems to have, at a minimum, some formatting issues. I don't
think anyone would have noticed since most distros seem to use the one bundled
with sysvinit, instead. The dump-utmp utility seems to share the problem in
that it doesn't print out completely useable/parseable output.
> The chunk with the 64-bit comment actually is independent of the high UID
> problem. It's there to prevent logging of wromng data on 64-bit arches,
> and is completely optimized away by the compiler on 32 bit ones.
> I could as well separate it into a different patch.
>
> I'll do a new version of the patch anyways, as I missed to change another
> comment.
Gotcha. I'll shut up, now. ;-)
> Yes, but how often do your users use more that 49 days of cputime?
I'm just being pendantic. I completely agree that a field with no predefined
range has to be capped. However, as a sys-admin I would say that I would
still find such long windows useful if someone was complaining about an
outtage for a service that basically only gets stopped/restarted during system
reboots. You could easily see what the history of interruptions would be for
those services.
> I'll wait a bit to see whether my other posting generates any feedback.
I'll wait as well. I'm very eager to see (at least) the uid issue resolved so
I don't have to keep hacking up kernel/glibc headers on new deployments.
--Arthur Corliss
Bolverk's Lair -- http://arthur.corlissfamily.org/
Digital Mages -- http://www.digitalmages.com/
"Live Free or Die, the Only Way to Live" -- NH State Motto
Arthur Corliss wrote:
>
>
> If the numbers we're logging are meaningless, then hell, yes, let's fix them
> all!
I believe the answer (to seamless backwards compatibility) lies in
struct acct's ac_pad[10] member.
3 options exist that I can see:
1) put the high 16 bits in there, with a magic # (at then end of?)
ac_pad. THe old tools will be none the wiser, the new tools will
autodetect which format the acct file is in. Ugly but easy.
2) just make the uid/gid 32bits, and put a magic# (at the end of?)
ac_pad. The old tools will choke, but the new tools will autodetect.
If you push the new tools out a couple of years ahead, then merge the
fix, acceptance will be fairly smooth. Clean but painful.
or
3) make the split of 16 bits interim with one magic#, make the tools
detect 3 formats, and in a few years, switch from the bastard 32bit to
the clean one (different magic #). This will give tools time to become
standard.
Combines best of both of the above.
You can do the above with the time stuff too, but 10 bytes spare might
constrain things a bit. Heck, make the struct bigger, as long as there
is a magic #, userspace should be ok. Right now, "file" command can't
tell what the heck the file is. Bit wasteful to put magic in every
record though.
While you're at it, make a switch for a tool that prints out
ac_exitcode, without reading the binary acct file (or it's dump).
Cheers,
--
Jeremy Jackson
Coplanar Networks
http://www.coplanar.net
> I believe the answer (to seamless backwards compatibility) lies in
> struct acct's ac_pad[10] member.
sure.
> 3 options exist that I can see:
>
> 1) put the high 16 bits in there, with a magic # (at then end of?)
> ac_pad. THe old tools will be none the wiser, the new tools will
> autodetect which format the acct file is in. Ugly but easy.
>
> 2) just make the uid/gid 32bits, and put a magic# (at the end of?)
> ac_pad. The old tools will choke, but the new tools will autodetect.
> If you push the new tools out a couple of years ahead, then merge the
> fix, acceptance will be fairly smooth. Clean but painful.
>
> or
> 3) make the split of 16 bits interim with one magic#, make the tools
> detect 3 formats, and in a few years, switch from the bastard 32bit to
> the clean one (different magic #). This will give tools time to become
> standard.
> Combines best of both of the above.
4) make ac_uid and ac_gid 32 bits and move them into the place formerly
occupied by ac_pad[10]. Establish comatibility fields ac_uid16 and
ac_gid16 in the old places of ac_uid/ac/gid. Introduce a magic (or not,
since checking for (ac_uid16==ac_uid && ac_gid16==ac_gid) will reveal
whether 32bit uids are supported).
Best of all worlds: old tools will keep working. Just recompiling them
will yield 32 bit uid/gid support. New tools can detect whether 32 bit
uid/gid are supported or not.
Problem is, ac_pad[10] doesn't provide enough space for two aligned
32 bit numbers _and_ a magic in ac_pad[10], which is the last byte of
struct acct. If we put in only the two 32 bit numbers, we've lost the
ability to establish a magic _in a prominent place_ for smooth
transitions in the future.
We might, however, put the magic into the implicit padding of ac_flag
(ugly, since this would require an arch dependant definition of stuct
acct), or put it into the uppermost three bits of ac_flag itself
(7 seven incompatible changes to struct acct should suffice in the
foreseeable future, but new flags would need to go into a different
field, breaking source compatibility with userspace acct code on
different archs, i.e. GNU acct tools).
I've posted a patch for 3) already. I'd be happy to go with 4) if we
decide to reserve the remaining bits in ac_flag for a version number.
If any source incompatible change is foreseeable in the 2.7 timeframe,
I'd prefer to stick with 3) though.
> You can do the above with the time stuff too, but 10 bytes spare might
> constrain things a bit. Heck, make the struct bigger, as long as there
> is a magic #, userspace should be ok. Right now, "file" command can't
> tell what the heck the file is. Bit wasteful to put magic in every
> record though.
Yeah, we might want a larger comp_t not only for times, but also for
ac_io. Unfortunately, the layout of comp_t is hard-coded into GNU acct,
so no way of keeping source code compatibility (other than uncompressed 64
bit fields).
Worse, I have to revoke my previous statement of GNU acct being able to
read 34 bits comp_t values. Yes, it stores the into a double, but it
uses a long as temporary variable. Sigh.
So a new patch is due anyways. Any further comments for that?
What to do about comp_t in 2.7?
Tim
> We might, however, put the magic into the implicit padding
> of ac_flag (ugly, since this would require an arch dependant
> definition of stuct acct), or put it into the uppermost
> three bits of ac_flag itself
First of all, none of this matters much if the format
is given a sysctl. The old format is default for now,
and the new one is default (only?) in a couple years.
Sun appears to have done something like this.
When fixing it, note that a 5-bit binary exponent
with denormals would beat the current float format.
Regarding the existing struct though...
Let's take a close look at this. I think there are 2 bytes
of padding on all Linux ports, and another 2 available
on everything except maybe m68k and/or arm. (that is, ports
that will put a u32 on any u16 boundry) Here is the current
struct, compactly formatted with 64-bit blocking:
struct linux_acct {
char ac_flag; // Flags
// 1 pad byte
__u16 ac_uid; // Real User ID
__u16 ac_gid; // Real Group ID
__u16 ac_tty; // Control Terminal
__u32 ac_btime; // Process Creation Time
comp_t ac_utime; // User Time
comp_t ac_stime; // System Time
comp_t ac_etime; // Elapsed Time
comp_t ac_mem; // Average Memory Usage
comp_t ac_io; // Chars Transferred
comp_t ac_rw; // Blocks Read or Written
comp_t ac_minflt; // Minor Pagefaults
comp_t ac_majflt; // Major Pagefaults
comp_t ac_swaps; // Number of Swaps
// 2 pad bytes (except m68k or arm maybe?)
__u32 ac_exitcode; // Exitcode
// hppa might pad this
char ac_comm[17]; // Command Name
// hppa might pad this
char ac_pad[10]; // Padding Bytes
// 1 pad byte (maybe 6 for hppa)
};
Just a sec... ARRRGH WHY DO PEOPLE LEAVE PADDING AND
GENERALLY MIS-ALIGN THINGS ALL THE TIME??????
So there you go. You have a pad byte after ac_flag
at a known location, and a pad byte after ac_pad that
might move a bit due to mis-alignments above it.
On Wed, 9 Mar 2004, Albert Cahalan wrote:
> First of all, none of this matters much if the format
> is given a sysctl. The old format is default for now,
> and the new one is default (only?) in a couple years.
> Sun appears to have done something like this.
Yeah, if we even want to introduce a new syscall for this...
But Documentation/Changes is not an empty file, i.e., we require
people to upgrade userspace anyways.
> When fixing it, note that a 5-bit binary exponent
> with denormals would beat the current float format.
Yes, but only by a short head. And, since comp_t is hard-coded into
current GNU acct tools, we can't keep source compatibility
(not that this matters too much anyways...)
I'm open for suggestions in this direction. Any reasonable ideas to
get more precision? (e.g. 16 bit mantissa and 4 bit base-whatever
exponent?)
> Regarding the existing struct though...
>
> Let's take a close look at this. I think there are 2 bytes
> of padding on all Linux ports, and another 2 available
> on everything except maybe m68k and/or arm. (that is, ports
> that will put a u32 on any u16 boundry) Here is the current
> struct, compactly formatted with 64-bit blocking:
>
> struct linux_acct {
> char ac_flag; // Flags
> // 1 pad byte
Yep, but depending on architecure I think the compiler is free to insert
the padding either before or after ac_flags.
[...]
>
> Just a sec... ARRRGH WHY DO PEOPLE LEAVE PADDING AND
> GENERALLY MIS-ALIGN THINGS ALL THE TIME??????
Yupp, I changed this for the new suggested binary-incompatible layout.
> So there you go. You have a pad byte after ac_flag
> at a known location, and a pad byte after ac_pad that
> might move a bit due to mis-alignments above it.
Yes, but I'd prefer an arch-independent and prominent position
(i.e. one that doesn't perturb alignment at the beginning of the struct)
Tim
On Wed, 2004-03-10 at 04:08, Tim Schmielau wrote:
> On Wed, 9 Mar 2004, Albert Cahalan wrote:
>
>> First of all, none of this matters much if the format
>> is given a sysctl. The old format is default for now,
>> and the new one is default (only?) in a couple years.
>> Sun appears to have done something like this.
>
> Yeah, if we even want to introduce a new syscall for this...
> But Documentation/Changes is not an empty file, i.e., we
> require people to upgrade userspace anyways.
In that case, don't forget to upgrade atop.
That's a version of top that uses BSD process
accounting to grab the transient processes
that wouldn't be seen in /proc before they die.
>> When fixing it, note that a 5-bit binary exponent
>> with denormals would beat the current float format.
>
> Yes, but only by a short head.
It's by 8 bits, with a stable 11-bit fraction
instead of a 10-bit to 12-bit variable-size one.
old: 1xxxxxxxxxxyy000000000000000000000
new: 1xxxxxxxxxxx000000000000000000000000000000
That's a 42-bit number instead of a 36-bit one.
The old base-8 exponent is wasteful, plus the old
format stores the leading 1 instead of using an
implied 1 and special exponent for leading 0.
> And, since comp_t is hard-coded into
> current GNU acct tools, we can't keep source compatibility
> (not that this matters too much anyways...)
> I'm open for suggestions in this direction. Any reasonable ideas to
> get more precision? (e.g. 16 bit mantissa and 4 bit base-whatever
> exponent?)
a. just what I said
b. 32-bit IEEE float (easy enough to encode by hand)
c. raw data -- is the space saving all that critical?
d. raw data with gzip-style compression
(note that gzip's deflate algorithm is in the kernel)
> > Regarding the existing struct though...
> >
> > Let's take a close look at this. I think there are 2 bytes
> > of padding on all Linux ports, and another 2 available
> > on everything except maybe m68k and/or arm. (that is, ports
> > that will put a u32 on any u16 boundry) Here is the current
> > struct, compactly formatted with 64-bit blocking:
> >
> > struct linux_acct {
> > char ac_flag; // Flags
> > // 1 pad byte
>
> Yep, but depending on architecure I think the compiler is free to insert
> the padding either before or after ac_flags.
I doubt it. I think the C standard has something to
say about this. In any case, I just checked a mix of
big-endian and little-endian boxes:
32-bit BE SPARC
64-bit BE SPARC
32-bit BE PowerPC
32-bit LE i386
64-bit LE x86-64
64-bit LE Alpha
In every case, 1-byte padding came after ac_flag.
I'm pretty sure ac_comm is too big as well. It has
room for 17 bytes. It needs room for 15, plus a '\0'
if you want one.
On Wed, 10 Mar 2004, Albert Cahalan wrote:
> In that case, don't forget to upgrade atop.
> That's a version of top that uses BSD process
> accounting to grab the transient processes
> that wouldn't be seen in /proc before they die.
OK, thanks for the reference. I wonder what other packages use
BSD accounting.
>
> >> When fixing it, note that a 5-bit binary exponent
> >> with denormals would beat the current float format.
> >
> > Yes, but only by a short head.
>
> It's by 8 bits, with a stable 11-bit fraction
> instead of a 10-bit to 12-bit variable-size one.
>
> old: 1xxxxxxxxxxyy000000000000000000000
> new: 1xxxxxxxxxxx000000000000000000000000000000
>
> That's a 42-bit number instead of a 36-bit one.
Huh, are we talking about the same format? I think comp_t rather looks
like this:
eeemmmmmmmmmmmmm
i.e., three bit (base 8) exponent, followed by 13 bits mantissa.
mantissa is right-aligned, so I don't understand why you are talking
about denormals at all.
The base 8 exponent indeed wastes up to two bits of the mantissa, but it
saves two bits of the exponent, so it's as good in the worst case and
gains one bit of precision on average.
The largest representable number (as I see it) is only (2^13-1) * 8^7
= 0x3ffe00000, i.e. a 34 bit number.
What's wrong with my view?
> The old base-8 exponent is wasteful, plus the old
> format stores the leading 1 instead of using an
> implied 1 and special exponent for leading 0.
I really don't know which leading 1 you are talking about.
> > get more precision? (e.g. 16 bit mantissa and 4 bit base-whatever
> > exponent?)
>
> a. just what I said
> b. 32-bit IEEE float (easy enough to encode by hand)
great idea - it even allows to keep source code compatibility with
(at least) the GNU acct tools!
> c. raw data -- is the space saving all that critical?
No, but I think it's just not worth expanding struct acct.
> d. raw data with gzip-style compression
>
> (note that gzip's deflate algorithm is in the kernel)
too unpredictable, I think.
e. suddenly pops to my mind: #define AHZ 10 /* ;-) */
but b. seems superior to me as it increases precision as well and
we can afford it.
> > Yep, but depending on architecure I think the compiler is free to insert
> > the padding either before or after ac_flags.
>
> I doubt it. I think the C standard has something to
> say about this.
I'd guess I'll have to update my K&R to ANSI on that...
> In any case, I just checked a mix of
> big-endian and little-endian boxes:
>
> 32-bit BE SPARC
> 64-bit BE SPARC
> 32-bit BE PowerPC
> 32-bit LE i386
> 64-bit LE x86-64
> 64-bit LE Alpha
>
> In every case, 1-byte padding came after ac_flag.
That sounds good. A version magic at the beginning of the struct would
allow us to extend it more easily.
Tim
On Wed, 10 Mar 2004, Albert Cahalan wrote:
> That's a 42-bit number instead of a 36-bit one.
OK, your format clearly wins. Especially since I think that comp_t can
only encode a 34-bit number.
But I favor your suggestion of 32-bit IEEE floats even more,
as it doesn't need a change to the GNU acct tools.
Sorry for the confusion
Tim
On Wed, 2004-03-10 at 12:47, Tim Schmielau wrote:
> On Wed, 10 Mar 2004, Albert Cahalan wrote:
>
> > That's a 42-bit number instead of a 36-bit one.
>
> OK, your format clearly wins. Especially since I think that comp_t can
> only encode a 34-bit number.
That is correct. 42 - 8 != 36
My diagram was right; the math was not.
> But I favor your suggestion of 32-bit IEEE floats even more,
> as it doesn't need a change to the GNU acct tools.
I'm surprised. Do the tools rely on a #define for this?
Is there a reason to have the whole struct be a
power of two? If so, and you don't wish to expand
it to 128 bytes, consider packing 3 80-byte records
and a 16-byte header into 256 bytes.
On Wed, 10 Mar 2004, Albert Cahalan wrote:
> > But I favor your suggestion of 32-bit IEEE floats even more,
> > as it doesn't need a change to the GNU acct tools.
>
> I'm surprised. Do the tools rely on a #define for this?
They check the types with autoconf. In case of a comp_t, the
hard-coded conversion routine is called, otherwise the value is
just cast to a double.
> Is there a reason to have the whole struct be a
> power of two?
Not that I'm aware of. But I hesitate to change the length of the
struct since seeking in a file with mixed field lengths would become
a nightmare. And why would we want to store a version number in each
acct record if we can't mix them afterwards?
So my current plot would be to expand the most interesting fields to
floats, using up the padding, and leave the others as comp_t.
Actually, the only field of interest here seems to be ac_etime where, as
Arthur Corliss pointed out, also precision matters.
Less impotant candidates might be ac_io, ac_rw and/or ac_minflt.
For ac_utime and ac_stime, on the other hand, 497 days of cpu time for
each job seems rather plentyful. Even more, if we'd fix the internal
overflow in GNU acct's comp_t conversion routine and have 1988 days at
hand.
Tim