2010-12-13 11:38:00

by Dan Carpenter

[permalink] [raw]
Subject: delayacct: alignment changes break iotop

Iotop uses hardcoded offsets to find the taskstats struct members.
This got changed in 2.6.37 so it now iotop doesn't work on amd64. The
offending commit is:

commit 85893120699f8bae8caa12a8ee18ab5fceac978e
Author: Jeff Mahoney <[email protected]>
Date: Wed Oct 27 15:34:43 2010 -0700

delayacct: align to 8 byte boundary on 64-bit systems

Brian Rogers gets the reported-by tag. The bugzilla entry is:
https://bugzilla.kernel.org/show_bug.cgi?id=24272

regards,
dan carpenter


2010-12-13 12:57:25

by Balbir Singh

[permalink] [raw]
Subject: Re: delayacct: alignment changes break iotop

* Dan Carpenter <[email protected]> [2010-12-13 14:37:45]:

> Iotop uses hardcoded offsets to find the taskstats struct members.
> This got changed in 2.6.37 so it now iotop doesn't work on amd64. The
> offending commit is:
>
> commit 85893120699f8bae8caa12a8ee18ab5fceac978e
> Author: Jeff Mahoney <[email protected]>
> Date: Wed Oct 27 15:34:43 2010 -0700
>
> delayacct: align to 8 byte boundary on 64-bit systems
>
> Brian Rogers gets the reported-by tag. The bugzilla entry is:
> https://bugzilla.kernel.org/show_bug.cgi?id=24272
>

Thanks for the report, looks like the change did not even bump the
version field. Sorry, its my fault, I should have caught it earlier.
iotop hard coding member offsets is not bad as long as we don't break
ABI (expected from us). Any chance you could dump the offsets before
and after the change?

--
Three Cheers,
Balbir

2010-12-13 13:10:34

by Dan Carpenter

[permalink] [raw]
Subject: Re: delayacct: alignment changes break iotop

On Mon, Dec 13, 2010 at 06:27:09PM +0530, Balbir Singh wrote:
> Thanks for the report, looks like the change did not even bump the
> version field. Sorry, its my fault, I should have caught it earlier.
> iotop hard coding member offsets is not bad as long as we don't break
> ABI (expected from us). Any chance you could dump the offsets before
> and after the change?
>

Here is what iotop expects from iotop/data.py

members_offsets = [
('blkio_delay_total', 40),
('swapin_delay_total', 56),
('read_bytes', 248),
('write_bytes', 256),
('cancelled_write_bytes', 264)
]

Bumping the version doesn't help, because iotop doesn't care.

What are the warning messages that prompted the change on IA64 in the
first place? Can't we just change the format for ia64 and leave the
other arches how they were before?

regards,
dan carpenter

2010-12-13 15:20:25

by Jeff Mahoney

[permalink] [raw]
Subject: Re: delayacct: alignment changes break iotop

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/13/2010 07:57 AM, Balbir Singh wrote:
> * Dan Carpenter <[email protected]> [2010-12-13 14:37:45]:
>
>> Iotop uses hardcoded offsets to find the taskstats struct members.
>> This got changed in 2.6.37 so it now iotop doesn't work on amd64. The
>> offending commit is:
>>
>> commit 85893120699f8bae8caa12a8ee18ab5fceac978e
>> Author: Jeff Mahoney <[email protected]>
>> Date: Wed Oct 27 15:34:43 2010 -0700
>>
>> delayacct: align to 8 byte boundary on 64-bit systems
>>
>> Brian Rogers gets the reported-by tag. The bugzilla entry is:
>> https://bugzilla.kernel.org/show_bug.cgi?id=24272
>>
>
> Thanks for the report, looks like the change did not even bump the
> version field. Sorry, its my fault, I should have caught it earlier.
> iotop hard coding member offsets is not bad as long as we don't break
> ABI (expected from us). Any chance you could dump the offsets before
> and after the change?

In your February response and again in September, you did suggest a
version bump. I'm not sure why that didn't get integrated but I still
don't see how it's necessary for code that actually follows the interface.

iotop doesn't. It's broken. It doesn't even honor that version field and
worse yet, it doesn't even honor the packet format which specifically
doesn't define hard offsets. Rather it defines a protocol that tags
fields and supplies the offsets in the packet.

The getdelays.c code that ships with the kernel even demonstrates this,
so there's no excuse for half-assing it like this.

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/

iEYEARECAAYFAk0GObEACgkQLPWxlyuTD7IziQCfYMylgEurblaMRGYXEbWzcYag
K9oAmQGeHsXKjcSs++Kx31UFS02ZEEUf
=2v7T
-----END PGP SIGNATURE-----

2010-12-13 21:02:14

by Brian Rogers

[permalink] [raw]
Subject: Re: delayacct: alignment changes break iotop

On 12/13/2010 07:20 AM, Jeff Mahoney wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 12/13/2010 07:57 AM, Balbir Singh wrote:
>> * Dan Carpenter<[email protected]> [2010-12-13 14:37:45]:
>>
>>> Iotop uses hardcoded offsets to find the taskstats struct members.
>>> This got changed in 2.6.37 so it now iotop doesn't work on amd64. The
>>> offending commit is:
>>>
>>> commit 85893120699f8bae8caa12a8ee18ab5fceac978e
>>> Author: Jeff Mahoney<[email protected]>
>>> Date: Wed Oct 27 15:34:43 2010 -0700
>>>
>>> delayacct: align to 8 byte boundary on 64-bit systems
>>>
>>> Brian Rogers gets the reported-by tag. The bugzilla entry is:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=24272
>>>
>> Thanks for the report, looks like the change did not even bump the
>> version field. Sorry, its my fault, I should have caught it earlier.
>> iotop hard coding member offsets is not bad as long as we don't break
>> ABI (expected from us). Any chance you could dump the offsets before
>> and after the change?
> In your February response and again in September, you did suggest a
> version bump. I'm not sure why that didn't get integrated but I still
> don't see how it's necessary for code that actually follows the interface.
>
> iotop doesn't. It's broken. It doesn't even honor that version field and
> worse yet, it doesn't even honor the packet format which specifically
> doesn't define hard offsets. Rather it defines a protocol that tags
> fields and supplies the offsets in the packet.
>
> The getdelays.c code that ships with the kernel even demonstrates this,
> so there's no excuse for half-assing it like this.

From a cursory glance, it looks to me like iotop (mostly) does the
correct thing. The taskstats struct is received as one big chunk, so the
table of fixed offsets (within struct taskstats) is necessary.

There's a bug fix in the git repo that fixes the cause of misalignment:

commit 08211d209ae8fc7e67ea3bebb09979ff61c70f97
Author: Guillaume Chazarain <[email protected]>
Date: Sat Sep 4 13:57:43 2010 +0200

Instead of assuming the pid field is 4 bytes long, take its length
from the header.
This is needed for http://lkml.org/lkml/2010/2/12/167
[PATCH] delayacct: align to 8 byte boundary on 64-bit systems


With this version of iotop, there is no problem with the latest kernel.
I'm CCing the iotop author. It would be nice to have a release with this
fix.

Brian

2010-12-13 21:23:09

by Dan Carpenter

[permalink] [raw]
Subject: Re: delayacct: alignment changes break iotop

Yeah, but the rule is that we don't break userspace... :/

Fixing the bug in iotop was always going to be the easy part. It's
great that they've applied a fix to git repo but it still takes a while
to get the fix out to users so they don't notice a problem.

I think we're going to need to put an ifdef IA64 then size is 8 else
size is 4. I just started this thread because I was hoping someone
had a different ifdef to use, like maybe there are other arches which
care about the alignment and maybe there is a way to test for it.

regards,
dan carpenter

2010-12-14 07:02:56

by Dan Carpenter

[permalink] [raw]
Subject: [patch] delayacct: fix iotop on x86_64

We changed how the taskstats was exported to user space in:
85893120699 "delayacct: align to 8 byte boundary on 64-bit systems"
This was important because it fixes a run time warning on IA64. In
theory it shouldn't have broken anything, if you just assume that user
space programmers don't smoke crack all day long.

But actually it breaks iotop on x86_64.

Reported-by: Brian Rogers <[email protected]>
Signed-off-by: Dan Carpenter <[email protected]>

diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index c8231fb..a0758de 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -358,7 +358,19 @@ static struct taskstats *mk_reply(struct sk_buff *skb, int type, u32 pid)
* This causes lots of runtime warnings on systems requiring 8 byte
* alignment */
u32 pids[2] = { pid, 0 };
- int pid_size = ALIGN(sizeof(pid), sizeof(long));
+ int pid_size;
+
+ /*
+ * IA64 can't be aligned on a 4 byte boundary. But iotop on x86_64
+ * depends on the current struct layout. The next version of iotop
+ * will fix this so maybe we can move everything to the new code in
+ * a couple years.
+ */
+#if defined(CONFIG_IA64)
+ pid_size = ALIGN(sizeof(pid), sizeof(long));
+#else
+ pid_size = sizeof(u32);
+#endif

aggr = (type == TASKSTATS_TYPE_PID)
? TASKSTATS_TYPE_AGGR_PID

2010-12-14 08:02:45

by Balbir Singh

[permalink] [raw]
Subject: Re: [patch] delayacct: fix iotop on x86_64

* Dan Carpenter <[email protected]> [2010-12-14 10:02:43]:

> We changed how the taskstats was exported to user space in:
> 85893120699 "delayacct: align to 8 byte boundary on 64-bit systems"
> This was important because it fixes a run time warning on IA64. In
> theory it shouldn't have broken anything, if you just assume that user
> space programmers don't smoke crack all day long.
>
> But actually it breaks iotop on x86_64.
>
> Reported-by: Brian Rogers <[email protected]>
> Signed-off-by: Dan Carpenter <[email protected]>
>
> diff --git a/kernel/taskstats.c b/kernel/taskstats.c
> index c8231fb..a0758de 100644
> --- a/kernel/taskstats.c
> +++ b/kernel/taskstats.c
> @@ -358,7 +358,19 @@ static struct taskstats *mk_reply(struct sk_buff *skb, int type, u32 pid)
> * This causes lots of runtime warnings on systems requiring 8 byte
> * alignment */
> u32 pids[2] = { pid, 0 };
> - int pid_size = ALIGN(sizeof(pid), sizeof(long));
> + int pid_size;
> +
> + /*
> + * IA64 can't be aligned on a 4 byte boundary. But iotop on x86_64
> + * depends on the current struct layout. The next version of iotop
> + * will fix this so maybe we can move everything to the new code in
> + * a couple years.
> + */
> +#if defined(CONFIG_IA64)
> + pid_size = ALIGN(sizeof(pid), sizeof(long));
> +#else
> + pid_size = sizeof(u32);
> +#endif

I would rather abstract this better and I'd be apprehensive about the
fix if iotop was at fault to begin with, I would rather fix iotop.
IOW, are we fixing what iotop got wrong? Isn't it easier to backport
the correct behaviour in iotop. I understand we broke the ABI, but
user space can still live.

>
> aggr = (type == TASKSTATS_TYPE_PID)
> ? TASKSTATS_TYPE_AGGR_PID

--
Three Cheers,
Balbir

2010-12-14 08:16:33

by Dan Carpenter

[permalink] [raw]
Subject: Re: [patch] delayacct: fix iotop on x86_64

On Tue, Dec 14, 2010 at 01:32:39PM +0530, Balbir Singh wrote:
> I would rather abstract this better and I'd be apprehensive about the
> fix if iotop was at fault to begin with, I would rather fix iotop.
> IOW, are we fixing what iotop got wrong? Isn't it easier to backport
> the correct behaviour in iotop. I understand we broke the ABI, but
> user space can still live.
>

The iotop people are definitely at fault and we should throw salmon at
the developers next time when we see them at a conference. But in the
end, it's not really a matter of assigning blame to things. It's just
annoying for users if the program stops working and you have to google
to figure out why the new kernel it broke iotop.

It's simple enough to paper over the bug for now, then fix it properly
in a couple years when everyone has upgraded their user space.

regards,
dan carpenter

2010-12-14 20:16:58

by Andrew Morton

[permalink] [raw]
Subject: Re: [patch] delayacct: fix iotop on x86_64

On Tue, 14 Dec 2010 13:32:39 +0530
Balbir Singh <[email protected]> wrote:

> * Dan Carpenter <[email protected]> [2010-12-14 10:02:43]:
>
> > We changed how the taskstats was exported to user space in:
> > 85893120699 "delayacct: align to 8 byte boundary on 64-bit systems"
> > This was important because it fixes a run time warning on IA64. In
> > theory it shouldn't have broken anything, if you just assume that user
> > space programmers don't smoke crack all day long.
> >
> > But actually it breaks iotop on x86_64.
> >
> > Reported-by: Brian Rogers <[email protected]>
> > Signed-off-by: Dan Carpenter <[email protected]>
> >
> > diff --git a/kernel/taskstats.c b/kernel/taskstats.c
> > index c8231fb..a0758de 100644
> > --- a/kernel/taskstats.c
> > +++ b/kernel/taskstats.c
> > @@ -358,7 +358,19 @@ static struct taskstats *mk_reply(struct sk_buff *skb, int type, u32 pid)
> > * This causes lots of runtime warnings on systems requiring 8 byte
> > * alignment */
> > u32 pids[2] = { pid, 0 };
> > - int pid_size = ALIGN(sizeof(pid), sizeof(long));
> > + int pid_size;
> > +
> > + /*
> > + * IA64 can't be aligned on a 4 byte boundary. But iotop on x86_64
> > + * depends on the current struct layout. The next version of iotop
> > + * will fix this so maybe we can move everything to the new code in
> > + * a couple years.
> > + */
> > +#if defined(CONFIG_IA64)
> > + pid_size = ALIGN(sizeof(pid), sizeof(long));
> > +#else
> > + pid_size = sizeof(u32);
> > +#endif
>
> I would rather abstract this better

Well. Abstracting something tends to make it permanent. When you have
an ugly, special-case temporary hack, there is merit to having it
sitting there in the middle of the code staring you in the face. It's
very explicit and we won't forget about it.

> and I'd be apprehensive about the
> fix if iotop was at fault to begin with, I would rather fix iotop.
> IOW, are we fixing what iotop got wrong? Isn't it easier to backport
> the correct behaviour in iotop. I understand we broke the ABI, but
> user space can still live.

Nah, let's not knowingly break a userspace app.


This is a versioned interface, is it not? How is that supposed
to work? Should we have upped the version number when making this
change?

2010-12-14 20:21:15

by Jeff Mahoney

[permalink] [raw]
Subject: Re: [patch] delayacct: fix iotop on x86_64

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 12/14/2010 03:16 PM, Andrew Morton wrote:
> On Tue, 14 Dec 2010 13:32:39 +0530
> Balbir Singh <[email protected]> wrote:
>
>> * Dan Carpenter <[email protected]> [2010-12-14 10:02:43]:
>>
>>> We changed how the taskstats was exported to user space in:
>>> 85893120699 "delayacct: align to 8 byte boundary on 64-bit systems"
>>> This was important because it fixes a run time warning on IA64. In
>>> theory it shouldn't have broken anything, if you just assume that user
>>> space programmers don't smoke crack all day long.
>>>
>>> But actually it breaks iotop on x86_64.
>>>
>>> Reported-by: Brian Rogers <[email protected]>
>>> Signed-off-by: Dan Carpenter <[email protected]>
>>>
>>> diff --git a/kernel/taskstats.c b/kernel/taskstats.c
>>> index c8231fb..a0758de 100644
>>> --- a/kernel/taskstats.c
>>> +++ b/kernel/taskstats.c
>>> @@ -358,7 +358,19 @@ static struct taskstats *mk_reply(struct sk_buff *skb, int type, u32 pid)
>>> * This causes lots of runtime warnings on systems requiring 8 byte
>>> * alignment */
>>> u32 pids[2] = { pid, 0 };
>>> - int pid_size = ALIGN(sizeof(pid), sizeof(long));
>>> + int pid_size;
>>> +
>>> + /*
>>> + * IA64 can't be aligned on a 4 byte boundary. But iotop on x86_64
>>> + * depends on the current struct layout. The next version of iotop
>>> + * will fix this so maybe we can move everything to the new code in
>>> + * a couple years.
>>> + */
>>> +#if defined(CONFIG_IA64)
>>> + pid_size = ALIGN(sizeof(pid), sizeof(long));
>>> +#else
>>> + pid_size = sizeof(u32);
>>> +#endif
>>
>> I would rather abstract this better
>
> Well. Abstracting something tends to make it permanent. When you have
> an ugly, special-case temporary hack, there is merit to having it
> sitting there in the middle of the code staring you in the face. It's
> very explicit and we won't forget about it.
>
>> and I'd be apprehensive about the
>> fix if iotop was at fault to begin with, I would rather fix iotop.
>> IOW, are we fixing what iotop got wrong? Isn't it easier to backport
>> the correct behaviour in iotop. I understand we broke the ABI, but
>> user space can still live.
>
> Nah, let's not knowingly break a userspace app.
>
>
> This is a versioned interface, is it not? How is that supposed
> to work? Should we have upped the version number when making this
> change?

Perhaps. Balbir suggested it, but it didn't make it into the final
version. Not that it would have mattered. iotop doesn't look at the
version anyway.

- -Jeff

- --
Jeff Mahoney
SUSE Labs
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.16 (GNU/Linux)
Comment: Using GnuPG with SUSE - http://enigmail.mozdev.org/

iEYEARECAAYFAk0H0bUACgkQLPWxlyuTD7LJlwCeKLRuVKXIwZi7XMARDNXmBxkj
QC0An0up3AVv/G8T8JZbb+cpDFagKnj0
=ra4a
-----END PGP SIGNATURE-----

2010-12-15 17:42:34

by Balbir Singh

[permalink] [raw]
Subject: Re: [patch] delayacct: fix iotop on x86_64

* Andrew Morton <[email protected]> [2010-12-14 12:16:41]:

> On Tue, 14 Dec 2010 13:32:39 +0530
> Balbir Singh <[email protected]> wrote:
>
> > * Dan Carpenter <[email protected]> [2010-12-14 10:02:43]:
> >
> > > We changed how the taskstats was exported to user space in:
> > > 85893120699 "delayacct: align to 8 byte boundary on 64-bit systems"
> > > This was important because it fixes a run time warning on IA64. In
> > > theory it shouldn't have broken anything, if you just assume that user
> > > space programmers don't smoke crack all day long.
> > >
> > > But actually it breaks iotop on x86_64.
> > >
> > > Reported-by: Brian Rogers <[email protected]>
> > > Signed-off-by: Dan Carpenter <[email protected]>
> > >
> > > diff --git a/kernel/taskstats.c b/kernel/taskstats.c
> > > index c8231fb..a0758de 100644
> > > --- a/kernel/taskstats.c
> > > +++ b/kernel/taskstats.c
> > > @@ -358,7 +358,19 @@ static struct taskstats *mk_reply(struct sk_buff *skb, int type, u32 pid)
> > > * This causes lots of runtime warnings on systems requiring 8 byte
> > > * alignment */
> > > u32 pids[2] = { pid, 0 };
> > > - int pid_size = ALIGN(sizeof(pid), sizeof(long));
> > > + int pid_size;
> > > +
> > > + /*
> > > + * IA64 can't be aligned on a 4 byte boundary. But iotop on x86_64
> > > + * depends on the current struct layout. The next version of iotop
> > > + * will fix this so maybe we can move everything to the new code in
> > > + * a couple years.
> > > + */
> > > +#if defined(CONFIG_IA64)
> > > + pid_size = ALIGN(sizeof(pid), sizeof(long));
> > > +#else
> > > + pid_size = sizeof(u32);
> > > +#endif
> >
> > I would rather abstract this better
>
> Well. Abstracting something tends to make it permanent. When you have
> an ugly, special-case temporary hack, there is merit to having it
> sitting there in the middle of the code staring you in the face. It's
> very explicit and we won't forget about it.
>

OK, agreed and learnt

> > and I'd be apprehensive about the
> > fix if iotop was at fault to begin with, I would rather fix iotop.
> > IOW, are we fixing what iotop got wrong? Isn't it easier to backport
> > the correct behaviour in iotop. I understand we broke the ABI, but
> > user space can still live.
>
> Nah, let's not knowingly break a userspace app.
>

Fair enough!

>
> This is a versioned interface, is it not? How is that supposed
> to work? Should we have upped the version number when making this
> change?

--
Three Cheers,
Balbir