2008-01-04 10:18:18

by Yi Yang

[permalink] [raw]
Subject: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

If a device can't support wakeup, its /sys/devices/.../power/wakeup output is
empty, this is confusing, a user doesn't know if it supports wakeup feature
unless he/she read the ralated source code, for this case, it is more
reasonable to output "unsupported". Otherwise, no matter what value the user
sets to /sys/devices/.../power/wakeup, the result is the same: Invalid argument,
so the user doesn't know why.

This patch changes empty output to "unsupported" in order that a user knows
wakeup feature isn't supported by this device when he/she
'cat /sys/devices/.../power/wakeup', please consider to apply, thanks.

Signed-off-by: Yi Yang <[email protected]>
---
sysfs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/base/power/sysfs.c 2008-01-04 16:50:54.000000000 +0800
+++ b/drivers/base/power/sysfs.c 2008-01-04 17:14:42.000000000 +0800
@@ -49,7 +49,7 @@ wake_show(struct device * dev, struct de
{
return sprintf(buf, "%s\n", device_can_wakeup(dev)
? (device_may_wakeup(dev) ? enabled : disabled)
- : "");
+ : "unsupported");
}

static ssize_t


2008-01-04 11:48:34

by Pavel Machek

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

Hi!

> If a device can't support wakeup, its /sys/devices/.../power/wakeup output is
> empty, this is confusing, a user doesn't know if it supports wakeup feature
> unless he/she read the ralated source code, for this case, it is more
> reasonable to output "unsupported". Otherwise, no matter what value the user
> sets to /sys/devices/.../power/wakeup, the result is the same: Invalid argument,
> so the user doesn't know why.
>
> This patch changes empty output to "unsupported" in order that a user knows
> wakeup feature isn't supported by this device when he/she
> 'cat /sys/devices/.../power/wakeup', please consider to apply,
> thanks.

What about simply removing "wakuep" file if wakeup is not supported?
Pavel

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-01-04 16:09:44

by David Brownell

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

> > This patch changes empty output to "unsupported" in order that a user knows
> > wakeup feature isn't supported by this device when he/she
> > 'cat /sys/devices/.../power/wakeup', please consider to apply,
> > thanks.
>
> What about simply removing "wakuep" file if wakeup is not supported?

It may not *stay* unsupported, so I think changing it in either
of those permanent ways would be confusing/misleading.

For example, USB devices have multiple states ... minimally, an
unconfigured state, and a configured state. Some have multiple
configurations. Only configured states can be wakeup-capable.
So a given device might have three states, but support wakeup
except in one of them...

- Dave

2008-01-04 16:31:18

by David Brownell

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

> This patch changes empty output to "unsupported" in order that a user knows
> wakeup feature isn't supported by this device when he/she
> 'cat /sys/devices/.../power/wakeup', please consider to apply, thanks.

I don't much like this patch. Not just for the technical reasons
mentioned in my previous note ... but also because it doesn't update
the documention at the top, which clearly states that "\n" is
returned for "temporary or permanent inability to issue wakeup".
And then it gives the example I gave before ...

Though I suppose a patch that changes the *entire* userspace interface,
(which includes its documentation, and all out-of-tree users) would have
shown more clearly why it wasn't a good idea. ;)


> --- a/drivers/base/power/sysfs.c 2008-01-04 16:50:54.000000000 +0800
> +++ b/drivers/base/power/sysfs.c 2008-01-04 17:14:42.000000000 +0800
> @@ -49,7 +49,7 @@ wake_show(struct device * dev, struct de
> {
> return sprintf(buf, "%s\n", device_can_wakeup(dev)
> ? (device_may_wakeup(dev) ? enabled : disabled)
> - : "");
> + : "unsupported");
> }
>
> static ssize_t
>
>

2008-01-04 16:38:37

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

On Fri, 4 Jan 2008, David Brownell wrote:

> > > This patch changes empty output to "unsupported" in order that a user knows
> > > wakeup feature isn't supported by this device when he/she
> > > 'cat /sys/devices/.../power/wakeup', please consider to apply,
> > > thanks.
> >
> > What about simply removing "wakuep" file if wakeup is not supported?
>
> It may not *stay* unsupported, so I think changing it in either
> of those permanent ways would be confusing/misleading.

How about changing it to say "unavailable"? That doesn't imply
permanence.

Alan Stern

2008-01-04 16:52:35

by Olivier Galibert

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

On Fri, Jan 04, 2008 at 11:38:29AM -0500, Alan Stern wrote:
> How about changing it to say "unavailable"? That doesn't imply
> permanence.

How about not changing a userland-visible interface gratuitously?

OG.

2008-01-04 17:20:39

by Oliver Neukum

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

Am Freitag, 4. Januar 2008 17:52:14 schrieb Olivier Galibert:
> On Fri, Jan 04, 2008 at 11:38:29AM -0500, Alan Stern wrote:
> > How about changing it to say "unavailable"? That doesn't imply
> > permanence.
>
> How about not changing a userland-visible interface gratuitously?

Very good idea.

Regards
Oliver

2008-01-07 01:27:33

by Yi Yang

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

On Fri, 2008-01-04 at 12:48 +0100, Pavel Machek wrote:
> Hi!
>
> > If a device can't support wakeup, its /sys/devices/.../power/wakeup output is
> > empty, this is confusing, a user doesn't know if it supports wakeup feature
> > unless he/she read the ralated source code, for this case, it is more
> > reasonable to output "unsupported". Otherwise, no matter what value the user
> > sets to /sys/devices/.../power/wakeup, the result is the same: Invalid argument,
> > so the user doesn't know why.
> >
> > This patch changes empty output to "unsupported" in order that a user knows
> > wakeup feature isn't supported by this device when he/she
> > 'cat /sys/devices/.../power/wakeup', please consider to apply,
> > thanks.
>
> What about simply removing "wakuep" file if wakeup is not supported?
> Pavel
At the beginning, i did that as you said, but it can't work, some
power/wakeup will disappear although they can be disabled or enabled, so
it is only one feasible way to make it exist permanently.
>

2008-01-07 01:46:47

by Yi Yang

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

On Fri, 2008-01-04 at 08:09 -0800, David Brownell wrote:
> > > This patch changes empty output to "unsupported" in order that a user knows
> > > wakeup feature isn't supported by this device when he/she
> > > 'cat /sys/devices/.../power/wakeup', please consider to apply,
> > > thanks.
> >
> > What about simply removing "wakuep" file if wakeup is not supported?
>
> It may not *stay* unsupported, so I think changing it in either
> of those permanent ways would be confusing/misleading.
>
> For example, USB devices have multiple states ... minimally, an
> unconfigured state, and a configured state. Some have multiple
> configurations. Only configured states can be wakeup-capable.
> So a given device might have three states, but support wakeup
> except in one of them...
If so, we can change "unsupported" to "unconfigurable" or "inoperable"
which can cover the states "unconfigured", "unsupported" and other
unknown states. :-).
>
> - Dave
>

2008-01-07 01:55:37

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

On Monday, 7 of January 2008, Yi Yang wrote:
> On Fri, 2008-01-04 at 08:09 -0800, David Brownell wrote:
> > > > This patch changes empty output to "unsupported" in order that a user knows
> > > > wakeup feature isn't supported by this device when he/she
> > > > 'cat /sys/devices/.../power/wakeup', please consider to apply,
> > > > thanks.
> > >
> > > What about simply removing "wakuep" file if wakeup is not supported?
> >
> > It may not *stay* unsupported, so I think changing it in either
> > of those permanent ways would be confusing/misleading.
> >
> > For example, USB devices have multiple states ... minimally, an
> > unconfigured state, and a configured state. Some have multiple
> > configurations. Only configured states can be wakeup-capable.
> > So a given device might have three states, but support wakeup
> > except in one of them...
> If so, we can change "unsupported" to "unconfigurable" or "inoperable"
> which can cover the states "unconfigured", "unsupported" and other
> unknown states. :-).

The main problem with your patch is that is changes a documented behavior
visible by the user space. That should be done very cautiously.

Thanks,
Rafael

2008-01-07 01:56:51

by Yi Yang

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

On Fri, 2008-01-04 at 08:31 -0800, David Brownell wrote:
> > This patch changes empty output to "unsupported" in order that a user knows
> > wakeup feature isn't supported by this device when he/she
> > 'cat /sys/devices/.../power/wakeup', please consider to apply, thanks.
>
> I don't much like this patch. Not just for the technical reasons
> mentioned in my previous note ... but also because it doesn't update
> the documention at the top, which clearly states that "\n" is
> returned for "temporary or permanent inability to issue wakeup".
> And then it gives the example I gave before ...
>
> Though I suppose a patch that changes the *entire* userspace interface,
> (which includes its documentation, and all out-of-tree users) would have
> shown more clearly why it wasn't a good idea. ;)
Really, "\n" should be changed to show that change.
Anyway, "\n" isn't a good indicator for that state. :-)
>
>
> > --- a/drivers/base/power/sysfs.c 2008-01-04 16:50:54.000000000 +0800
> > +++ b/drivers/base/power/sysfs.c 2008-01-04 17:14:42.000000000 +0800
> > @@ -49,7 +49,7 @@ wake_show(struct device * dev, struct de
> > {
> > return sprintf(buf, "%s\n", device_can_wakeup(dev)
> > ? (device_may_wakeup(dev) ? enabled : disabled)
> > - : "");
> > + : "unsupported");
> > }
> >
> > static ssize_t
> >
> >

2008-01-07 01:59:42

by Yi Yang

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

On Fri, 2008-01-04 at 11:38 -0500, Alan Stern wrote:
> On Fri, 4 Jan 2008, David Brownell wrote:
>
> > > > This patch changes empty output to "unsupported" in order that a user knows
> > > > wakeup feature isn't supported by this device when he/she
> > > > 'cat /sys/devices/.../power/wakeup', please consider to apply,
> > > > thanks.
> > >
> > > What about simply removing "wakuep" file if wakeup is not supported?
> >
> > It may not *stay* unsupported, so I think changing it in either
> > of those permanent ways would be confusing/misleading.
>
> How about changing it to say "unavailable"? That doesn't imply
> permanence.
This should be an option.
>
> Alan Stern
>

2008-01-07 02:01:26

by Yi Yang

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

On Fri, 2008-01-04 at 17:52 +0100, Olivier Galibert wrote:
> On Fri, Jan 04, 2008 at 11:38:29AM -0500, Alan Stern wrote:
> > How about changing it to say "unavailable"? That doesn't imply
> > permanence.
>
> How about not changing a userland-visible interface gratuitously?
"empty" can't tell a user the state of wakeup of a device, so it is
necessary to change it.
>
> OG.
>

2008-01-07 02:08:55

by Yi Yang

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

On Fri, 2008-01-04 at 18:20 +0100, Oliver Neukum wrote:
> Am Freitag, 4. Januar 2008 17:52:14 schrieb Olivier Galibert:
> > On Fri, Jan 04, 2008 at 11:38:29AM -0500, Alan Stern wrote:
> > > How about changing it to say "unavailable"? That doesn't imply
> > > permanence.
> >
> > How about not changing a userland-visible interface gratuitously?
>
Why not do if we can make it better?
> Very good idea.
>
> Regards
> Oliver

2008-01-07 02:21:34

by Yi Yang

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

On Mon, 2008-01-07 at 02:57 +0100, Rafael J. Wysocki wrote:
> On Monday, 7 of January 2008, Yi Yang wrote:
> > On Fri, 2008-01-04 at 08:09 -0800, David Brownell wrote:
> > > > > This patch changes empty output to "unsupported" in order that a user knows
> > > > > wakeup feature isn't supported by this device when he/she
> > > > > 'cat /sys/devices/.../power/wakeup', please consider to apply,
> > > > > thanks.
> > > >
> > > > What about simply removing "wakuep" file if wakeup is not supported?
> > >
> > > It may not *stay* unsupported, so I think changing it in either
> > > of those permanent ways would be confusing/misleading.
> > >
> > > For example, USB devices have multiple states ... minimally, an
> > > unconfigured state, and a configured state. Some have multiple
> > > configurations. Only configured states can be wakeup-capable.
> > > So a given device might have three states, but support wakeup
> > > except in one of them...
> > If so, we can change "unsupported" to "unconfigurable" or "inoperable"
> > which can cover the states "unconfigured", "unsupported" and other
> > unknown states. :-).
>
> The main problem with your patch is that is changes a documented behavior
> visible by the user space. That should be done very cautiously.
Which applications are using this interface? if they assume "\n" means
the unconfigurable or unavailable state, this is very fragile.
>
> Thanks,
> Rafael

2008-01-07 03:49:57

by David Brownell

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

On Sunday 06 January 2008, Yi Yang wrote:
>
> > How about not changing a userland-visible interface gratuitously?
>
> "empty" can't tell a user the state of wakeup of a device, so it is
> necessary to change it.

Words don't say anything at all in isolation. For example,
here the current tristate behavior has been documented for
several years now ... and that's the only thing that could
have conveyed any meaning whatever.

I can't agree that it's even slightly "necessary" to make
such a grutuitous and backwards-incompatible change.

2008-01-07 16:43:43

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [linux-pm][PATCH] base: Change power/wakeup output from "" to "unsupported" if wakeup feature isn't supported by a device

On Monday, 7 of January 2008, Yi Yang wrote:
> On Mon, 2008-01-07 at 02:57 +0100, Rafael J. Wysocki wrote:
> > On Monday, 7 of January 2008, Yi Yang wrote:
> > > On Fri, 2008-01-04 at 08:09 -0800, David Brownell wrote:
> > > > > > This patch changes empty output to "unsupported" in order that a user knows
> > > > > > wakeup feature isn't supported by this device when he/she
> > > > > > 'cat /sys/devices/.../power/wakeup', please consider to apply,
> > > > > > thanks.
> > > > >
> > > > > What about simply removing "wakuep" file if wakeup is not supported?
> > > >
> > > > It may not *stay* unsupported, so I think changing it in either
> > > > of those permanent ways would be confusing/misleading.
> > > >
> > > > For example, USB devices have multiple states ... minimally, an
> > > > unconfigured state, and a configured state. Some have multiple
> > > > configurations. Only configured states can be wakeup-capable.
> > > > So a given device might have three states, but support wakeup
> > > > except in one of them...
> > > If so, we can change "unsupported" to "unconfigurable" or "inoperable"
> > > which can cover the states "unconfigured", "unsupported" and other
> > > unknown states. :-).
> >
> > The main problem with your patch is that is changes a documented behavior
> > visible by the user space. That should be done very cautiously.
> Which applications are using this interface? if they assume "\n" means
> the unconfigurable or unavailable state, this is very fragile.

I can't give you examples right now, but since the behavior has been
documented for quite some time, it's safe to assume there are some unless
proven otherwise.

Also, fragile or not, it is the user space's right to rely on documented
behavior of the kernel.

Thanks,
Rafael

2008-01-23 08:16:17

by Yi Yang

[permalink] [raw]
Subject: [PATCH 2.6.24-rc8] cpufreq: fix obvious condition statement error

The function __cpufreq_set_policy in file drivers/cpufreq/cpufreq.c
has a very obvious error:

if (policy->min > data->min && policy->min > policy->max) {
ret = -EINVAL;
goto error_out;
}

This condtion statement is wrong because it returns -EINVAL only if
policy->min is greater than policy->max (in this case,
"policy->min > data->min" is true for ever.). In fact, it should
return -EINVAL as well if policy->max is less than data->min.

The correct condition should be:

if (policy->min > data->max || policy->max < data->min) {

The following test result testifies the above conclusion:

Before applying this patch:

[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
2394000 1596000
[root@yangyi-dev /]# echo 1596000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1596000
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
1596000
[root@yangyi-dev /]# echo "2000000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
1596000
[root@yangyi-dev /]# echo "0" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1596000
[root@yangyi-dev /]# echo "1595000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1596000
[root@yangyi-dev /]#


After applying this patch:

[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies
2394000 1596000
[root@yangyi-dev /]# echo 1596000 > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1596000
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
1596000
[root@localhost /]# echo "2000000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
-bash: echo: write error: Invalid argument
[root@localhost /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
1596000
[root@localhost /]# echo "0" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
-bash: echo: write error: Invalid argument
[root@localhost /]# echo "1595000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
-bash: echo: write error: Invalid argument
[root@localhost /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
1596000
[root@localhost /]# echo "1596000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
[root@localhost /]# echo "2394000" > /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
[root@localhost /]# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq
2394000
[root@localhost /]


Signed-off-by: Yi Yang <[email protected]>
---
cpufreq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/cpufreq/cpufreq.c 2008-01-23 05:02:28.000000000 +0800
+++ b/drivers/cpufreq/cpufreq.c 2008-01-23 06:11:21.000000000 +0800
@@ -1608,7 +1608,7 @@ static int __cpufreq_set_policy(struct c
memcpy(&policy->cpuinfo, &data->cpuinfo,
sizeof(struct cpufreq_cpuinfo));

- if (policy->min > data->min && policy->min > policy->max) {
+ if (policy->min > data->max || policy->max < data->min) {
ret = -EINVAL;
goto error_out;
}

2008-01-29 08:27:53

by Yi Yang

[permalink] [raw]
Subject: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

Current cpuid module will create a char device for every logical cpu,
when a user cats /dev/cpu/*/cpuid, he/she will enter a limitless loop,
the root cause is that cpuid module doesn't decide wether a cpuid level
is valid, it just uses an offset to denote cpuid level and take it to
cpuid instruction, cpuid instruction will ignore it and return some data
specific to cpu model, cpuid doesn't an error return value because it is
void type. So cpuid module will execute cpuid continuously and return
data although most of data make no sense.

This patch tries to add a sysfs interface for cpuid, users can see all the
available cpuid levels, specify a specific level and get cpuid corresponding
to this cpuid level.

For every logical cpu, this patch will create a cpuid directory under
/sys/devices/system/cpu/cpu*/, there are three entries under cpuid:

avail_levels cur_level cur_cpuid

A user can get all the available cpuid levels from avail_levels, he/she can
set one available cpuid level to cur_level, then he/she can get cpuid from
cur_cpuid, cur_cpuid corresponds to cur_level.

This patch uses sysfs to avoid limitless loop and provide more flexible
interface for cpuid, please consider to merge to -mm tree in order to test.

Here are some example output:
[root@yangyi-dev /]# ls /sys/devices/system/cpu/cpu0/cpuid
avail_levels cur_cpuid cur_level
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpuid/avail_levels
0x00000000
0x00000001
0x00000002
0x00000003
0x00000004
0x00000005
0x00000006
0x00000007
0x00000008
0x00000009
0x0000000a
0x80000000
0x80000001
0x80000002
0x80000003
0x80000004
0x80000005
0x80000006
0x80000007
0x80000008
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpuid/cur_level
0x00000000
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpuid/cur_cpuid
0x0000000a 0x756e6547 0x6c65746e 0x49656e69
[root@yangyi-dev /]# echo 0x80000007 > /sys/devices/system/cpu/cpu0/cpuid/cur_level
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpuid/cur_cpuid
0x00000000 0x00000000 0x00000000 0x00000000
[root@yangyi-dev /]# echo 0x80000001 > /sys/devices/system/cpu/cpu0/cpuid/cur_level
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpuid/cur_cpuid
0x00000000 0x00000000 0x00000001 0x20100000
[root@yangyi-dev /]# ls /sys/devices/system/cpu/cpu1/cpuid
avail_levels cur_cpuid cur_level
[root@yangyi-dev /]#

Signed-off-by: Yi Yang <[email protected]>
---
cpuid.c | 233 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 232 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/cpuid.c 2008-01-28 01:28:48.000000000 +0800
+++ b/arch/x86/kernel/cpuid.c 2008-01-29 06:27:15.000000000 +0800
@@ -175,6 +175,228 @@ static struct notifier_block __cpuinitda
.notifier_call = cpuid_class_cpu_callback,
};

+struct cpuid_info {
+ int cpu;
+ unsigned int cur_level;
+ unsigned int min_level;
+ unsigned int max_level;
+ unsigned int min_ext_level;
+ unsigned int max_ext_level;
+ struct kobject kobj;
+};
+
+struct cpuid_attr {
+ struct attribute attr;
+ ssize_t (*show)(struct cpuid_info *, char *);
+ ssize_t (*store)(struct cpuid_info *, const char *, size_t count);
+};
+
+static struct cpuid_info cpuid_infos[NR_CPUS];
+
+static cpumask_t cpu_map = CPU_MASK_NONE;
+
+static ssize_t show_cur_level(struct cpuid_info *cpuid_info_p, char *buf)
+{
+ return sprintf(buf, "0x%08x\n", cpuid_info_p->cur_level);
+}
+
+static ssize_t store_cur_level(struct cpuid_info *cpuid_info_p,
+ const char *buf, size_t count)
+{
+ char *p;
+ unsigned int val;
+ size_t len = 13;
+ char tmpbuf[16];
+
+ if ((count > len) || (count <= 0))
+ return -EINVAL;
+
+ len = count;
+ if (buf[count-1] == '\n')
+ len--;
+
+ memcpy(tmpbuf, buf, len);
+ tmpbuf[len] = '\0';
+ val = simple_strtoul(tmpbuf, &p, 0);
+
+ if (*p != '\0')
+ return -EINVAL;
+ if (((val >= cpuid_info_p->min_level)
+ && (val <= cpuid_info_p->max_level))
+ || ((val >= cpuid_info_p->min_ext_level)
+ && (val <= cpuid_info_p->max_ext_level))) {
+ cpuid_info_p->cur_level = val;
+ return count;
+ } else
+ return -EINVAL;
+}
+
+static ssize_t show_avail_levels(struct cpuid_info *cpuid_info_p, char *buf)
+{
+ unsigned int level;
+ ssize_t len = 0;
+
+ level = cpuid_info_p->min_level;
+ while (level <= cpuid_info_p->max_level) {
+ len += sprintf(buf + len, "0x%08x\n", level);
+ level++;
+ }
+
+ level = cpuid_info_p->min_ext_level;
+ while (level <= cpuid_info_p->max_ext_level) {
+ len += sprintf(buf + len, "0x%08x\n", level);
+ level++;
+ }
+ return len;
+}
+
+static ssize_t show_cur_cpuid(struct cpuid_info *cpuid_info_p, char *buf)
+{
+ u32 data[4];
+
+ do_cpuid(cpuid_info_p->cpu, cpuid_info_p->cur_level, data);
+ return sprintf(buf, "0x%08x 0x%08x 0x%08x 0x%08x\n",
+ data[0], data[1], data[2], data[3]);
+}
+
+static struct cpuid_attr cur_level =
+ __ATTR(cur_level, 0600, show_cur_level, store_cur_level);
+static struct cpuid_attr avail_levels =
+ __ATTR(avail_levels, 0400, show_avail_levels, NULL);
+static struct cpuid_attr cur_cpuid =
+ __ATTR(cur_cpuid, 0400, show_cur_cpuid, NULL);
+
+static struct attribute *default_attrs[] = {
+ &cur_level.attr,
+ &avail_levels.attr,
+ &cur_cpuid.attr,
+ NULL
+};
+#define to_cpuid_info(k) container_of(k, struct cpuid_info, kobj)
+#define to_cpuid_attr(a) container_of(a, struct cpuid_attr, attr)
+
+static ssize_t check_cpu(struct cpuid_info *cpuid_info_p)
+{
+ int cpu = cpuid_info_p->cpu;
+ struct cpuinfo_x86 *c = &cpu_data(cpu);
+
+ if (cpu >= NR_CPUS || !cpu_online(cpu))
+ return -ENXIO; /* No such CPU */
+ if (c->cpuid_level < 0)
+ return -EIO; /* CPUID not supported */
+
+ return 0;
+}
+
+static ssize_t cpuid_show(struct kobject *kobj, struct attribute *attr,
+ char *buf)
+{
+ ssize_t ret;
+ struct cpuid_attr *fattr = to_cpuid_attr(attr);
+ struct cpuid_info *cpuid_info_p = to_cpuid_info(kobj);
+
+ ret = check_cpu(cpuid_info_p);
+ if (ret != 0)
+ return ret;
+
+ return fattr->show(cpuid_info_p, buf);
+}
+
+static ssize_t cpuid_store(struct kobject *kobj, struct attribute *attr,
+ const char *buf, size_t count)
+{
+ struct cpuid_attr *fattr = to_cpuid_attr(attr);
+ struct cpuid_info *cpuid_info_p = to_cpuid_info(kobj);
+ ssize_t ret;
+
+ ret = check_cpu(cpuid_info_p);
+ if (ret != 0)
+ return ret;
+
+ ret = fattr->store ? fattr->store(cpuid_info_p, buf, count) : 0;
+ return ret;
+}
+
+static void cpuid_sysfs_release(struct kobject *kobj)
+{
+}
+
+static struct sysfs_ops cpuid_sysfs_ops = {
+ .show = cpuid_show,
+ .store = cpuid_store,
+};
+
+static struct kobj_type ktype_cpuid = {
+ .sysfs_ops = &cpuid_sysfs_ops,
+ .default_attrs = default_attrs,
+ .release = cpuid_sysfs_release,
+};
+
+static __cpuinit int create_cpuid_sysfs(int cpu)
+{
+ struct sys_device *cpu_sys_dev = NULL;
+ u32 data[4];
+ int retval;
+
+ cpu_sys_dev = get_cpu_sysdev(cpu);
+ if (cpu_sys_dev == NULL)
+ return -1;
+
+ cpuid_infos[cpu].kobj.parent = &cpu_sys_dev->kobj;
+ kobject_set_name(&(cpuid_infos[cpu].kobj), "%s", "cpuid");
+ cpuid_infos[cpu].kobj.ktype = &ktype_cpuid;
+
+ do_cpuid(cpu, 0, data);
+ cpuid_infos[cpu].cpu = cpu;
+ cpuid_infos[cpu].cur_level = 0;
+ cpuid_infos[cpu].min_level = 0;
+ cpuid_infos[cpu].max_level = data[0];
+
+ do_cpuid(cpu, 0x80000000, data);
+ cpuid_infos[cpu].min_ext_level = 0x80000000;
+ cpuid_infos[cpu].max_ext_level = data[0];
+
+ retval = kobject_register(&(cpuid_infos[cpu].kobj));
+ if (retval < 0) {
+ cpu_clear(cpu, cpu_map);
+ return retval;
+ }
+ cpu_set(cpu, cpu_map);
+ return 0;
+}
+
+static void remove_cpuid_sysfs(int cpu)
+{
+ if (cpu_isset(cpu, cpu_map)) {
+ cpu_clear(cpu, cpu_map);
+ kobject_unregister(&(cpuid_infos[cpu].kobj));
+ memset(&cpuid_infos[cpu], 0, sizeof(struct cpuid_info));
+ }
+}
+
+static int __cpuinit cpuid_sysfs_cpu_callback(struct notifier_block *nfb,
+ unsigned long action,
+ void *hcpu)
+{
+ unsigned int cpu = (unsigned long)hcpu;
+
+ switch (action) {
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ create_cpuid_sysfs(cpu);
+ break;
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ remove_cpuid_sysfs(cpu);
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block __cpuinitdata cpuid_sysfs_cpu_notifier = {
+ .notifier_call = cpuid_sysfs_cpu_callback,
+};
+
static int __init cpuid_init(void)
{
int i, err = 0;
@@ -195,8 +417,13 @@ static int __init cpuid_init(void)
err = cpuid_device_create(i);
if (err != 0)
goto out_class;
+
+ err = create_cpuid_sysfs(i);
+ if (err != 0)
+ goto out_class;
}
register_hotcpu_notifier(&cpuid_class_cpu_notifier);
+ register_hotcpu_notifier(&cpuid_sysfs_cpu_notifier);

err = 0;
goto out;
@@ -205,6 +432,7 @@ out_class:
i = 0;
for_each_online_cpu(i) {
cpuid_device_destroy(i);
+ remove_cpuid_sysfs(i);
}
class_destroy(cpuid_class);
out_chrdev:
@@ -217,11 +445,14 @@ static void __exit cpuid_exit(void)
{
int cpu = 0;

- for_each_online_cpu(cpu)
+ for_each_online_cpu(cpu) {
cpuid_device_destroy(cpu);
+ remove_cpuid_sysfs(cpu);
+ }
class_destroy(cpuid_class);
unregister_chrdev(CPUID_MAJOR, "cpu/cpuid");
unregister_hotcpu_notifier(&cpuid_class_cpu_notifier);
+ unregister_hotcpu_notifier(&cpuid_sysfs_cpu_notifier);
}

module_init(cpuid_init);

2008-01-29 08:44:42

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

> +
> +static struct notifier_block __cpuinitdata cpuid_sysfs_cpu_notifier = {
> + .notifier_call = cpuid_sysfs_cpu_callback,
> +};
Data is annotated _cpuintidata

but

> +
Data is annotated _cpuintidata

> @@ -217,11 +445,14 @@ static void __exit cpuid_exit(void)
> {
> int cpu = 0;
>
> - for_each_online_cpu(cpu)
> + for_each_online_cpu(cpu) {
> cpuid_device_destroy(cpu);
> + remove_cpuid_sysfs(cpu);
> + }
> class_destroy(cpuid_class);
> unregister_chrdev(CPUID_MAJOR, "cpu/cpuid");
> unregister_hotcpu_notifier(&cpuid_class_cpu_notifier);
> + unregister_hotcpu_notifier(&cpuid_sysfs_cpu_notifier);

used in an __exit function.

You should have seen a Section mismatch warning for this.
The right fix is to annotate the cpuid_sysfs_cpu_notifier
with __initdata_refok (soon to be named __refdata)
Or even better to declare it const and use _refconst.

Sam

2008-01-29 15:54:43

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

Yi Yang wrote:
> Current cpuid module will create a char device for every logical cpu,
> when a user cats /dev/cpu/*/cpuid, he/she will enter a limitless loop,
> the root cause is that cpuid module doesn't decide wether a cpuid level
> is valid, it just uses an offset to denote cpuid level and take it to
> cpuid instruction, cpuid instruction will ignore it and return some data
> specific to cpu model, cpuid doesn't an error return value because it is
> void type. So cpuid module will execute cpuid continuously and return
> data although most of data make no sense.
>
> This patch tries to add a sysfs interface for cpuid, users can see all the
> available cpuid levels, specify a specific level and get cpuid corresponding
> to this cpuid level.
>
> For every logical cpu, this patch will create a cpuid directory under
> /sys/devices/system/cpu/cpu*/, there are three entries under cpuid:
>
> avail_levels cur_level cur_cpuid
>
> A user can get all the available cpuid levels from avail_levels, he/she can
> set one available cpuid level to cur_level, then he/she can get cpuid from
> cur_cpuid, cur_cpuid corresponds to cur_level.
>
> This patch uses sysfs to avoid limitless loop and provide more flexible
> interface for cpuid, please consider to merge to -mm tree in order to test.

This is broken.

Triple broken.

It's broken, because it doesn't take into account the fact that Intel
broke CPUID level 4 and made it "repeating" (neither did the cpuid char
device, because it predated the Intel braindamage; I've had a patch for
it privately for a while, but didn't push it upstream because paravirt
broke it royally and I wanted the situation to settle down.)

It's broken, because the algorithm used to determine valid CPUID levels
is incorrect; it fails to recognize any CPUID levels other than the main
Intel and AMD ones, e.g. the Transmeta 0x8086xxxx (and sometimes more)
and VIA 0xc000xxxx levels.

It's broken, because it is better for the userspace extractor to have
this logic than to stuff it into the kernel, where it sits hogging
unswappable memory at all times.

-hpa

2008-01-30 03:04:29

by Yi Yang

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

On Tue, 2008-01-29 at 07:51 -0800, H. Peter Anvin wrote:
> Yi Yang wrote:
> > Current cpuid module will create a char device for every logical cpu,
> > when a user cats /dev/cpu/*/cpuid, he/she will enter a limitless loop,
> > the root cause is that cpuid module doesn't decide wether a cpuid level
> > is valid, it just uses an offset to denote cpuid level and take it to
> > cpuid instruction, cpuid instruction will ignore it and return some data
> > specific to cpu model, cpuid doesn't an error return value because it is
> > void type. So cpuid module will execute cpuid continuously and return
> > data although most of data make no sense.
> >
> > This patch tries to add a sysfs interface for cpuid, users can see all the
> > available cpuid levels, specify a specific level and get cpuid corresponding
> > to this cpuid level.
> >
> > For every logical cpu, this patch will create a cpuid directory under
> > /sys/devices/system/cpu/cpu*/, there are three entries under cpuid:
> >
> > avail_levels cur_level cur_cpuid
> >
> > A user can get all the available cpuid levels from avail_levels, he/she can
> > set one available cpuid level to cur_level, then he/she can get cpuid from
> > cur_cpuid, cur_cpuid corresponds to cur_level.
> >
> > This patch uses sysfs to avoid limitless loop and provide more flexible
> > interface for cpuid, please consider to merge to -mm tree in order to test.
>
> This is broken.
>
> Triple broken.
>
> It's broken, because it doesn't take into account the fact that Intel
> broke CPUID level 4 and made it "repeating" (neither did the cpuid char
> device, because it predated the Intel braindamage; I've had a patch for
> it privately for a while, but didn't push it upstream because paravirt
> broke it royally and I wanted the situation to settle down.)
>
> It's broken, because the algorithm used to determine valid CPUID levels
> is incorrect; it fails to recognize any CPUID levels other than the main
> Intel and AMD ones, e.g. the Transmeta 0x8086xxxx (and sometimes more)
> and VIA 0xc000xxxx levels.
Thank you for pointing out these issues, i think we can let users input
any cpuid level and output the corresponding cpuid, in this way we can
avoid to consider cpu differences and left this to userspace. We can
also consider all the x86 platforms to do cpuid for every one.

>
> It's broken, because it is better for the userspace extractor to have
> this logic than to stuff it into the kernel, where it sits hogging
> unswappable memory at all times.
It seems not to be very appropriate to let user space consider hardware
details. /proc/cpuinfo should be an example to justify this.

Is there any user application using /dev/cpu/*/cpuid? if no, i think it
is feasible to provide an interface in the kernel.

I noticed an application cpu-z on Windows, maybe we can clone it on
Linux.
>
> -hpa

2008-01-30 06:10:45

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

Yi Yang wrote:
>>
>> It's broken, because it doesn't take into account the fact that Intel
>> broke CPUID level 4 and made it "repeating" (neither did the cpuid char
>> device, because it predated the Intel braindamage; I've had a patch for
>> it privately for a while, but didn't push it upstream because paravirt
>> broke it royally and I wanted the situation to settle down.)
>>
>> It's broken, because the algorithm used to determine valid CPUID levels
>> is incorrect; it fails to recognize any CPUID levels other than the main
>> Intel and AMD ones, e.g. the Transmeta 0x8086xxxx (and sometimes more)
>> and VIA 0xc000xxxx levels.
> Thank you for pointing out these issues, i think we can let users input
> any cpuid level and output the corresponding cpuid, in this way we can
> avoid to consider cpu differences and left this to userspace. We can
> also consider all the x86 platforms to do cpuid for every one.
>
>> It's broken, because it is better for the userspace extractor to have
>> this logic than to stuff it into the kernel, where it sits hogging
>> unswappable memory at all times.
> It seems not to be very appropriate to let user space consider hardware
> details. /proc/cpuinfo should be an example to justify this.

/proc/cpuinfo represents what the kernel needs to know, so it reflects
the kernel's interpretation of CPUID. There is no reason to interpret
things in the kernel that the kernel doesn't need.

> Is there any user application using /dev/cpu/*/cpuid? if no, i think it
> is feasible to provide an interface in the kernel.

Yes. It's called x86info, I believe.

-hpa

2008-01-30 07:08:28

by Yi Yang

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

On Tue, 2008-01-29 at 07:51 -0800, H. Peter Anvin wrote:
> Yi Yang wrote:
> > Current cpuid module will create a char device for every logical cpu,
> > when a user cats /dev/cpu/*/cpuid, he/she will enter a limitless loop,
> > the root cause is that cpuid module doesn't decide wether a cpuid level
> > is valid, it just uses an offset to denote cpuid level and take it to
> > cpuid instruction, cpuid instruction will ignore it and return some data
> >
> > This patch uses sysfs to avoid limitless loop and provide more flexible
> > interface for cpuid, please consider to merge to -mm tree in order to test.
>
> This is broken.
>
> Triple broken.
>
> It's broken, because it doesn't take into account the fact that Intel
> broke CPUID level 4 and made it "repeating" (neither did the cpuid char
> device, because it predated the Intel braindamage; I've had a patch for
> it privately for a while, but didn't push it upstream because paravirt
> broke it royally and I wanted the situation to settle down.)
level 4 doesn't result in repeating on Intel CPU, cpuid module sets
file offset to level, so cat /dev/cpu/*/cpuid will run cpuid instruction
continuously.

2008-01-30 07:18:17

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

Yi Yang wrote:
>>
>> It's broken, because it doesn't take into account the fact that Intel
>> broke CPUID level 4 and made it "repeating" (neither did the cpuid char
>> device, because it predated the Intel braindamage; I've had a patch for
>> it privately for a while, but didn't push it upstream because paravirt
>> broke it royally and I wanted the situation to settle down.)

> level 4 doesn't result in repeating on Intel CPU, cpuid module sets
> file offset to level, so cat /dev/cpu/*/cpuid will run cpuid instruction
> continuously.

The issue is that Intel suddenly made CPUID ECX-sensitive, which there
is no way to represent.

As far as cat /dev/cpu/*/cpuid, that's a user error.

-hpa

2008-01-30 07:24:46

by Yi Yang

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

On Tue, 2008-01-29 at 23:17 -0800, H. Peter Anvin wrote:
> Yi Yang wrote:
> >>
> >> It's broken, because it doesn't take into account the fact that Intel
> >> broke CPUID level 4 and made it "repeating" (neither did the cpuid char
> >> device, because it predated the Intel braindamage; I've had a patch for
> >> it privately for a while, but didn't push it upstream because paravirt
> >> broke it royally and I wanted the situation to settle down.)
>
> > level 4 doesn't result in repeating on Intel CPU, cpuid module sets
> > file offset to level, so cat /dev/cpu/*/cpuid will run cpuid instruction
> > continuously.
>
> The issue is that Intel suddenly made CPUID ECX-sensitive, which there
> is no way to represent.
Function cpuid has reset ecx to 0 immediate before calling to __cpuid,
so this shouldn't be a problem now.

in include/asm-x86/processor_32.h
/*
* Generic CPUID function
* clear %ecx since some cpus (Cyrix MII) do not set or clear %ecx
* resulting in stale register contents being returned.
*/
static inline void cpuid(unsigned int op,
unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx)
{
*eax = op;
*ecx = 0;
__cpuid(eax, ebx, ecx, edx);
}
>
> As far as cat /dev/cpu/*/cpuid, that's a user error.
>
> -hpa
>

2008-01-30 07:34:07

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

Yi Yang wrote:
> Function cpuid has reset ecx to 0 immediate before calling to __cpuid,
> so this shouldn't be a problem now.

Unless, of course, you want to get to the information for the higher
CPUID levels.

The easiest way to fix that would be to use cpuid_count() and let
/dev/cpu/*/cpuid take the %ecx value in the high half of the offset.
That would have minimal impact on the interface.

-hpa

2008-01-30 07:35:33

by Yi Yang

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

On Tue, 2008-01-29 at 09:44 +0100, Sam Ravnborg wrote:
> > +
> > +static struct notifier_block __cpuinitdata cpuid_sysfs_cpu_notifier = {
> > + .notifier_call = cpuid_sysfs_cpu_callback,
> > +};
> Data is annotated _cpuintidata
>
> but
>
> > +
> Data is annotated _cpuintidata
>
> > @@ -217,11 +445,14 @@ static void __exit cpuid_exit(void)
> > {
> > int cpu = 0;
> >
> > - for_each_online_cpu(cpu)
> > + for_each_online_cpu(cpu) {
> > cpuid_device_destroy(cpu);
> > + remove_cpuid_sysfs(cpu);
> > + }
> > class_destroy(cpuid_class);
> > unregister_chrdev(CPUID_MAJOR, "cpu/cpuid");
> > unregister_hotcpu_notifier(&cpuid_class_cpu_notifier);
> > + unregister_hotcpu_notifier(&cpuid_sysfs_cpu_notifier);
>
> used in an __exit function.
>
> You should have seen a Section mismatch warning for this.
> The right fix is to annotate the cpuid_sysfs_cpu_notifier
> with __initdata_refok (soon to be named __refdata)
> Or even better to declare it const and use _refconst.
I think __cpuinitdata is different from __initdata, i have tested it
by insmod, rmmod, echo 0/1 > /sys/devices/system/cpu/cpu1/online
repeatly, it hasn't any issue.

>
> Sam

2008-01-30 07:49:01

by Sam Ravnborg

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

On Wed, Jan 30, 2008 at 06:22:43AM +0800, Yi Yang wrote:
> On Tue, 2008-01-29 at 09:44 +0100, Sam Ravnborg wrote:
> > > +
> > > +static struct notifier_block __cpuinitdata cpuid_sysfs_cpu_notifier = {
> > > + .notifier_call = cpuid_sysfs_cpu_callback,
> > > +};
> > Data is annotated _cpuintidata
> >
> > but
> >
> > > +
> > Data is annotated _cpuintidata
> >
> > > @@ -217,11 +445,14 @@ static void __exit cpuid_exit(void)
> > > {
> > > int cpu = 0;
> > >
> > > - for_each_online_cpu(cpu)
> > > + for_each_online_cpu(cpu) {
> > > cpuid_device_destroy(cpu);
> > > + remove_cpuid_sysfs(cpu);
> > > + }
> > > class_destroy(cpuid_class);
> > > unregister_chrdev(CPUID_MAJOR, "cpu/cpuid");
> > > unregister_hotcpu_notifier(&cpuid_class_cpu_notifier);
> > > + unregister_hotcpu_notifier(&cpuid_sysfs_cpu_notifier);
> >
> > used in an __exit function.
> >
> > You should have seen a Section mismatch warning for this.
> > The right fix is to annotate the cpuid_sysfs_cpu_notifier
> > with __initdata_refok (soon to be named __refdata)
> > Or even better to declare it const and use _refconst.
> I think __cpuinitdata is different from __initdata, i have tested it
> by insmod, rmmod, echo 0/1 > /sys/devices/system/cpu/cpu1/online
> repeatly, it hasn't any issue.

__cpuinit & _cpuinitdata have over time been used for
different purposes:
a) To annotate code/data used in the init path and that in the
non HOTPLUG_CPU case can be discarded after init.
b) To annotate code/data used in the 'core' HOTPLUG_CPU
functionality that isonly in use if HOTPLUG_CPU='y'


The b) usage is questionable and the annotation
of cpuid_sysfs_cpu_notifier beongs in the b) category.

The correct solution would be to factor out the 'core'
HOTPLUG_CPU=y code to a set of separate files and used to
usual mechanishm in the Makefile to select when to include
this code in the kernel.

The improved section mismatch checks by modpost has just
brought this issue to the attention and now you add code
that does the wrong thing it is being discussed.

Sam

2008-01-30 08:56:57

by Yi Yang

[permalink] [raw]
Subject: [RFC PATCH 2.6.24] x86: add sysfs interface for cpuid module, try 2

This patch is try 2, it should cover VIA/Cyrix and Transmeta Crusoe, please
help to test it if anybody has VIA/Cyrix and Transmeta Crusoe. This patch just
makes users get cpuid raw data more conveniently by
/sys/devices/syste/cpu/cpu*/cpuid/* without using any userspace application.

Current cpuid module will create a char device for every logical cpu,
when a user cats /dev/cpu/*/cpuid, he/she will enter a limitless loop,
the root cause is that cpuid module doesn't decide wether a cpuid level
is valid, it just uses an offset to denote cpuid level and take it to
cpuid instruction, cpuid instruction will ignore it and return some data
specific to cpu model, cpuid doesn't an error return value because it is
void type. So cpuid module will execute cpuid continuously and return
data although most of data make no sense.

This patch tries to add a sysfs interface for cpuid, users can see all the
available cpuid levels, specify a specific level and get cpuid corresponding
to this cpuid level.

For every logical cpu, this patch will create a cpuid directory under
/sys/devices/system/cpu/cpu*/, there are three entries under cpuid:

avail_levels cur_level cur_cpuid

A user can get all the available cpuid levels from avail_levels, he/she can
set one available cpuid level to cur_level, then he/she can get cpuid from
cur_cpuid, cur_cpuid corresponds to cur_level.

Here are some example output:
[root@yangyi-dev /]# ls /sys/devices/system/cpu/cpu0/cpuid
avail_levels cur_cpuid cur_level
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpuid/avail_levels
0x00000000
0x00000001
0x00000002
0x00000003
0x00000004
0x00000005
0x00000006
0x00000007
0x00000008
0x00000009
0x0000000a
0x80000000
0x80000001
0x80000002
0x80000003
0x80000004
0x80000005
0x80000006
0x80000007
0x80000008
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpuid/cur_level
0x00000000
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpuid/cur_cpuid
0x0000000a 0x756e6547 0x6c65746e 0x49656e69
[root@yangyi-dev /]# echo 0x80000007 > /sys/devices/system/cpu/cpu0/cpuid/cur_level
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpuid/cur_cpuid
0x00000000 0x00000000 0x00000000 0x00000000
[root@yangyi-dev /]# echo 0x80000001 > /sys/devices/system/cpu/cpu0/cpuid/cur_level
[root@yangyi-dev /]# cat /sys/devices/system/cpu/cpu0/cpuid/cur_cpuid
0x00000000 0x00000000 0x00000001 0x20100000
[root@yangyi-dev /]# ls /sys/devices/system/cpu/cpu1/cpuid
avail_levels cur_cpuid cur_level
[root@yangyi-dev /]#

Signed-off-by: Yi Yang <[email protected]>
---
cpuid.c | 255 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 254 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/cpuid.c 2008-01-28 01:28:48.000000000 +0800
+++ b/arch/x86/kernel/cpuid.c 2008-01-30 07:31:40.000000000 +0800
@@ -175,6 +175,250 @@ static struct notifier_block __cpuinitda
.notifier_call = cpuid_class_cpu_callback,
};

+struct cpuid_info {
+ int cpu;
+ unsigned int cur_level;
+ unsigned int min_level;
+ unsigned int max_level;
+ unsigned int min_ext_level;
+ unsigned int max_ext_level;
+ struct kobject kobj;
+};
+
+struct cpuid_attr {
+ struct attribute attr;
+ ssize_t (*show)(struct cpuid_info *, char *);
+ ssize_t (*store)(struct cpuid_info *, const char *, size_t count);
+};
+
+static struct cpuid_info cpuid_infos[NR_CPUS];
+
+static cpumask_t cpu_map = CPU_MASK_NONE;
+
+#define A32(a, b, c, d) (((d) << 24) + ((c) << 16) + ((b) << 8) + (a))
+
+static int is_centaur(u32 *data)
+{
+ return data[1] == A32('C', 'e', 'n', 't') &&
+ data[3] == A32('a', 'u', 'r', 'H') &&
+ data[2] == A32('a', 'u', 'l', 's');
+}
+
+static int is_transmeta(u32 *data)
+{
+ return data[1] == A32('G', 'e', 'n', 'u') &&
+ data[3] == A32('i', 'n', 'e', 'T') &&
+ data[2] == A32('M', 'x', '8', '6');
+}
+
+static ssize_t show_cur_level(struct cpuid_info *cpuid_info_p, char *buf)
+{
+ return sprintf(buf, "0x%08x\n", cpuid_info_p->cur_level);
+}
+
+static ssize_t store_cur_level(struct cpuid_info *cpuid_info_p,
+ const char *buf, size_t count)
+{
+ char *p;
+ unsigned int val;
+ size_t len = 13;
+ char tmpbuf[16];
+
+ if ((count > len) || (count <= 0))
+ return -EINVAL;
+
+ len = count;
+ if (buf[count-1] == '\n')
+ len--;
+
+ memcpy(tmpbuf, buf, len);
+ tmpbuf[len] = '\0';
+ val = simple_strtoul(tmpbuf, &p, 0);
+
+ if (*p != '\0')
+ return -EINVAL;
+ if (((val >= cpuid_info_p->min_level)
+ && (val <= cpuid_info_p->max_level))
+ || ((val >= cpuid_info_p->min_ext_level)
+ && (val <= cpuid_info_p->max_ext_level))) {
+ cpuid_info_p->cur_level = val;
+ return count;
+ } else
+ return -EINVAL;
+}
+
+static ssize_t show_avail_levels(struct cpuid_info *cpuid_info_p, char *buf)
+{
+ unsigned int level;
+ ssize_t len = 0;
+
+ level = cpuid_info_p->min_level;
+ while (level <= cpuid_info_p->max_level) {
+ len += sprintf(buf + len, "0x%08x\n", level);
+ level++;
+ }
+
+ level = cpuid_info_p->min_ext_level;
+ while (level <= cpuid_info_p->max_ext_level) {
+ len += sprintf(buf + len, "0x%08x\n", level);
+ level++;
+ }
+ return len;
+}
+
+static ssize_t show_cur_cpuid(struct cpuid_info *cpuid_info_p, char *buf)
+{
+ u32 data[4];
+
+ do_cpuid(cpuid_info_p->cpu, cpuid_info_p->cur_level, data);
+ return sprintf(buf, "0x%08x 0x%08x 0x%08x 0x%08x\n",
+ data[0], data[1], data[2], data[3]);
+}
+
+static struct cpuid_attr cur_level =
+ __ATTR(cur_level, 0600, show_cur_level, store_cur_level);
+static struct cpuid_attr avail_levels =
+ __ATTR(avail_levels, 0400, show_avail_levels, NULL);
+static struct cpuid_attr cur_cpuid =
+ __ATTR(cur_cpuid, 0400, show_cur_cpuid, NULL);
+
+static struct attribute *default_attrs[] = {
+ &cur_level.attr,
+ &avail_levels.attr,
+ &cur_cpuid.attr,
+ NULL
+};
+#define to_cpuid_info(k) container_of(k, struct cpuid_info, kobj)
+#define to_cpuid_attr(a) container_of(a, struct cpuid_attr, attr)
+
+static ssize_t check_cpu(struct cpuid_info *cpuid_info_p)
+{
+ int cpu = cpuid_info_p->cpu;
+ struct cpuinfo_x86 *c = &cpu_data(cpu);
+
+ if (cpu >= NR_CPUS || !cpu_online(cpu))
+ return -ENXIO; /* No such CPU */
+ if (c->cpuid_level < 0)
+ return -EIO; /* CPUID not supported */
+
+ return 0;
+}
+
+static ssize_t cpuid_show(struct kobject *kobj, struct attribute *attr,
+ char *buf)
+{
+ ssize_t ret;
+ struct cpuid_attr *fattr = to_cpuid_attr(attr);
+ struct cpuid_info *cpuid_info_p = to_cpuid_info(kobj);
+
+ ret = check_cpu(cpuid_info_p);
+ if (ret != 0)
+ return ret;
+
+ return fattr->show(cpuid_info_p, buf);
+}
+
+static ssize_t cpuid_store(struct kobject *kobj, struct attribute *attr,
+ const char *buf, size_t count)
+{
+ struct cpuid_attr *fattr = to_cpuid_attr(attr);
+ struct cpuid_info *cpuid_info_p = to_cpuid_info(kobj);
+ ssize_t ret;
+
+ ret = check_cpu(cpuid_info_p);
+ if (ret != 0)
+ return ret;
+
+ ret = fattr->store ? fattr->store(cpuid_info_p, buf, count) : 0;
+ return ret;
+}
+
+static void cpuid_sysfs_release(struct kobject *kobj)
+{
+}
+
+static struct sysfs_ops cpuid_sysfs_ops = {
+ .show = cpuid_show,
+ .store = cpuid_store,
+};
+
+static struct kobj_type ktype_cpuid = {
+ .sysfs_ops = &cpuid_sysfs_ops,
+ .default_attrs = default_attrs,
+ .release = cpuid_sysfs_release,
+};
+
+static __init_refok int create_cpuid_sysfs(int cpu)
+{
+ struct sys_device *cpu_sys_dev = NULL;
+ u32 data[4];
+ int retval;
+ u32 ext_level = 0x80000000;
+
+ cpu_sys_dev = get_cpu_sysdev(cpu);
+ if (cpu_sys_dev == NULL)
+ return -1;
+
+ cpuid_infos[cpu].kobj.parent = &cpu_sys_dev->kobj;
+ kobject_set_name(&(cpuid_infos[cpu].kobj), "%s", "cpuid");
+ cpuid_infos[cpu].kobj.ktype = &ktype_cpuid;
+
+ do_cpuid(cpu, 0, data);
+ cpuid_infos[cpu].cpu = cpu;
+ cpuid_infos[cpu].cur_level = 0;
+ cpuid_infos[cpu].min_level = 0;
+ cpuid_infos[cpu].max_level = data[0];
+
+ if (is_centaur(data))
+ ext_level = 0xc0000000;
+ else if (is_transmeta(data))
+ ext_level = 0x80860000;
+
+ do_cpuid(cpu, ext_level, data);
+ cpuid_infos[cpu].min_ext_level = ext_level;
+ cpuid_infos[cpu].max_ext_level = data[0];
+
+ retval = kobject_register(&(cpuid_infos[cpu].kobj));
+ if (retval < 0) {
+ cpu_clear(cpu, cpu_map);
+ return retval;
+ }
+ cpu_set(cpu, cpu_map);
+ return 0;
+}
+
+static void remove_cpuid_sysfs(int cpu)
+{
+ if (cpu_isset(cpu, cpu_map)) {
+ cpu_clear(cpu, cpu_map);
+ kobject_unregister(&(cpuid_infos[cpu].kobj));
+ memset(&cpuid_infos[cpu], 0, sizeof(struct cpuid_info));
+ }
+}
+
+static int __init_refok cpuid_sysfs_cpu_callback(struct notifier_block *nfb,
+ unsigned long action,
+ void *hcpu)
+{
+ unsigned int cpu = (unsigned long)hcpu;
+
+ switch (action) {
+ case CPU_ONLINE:
+ case CPU_ONLINE_FROZEN:
+ create_cpuid_sysfs(cpu);
+ break;
+ case CPU_DEAD:
+ case CPU_DEAD_FROZEN:
+ remove_cpuid_sysfs(cpu);
+ break;
+ }
+ return NOTIFY_OK;
+}
+
+static struct notifier_block __initdata_refok cpuid_sysfs_cpu_notifier = {
+ .notifier_call = cpuid_sysfs_cpu_callback,
+};
+
static int __init cpuid_init(void)
{
int i, err = 0;
@@ -195,8 +439,13 @@ static int __init cpuid_init(void)
err = cpuid_device_create(i);
if (err != 0)
goto out_class;
+
+ err = create_cpuid_sysfs(i);
+ if (err != 0)
+ goto out_class;
}
register_hotcpu_notifier(&cpuid_class_cpu_notifier);
+ register_hotcpu_notifier(&cpuid_sysfs_cpu_notifier);

err = 0;
goto out;
@@ -205,6 +454,7 @@ out_class:
i = 0;
for_each_online_cpu(i) {
cpuid_device_destroy(i);
+ remove_cpuid_sysfs(i);
}
class_destroy(cpuid_class);
out_chrdev:
@@ -217,11 +467,14 @@ static void __exit cpuid_exit(void)
{
int cpu = 0;

- for_each_online_cpu(cpu)
+ for_each_online_cpu(cpu) {
cpuid_device_destroy(cpu);
+ remove_cpuid_sysfs(cpu);
+ }
class_destroy(cpuid_class);
unregister_chrdev(CPUID_MAJOR, "cpu/cpuid");
unregister_hotcpu_notifier(&cpuid_class_cpu_notifier);
+ unregister_hotcpu_notifier(&cpuid_sysfs_cpu_notifier);
}

module_init(cpuid_init);

2008-01-30 17:36:48

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [RFC PATCH 2.6.24] x86: add sysfs interface for cpuid module, try 2

Yi Yang wrote:
> This patch is try 2, it should cover VIA/Cyrix and Transmeta Crusoe, please
> help to test it if anybody has VIA/Cyrix and Transmeta Crusoe. This patch just
> makes users get cpuid raw data more conveniently by
> /sys/devices/syste/cpu/cpu*/cpuid/* without using any userspace application.

Still completely useless garbage.

-hpa

2008-01-31 10:34:28

by Yi Yang

[permalink] [raw]
Subject: [PATCH 2.6.24] Add new string functions real_strtoul and change kernel params to use them

Currently, for every sysfs node, the callers will be responsible for
implementing store operation, so many many callers are doing duplicate
things to validate input, they have the same mistakes because they are
calling simple_strtol/ul/ll/ull, especially for module params, they are
just numeric, but you can echo such values as 0x1234xxx, 07777888 and
1234aaa, for these cases, module params store operation just ignores
successive invalid char and converts prefix part to a numeric although
input is actually invalid.

This patch tries to fix the aforementioned issues and implements real_strtox
serial functions, kernel/params.c uses them to strictly validate input,
so module params will reject such values as 0x1234xxxx and returns an error:

write error: Invalid argument

Any modules which export numeric sysfs node can use real_strtox instead of
simple_strtox to reject any invalid input. Please consider to merge to -mm tree
in order to test.

Here are some test results:

Before applying this patch:

[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#


After applying this patch:

[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo -n 4096 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#


Signed-off-by: Yi Yang <[email protected]>
---
include/linux/kernel.h | 4 ++++
kernel/params.c | 20 ++++++++++----------
lib/vsprintf.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 60 insertions(+), 10 deletions(-)

--- a/include/linux/kernel.h 2008-01-31 00:41:46.000000000 +0800
+++ b/include/linux/kernel.h 2008-01-31 01:12:33.000000000 +0800
@@ -141,6 +141,10 @@ extern unsigned long simple_strtoul(cons
extern long simple_strtol(const char *,char **,unsigned int);
extern unsigned long long simple_strtoull(const char *,char **,unsigned int);
extern long long simple_strtoll(const char *,char **,unsigned int);
+extern int real_strtoul(const char *, unsigned int, unsigned long *);
+extern int real_strtol(const char *, unsigned int, long *);
+extern int real_strtoull(const char *, unsigned int, unsigned long long *);
+extern int real_strtoll(const char *, unsigned int, long long *);
extern int sprintf(char * buf, const char * fmt, ...)
__attribute__ ((format (printf, 2, 3)));
extern int vsprintf(char *buf, const char *, va_list)
--- a/lib/vsprintf.c 2008-01-30 08:13:03.000000000 +0800
+++ b/lib/vsprintf.c 2008-01-31 05:19:31.000000000 +0800
@@ -126,6 +126,52 @@ long long simple_strtoll(const char *cp,
return simple_strtoull(cp,endp,base);
}

+#define define_real_strtoux(type, valtype) \
+int real_strtou##type(const char *cp, unsigned int base, valtype *res) \
+{ \
+ char *tail; \
+ valtype val; \
+ size_t len; \
+ \
+ *res = 0; \
+ len = strlen(cp); \
+ if (len == 0) \
+ return -EINVAL; \
+ \
+ val = simple_strtoul(cp, &tail, base); \
+ if ((*tail == '\0') || \
+ (len == (size_t)(tail - cp) + 1) && (*tail == '\n')) { \
+ *res = val; \
+ return 0; \
+ } \
+ \
+ return -EINVAL; \
+} \
+
+#define define_real_strtox(type, valtype) \
+int real_strto##type(const char *cp, unsigned int base, valtype *res) \
+{ \
+ int ret; \
+ if (*cp == '-') { \
+ ret = real_strtou##type(cp+1, base, res); \
+ if (ret != 0) \
+ *res = -(*res); \
+ } else \
+ ret = real_strtou##type(cp+1, base, res); \
+ \
+ return ret; \
+} \
+
+define_real_strtoux(l, unsigned long)
+define_real_strtox(l, long)
+define_real_strtoux(ll, unsigned long long)
+define_real_strtox(ll, long long)
+
+EXPORT_SYMBOL(real_strtoul);
+EXPORT_SYMBOL(real_strtol);
+EXPORT_SYMBOL(real_strtoll);
+EXPORT_SYMBOL(real_strtoull);
+
static int skip_atoi(const char **s)
{
int i=0;
--- a/kernel/params.c 2008-01-31 00:44:44.000000000 +0800
+++ b/kernel/params.c 2008-01-31 01:11:51.000000000 +0800
@@ -180,12 +180,12 @@ int parse_args(const char *name,
#define STANDARD_PARAM_DEF(name, type, format, tmptype, strtolfn) \
int param_set_##name(const char *val, struct kernel_param *kp) \
{ \
- char *endp; \
tmptype l; \
+ int ret; \
\
if (!val) return -EINVAL; \
- l = strtolfn(val, &endp, 0); \
- if (endp == val || ((type)l != l)) \
+ ret = strtolfn(val, 0, &l); \
+ if (ret == -EINVAL || ((type)l != l)) \
return -EINVAL; \
*((type *)kp->arg) = l; \
return 0; \
@@ -195,13 +195,13 @@ int parse_args(const char *name,
return sprintf(buffer, format, *((type *)kp->arg)); \
}

-STANDARD_PARAM_DEF(byte, unsigned char, "%c", unsigned long, simple_strtoul);
-STANDARD_PARAM_DEF(short, short, "%hi", long, simple_strtol);
-STANDARD_PARAM_DEF(ushort, unsigned short, "%hu", unsigned long, simple_strtoul);
-STANDARD_PARAM_DEF(int, int, "%i", long, simple_strtol);
-STANDARD_PARAM_DEF(uint, unsigned int, "%u", unsigned long, simple_strtoul);
-STANDARD_PARAM_DEF(long, long, "%li", long, simple_strtol);
-STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", unsigned long, simple_strtoul);
+STANDARD_PARAM_DEF(byte, unsigned char, "%c", unsigned long, real_strtoul);
+STANDARD_PARAM_DEF(short, short, "%hi", long, real_strtol);
+STANDARD_PARAM_DEF(ushort, unsigned short, "%hu", unsigned long, real_strtoul);
+STANDARD_PARAM_DEF(int, int, "%i", long, real_strtol);
+STANDARD_PARAM_DEF(uint, unsigned int, "%u", unsigned long, real_strtoul);
+STANDARD_PARAM_DEF(long, long, "%li", long, real_strtol);
+STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", unsigned long, real_strtoul);

int param_set_charp(const char *val, struct kernel_param *kp)
{

2008-01-31 17:05:58

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] Add new string functions real_strtoul and change kernel params to use them

On Thu, 31 Jan 2008 09:18:22 +0800 Yi Yang wrote:

> Currently, for every sysfs node, the callers will be responsible for
> implementing store operation, so many many callers are doing duplicate
> things to validate input, they have the same mistakes because they are
> calling simple_strtol/ul/ll/ull, especially for module params, they are
> just numeric, but you can echo such values as 0x1234xxx, 07777888 and
> 1234aaa, for these cases, module params store operation just ignores
> successive invalid char and converts prefix part to a numeric although
> input is actually invalid.
>
> This patch tries to fix the aforementioned issues and implements real_strtox
> serial functions, kernel/params.c uses them to strictly validate input,
> so module params will reject such values as 0x1234xxxx and returns an error:

How about a prefix of safe_ or strict_ or something other than real_?
real_ sounds too much like a real number function string parser.

> write error: Invalid argument
>
> Any modules which export numeric sysfs node can use real_strtox instead of
> simple_strtox to reject any invalid input. Please consider to merge to -mm tree
> in order to test.
>
> Here are some test results:
>
> Before applying this patch:
>
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]#
>
>
> After applying this patch:
>
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
> -bash: echo: write error: Invalid argument
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
> -bash: echo: write error: Invalid argument
> [root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
> -bash: echo: write error: Invalid argument
> [root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
> -bash: echo: write error: Invalid argument
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo -n 4096 > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]#
>
>
> Signed-off-by: Yi Yang <[email protected]>
> ---
> include/linux/kernel.h | 4 ++++
> kernel/params.c | 20 ++++++++++----------
> lib/vsprintf.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 60 insertions(+), 10 deletions(-)
>
> --- a/include/linux/kernel.h 2008-01-31 00:41:46.000000000 +0800
> +++ b/include/linux/kernel.h 2008-01-31 01:12:33.000000000 +0800
> @@ -141,6 +141,10 @@ extern unsigned long simple_strtoul(cons
> extern long simple_strtol(const char *,char **,unsigned int);
> extern unsigned long long simple_strtoull(const char *,char **,unsigned int);
> extern long long simple_strtoll(const char *,char **,unsigned int);
> +extern int real_strtoul(const char *, unsigned int, unsigned long *);
> +extern int real_strtol(const char *, unsigned int, long *);
> +extern int real_strtoull(const char *, unsigned int, unsigned long long *);
> +extern int real_strtoll(const char *, unsigned int, long long *);
> extern int sprintf(char * buf, const char * fmt, ...)
> __attribute__ ((format (printf, 2, 3)));
> extern int vsprintf(char *buf, const char *, va_list)
> --- a/lib/vsprintf.c 2008-01-30 08:13:03.000000000 +0800
> +++ b/lib/vsprintf.c 2008-01-31 05:19:31.000000000 +0800
> @@ -126,6 +126,52 @@ long long simple_strtoll(const char *cp,
> return simple_strtoull(cp,endp,base);
> }
>
> +#define define_real_strtoux(type, valtype) \
> +int real_strtou##type(const char *cp, unsigned int base, valtype *res) \
> +{ \
> + char *tail; \
> + valtype val; \
> + size_t len; \
> + \
> + *res = 0; \
> + len = strlen(cp); \
> + if (len == 0) \
> + return -EINVAL; \
> + \
> + val = simple_strtoul(cp, &tail, base); \
> + if ((*tail == '\0') || \
> + (len == (size_t)(tail - cp) + 1) && (*tail == '\n')) { \
> + *res = val; \
> + return 0; \
> + } \
> + \
> + return -EINVAL; \
> +} \
> +
> +#define define_real_strtox(type, valtype) \
> +int real_strto##type(const char *cp, unsigned int base, valtype *res) \
> +{ \
> + int ret; \
> + if (*cp == '-') { \
> + ret = real_strtou##type(cp+1, base, res); \
> + if (ret != 0) \
> + *res = -(*res); \
> + } else \
> + ret = real_strtou##type(cp+1, base, res); \
> + \
> + return ret; \
> +} \
> +
> +define_real_strtoux(l, unsigned long)
> +define_real_strtox(l, long)
> +define_real_strtoux(ll, unsigned long long)
> +define_real_strtox(ll, long long)
> +
> +EXPORT_SYMBOL(real_strtoul);
> +EXPORT_SYMBOL(real_strtol);
> +EXPORT_SYMBOL(real_strtoll);
> +EXPORT_SYMBOL(real_strtoull);
> +
> static int skip_atoi(const char **s)
> {
> int i=0;
> --- a/kernel/params.c 2008-01-31 00:44:44.000000000 +0800
> +++ b/kernel/params.c 2008-01-31 01:11:51.000000000 +0800
> @@ -180,12 +180,12 @@ int parse_args(const char *name,
> #define STANDARD_PARAM_DEF(name, type, format, tmptype, strtolfn) \
> int param_set_##name(const char *val, struct kernel_param *kp) \
> { \
> - char *endp; \
> tmptype l; \
> + int ret; \
> \
> if (!val) return -EINVAL; \
> - l = strtolfn(val, &endp, 0); \
> - if (endp == val || ((type)l != l)) \
> + ret = strtolfn(val, 0, &l); \
> + if (ret == -EINVAL || ((type)l != l)) \
> return -EINVAL; \
> *((type *)kp->arg) = l; \
> return 0; \
> @@ -195,13 +195,13 @@ int parse_args(const char *name,
> return sprintf(buffer, format, *((type *)kp->arg)); \
> }
>
> -STANDARD_PARAM_DEF(byte, unsigned char, "%c", unsigned long, simple_strtoul);
> -STANDARD_PARAM_DEF(short, short, "%hi", long, simple_strtol);
> -STANDARD_PARAM_DEF(ushort, unsigned short, "%hu", unsigned long, simple_strtoul);
> -STANDARD_PARAM_DEF(int, int, "%i", long, simple_strtol);
> -STANDARD_PARAM_DEF(uint, unsigned int, "%u", unsigned long, simple_strtoul);
> -STANDARD_PARAM_DEF(long, long, "%li", long, simple_strtol);
> -STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", unsigned long, simple_strtoul);
> +STANDARD_PARAM_DEF(byte, unsigned char, "%c", unsigned long, real_strtoul);
> +STANDARD_PARAM_DEF(short, short, "%hi", long, real_strtol);
> +STANDARD_PARAM_DEF(ushort, unsigned short, "%hu", unsigned long, real_strtoul);
> +STANDARD_PARAM_DEF(int, int, "%i", long, real_strtol);
> +STANDARD_PARAM_DEF(uint, unsigned int, "%u", unsigned long, real_strtoul);
> +STANDARD_PARAM_DEF(long, long, "%li", long, real_strtol);
> +STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", unsigned long, real_strtoul);
>
> int param_set_charp(const char *val, struct kernel_param *kp)
> {
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>


---
~Randy

2008-02-01 02:48:22

by Randy Dunlap

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] Add new string functions real_strtoul and change kernel params to use them

On Fri, 01 Feb 2008 00:30:17 +0800 Yi Yang wrote:

> On Thu, 2008-01-31 at 09:03 -0800, Randy Dunlap wrote:
> > On Thu, 31 Jan 2008 09:18:22 +0800 Yi Yang wrote:
> >
> > > Currently, for every sysfs node, the callers will be responsible for
> > > implementing store operation, so many many callers are doing duplicate
> > > things to validate input, they have the same mistakes because they are
> > > calling simple_strtol/ul/ll/ull, especially for module params, they are
> > > just numeric, but you can echo such values as 0x1234xxx, 07777888 and
> > > 1234aaa, for these cases, module params store operation just ignores
> > > successive invalid char and converts prefix part to a numeric although
> > > input is actually invalid.
> > >
> > > This patch tries to fix the aforementioned issues and implements real_strtox
> > > serial functions, kernel/params.c uses them to strictly validate input,
> > > so module params will reject such values as 0x1234xxxx and returns an error:
> >
> > How about a prefix of safe_ or strict_ or something other than real_?
> > real_ sounds too much like a real number function string parser.
> >
> I named it as strict_ at the beginning, but it results in some alignment
> issues checkpatch.pl will always warn, i don't know if warnings will
> make the patch out of the door.
>
> In kernel/params.c, STANDARD_PARAM_DEF(a function definition macro) will
> be over 80 chars, is it correct coding style to split it to two lines?

Yes, if it can be done cleanly. Otherwise just leave it long (IMHO).

---
~Randy

2008-02-01 01:42:19

by Yi Yang

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] Add new string functions real_strtoul and change kernel params to use them

On Thu, 2008-01-31 at 09:03 -0800, Randy Dunlap wrote:
> On Thu, 31 Jan 2008 09:18:22 +0800 Yi Yang wrote:
>
> > Currently, for every sysfs node, the callers will be responsible for
> > implementing store operation, so many many callers are doing duplicate
> > things to validate input, they have the same mistakes because they are
> > calling simple_strtol/ul/ll/ull, especially for module params, they are
> > just numeric, but you can echo such values as 0x1234xxx, 07777888 and
> > 1234aaa, for these cases, module params store operation just ignores
> > successive invalid char and converts prefix part to a numeric although
> > input is actually invalid.
> >
> > This patch tries to fix the aforementioned issues and implements real_strtox
> > serial functions, kernel/params.c uses them to strictly validate input,
> > so module params will reject such values as 0x1234xxxx and returns an error:
>
> How about a prefix of safe_ or strict_ or something other than real_?
> real_ sounds too much like a real number function string parser.
>
I named it as strict_ at the beginning, but it results in some alignment
issues checkpatch.pl will always warn, i don't know if warnings will
make the patch out of the door.

In kernel/params.c, STANDARD_PARAM_DEF(a function definition macro) will
be over 80 chars, is it correct coding style to split it to two lines?

2008-02-01 07:38:33

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] Add new string functions strict_strto* and convert kernel params to use them, take 2

On Fri, 01 Feb 2008 06:11:43 +0800 Yi Yang <[email protected]> wrote:

> This patch is a resend, it changes previous name "real_" to "strict_"
> according to Randy Dunlap's feedback. Please consider to apply. thanks.
>
> Currently, for every sysfs node, the callers will be responsible for
> implementing store operation, so many many callers are doing duplicate
> things to validate input, they have the same mistakes because they are
> calling simple_strtol/ul/ll/uul, especially for module params, they are
> just numeric, but you can echo such values as 0x1234xxx, 07777888 and
> 1234aaa, for these cases, module params store operation just ignores
> succesive invalid char and converts prefix part to a numeric although
> input is acctually invalid.
>
> This patch tries to fix the aforementioned issues and implements strict_strtox
> serial functions, kernel/params.c uses them to strictly validate input,
> so module params will reject such values as 0x1234xxxx and returns an error:
>
> write error: Invalid argument
>
> Any modules which export numeric sysfs node can use strict_strtox instead of
> simple_strtox to reject any invalid input.
>
> Here are some test results:
>
> Before applying this patch:
>
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]#
>
>
> After applying this patch:
>
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
> -bash: echo: write error: Invalid argument
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
> -bash: echo: write error: Invalid argument
> [root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
> -bash: echo: write error: Invalid argument
> [root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
> -bash: echo: write error: Invalid argument
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]# echo -n 4096 > /sys/module/e1000/parameters/copybreak
> [root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
> 4096
> [root@yangyi-dev /]#
>

Oh dear. So if someone has an existing script which does

echo 0x1000g > /sys/module/e1000/parameters/copybreak

we just broke their setup.

This is what happens when we screw up kernel interfaces: we should remain
bug-for-bug compatible with earlier releases. argh, we suck :(

I'm inclined to merge it anyway - everyone already knows we suck, and
not many people will be affected and they suck too.

2008-02-01 07:45:23

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] Add new string functions strict_strto* and convert kernel params to use them, take 2

On Fri, 01 Feb 2008 06:11:43 +0800 Yi Yang <[email protected]> wrote:

> --- a/include/linux/kernel.h 2008-01-31 00:41:46.000000000 +0800
> --- b/include/linux/kernel.h 2008-02-01 04:30:49.000000000 +0800
> @@ -141,6 +141,10 @@ extern unsigned long simple_strtoul(cons
> extern long simple_strtol(const char *,char **,unsigned int);
> extern unsigned long long simple_strtoull(const char *,char **,unsigned int);
> extern long long simple_strtoll(const char *,char **,unsigned int);
> +extern int strict_strtoul(const char *, unsigned int, unsigned long *);
> +extern int strict_strtol(const char *, unsigned int, long *);
> +extern int strict_strtoull(const char *, unsigned int, unsigned long long *);
> +extern int strict_strtoll(const char *, unsigned int, long long *);
> extern int sprintf(char * buf, const char * fmt, ...)
> __attribute__ ((format (printf, 2, 3)));
> extern int vsprintf(char *buf, const char *, va_list)
> --- a/lib/vsprintf.c 2008-01-30 08:13:03.000000000 +0800
> --- b/lib/vsprintf.c 2008-02-01 04:33:27.000000000 +0800
> @@ -126,6 +126,52 @@ long long simple_strtoll(const char *cp,
> return simple_strtoull(cp,endp,base);
> }
>
> +#define define_strict_strtoux(type, valtype) \
> +int strict_strtou##type(const char *cp, unsigned int base, valtype *res)\
> +{ \
> + char *tail; \
> + valtype val; \
> + size_t len; \
> + \
> + *res = 0; \
> + len = strlen(cp); \
> + if (len == 0) \
> + return -EINVAL; \
> + \
> + val = simple_strtoul(cp, &tail, base); \
> + if ((*tail == '\0') || \
> + (len == (size_t)(tail - cp) + 1) && (*tail == '\n')) { \
> + *res = val; \
> + return 0; \
> + } \
> + \
> + return -EINVAL; \
> +} \
> +
> +#define define_strict_strtox(type, valtype) \
> +int strict_strto##type(const char *cp, unsigned int base, valtype *res) \
> +{ \
> + int ret; \
> + if (*cp == '-') { \
> + ret = strict_strtou##type(cp+1, base, res); \
> + if (ret != 0) \
> + *res = -(*res); \
> + } else \
> + ret = strict_strtou##type(cp+1, base, res); \
> + \
> + return ret; \
> +} \
> +
> +define_strict_strtoux(l, unsigned long)
> +define_strict_strtox(l, long)
> +define_strict_strtoux(ll, unsigned long long)
> +define_strict_strtox(ll, long long)
> +
> +EXPORT_SYMBOL(strict_strtoul);
> +EXPORT_SYMBOL(strict_strtol);
> +EXPORT_SYMBOL(strict_strtoll);
> +EXPORT_SYMBOL(strict_strtoull);

These are global, exported-to-modules library functions. Documentation is
compulsory (as far as I'm concerned).

Could you please add a comment block in there which at least explains how
they differ from the other strto* functions?

2008-02-01 07:47:27

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] Add new string functions strict_strto* and convert kernel params to use them, take 2

On Fri, 01 Feb 2008 06:11:43 +0800 Yi Yang <[email protected]> wrote:

> --- a/include/linux/kernel.h 2008-01-31 00:41:46.000000000 +0800
> --- b/include/linux/kernel.h 2008-02-01 04:30:49.000000000 +0800

This isn't a patch. I wonder how that happened?

2008-02-01 09:16:59

by Yi Yang

[permalink] [raw]
Subject: [PATCH 2.6.24] Add new string functions strict_strto* and convert kernel params to use them, take 3

Andrew, i'm really very very sorry for those mistakes, here is the latest, it
adds documentation for every new function, please use it to replace the patch
in -mm tree you added just now:

add-new-string-functions-strict_strto-and-convert-kernel-params-to-use-them.patch

Thank you, Andrew and Randy.

Currently, for every sysfs node, the callers will be responsible for
implementing store operation, so many many callers are doing duplicate
things to validate input, they have the same mistakes because they are
calling simple_strtol/ul/ll/uul, especially for module params, they are
just numeric, but you can echo such values as 0x1234xxx, 07777888 and
1234aaa, for these cases, module params store operation just ignores
succesive invalid char and converts prefix part to a numeric although
input is acctually invalid.

This patch tries to fix the aforementioned issues and implements strict_strtox
serial functions, kernel/params.c uses them to strictly validate input,
so module params will reject such values as 0x1234xxxx and returns an error:

write error: Invalid argument

Any modules which export numeric sysfs node can use strict_strtox instead of
simple_strtox to reject any invalid input. Please consider to merge to -mm tree
in order to test.

Here are some test results:

Before applying this patch:

[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#


After applying this patch:

[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo -n 4096 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#


Signed-off-by: Yi Yang <[email protected]>
---
include/linux/kernel.h | 4 +
kernel/params.c | 20 +++----
lib/vsprintf.c | 123 +++++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 137 insertions(+), 10 deletions(-)

--- a/include/linux/kernel.h 2008-01-31 00:41:46.000000000 +0800
+++ b/include/linux/kernel.h 2008-02-01 04:30:49.000000000 +0800
@@ -141,6 +141,10 @@ extern unsigned long simple_strtoul(cons
extern long simple_strtol(const char *,char **,unsigned int);
extern unsigned long long simple_strtoull(const char *,char **,unsigned int);
extern long long simple_strtoll(const char *,char **,unsigned int);
+extern int strict_strtoul(const char *, unsigned int, unsigned long *);
+extern int strict_strtol(const char *, unsigned int, long *);
+extern int strict_strtoull(const char *, unsigned int, unsigned long long *);
+extern int strict_strtoll(const char *, unsigned int, long long *);
extern int sprintf(char * buf, const char * fmt, ...)
__attribute__ ((format (printf, 2, 3)));
extern int vsprintf(char *buf, const char *, va_list)
--- a/lib/vsprintf.c 2008-01-30 08:13:03.000000000 +0800
+++ b/lib/vsprintf.c 2008-02-01 07:36:57.000000000 +0800
@@ -126,6 +126,129 @@ long long simple_strtoll(const char *cp,
return simple_strtoull(cp,endp,base);
}

+
+/**
+ * strict_strtoul - convert a string to an unsigned long strictly
+ * @cp: The string to be converted
+ * @base: The number base to use
+ * @res: The converted result value
+ *
+ * strict_strtoul converts a string to an unsigned long only if the
+ * string is really an unsigned long string, any string containing
+ * any invalid char at the tail will be rejected and -EINVAL is returned,
+ * only a newline char at the tail is acceptible because people generally
+ * change a module parameter in the following way:
+ *
+ * echo 1024 > /sys/module/e1000/parameters/copybreak
+ *
+ * echo will append a newline to the tail.
+ *
+ * It returns 0 if conversion is successful and *res is set to the converted
+ * value, otherwise it returns -EINVAL and *res is set to 0.
+ *
+ * simple_strtoul just ignores the successive invalid characters and
+ * return the converted value of prefix part of the string.
+ */
+int strict_strtoul(const char *cp, unsigned int base, unsigned long *res);
+
+/**
+ * strict_strtol - convert a string to a long strictly
+ * @cp: The string to be converted
+ * @base: The number base to use
+ * @res: The converted result value
+ *
+ * strict_strtol is similiar to strict_strtoul, but it allows the first
+ * character of a string is '-'.
+ *
+ * It returns 0 if conversion is successful and *res is set to the converted
+ * value, otherwise it returns -EINVAL and *res is set to 0.
+ */
+int strict_strtol(const char *cp, unsigned int base, long *res);
+
+/**
+ * strict_strtoull - convert a string to an unsigned long long strictly
+ * @cp: The string to be converted
+ * @base: The number base to use
+ * @res: The converted result value
+ *
+ * strict_strtoull converts a string to an unsigned long long only if the
+ * string is really an unsigned long long string, any string containing
+ * any invalid char at the tail will be rejected and -EINVAL is returned,
+ * only a newline char at the tail is acceptible because people generally
+ * change a module parameter in the following way:
+ *
+ * echo 1024 > /sys/module/e1000/parameters/copybreak
+ *
+ * echo will append a newline to the tail of the string.
+ *
+ * It returns 0 if conversion is successful and *res is set to the converted
+ * value, otherwise it returns -EINVAL and *res is set to 0.
+ *
+ * simple_strtoull just ignores the successive invalid characters and
+ * return the converted value of prefix part of the string.
+ */
+int strict_strtoull(const char *cp, unsigned int base, unsigned long long *res);
+
+/**
+ * strict_strtoll - convert a string to a long long strictly
+ * @cp: The string to be converted
+ * @base: The number base to use
+ * @res: The converted result value
+ *
+ * strict_strtoll is similiar to strict_strtoull, but it allows the first
+ * character of a string is '-'.
+ *
+ * It returns 0 if conversion is successful and *res is set to the converted
+ * value, otherwise it returns -EINVAL and *res is set to 0.
+ */
+int strict_strtoll(const char *cp, unsigned int base, long long *res);
+
+#define define_strict_strtoux(type, valtype) \
+int strict_strtou##type(const char *cp, unsigned int base, valtype *res)\
+{ \
+ char *tail; \
+ valtype val; \
+ size_t len; \
+ \
+ *res = 0; \
+ len = strlen(cp); \
+ if (len == 0) \
+ return -EINVAL; \
+ \
+ val = simple_strtoul(cp, &tail, base); \
+ if ((*tail == '\0') || \
+ (len == (size_t)(tail - cp) + 1) && (*tail == '\n')) { \
+ *res = val; \
+ return 0; \
+ } \
+ \
+ return -EINVAL; \
+} \
+
+#define define_strict_strtox(type, valtype) \
+int strict_strto##type(const char *cp, unsigned int base, valtype *res) \
+{ \
+ int ret; \
+ if (*cp == '-') { \
+ ret = strict_strtou##type(cp+1, base, res); \
+ if (ret != 0) \
+ *res = -(*res); \
+ } else \
+ ret = strict_strtou##type(cp+1, base, res); \
+ \
+ return ret; \
+} \
+
+define_strict_strtoux(l, unsigned long)
+define_strict_strtox(l, long)
+define_strict_strtoux(ll, unsigned long long)
+define_strict_strtox(ll, long long)
+
+EXPORT_SYMBOL(strict_strtoul);
+EXPORT_SYMBOL(strict_strtol);
+EXPORT_SYMBOL(strict_strtoll);
+EXPORT_SYMBOL(strict_strtoull);
+
static int skip_atoi(const char **s)
{
int i=0;
--- a/kernel/params.c 2008-01-31 00:44:44.000000000 +0800
+++ b/kernel/params.c 2008-02-01 04:32:12.000000000 +0800
@@ -180,12 +180,12 @@ int parse_args(const char *name,
#define STANDARD_PARAM_DEF(name, type, format, tmptype, strtolfn) \
int param_set_##name(const char *val, struct kernel_param *kp) \
{ \
- char *endp; \
tmptype l; \
+ int ret; \
\
if (!val) return -EINVAL; \
- l = strtolfn(val, &endp, 0); \
- if (endp == val || ((type)l != l)) \
+ ret = strtolfn(val, 0, &l); \
+ if (ret == -EINVAL || ((type)l != l)) \
return -EINVAL; \
*((type *)kp->arg) = l; \
return 0; \
@@ -195,13 +195,13 @@ int parse_args(const char *name,
return sprintf(buffer, format, *((type *)kp->arg)); \
}

-STANDARD_PARAM_DEF(byte, unsigned char, "%c", unsigned long, simple_strtoul);
-STANDARD_PARAM_DEF(short, short, "%hi", long, simple_strtol);
-STANDARD_PARAM_DEF(ushort, unsigned short, "%hu", unsigned long, simple_strtoul);
-STANDARD_PARAM_DEF(int, int, "%i", long, simple_strtol);
-STANDARD_PARAM_DEF(uint, unsigned int, "%u", unsigned long, simple_strtoul);
-STANDARD_PARAM_DEF(long, long, "%li", long, simple_strtol);
-STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", unsigned long, simple_strtoul);
+STANDARD_PARAM_DEF(byte, unsigned char, "%c", unsigned long, strict_strtoul);
+STANDARD_PARAM_DEF(short, short, "%hi", long, strict_strtol);
+STANDARD_PARAM_DEF(ushort, unsigned short, "%hu", unsigned long, strict_strtoul);
+STANDARD_PARAM_DEF(int, int, "%i", long, strict_strtol);
+STANDARD_PARAM_DEF(uint, unsigned int, "%u", unsigned long, strict_strtoul);
+STANDARD_PARAM_DEF(long, long, "%li", long, strict_strtol);
+STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", unsigned long, strict_strtoul);

int param_set_charp(const char *val, struct kernel_param *kp)
{

2008-02-01 16:31:28

by dean gaudet

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

why do we need another kernel cpuid reading method when sched_setaffinity
exists and cpuid is available in ring3?

-dean

2008-02-01 17:57:30

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

dean gaudet wrote:
> why do we need another kernel cpuid reading method when sched_setaffinity
> exists and cpuid is available in ring3?

Because /dev/cpu/*/cpuid:

a) predates sched_setaffinity by quite a few years
b) is already widely used
c) doesn't have issues with relative priorities (using sched_setaffinity
can easily wait forever.)
d) can be accessed by unprivileged processes given appropriate device
info.

-hpa

2008-02-01 17:59:28

by H. Peter Anvin

[permalink] [raw]
Subject: Re: [PATCH 2.6.24] x86: add sysfs interface for cpuid module

H. Peter Anvin wrote:
> dean gaudet wrote:
>> why do we need another kernel cpuid reading method when
>> sched_setaffinity exists and cpuid is available in ring3?
>
> Because /dev/cpu/*/cpuid:
>
> a) predates sched_setaffinity by quite a few years
> b) is already widely used
> c) doesn't have issues with relative priorities (using sched_setaffinity
> can easily wait forever.)
> d) can be accessed by unprivileged processes given appropriate device
> info.
>

(a) and (b) obviously doesn't apply to the proposed sysfs hack; nor does
it give the administrator the control given by (d). The sysfs hack is
pointless.

-hpa

2008-02-15 08:58:04

by Yi Yang

[permalink] [raw]
Subject: [PATCH 2.6.25-rc1] cpufreq: fix cpufreq policy refcount imbalance

When one cpu is set to offline, the caller process will hang, according to
the trace data, the problem lies in the refcount error in cpufreq driver,
cpufreq_cpu_callback will wait for completion policy->kobj_unregister
which is nerver completed because a refcount error in function
__cpufreq_remove_dev in file driver/cpufreq/cpufreq.c results in not
calling kobject release method.

In driver/cpufreq/cpufreq.c, the refcount of data->kobj isn't 1 when it
will be unregistered, this problem didn't exist in 2.6.24 and earlier.

The root cause is kobject API switch, kobject_init_and_add and kobject_put
replace older kobject_register and kobject_unregister in 2.6.25-rc1,
compared to 2.6.24, kobject_unregister is deleted in function
__cpufreq_remove_dev but it isn't replaced with kobject_put.

This patch adds kobject_put to balance refcount. I noticed Greg suggests
it will fix a power-off issue to remove kobject_get statement block, but i
think that isn't the best way because those code block has existed very long
and it is helpful because the successive statements are invoking relevant
data.


Signed-off-by: Yi Yang <[email protected]>
---
cpufreq.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/drivers/cpufreq/cpufreq.c 2008-02-15 04:41:29.000000000 +0800
+++ b/drivers/cpufreq/cpufreq.c 2008-02-15 06:56:56.000000000 +0800
@@ -1057,12 +1057,17 @@ static int __cpufreq_remove_dev (struct

unlock_policy_rwsem_write(cpu);

+ /* it matches the previous kobject_get */
kobject_put(&data->kobj);

/* we need to make sure that the underlying kobj is actually
* not referenced anymore by anybody before we proceed with
* unloading.
*/
+
+ /* unregister data->kobj, it matches kobject_init_and_add */
+ kobject_put(&data->kobj);
+
dprintk("waiting for dropping of refcount\n");
wait_for_completion(&data->kobj_unregister);
dprintk("wait complete\n");

2008-02-15 09:09:45

by Yi Yang

[permalink] [raw]
Subject: [PATCH 2.6.25-rc1] cpufreq: fix cpufreq policy refcount imbalance

When one cpu is set to offline, the caller process will hang, according to
the trace data, the problem lies in the refcount error in cpufreq driver,
cpufreq_cpu_callback will wait for completion policy->kobj_unregister
which is nerver completed because a refcount error in function
__cpufreq_remove_dev in file driver/cpufreq/cpufreq.c results in not
calling kobject release method.

In driver/cpufreq/cpufreq.c, the refcount of data->kobj isn't 1 when it
will be unregistered, this problem didn't exist in 2.6.24 and earlier.

The root cause is kobject API switch, kobject_init_and_add and kobject_put
replace older kobject_register and kobject_unregister in 2.6.25-rc1,
compared to 2.6.24, kobject_unregister is deleted in function
__cpufreq_remove_dev but it isn't replaced with kobject_put.

This patch adds kobject_put to balance refcount. I noticed Greg suggests
it will fix a power-off issue to remove kobject_get statement block, but i
think that isn't the best way because those code block has existed very long
and it is helpful because the successive statements are invoking relevant
data.


Signed-off-by: Yi Yang <[email protected]>
---
cpufreq.c | 5 +++++
1 file changed, 5 insertions(+)

--- a/drivers/cpufreq/cpufreq.c 2008-02-15 04:41:29.000000000 +0800
+++ b/drivers/cpufreq/cpufreq.c 2008-02-15 06:56:56.000000000 +0800
@@ -1057,12 +1057,17 @@ static int __cpufreq_remove_dev (struct

unlock_policy_rwsem_write(cpu);

+ /* it matches the previous kobject_get */
kobject_put(&data->kobj);

/* we need to make sure that the underlying kobj is actually
* not referenced anymore by anybody before we proceed with
* unloading.
*/
+
+ /* unregister data->kobj, it matches kobject_init_and_add */
+ kobject_put(&data->kobj);
+
dprintk("waiting for dropping of refcount\n");
wait_for_completion(&data->kobj_unregister);
dprintk("wait complete\n");

2008-02-15 15:53:10

by Alan Stern

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH 2.6.25-rc1] cpufreq: fix cpufreq policy refcount imbalance

On Fri, 15 Feb 2008, Yi Yang wrote:

> This patch adds kobject_put to balance refcount. I noticed Greg suggests
> it will fix a power-off issue to remove kobject_get statement block, but i
> think that isn't the best way because those code block has existed very long
> and it is helpful because the successive statements are invoking relevant
> data.

Are you referring to this section of code (before the region affected
by your patch)?

if (!kobject_get(&data->kobj)) {
spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
cpufreq_debug_enable_ratelimit();
unlock_policy_rwsem_write(cpu);
return -EFAULT;
}

Greg is correct that the kobject_get() here is useless and should be
removed. kobject_get() never returns NULL unless its argument is NULL.
Since &data->kobj can never be NULL, the "if" test will never fail.
Hence there's no point in making the test at all.

The fact that a section of code has existed for a long time doesn't
mean that it is right. :-)

Furthermore, there's no reason to do the kobject_get(). Holding 2
references to a kobject is no better than holding just 1 reference.
Assuming you know that the kobject is still registered, then you also
know that there is already a reference to it. So you have no reason to
take an additional reference.

Alan Stern

2008-02-15 18:26:41

by Greg KH

[permalink] [raw]
Subject: Re: [linux-pm] [PATCH 2.6.25-rc1] cpufreq: fix cpufreq policy refcount imbalance

On Fri, Feb 15, 2008 at 10:52:51AM -0500, Alan Stern wrote:
> On Fri, 15 Feb 2008, Yi Yang wrote:
>
> > This patch adds kobject_put to balance refcount. I noticed Greg suggests
> > it will fix a power-off issue to remove kobject_get statement block, but i
> > think that isn't the best way because those code block has existed very long
> > and it is helpful because the successive statements are invoking relevant
> > data.
>
> Are you referring to this section of code (before the region affected
> by your patch)?
>
> if (!kobject_get(&data->kobj)) {
> spin_unlock_irqrestore(&cpufreq_driver_lock, flags);
> cpufreq_debug_enable_ratelimit();
> unlock_policy_rwsem_write(cpu);
> return -EFAULT;
> }
>
> Greg is correct that the kobject_get() here is useless and should be
> removed. kobject_get() never returns NULL unless its argument is NULL.
> Since &data->kobj can never be NULL, the "if" test will never fail.
> Hence there's no point in making the test at all.
>
> The fact that a section of code has existed for a long time doesn't
> mean that it is right. :-)
>
> Furthermore, there's no reason to do the kobject_get(). Holding 2
> references to a kobject is no better than holding just 1 reference.
> Assuming you know that the kobject is still registered, then you also
> know that there is already a reference to it. So you have no reason to
> take an additional reference.

There's the additional problem that this second reference count is never
dropped, causing a bug :)

thanks,

greg k-h

2008-02-15 21:07:32

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH 2.6.25-rc1] cpufreq: fix cpufreq policy refcount imbalance

On Fri, Feb 15, 2008 at 07:48:41AM +0800, Yi Yang wrote:
> When one cpu is set to offline, the caller process will hang, according to
> the trace data, the problem lies in the refcount error in cpufreq driver,
> cpufreq_cpu_callback will wait for completion policy->kobj_unregister
> which is nerver completed because a refcount error in function
> __cpufreq_remove_dev in file driver/cpufreq/cpufreq.c results in not
> calling kobject release method.
>
> In driver/cpufreq/cpufreq.c, the refcount of data->kobj isn't 1 when it
> will be unregistered, this problem didn't exist in 2.6.24 and earlier.
>
> The root cause is kobject API switch, kobject_init_and_add and kobject_put
> replace older kobject_register and kobject_unregister in 2.6.25-rc1,
> compared to 2.6.24, kobject_unregister is deleted in function
> __cpufreq_remove_dev but it isn't replaced with kobject_put.
>
> This patch adds kobject_put to balance refcount. I noticed Greg suggests
> it will fix a power-off issue to remove kobject_get statement block, but i
> think that isn't the best way because those code block has existed very long
> and it is helpful because the successive statements are invoking relevant
> data.
>
>
> Signed-off-by: Yi Yang <[email protected]>

No, the additional kobject_get() needs to be removed. I posted a patch
for this last night, and so did someone else earlier at:
http://lkml.org/lkml/2008/2/8/342

this patch should not be added, I'll get the other one in instead.

thanks,

greg k-h

2008-02-25 10:07:31

by Yi Yang

[permalink] [raw]
Subject: [PATCH 2.6.25-rc3] cpuidle: fix cpuidle time and usage overflow

cpuidle C-state sysfs node time and usage are very easy to overflow because
they are all of unsigned int type, time will overflow within about two hours,
usage will take longer time to overflow, but they are increasing for ever.

This patch will convert them to unsigned long long.


Signed-off-by: Yi Yang <[email protected]>
---
drivers/cpuidle/cpuidle.c | 2 +-
drivers/cpuidle/sysfs.c | 10 ++++++++--
include/linux/cpuidle.h | 4 ++--

--- a/include/linux/cpuidle.h 2008-02-25 02:31:26.000000000 -0500
+++ b/include/linux/cpuidle.h 2008-02-25 04:30:24.000000000 -0500
@@ -38,8 +38,8 @@ struct cpuidle_state {
unsigned int power_usage; /* in mW */
unsigned int target_residency; /* in US */

- unsigned int usage;
- unsigned int time; /* in US */
+ unsigned long long usage;
+ unsigned long long time; /* in US */

int (*enter) (struct cpuidle_device *dev,
struct cpuidle_state *state);
--- a/drivers/cpuidle/cpuidle.c 2008-02-25 02:37:14.000000000 -0500
+++ b/drivers/cpuidle/cpuidle.c 2008-02-25 04:29:19.000000000 -0500
@@ -67,7 +67,7 @@ static void cpuidle_idle_call(void)
/* enter the state and update stats */
dev->last_residency = target_state->enter(dev, target_state);
dev->last_state = target_state;
- target_state->time += dev->last_residency;
+ target_state->time += (unsigned long long)dev->last_residency;
target_state->usage++;

/* give the governor an opportunity to reflect on the outcome */
--- a/drivers/cpuidle/sysfs.c 2008-02-25 02:33:14.000000000 -0500
+++ b/drivers/cpuidle/sysfs.c 2008-02-25 03:10:50.000000000 -0500
@@ -218,6 +218,12 @@ static ssize_t show_state_##_name(struct
return sprintf(buf, "%u\n", state->_name);\
}

+#define define_show_state_ull_function(_name) \
+static ssize_t show_state_##_name(struct cpuidle_state *state, char *buf) \
+{ \
+ return sprintf(buf, "%llu\n", state->_name);\
+}
+
#define define_show_state_str_function(_name) \
static ssize_t show_state_##_name(struct cpuidle_state *state, char *buf) \
{ \
@@ -228,8 +234,8 @@ static ssize_t show_state_##_name(struct

define_show_state_function(exit_latency)
define_show_state_function(power_usage)
-define_show_state_function(usage)
-define_show_state_function(time)
+define_show_state_ull_function(usage)
+define_show_state_ull_function(time)
define_show_state_str_function(name)
define_show_state_str_function(desc)


2008-02-25 10:16:29

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 2.6.25-rc3] cpuidle: fix cpuidle time and usage overflow


* Yi Yang <[email protected]> wrote:

> cpuidle C-state sysfs node time and usage are very easy to overflow
> because they are all of unsigned int type, time will overflow within
> about two hours, usage will take longer time to overflow, but they are
> increasing for ever.
>
> This patch will convert them to unsigned long long.

what happens if such an overflow happens - any particular regression or
other misbehavior, or just funny looking stats in sysfs?

Ingo

2008-02-25 10:26:21

by Yi Yang

[permalink] [raw]
Subject: Re: [PATCH 2.6.25-rc3] cpuidle: fix cpuidle time and usage overflow

On Mon, 2008-02-25 at 11:15 +0100, Ingo Molnar wrote:
> * Yi Yang <[email protected]> wrote:
>
> > cpuidle C-state sysfs node time and usage are very easy to overflow
> > because they are all of unsigned int type, time will overflow within
> > about two hours, usage will take longer time to overflow, but they are
> > increasing for ever.
> >
> > This patch will convert them to unsigned long long.
>
> what happens if such an overflow happens - any particular regression or
> other misbehavior, or just funny looking stats in sysfs?
They are just stats info in sysfs, cpuidle's behaviors don't depend on
them. I didn't notice any regression or misbehaviors.

>
> Ingo
> -
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-02-01 07:23:46

by Yi Yang

[permalink] [raw]
Subject: [PATCH 2.6.24] Add new string functions strict_strto* and convert kernel params to use them, take 2

This patch is a resend, it changes previous name "real_" to "strict_"
according to Randy Dunlap's feedback. Please consider to apply. thanks.

Currently, for every sysfs node, the callers will be responsible for
implementing store operation, so many many callers are doing duplicate
things to validate input, they have the same mistakes because they are
calling simple_strtol/ul/ll/uul, especially for module params, they are
just numeric, but you can echo such values as 0x1234xxx, 07777888 and
1234aaa, for these cases, module params store operation just ignores
succesive invalid char and converts prefix part to a numeric although
input is acctually invalid.

This patch tries to fix the aforementioned issues and implements strict_strtox
serial functions, kernel/params.c uses them to strictly validate input,
so module params will reject such values as 0x1234xxxx and returns an error:

write error: Invalid argument

Any modules which export numeric sysfs node can use strict_strtox instead of
simple_strtox to reject any invalid input.

Here are some test results:

Before applying this patch:

[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#


After applying this patch:

[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000g > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo 0x1000gggggggg > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# echo 0100008 > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# echo 010000aaaaa > /sys/module/e1000/parameters/copybreak
-bash: echo: write error: Invalid argument
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]# echo -n 4096 > /sys/module/e1000/parameters/copybreak
[root@yangyi-dev /]# cat /sys/module/e1000/parameters/copybreak
4096
[root@yangyi-dev /]#


Signed-off-by: Yi Yang <[email protected]>
---
include/linux/kernel.h | 4 ++++
kernel/params.c | 20 ++++++++++----------
lib/vsprintf.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 60 insertions(+), 10 deletions(-)

--- a/include/linux/kernel.h 2008-01-31 00:41:46.000000000 +0800
--- b/include/linux/kernel.h 2008-02-01 04:30:49.000000000 +0800
@@ -141,6 +141,10 @@ extern unsigned long simple_strtoul(cons
extern long simple_strtol(const char *,char **,unsigned int);
extern unsigned long long simple_strtoull(const char *,char **,unsigned int);
extern long long simple_strtoll(const char *,char **,unsigned int);
+extern int strict_strtoul(const char *, unsigned int, unsigned long *);
+extern int strict_strtol(const char *, unsigned int, long *);
+extern int strict_strtoull(const char *, unsigned int, unsigned long long *);
+extern int strict_strtoll(const char *, unsigned int, long long *);
extern int sprintf(char * buf, const char * fmt, ...)
__attribute__ ((format (printf, 2, 3)));
extern int vsprintf(char *buf, const char *, va_list)
--- a/lib/vsprintf.c 2008-01-30 08:13:03.000000000 +0800
--- b/lib/vsprintf.c 2008-02-01 04:33:27.000000000 +0800
@@ -126,6 +126,52 @@ long long simple_strtoll(const char *cp,
return simple_strtoull(cp,endp,base);
}

+#define define_strict_strtoux(type, valtype) \
+int strict_strtou##type(const char *cp, unsigned int base, valtype *res)\
+{ \
+ char *tail; \
+ valtype val; \
+ size_t len; \
+ \
+ *res = 0; \
+ len = strlen(cp); \
+ if (len == 0) \
+ return -EINVAL; \
+ \
+ val = simple_strtoul(cp, &tail, base); \
+ if ((*tail == '\0') || \
+ (len == (size_t)(tail - cp) + 1) && (*tail == '\n')) { \
+ *res = val; \
+ return 0; \
+ } \
+ \
+ return -EINVAL; \
+} \
+
+#define define_strict_strtox(type, valtype) \
+int strict_strto##type(const char *cp, unsigned int base, valtype *res) \
+{ \
+ int ret; \
+ if (*cp == '-') { \
+ ret = strict_strtou##type(cp+1, base, res); \
+ if (ret != 0) \
+ *res = -(*res); \
+ } else \
+ ret = strict_strtou##type(cp+1, base, res); \
+ \
+ return ret; \
+} \
+
+define_strict_strtoux(l, unsigned long)
+define_strict_strtox(l, long)
+define_strict_strtoux(ll, unsigned long long)
+define_strict_strtox(ll, long long)
+
+EXPORT_SYMBOL(strict_strtoul);
+EXPORT_SYMBOL(strict_strtol);
+EXPORT_SYMBOL(strict_strtoll);
+EXPORT_SYMBOL(strict_strtoull);
+
static int skip_atoi(const char **s)
{
int i=0;
--- a/kernel/params.c 2008-01-31 00:44:44.000000000 +0800
--- b/kernel/params.c 2008-02-01 04:32:12.000000000 +0800
@@ -180,12 +180,12 @@ int parse_args(const char *name,
#define STANDARD_PARAM_DEF(name, type, format, tmptype, strtolfn) \
int param_set_##name(const char *val, struct kernel_param *kp) \
{ \
- char *endp; \
tmptype l; \
+ int ret; \
\
if (!val) return -EINVAL; \
- l = strtolfn(val, &endp, 0); \
- if (endp == val || ((type)l != l)) \
+ ret = strtolfn(val, 0, &l); \
+ if (ret == -EINVAL || ((type)l != l)) \
return -EINVAL; \
*((type *)kp->arg) = l; \
return 0; \
@@ -195,13 +195,13 @@ int parse_args(const char *name,
return sprintf(buffer, format, *((type *)kp->arg)); \
}

-STANDARD_PARAM_DEF(byte, unsigned char, "%c", unsigned long, simple_strtoul);
-STANDARD_PARAM_DEF(short, short, "%hi", long, simple_strtol);
-STANDARD_PARAM_DEF(ushort, unsigned short, "%hu", unsigned long, simple_strtoul);
-STANDARD_PARAM_DEF(int, int, "%i", long, simple_strtol);
-STANDARD_PARAM_DEF(uint, unsigned int, "%u", unsigned long, simple_strtoul);
-STANDARD_PARAM_DEF(long, long, "%li", long, simple_strtol);
-STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", unsigned long, simple_strtoul);
+STANDARD_PARAM_DEF(byte, unsigned char, "%c", unsigned long, strict_strtoul);
+STANDARD_PARAM_DEF(short, short, "%hi", long, strict_strtol);
+STANDARD_PARAM_DEF(ushort, unsigned short, "%hu", unsigned long, strict_strtoul);
+STANDARD_PARAM_DEF(int, int, "%i", long, strict_strtol);
+STANDARD_PARAM_DEF(uint, unsigned int, "%u", unsigned long, strict_strtoul);
+STANDARD_PARAM_DEF(long, long, "%li", long, strict_strtol);
+STANDARD_PARAM_DEF(ulong, unsigned long, "%lu", unsigned long, strict_strtoul);

int param_set_charp(const char *val, struct kernel_param *kp)
{

2008-03-26 05:04:19

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH 2.6.25-rc3] cpuidle: fix cpuidle time and usage overflow

applied

thanks,
-len

On Sunday 24 February 2008, Yi Yang wrote:
> cpuidle C-state sysfs node time and usage are very easy to overflow because
> they are all of unsigned int type, time will overflow within about two hours,
> usage will take longer time to overflow, but they are increasing for ever.
>
> This patch will convert them to unsigned long long.
>
>
> Signed-off-by: Yi Yang <[email protected]>
> ---
> drivers/cpuidle/cpuidle.c | 2 +-
> drivers/cpuidle/sysfs.c | 10 ++++++++--
> include/linux/cpuidle.h | 4 ++--
>
> --- a/include/linux/cpuidle.h 2008-02-25 02:31:26.000000000 -0500
> +++ b/include/linux/cpuidle.h 2008-02-25 04:30:24.000000000 -0500
> @@ -38,8 +38,8 @@ struct cpuidle_state {
> unsigned int power_usage; /* in mW */
> unsigned int target_residency; /* in US */
>
> - unsigned int usage;
> - unsigned int time; /* in US */
> + unsigned long long usage;
> + unsigned long long time; /* in US */
>
> int (*enter) (struct cpuidle_device *dev,
> struct cpuidle_state *state);
> --- a/drivers/cpuidle/cpuidle.c 2008-02-25 02:37:14.000000000 -0500
> +++ b/drivers/cpuidle/cpuidle.c 2008-02-25 04:29:19.000000000 -0500
> @@ -67,7 +67,7 @@ static void cpuidle_idle_call(void)
> /* enter the state and update stats */
> dev->last_residency = target_state->enter(dev, target_state);
> dev->last_state = target_state;
> - target_state->time += dev->last_residency;
> + target_state->time += (unsigned long long)dev->last_residency;
> target_state->usage++;
>
> /* give the governor an opportunity to reflect on the outcome */
> --- a/drivers/cpuidle/sysfs.c 2008-02-25 02:33:14.000000000 -0500
> +++ b/drivers/cpuidle/sysfs.c 2008-02-25 03:10:50.000000000 -0500
> @@ -218,6 +218,12 @@ static ssize_t show_state_##_name(struct
> return sprintf(buf, "%u\n", state->_name);\
> }
>
> +#define define_show_state_ull_function(_name) \
> +static ssize_t show_state_##_name(struct cpuidle_state *state, char *buf) \
> +{ \
> + return sprintf(buf, "%llu\n", state->_name);\
> +}
> +
> #define define_show_state_str_function(_name) \
> static ssize_t show_state_##_name(struct cpuidle_state *state, char *buf) \
> { \
> @@ -228,8 +234,8 @@ static ssize_t show_state_##_name(struct
>
> define_show_state_function(exit_latency)
> define_show_state_function(power_usage)
> -define_show_state_function(usage)
> -define_show_state_function(time)
> +define_show_state_ull_function(usage)
> +define_show_state_ull_function(time)
> define_show_state_str_function(name)
> define_show_state_str_function(desc)
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>