From: Fenghua Yu <[email protected]>
In current coretemp driver, when a CPU in dev_list is hot-removed, although its
HT sibling is still running, its core sensor is gone and not available to user
level application any more.
When a CPU is hot-removed, its core sensor should be still available to upper
level application as long as the hot-removed CPU's HT sibling is still running.
A core sensor is invisible to user level only when all of siblings in a core are
hot-removed.
This patch fixes this issue.
Signed-off-by: Fenghua Yu <[email protected]>
---
drivers/hwmon/coretemp.c | 25 ++++++++++++++++++++++---
1 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index c070c97..2257cc4 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -418,7 +418,7 @@ struct pdev_entry {
static LIST_HEAD(pdev_list);
static DEFINE_MUTEX(pdev_list_mutex);
-static int __cpuinit coretemp_device_add(unsigned int cpu)
+static int coretemp_device_add(unsigned int cpu)
{
int err;
struct platform_device *pdev;
@@ -483,15 +483,34 @@ exit:
static void coretemp_device_remove(unsigned int cpu)
{
struct pdev_entry *p, *n;
- mutex_lock(&pdev_list_mutex);
+#ifdef CONFIG_SMP
+ int s;
+#endif
+
list_for_each_entry_safe(p, n, &pdev_list, list) {
if (p->cpu == cpu) {
+ mutex_lock(&pdev_list_mutex);
platform_device_unregister(p->pdev);
list_del(&p->list);
kfree(p);
+ mutex_unlock(&pdev_list_mutex);
+
+#ifdef CONFIG_SMP
+ /*
+ * Add removed CPU's HT sibling to dev_list.
+ * If there is no sibling available, the core sensor
+ * is invisiable to user space any more.
+ */
+ for_each_cpu(s, cpu_sibling_mask(cpu)) {
+ if (s != cpu) {
+ coretemp_device_add(s);
+ break;
+ }
+ }
+#endif
+ return;
}
}
- mutex_unlock(&pdev_list_mutex);
}
static int __cpuinit coretemp_cpu_callback(struct notifier_block *nfb,
--
1.6.0.3
From: Fenghua Yu <[email protected]>
Remove compilation warnings of unused variables p and n in coretemp_init().
Signed-off-by: Fenghua Yu <[email protected]>
---
drivers/hwmon/coretemp.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)
diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index c070c97..de81111 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -518,7 +518,6 @@ static struct notifier_block coretemp_cpu_notifier __refdata = {
static int __init coretemp_init(void)
{
int i, err = -ENODEV;
- struct pdev_entry *p, *n;
/* quick check if we run Intel */
if (cpu_data(0).x86_vendor != X86_VENDOR_INTEL)
--
1.6.0.3
From: Fenghua Yu <[email protected]>
When a sibling is added to dev_list after a cpu is hot-removed, pdev_list_mutex
has been locked already. But pkgtemp_device_add() tries to lock pdev_list_mutex
again. This is incorrect. The patch fixes this issue.
The patch also removes __cpuinit for pkgtemp_device_add() to avoid section
mismatch warning.
Signed-off-by: Fenghua Yu <[email protected]>
---
drivers/hwmon/pkgtemp.c | 18 +++++++++++++-----
1 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/drivers/hwmon/pkgtemp.c b/drivers/hwmon/pkgtemp.c
index 74157fc..928a016 100644
--- a/drivers/hwmon/pkgtemp.c
+++ b/drivers/hwmon/pkgtemp.c
@@ -276,7 +276,7 @@ struct pdev_entry {
static LIST_HEAD(pdev_list);
static DEFINE_MUTEX(pdev_list_mutex);
-static int __cpuinit pkgtemp_device_add(unsigned int cpu)
+static int pkgtemp_device_add(unsigned int cpu)
{
int err;
struct platform_device *pdev;
@@ -341,26 +341,34 @@ static void pkgtemp_device_remove(unsigned int cpu)
{
struct pdev_entry *p, *n;
unsigned int i;
- int err;
- mutex_lock(&pdev_list_mutex);
list_for_each_entry_safe(p, n, &pdev_list, list) {
if (p->cpu != cpu)
continue;
+ mutex_lock(&pdev_list_mutex);
platform_device_unregister(p->pdev);
list_del(&p->list);
kfree(p);
+ mutex_unlock(&pdev_list_mutex);
+ /*
+ * Select one of removed cpu's siblings to represent sensor
+ * for this package.
+ * If there is no more running sibling in a package, the
+ * package sensor for this package is not available to user
+ * space any more.
+ */
for_each_cpu(i, cpu_core_mask(cpu)) {
+ int err;
+
if (i != cpu) {
err = pkgtemp_device_add(i);
if (!err)
break;
}
}
- break;
+ return;
}
- mutex_unlock(&pdev_list_mutex);
}
static int __cpuinit pkgtemp_cpu_callback(struct notifier_block *nfb,
--
1.6.0.3
Hi Fenghua,
On Wed, 18 Aug 2010 15:53:46 -0700, Fenghua Yu wrote:
> From: Fenghua Yu <[email protected]>
>
> Remove compilation warnings of unused variables p and n in coretemp_init().
>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> drivers/hwmon/coretemp.c | 1 -
> 1 files changed, 0 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> index c070c97..de81111 100644
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -518,7 +518,6 @@ static struct notifier_block coretemp_cpu_notifier __refdata = {
> static int __init coretemp_init(void)
> {
> int i, err = -ENODEV;
> - struct pdev_entry *p, *n;
>
> /* quick check if we run Intel */
> if (cpu_data(0).x86_vendor != X86_VENDOR_INTEL)
Thanks for the fix, but it is already in my tree:
ftp://ftp.kernel.org/pub/linux/kernel/people/jdelvare/linux-2.6/jdelvare-hwmon/hwmon-coretemp-fix-warning.patch
I'll send it to Linus as part as my next batch of hwmon fixes for
2.6.36.
--
Jean Delvare
On Wed, Aug 18, 2010 at 06:53:45PM -0400, Fenghua Yu wrote:
> From: Fenghua Yu <[email protected]>
>
> In current coretemp driver, when a CPU in dev_list is hot-removed, although its
> HT sibling is still running, its core sensor is gone and not available to user
> level application any more.
>
> When a CPU is hot-removed, its core sensor should be still available to upper
> level application as long as the hot-removed CPU's HT sibling is still running.
> A core sensor is invisible to user level only when all of siblings in a core are
> hot-removed.
>
Isn't that just a short term (ie a couple of ms or even us) race condition
until the sibling is removed as well ? If so, why bother ?
Thanks,
Guenter
On 08/19/2010 12:24 AM, Jean Delvare wrote:
>
> Thanks for the fix, but it is already in my tree:
> ftp://ftp.kernel.org/pub/linux/kernel/people/jdelvare/linux-2.6/jdelvare-hwmon/hwmon-coretemp-fix-warning.patch
>
> I'll send it to Linus as part as my next batch of hwmon fixes for
> 2.6.36.
>
Hi Jean,
Are you overall hwmon maintainer now? That job is currently listed as
Orphan in the MAINTAINERS file.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
On Thu, 19 Aug 2010 12:04:06 -0700, H. Peter Anvin wrote:
> On 08/19/2010 12:24 AM, Jean Delvare wrote:
> >
> > Thanks for the fix, but it is already in my tree:
> > ftp://ftp.kernel.org/pub/linux/kernel/people/jdelvare/linux-2.6/jdelvare-hwmon/hwmon-coretemp-fix-warning.patch
> >
> > I'll send it to Linus as part as my next batch of hwmon fixes for
> > 2.6.36.
>
> Hi Jean,
>
> Are you overall hwmon maintainer now? That job is currently listed as
> Orphan in the MAINTAINERS file.
MAINTAINERS is correct. I do not have the time, nor the will, to be the
single official maintainer of the hwmon subsystem. I would be happy to
be a co-maintainer, but nobody stepped in to co-maintain with me.
Meanwhile, I'm doing my best to off-load some of the hwmon work from
Andrew, but that doesn't make me THE maintainer of hwmon.
--
Jean Delvare
On 08/19/2010 02:26 PM, Jean Delvare wrote:
>
> MAINTAINERS is correct. I do not have the time, nor the will, to be the
> single official maintainer of the hwmon subsystem. I would be happy to
> be a co-maintainer, but nobody stepped in to co-maintain with me.
>
> Meanwhile, I'm doing my best to off-load some of the hwmon work from
> Andrew, but that doesn't make me THE maintainer of hwmon.
>
OK. I was simply wondering if Fenghua should push his coretemp tree
through you rather than direct to Linus.
-hpa
On Thu, 19 Aug 2010 14:28:11 -0700, H. Peter Anvin wrote:
> On 08/19/2010 02:26 PM, Jean Delvare wrote:
> >
> > MAINTAINERS is correct. I do not have the time, nor the will, to be the
> > single official maintainer of the hwmon subsystem. I would be happy to
> > be a co-maintainer, but nobody stepped in to co-maintain with me.
> >
> > Meanwhile, I'm doing my best to off-load some of the hwmon work from
> > Andrew, but that doesn't make me THE maintainer of hwmon.
> >
>
> OK. I was simply wondering if Fenghua should push his coretemp tree
> through you rather than direct to Linus.
Well, it would certainly make sense for Fenghua to send the patches to
me (or better, to the lm-sensors list) and/or Andrew. I doubt Linus is
willing to review random driver patches on a regular basis, especially
when they don't fix anything critical.
--
Jean Delvare
On Thu, Aug 19, 2010 at 05:26:55PM -0400, Jean Delvare wrote:
> On Thu, 19 Aug 2010 12:04:06 -0700, H. Peter Anvin wrote:
> > On 08/19/2010 12:24 AM, Jean Delvare wrote:
> > >
> > > Thanks for the fix, but it is already in my tree:
> > > ftp://ftp.kernel.org/pub/linux/kernel/people/jdelvare/linux-2.6/jdelvare-hwmon/hwmon-coretemp-fix-warning.patch
> > >
> > > I'll send it to Linus as part as my next batch of hwmon fixes for
> > > 2.6.36.
> >
> > Hi Jean,
> >
> > Are you overall hwmon maintainer now? That job is currently listed as
> > Orphan in the MAINTAINERS file.
>
> MAINTAINERS is correct. I do not have the time, nor the will, to be the
> single official maintainer of the hwmon subsystem. I would be happy to
> be a co-maintainer, but nobody stepped in to co-maintain with me.
>
I can step in if needed. I don't want to step on anyone's feet, though,
and I would not want to be the sole maintainer, at least not yet - I just
don't know the code well enough.
Guenter
Hi Fenghua,
On Wed, 18 Aug 2010 15:53:45 -0700, Fenghua Yu wrote:
> From: Fenghua Yu <[email protected]>
>
> In current coretemp driver, when a CPU in dev_list is hot-removed, although its
> HT sibling is still running, its core sensor is gone and not available to user
> level application any more.
>
> When a CPU is hot-removed, its core sensor should be still available to upper
> level application as long as the hot-removed CPU's HT sibling is still running.
> A core sensor is invisible to user level only when all of siblings in a core are
> hot-removed.
Good point. I admit I didn't think about this scenario when fixing the
duplicate HT entries. I thought both hyperthreads would go away at the
same time, but since then I learned that individual HT can be removed
using the sysfs "online" attributes.
That being said, I'm curious if this is really a problem in practice?
Why would one disable only one hyperthread on a given core? I can't
think of a real-world scenario.
I don't mean to suggest that we don't have to fix the problem. I'm
simply trying to figure out how fast we need to fix it, and whether the
fix is worth adding to the stable kernel series or not.
>
> This patch fixes this issue.
>
> Signed-off-by: Fenghua Yu <[email protected]>
> ---
> drivers/hwmon/coretemp.c | 25 ++++++++++++++++++++++---
> 1 files changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
> index c070c97..2257cc4 100644
> --- a/drivers/hwmon/coretemp.c
> +++ b/drivers/hwmon/coretemp.c
> @@ -418,7 +418,7 @@ struct pdev_entry {
> static LIST_HEAD(pdev_list);
> static DEFINE_MUTEX(pdev_list_mutex);
>
> -static int __cpuinit coretemp_device_add(unsigned int cpu)
> +static int coretemp_device_add(unsigned int cpu)
> {
> int err;
> struct platform_device *pdev;
> @@ -483,15 +483,34 @@ exit:
> static void coretemp_device_remove(unsigned int cpu)
> {
> struct pdev_entry *p, *n;
> - mutex_lock(&pdev_list_mutex);
> +#ifdef CONFIG_SMP
> + int s;
> +#endif
> +
> list_for_each_entry_safe(p, n, &pdev_list, list) {
> if (p->cpu == cpu) {
> + mutex_lock(&pdev_list_mutex);
> platform_device_unregister(p->pdev);
> list_del(&p->list);
> kfree(p);
> + mutex_unlock(&pdev_list_mutex);
> +
> +#ifdef CONFIG_SMP
> + /*
> + * Add removed CPU's HT sibling to dev_list.
> + * If there is no sibling available, the core sensor
> + * is invisiable to user space any more.
> + */
> + for_each_cpu(s, cpu_sibling_mask(cpu)) {
> + if (s != cpu) {
> + coretemp_device_add(s);
> + break;
> + }
> + }
> +#endif
> + return;
> }
> }
> - mutex_unlock(&pdev_list_mutex);
> }
I am not convinced by this implementation. See the following comment
sequence:
* * * * *
localhost:/home/khali # sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +47.0°C (high = +90.0°C, crit = +100.0°C)
coretemp-isa-0001
Adapter: ISA adapter
Core 1: +46.0°C (high = +90.0°C, crit = +100.0°C)
coretemp-isa-0002
Adapter: ISA adapter
Core 2: +48.0°C (high = +90.0°C, crit = +100.0°C)
coretemp-isa-0003
Adapter: ISA adapter
Core 3: +46.0°C (high = +90.0°C, crit = +100.0°C)
localhost:/home/khali # echo 0 > /sys/devices/system/cpu/cpu1/online
localhost:/home/khali # sensors
coretemp-isa-0005
Adapter: ISA adapter
Core 1: +48.0°C (high = +90.0°C, crit = +100.0°C)
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +48.0°C (high = +90.0°C, crit = +100.0°C)
coretemp-isa-0002
Adapter: ISA adapter
Core 2: +49.0°C (high = +90.0°C, crit = +100.0°C)
coretemp-isa-0003
Adapter: ISA adapter
Core 3: +48.0°C (high = +90.0°C, crit = +100.0°C)
localhost:/home/khali #
* * * * *
As you can see, the switch of hyperthreads on Core 1 caused hwmon
device coretemp-isa-0001 to be removed and be replaced with
coretemp-isa-0005. There is also a change in the underlying
directories, /sys/class/hwmon/hwmon1/device now points
to /sys/devices/platform/coretemp.5 instead
of /sys/devices/platform/coretemp.1. This has three drawbacks:
1* Configuration statements from /etc/sensors.conf will no longer be
applied.
2* Some monitoring applications may lose their path to the sensors.
Thankfully, libsensors uses hwmon device paths rather than physical
device paths, so the effect should be limited, but other tools (e.g.
the fancontrol script) tend to prefer physical device paths, so they
will break.
3* If you disable several HTs at once, you have no guarantee that the
new hwmon devices will be numbered in the same order as the old hwmon
devices. If you are unlucky and the number changes, then all
libsensors-based applications will start reporting garbage.
I admit that these issues are not critical ones, and are rather
unlikely to happen in the real world, but so is the problem you are
trying to solve in the first place.
Point 1* could be easily solved by changing the way the coretemp device
ID is allocated. Instead of using the CPU ID directly, we would use the
smallest CPU ID amongst all the siblings. This ensures a consistent ID
no matter which sibling is used.
Points 2* and 3*, however, can't be solved without reworking the driver
significantly. I think we should not only skip duplicate HT entries on
driver registration as my naive patch did. We should instead keep track
of them, i.e. all coretemp entries should know the list of CPU entries
they are backed up by, and a coretemp device would be unregistered only
when this list shrinks to zero elements (all HT have been removed.)
As you said you agree to give a try to a rework of the coretemp driver
to keep all related cores into the same hwmon device, I think this
additional constraint might fit well in the new driver design. What do
you think?
--
Jean Delvare
On Fri, Aug 20, 2010 at 01:26:46AM -0700, Jean Delvare wrote:
> Hi Fenghua,
>
> On Wed, 18 Aug 2010 15:53:45 -0700, Fenghua Yu wrote:
> > From: Fenghua Yu <[email protected]>
> > When a CPU is hot-removed, its core sensor should be still available to upper
> > level application as long as the hot-removed CPU's HT sibling is still running.
> > A core sensor is invisible to user level only when all of siblings in a core are
> > hot-removed.
>
> Good point. I admit I didn't think about this scenario when fixing the
> duplicate HT entries. I thought both hyperthreads would go away at the
> same time, but since then I learned that individual HT can be removed
> using the sysfs "online" attributes.
>
> That being said, I'm curious if this is really a problem in practice?
> Why would one disable only one hyperthread on a given core? I can't
> think of a real-world scenario.
Overall we need to keep state integrity for hot-removed CPU and shared core
sensor. Without fixing this issue, we end up with inconsistent system info.
As for usage scenario, I can think of some:
1. Power saving. Management application may offline some threads or all thread
siblings to save power. Image all of HT is disabled during run-time, less power
is consumed with less performance.
2. RAS. A bad thread may be offlined which its sibling is still running. This
could be becaused of logical CPU spcific state e.g. instruction TLB.
>
> I don't mean to suggest that we don't have to fix the problem. I'm
> simply trying to figure out how fast we need to fix it, and whether the
> fix is worth adding to the stable kernel series or not.
>
> As you can see, the switch of hyperthreads on Core 1 caused hwmon
> device coretemp-isa-0001 to be removed and be replaced with
> coretemp-isa-0005. There is also a change in the underlying
> directories, /sys/class/hwmon/hwmon1/device now points
> to /sys/devices/platform/coretemp.5 instead
> of /sys/devices/platform/coretemp.1. This has three drawbacks:
> 1* Configuration statements from /etc/sensors.conf will no longer be
> applied.
> 2* Some monitoring applications may lose their path to the sensors.
> Thankfully, libsensors uses hwmon device paths rather than physical
> device paths, so the effect should be limited, but other tools (e.g.
> the fancontrol script) tend to prefer physical device paths, so they
> will break.
> 3* If you disable several HTs at once, you have no guarantee that the
> new hwmon devices will be numbered in the same order as the old hwmon
> devices. If you are unlucky and the number changes, then all
> libsensors-based applications will start reporting garbage.
>
> I admit that these issues are not critical ones, and are rather
> unlikely to happen in the real world, but so is the problem you are
> trying to solve in the first place.
>
> Point 1* could be easily solved by changing the way the coretemp device
> ID is allocated. Instead of using the CPU ID directly, we would use the
> smallest CPU ID amongst all the siblings. This ensures a consistent ID
> no matter which sibling is used.
>
> Points 2* and 3*, however, can't be solved without reworking the driver
> significantly. I think we should not only skip duplicate HT entries on
> driver registration as my naive patch did. We should instead keep track
> of them, i.e. all coretemp entries should know the list of CPU entries
> they are backed up by, and a coretemp device would be unregistered only
> when this list shrinks to zero elements (all HT have been removed.)
>
> As you said you agree to give a try to a rework of the coretemp driver
> to keep all related cores into the same hwmon device, I think this
> additional constraint might fit well in the new driver design. What do
> you think?
Yes, I agree with you on that. Since I'm rewriting coretemp/pkgtemp, this issue
will be fixed in a new coding.
Thanks.
-Fenghua