2024-03-25 17:55:16

by George Stark

[permalink] [raw]
Subject: [PATCH] iio: dht11: set debug log level for parsing error messages

Protocol parsing errors could happen due to several reasons like noise
environment, heavy load on system etc. If to poll the sensor frequently
and/or for a long period kernel log will become polluted with error
messages if their log level is err (i.e. on by default). Also some types
of those messages already have dbg level so use unified log level for
all such cases.

Signed-off-by: George Stark <[email protected]>
---
I use DHT22 sensor with Raspberry Pi Zero W as a simple home meteo station.
Even if to poll the sensor once per tens of seconds after month or two dmesg
may become full of useless parsing error messages. Anyway those errors are caught
in the user software thru return values.

drivers/iio/humidity/dht11.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/iio/humidity/dht11.c b/drivers/iio/humidity/dht11.c
index c97e25448772..e2cbc442177b 100644
--- a/drivers/iio/humidity/dht11.c
+++ b/drivers/iio/humidity/dht11.c
@@ -156,7 +156,7 @@ static int dht11_decode(struct dht11 *dht11, int offset)
dht11->temperature = temp_int * 1000;
dht11->humidity = hum_int * 1000;
} else {
- dev_err(dht11->dev,
+ dev_dbg(dht11->dev,
"Don't know how to decode data: %d %d %d %d\n",
hum_int, hum_dec, temp_int, temp_dec);
return -EIO;
@@ -239,7 +239,7 @@ static int dht11_read_raw(struct iio_dev *iio_dev,
#endif

if (ret == 0 && dht11->num_edges < DHT11_EDGES_PER_READ - 1) {
- dev_err(dht11->dev, "Only %d signal edges detected\n",
+ dev_dbg(dht11->dev, "Only %d signal edges detected\n",
dht11->num_edges);
ret = -ETIMEDOUT;
}
--
2.25.1



2024-03-25 19:00:49

by Harald Geyer

[permalink] [raw]
Subject: Re: [PATCH] iio: dht11: set debug log level for parsing error messages

Hi George!

I'm torn on this:

Am Montag, dem 25.03.2024 um 19:54 +0300 schrieb George Stark:
> Protocol parsing errors could happen due to several reasons like
> noise
> environment, heavy load on system etc. If to poll the sensor
> frequently
> and/or for a long period kernel log will become polluted with error
> messages if their log level is err (i.e. on by default).

Yes, these error are often recoverable. (As are many other HW errors,
that typically are logged. Eg USB bus resets due to EMI)

However they are still genuine errors of the HW.

> Also some types
> of those messages already have dbg level so use unified log level for
> all such cases.

My take so far has been: Debug level messages are for debugging the
code (ie adding/testing support of new device variants etc). Users
aren't expected to know about or enable debug output. OTOH anything
actually going wrong is an error and should be logged as such.

The idea is, that these messages help users understand issues with
their HW (like too long cables, broken cables etc). But it is true,
that they will slowly accumulate in many real world scenarios without
anything being truly wrong.

I don't consider the dmesg buffer being rotated after a month or two a
bug. But I suppose this is a corner case. I'll happily accept whatever
Jonathan thinks is reasonable.

Best regards,
Harald


> Signed-off-by: George Stark <[email protected]>
> ---
> I use DHT22 sensor with Raspberry Pi Zero W as a simple home meteo
> station.
> Even if to poll the sensor once per tens of seconds after month or
> two dmesg
> may become full of useless parsing error messages. Anyway those
> errors are caught
> in the user software thru return values.
>
>  drivers/iio/humidity/dht11.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/iio/humidity/dht11.c
> b/drivers/iio/humidity/dht11.c
> index c97e25448772..e2cbc442177b 100644
> --- a/drivers/iio/humidity/dht11.c
> +++ b/drivers/iio/humidity/dht11.c
> @@ -156,7 +156,7 @@ static int dht11_decode(struct dht11 *dht11, int
> offset)
>                 dht11->temperature = temp_int * 1000;
>                 dht11->humidity = hum_int * 1000;
>         } else {
> -               dev_err(dht11->dev,
> +               dev_dbg(dht11->dev,
>                         "Don't know how to decode data: %d %d %d
> %d\n",
>                         hum_int, hum_dec, temp_int, temp_dec);
>                 return -EIO;
> @@ -239,7 +239,7 @@ static int dht11_read_raw(struct iio_dev
> *iio_dev,
>  #endif
>
>                 if (ret == 0 && dht11->num_edges <
> DHT11_EDGES_PER_READ - 1) {
> -                       dev_err(dht11->dev, "Only %d signal edges
> detected\n",
> +                       dev_dbg(dht11->dev, "Only %d signal edges
> detected\n",
>                                 dht11->num_edges);
>                         ret = -ETIMEDOUT;
>                 }
> --
> 2.25.1
>


2024-03-25 20:19:30

by George Stark

[permalink] [raw]
Subject: Re: [PATCH] iio: dht11: set debug log level for parsing error messages

Hello Harald

Thanks for the review.


On 3/25/24 21:48, Harald Geyer wrote:
> Hi George!
>
> I'm torn on this:
>
> Am Montag, dem 25.03.2024 um 19:54 +0300 schrieb George Stark:
>> Protocol parsing errors could happen due to several reasons like
>> noise
>> environment, heavy load on system etc. If to poll the sensor
>> frequently
>> and/or for a long period kernel log will become polluted with error
>> messages if their log level is err (i.e. on by default).
>
> Yes, these error are often recoverable. (As are many other HW errors,
> that typically are logged. Eg USB bus resets due to EMI)
>
> However they are still genuine errors of the HW.
>
>> Also some types
>> of those messages already have dbg level so use unified log level for
>> all such cases.
>
> My take so far has been: Debug level messages are for debugging the
> code (ie adding/testing support of new device variants etc). Users
> aren't expected to know about or enable debug output. OTOH anything
> actually going wrong is an error and should be logged as such.
>
> The idea is, that these messages help users understand issues with
> their HW (like too long cables, broken cables etc). But it is true,
> that they will slowly accumulate in many real world scenarios without
> anything being truly wrong.

I agree with you that it's very convenient to just take a look to dmesg
and see device connection problems at once. But unlike e.g. usb user has
to actually start reading sensor to perform communication and read
errors will be propagated to the userspace and could be noticed \
handled.

Anyway I believe we should use uniform approach for read errors -
currently in the driver there're already dbg messages:

"lost synchronisation at edge %d\n"
"invalid checksum\n"

I changed log level from err to dbg for the messages:

"Only %d signal edges detected\n"
"Don't know how to decode data: %d %d %d %d\n"

They all are from a single callback and say the same thing -
communication problem.

If we make all those messages as errors it'd be great to have mechanism
to disable them e.g. thru module parameter or somehow without rebuilding
kernel. Those errors can be bypassed by increasing read rate.

>
> I don't consider the dmesg buffer being rotated after a month or two a
> bug. But I suppose this is a corner case. I'll happily accept whatever
> Jonathan thinks is reasonable.
>
> Best regards,
> Harald
>
>
>> Signed-off-by: George Stark <[email protected]>
>> ---
>> I use DHT22 sensor with Raspberry Pi Zero W as a simple home meteo
>> station.
>> Even if to poll the sensor once per tens of seconds after month or
>> two dmesg
>> may become full of useless parsing error messages. Anyway those
>> errors are caught
>> in the user software thru return values.
>>
>>  drivers/iio/humidity/dht11.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/iio/humidity/dht11.c
>> b/drivers/iio/humidity/dht11.c
>> index c97e25448772..e2cbc442177b 100644
>> --- a/drivers/iio/humidity/dht11.c
>> +++ b/drivers/iio/humidity/dht11.c
>> @@ -156,7 +156,7 @@ static int dht11_decode(struct dht11 *dht11, int
>> offset)
>>                 dht11->temperature = temp_int * 1000;
>>                 dht11->humidity = hum_int * 1000;
>>         } else {
>> -               dev_err(dht11->dev,
>> +               dev_dbg(dht11->dev,
>>                         "Don't know how to decode data: %d %d %d
>> %d\n",
>>                         hum_int, hum_dec, temp_int, temp_dec);
>>                 return -EIO;
>> @@ -239,7 +239,7 @@ static int dht11_read_raw(struct iio_dev
>> *iio_dev,
>>  #endif
>>
>>                 if (ret == 0 && dht11->num_edges <
>> DHT11_EDGES_PER_READ - 1) {
>> -                       dev_err(dht11->dev, "Only %d signal edges
>> detected\n",
>> +                       dev_dbg(dht11->dev, "Only %d signal edges
>> detected\n",
>>                                 dht11->num_edges);
>>                         ret = -ETIMEDOUT;
>>                 }
>> --
>> 2.25.1
>>
>

--
Best regards
George

2024-03-25 20:50:15

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH] iio: dht11: set debug log level for parsing error messages

On Mon, 25 Mar 2024 23:18:32 +0300
George Stark <[email protected]> wrote:

> Hello Harald
>
> Thanks for the review.
>
>
> On 3/25/24 21:48, Harald Geyer wrote:
> > Hi George!
> >
> > I'm torn on this:
> >
> > Am Montag, dem 25.03.2024 um 19:54 +0300 schrieb George Stark:
> >> Protocol parsing errors could happen due to several reasons like
> >> noise
> >> environment, heavy load on system etc. If to poll the sensor
> >> frequently
> >> and/or for a long period kernel log will become polluted with error
> >> messages if their log level is err (i.e. on by default).
> >
> > Yes, these error are often recoverable. (As are many other HW errors,
> > that typically are logged. Eg USB bus resets due to EMI)
> >
> > However they are still genuine errors of the HW.
> >
> >> Also some types
> >> of those messages already have dbg level so use unified log level for
> >> all such cases.
> >
> > My take so far has been: Debug level messages are for debugging the
> > code (ie adding/testing support of new device variants etc). Users
> > aren't expected to know about or enable debug output. OTOH anything
> > actually going wrong is an error and should be logged as such.
> >
> > The idea is, that these messages help users understand issues with
> > their HW (like too long cables, broken cables etc). But it is true,
> > that they will slowly accumulate in many real world scenarios without
> > anything being truly wrong.
>
> I agree with you that it's very convenient to just take a look to dmesg
> and see device connection problems at once. But unlike e.g. usb user has
> to actually start reading sensor to perform communication and read
> errors will be propagated to the userspace and could be noticed \
> handled.
>
> Anyway I believe we should use uniform approach for read errors -
> currently in the driver there're already dbg messages:
>
> "lost synchronisation at edge %d\n"
> "invalid checksum\n"
>
> I changed log level from err to dbg for the messages:
>
> "Only %d signal edges detected\n"
> "Don't know how to decode data: %d %d %d %d\n"
>
> They all are from a single callback and say the same thing -
> communication problem.
>
> If we make all those messages as errors it'd be great to have mechanism
> to disable them e.g. thru module parameter or somehow without rebuilding
> kernel. Those errors can be bypassed by increasing read rate.

Don't use a parameter for this. dev_dbg and dynamic debug is the
way to go if this is the choice available.

If Harald is comfortable with the change given this discussion, I'll pick
up this patch.

Jonathan

>
> >
> > I don't consider the dmesg buffer being rotated after a month or two a
> > bug. But I suppose this is a corner case. I'll happily accept whatever
> > Jonathan thinks is reasonable.
> >
> > Best regards,
> > Harald
> >
> >
> >> Signed-off-by: George Stark <[email protected]>
> >> ---
> >> I use DHT22 sensor with Raspberry Pi Zero W as a simple home meteo
> >> station.
> >> Even if to poll the sensor once per tens of seconds after month or
> >> two dmesg
> >> may become full of useless parsing error messages. Anyway those
> >> errors are caught
> >> in the user software thru return values.
> >>
> >>  drivers/iio/humidity/dht11.c | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/drivers/iio/humidity/dht11.c
> >> b/drivers/iio/humidity/dht11.c
> >> index c97e25448772..e2cbc442177b 100644
> >> --- a/drivers/iio/humidity/dht11.c
> >> +++ b/drivers/iio/humidity/dht11.c
> >> @@ -156,7 +156,7 @@ static int dht11_decode(struct dht11 *dht11, int
> >> offset)
> >>                 dht11->temperature = temp_int * 1000;
> >>                 dht11->humidity = hum_int * 1000;
> >>         } else {
> >> -               dev_err(dht11->dev,
> >> +               dev_dbg(dht11->dev,
> >>                         "Don't know how to decode data: %d %d %d
> >> %d\n",
> >>                         hum_int, hum_dec, temp_int, temp_dec);
> >>                 return -EIO;
> >> @@ -239,7 +239,7 @@ static int dht11_read_raw(struct iio_dev
> >> *iio_dev,
> >>  #endif
> >>
> >>                 if (ret == 0 && dht11->num_edges <
> >> DHT11_EDGES_PER_READ - 1) {
> >> -                       dev_err(dht11->dev, "Only %d signal edges
> >> detected\n",
> >> +                       dev_dbg(dht11->dev, "Only %d signal edges
> >> detected\n",
> >>                                 dht11->num_edges);
> >>                         ret = -ETIMEDOUT;
> >>                 }
> >> --
> >> 2.25.1
> >>
> >
>


2024-03-25 23:24:27

by Jonathan Cameron

[permalink] [raw]
Subject: Re: [PATCH] iio: dht11: set debug log level for parsing error messages

On Mon, 25 Mar 2024 19:48:00 +0100
Harald Geyer <[email protected]> wrote:

> Hi George!
>
> I'm torn on this:
>
> Am Montag, dem 25.03.2024 um 19:54 +0300 schrieb George Stark:
> > Protocol parsing errors could happen due to several reasons like
> > noise
> > environment, heavy load on system etc. If to poll the sensor
> > frequently
> > and/or for a long period kernel log will become polluted with error
> > messages if their log level is err (i.e. on by default).
>
> Yes, these error are often recoverable. (As are many other HW errors,
> that typically are logged. Eg USB bus resets due to EMI)
>
> However they are still genuine errors of the HW.
>
> > Also some types
> > of those messages already have dbg level so use unified log level for
> > all such cases.
>
> My take so far has been: Debug level messages are for debugging the
> code (ie adding/testing support of new device variants etc). Users
> aren't expected to know about or enable debug output. OTOH anything
> actually going wrong is an error and should be logged as such.
>
> The idea is, that these messages help users understand issues with
> their HW (like too long cables, broken cables etc). But it is true,
> that they will slowly accumulate in many real world scenarios without
> anything being truly wrong.
>
> I don't consider the dmesg buffer being rotated after a month or two a
> bug. But I suppose this is a corner case. I'll happily accept whatever
> Jonathan thinks is reasonable.
My take:

If the errors are eaten with no user visibility then they should
be logged (errors in interrupt handlers etc) but for errors in
code that returns an error code to the userspace read or similar there
is info that 'something went wrong' available via that then it's fine
to use dev_dbg() with expectation anyone who is looking into issues
can turn them on. However, I defer to driver maintainers on whether
they prefer dev_err() or dev_dbg() for these sorts of cases. Far
as I'm concerned either choice is fine and it's a judgement on
the expected rates of error and impact.

Jonathan

>
> Best regards,
> Harald
>
>
> > Signed-off-by: George Stark <[email protected]>
> > ---
> > I use DHT22 sensor with Raspberry Pi Zero W as a simple home meteo
> > station.
> > Even if to poll the sensor once per tens of seconds after month or
> > two dmesg
> > may become full of useless parsing error messages. Anyway those
> > errors are caught
> > in the user software thru return values.
> >
> >  drivers/iio/humidity/dht11.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/iio/humidity/dht11.c
> > b/drivers/iio/humidity/dht11.c
> > index c97e25448772..e2cbc442177b 100644
> > --- a/drivers/iio/humidity/dht11.c
> > +++ b/drivers/iio/humidity/dht11.c
> > @@ -156,7 +156,7 @@ static int dht11_decode(struct dht11 *dht11, int
> > offset)
> >                 dht11->temperature = temp_int * 1000;
> >                 dht11->humidity = hum_int * 1000;
> >         } else {
> > -               dev_err(dht11->dev,
> > +               dev_dbg(dht11->dev,
> >                         "Don't know how to decode data: %d %d %d
> > %d\n",
> >                         hum_int, hum_dec, temp_int, temp_dec);
> >                 return -EIO;
> > @@ -239,7 +239,7 @@ static int dht11_read_raw(struct iio_dev
> > *iio_dev,
> >  #endif
> >
> >                 if (ret == 0 && dht11->num_edges <
> > DHT11_EDGES_PER_READ - 1) {
> > -                       dev_err(dht11->dev, "Only %d signal edges
> > detected\n",
> > +                       dev_dbg(dht11->dev, "Only %d signal edges
> > detected\n",
> >                                 dht11->num_edges);
> >                         ret = -ETIMEDOUT;
> >                 }
> > --
> > 2.25.1
> >
>


2024-03-26 14:04:57

by Harald Geyer

[permalink] [raw]
Subject: Re: [PATCH] iio: dht11: set debug log level for parsing error messages

Hi George!

Am Montag, dem 25.03.2024 um 23:18 +0300 schrieb George Stark:
> On 3/25/24 21:48, Harald Geyer wrote:
> > Am Montag, dem 25.03.2024 um 19:54 +0300 schrieb George Stark:
> > > Protocol parsing errors could happen due to several reasons like
> > > noise
> > > environment, heavy load on system etc. If to poll the sensor
> > > frequently
> > > and/or for a long period kernel log will become polluted with
> > > error
> > > messages if their log level is err (i.e. on by default).
> >
> > Yes, these error are often recoverable. (As are many other HW
> > errors,
> > that typically are logged. Eg USB bus resets due to EMI)
> >
> > [...]
> >
> > The idea is, that these messages help users understand issues with
> > their HW (like too long cables, broken cables etc). But it is true,
> > that they will slowly accumulate in many real world scenarios
> > without
> > anything being truly wrong.
>
> I agree with you that it's very convenient to just take a look to
> dmesg
> and see device connection problems at once. But unlike e.g. usb user
> has
> to actually start reading sensor to perform communication and read
> errors will be propagated to the userspace and could be noticed \
> handled.

Not really. The log lines contain additional information useful for
understanding the problem with the setup.

> Anyway I believe we should use uniform approach for read errors -
> currently in the driver there're already dbg messages:
>
> "lost synchronisation at edge %d\n"
> "invalid checksum\n"

These errors are usually caused by EMI and there isn't much to do aside
from trying again until we find a time window with less interference.
They are not logged, because in some cases they might be very frequent
and can be handled by the user space client programatically anyway.

> I changed log level from err to dbg for the messages:
>
> "Only %d signal edges detected\n"

This mostly indicates a problem with the setup. Long cable, dead
sensor, high (interrupt) load etc.

Its true that this can happen during normal operation. - Usually when
the system takes too long to enter the irq handler.

But the primary causes are:
1) Your wiring is broken. In this case, the message is immediately
helpful and points you in the right direction. (Only if you understand
the protocol though.)
2) Your sensor is dead or "crashed", which also warrants an error msg
IMO.

The "crashed" case is a bit special. Some chips seem to randomly stop
working after a couple of hours and the only remedy is to power cycle
them. This could be done automatically. - I have the sensor power
supply pin on a GPIO and reset it from userspace in my setup. I tried
to work on a version of the driver some years ago, that would
optionally register with a regulator and manage sensor resets from
within the kernel driver. If this was actually implemented, we could
reduce the logging to cases, where the reset didn't solve the problem.

I stopped working on this, because it would have required changes to
the regulator framework, to be actually useful, and the regulator
maintainers didn't seem to keen about them. However, if you want to
pick this up in an effort to reduce unnecessary error conditions and
messages, I certainly would be happy.

> "Don't know how to decode data: %d %d %d %d\n"

This would indicate a sensor, that uses the same protocol but an
unsupported data format. This is a permanent error and therefor should
be logged IMO.

I guess, if you have a bad readout due to EMI but the checksum
accidentally matches, then you might get this message too. But this
should be a very rare case.

> They all are from a single callback and say the same thing -
> communication problem.

Not really. See above.

> If we make all those messages as errors it'd be great to have
> mechanism
> to disable them e.g. thru module parameter or somehow without
> rebuilding
> kernel.

No. What you try to change is cosmetic at best. It certainly doesn't
justify adding any complexity.

Since Jonathan deferred to my judgment:

As you can see, I did consider the trade-off between useful diagnostics
and spamming the log carefully. So naturally I'm inclined to reject
your proposal unless it solves an actual problem.

Also people still mail me directly with bogus bug reports about the
driver when really they have some issue with their setup. I fear, if we
reduce diagnostics, it will increase that noise.

So I reject your proposed changes, if they are for the sake of
unification. I'm willing to discuss, what the most sensible trade-off
is, but it would need to actually add to the considerations I already
did.

Best regards,
Harald


> Those errors can be bypassed by increasing read rate.
>
> >
> > I don't consider the dmesg buffer being rotated after a month or
> > two a
> > bug. But I suppose this is a corner case. I'll happily accept
> > whatever
> > Jonathan thinks is reasonable.
> >
> > Best regards,
> > Harald
> >
> >
> > > Signed-off-by: George Stark <[email protected]>
> > > ---
> > > I use DHT22 sensor with Raspberry Pi Zero W as a simple home
> > > meteo
> > > station.
> > > Even if to poll the sensor once per tens of seconds after month
> > > or
> > > two dmesg
> > > may become full of useless parsing error messages. Anyway those
> > > errors are caught
> > > in the user software thru return values.
> > >
> > >   drivers/iio/humidity/dht11.c | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/iio/humidity/dht11.c
> > > b/drivers/iio/humidity/dht11.c
> > > index c97e25448772..e2cbc442177b 100644
> > > --- a/drivers/iio/humidity/dht11.c
> > > +++ b/drivers/iio/humidity/dht11.c
> > > @@ -156,7 +156,7 @@ static int dht11_decode(struct dht11 *dht11,
> > > int
> > > offset)
> > >                  dht11->temperature = temp_int * 1000;
> > >                  dht11->humidity = hum_int * 1000;
> > >          } else {
> > > -               dev_err(dht11->dev,
> > > +               dev_dbg(dht11->dev,
> > >                          "Don't know how to decode data: %d %d %d
> > > %d\n",
> > >                          hum_int, hum_dec, temp_int, temp_dec);
> > >                  return -EIO;
> > > @@ -239,7 +239,7 @@ static int dht11_read_raw(struct iio_dev
> > > *iio_dev,
> > >   #endif
> > >
> > >                  if (ret == 0 && dht11->num_edges <
> > > DHT11_EDGES_PER_READ - 1) {
> > > -                       dev_err(dht11->dev, "Only %d signal edges
> > > detected\n",
> > > +                       dev_dbg(dht11->dev, "Only %d signal edges
> > > detected\n",
> > >                                  dht11->num_edges);
> > >                          ret = -ETIMEDOUT;
> > >                  }
> > > --
> > > 2.25.1
> > >
> >
>


2024-04-05 20:09:38

by George Stark

[permalink] [raw]
Subject: Re: [PATCH] iio: dht11: set debug log level for parsing error messages

Hello Harald

On 3/26/24 17:03, Harald Geyer wrote:
> Hi George!
>
> Am Montag, dem 25.03.2024 um 23:18 +0300 schrieb George Stark:
>> On 3/25/24 21:48, Harald Geyer wrote:
>>> Am Montag, dem 25.03.2024 um 19:54 +0300 schrieb George Stark:
>>>> Protocol parsing errors could happen due to several reasons like
>>>> noise
>>>> environment, heavy load on system etc. If to poll the sensor
>>>> frequently
>>>> and/or for a long period kernel log will become polluted with
>>>> error
>>>> messages if their log level is err (i.e. on by default).
>>>
>>> Yes, these error are often recoverable. (As are many other HW
>>> errors,
>>> that typically are logged. Eg USB bus resets due to EMI)
>>>
>>> [...]
>>>
>>> The idea is, that these messages help users understand issues with
>>> their HW (like too long cables, broken cables etc). But it is true,
>>> that they will slowly accumulate in many real world scenarios
>>> without
>>> anything being truly wrong.
>> I agree with you that it's very convenient to just take a look to
>> dmesg
>> and see device connection problems at once. But unlike e.g. usb user
>> has
>> to actually start reading sensor to perform communication and read
>> errors will be propagated to the userspace and could be noticed \
>> handled.
>
> Not really. The log lines contain additional information useful for
> understanding the problem with the setup.
>
>> Anyway I believe we should use uniform approach for read errors -
>> currently in the driver there're already dbg messages:
>>
>> "lost synchronisation at edge %d\n"
>> "invalid checksum\n"
>
> These errors are usually caused by EMI and there isn't much to do aside
> from trying again until we find a time window with less interference.
> They are not logged, because in some cases they might be very frequent
> and can be handled by the user space client programatically anyway.
>
>> I changed log level from err to dbg for the messages:
>>
>> "Only %d signal edges detected\n"
>
> This mostly indicates a problem with the setup. Long cable, dead
> sensor, high (interrupt) load etc.
>
> Its true that this can happen during normal operation. - Usually when
> the system takes too long to enter the irq handler.
Usually there's nothing to do about it. And several late-handled
interrupts can lead to any of 4 errors.

>
> But the primary causes are:
> 1) Your wiring is broken. In this case, the message is immediately
> helpful and points you in the right direction. (Only if you understand
> the protocol though.)
> 2) Your sensor is dead or "crashed", which also warrants an error msg
> IMO.

My sensor works around 3 years 24\7 (with one reboot per month) and
still precise (for home usage at least) - I compared it with several
other home temperature sensors.

I've just checked another sensor in my setup (3.3v, cable 15cm) the
error rate is the same. So it's ether EMI or single-core slow cpu.

>
> The "crashed" case is a bit special. Some chips seem to randomly stop
> working after a couple of hours and the only remedy is to power cycle
> them. This could be done automatically. - I have the sensor power
> supply pin on a GPIO and reset it from userspace in my setup. I tried
> to work on a version of the driver some years ago, that would
> optionally register with a regulator and manage sensor resets from
> within the kernel driver. If this was actually implemented, we could
> reduce the logging to cases, where the reset didn't solve the problem.
>
> I stopped working on this, because it would have required changes to
> the regulator framework, to be actually useful, and the regulator
> maintainers didn't seem to keen about them. However, if you want to
> pick this up in an effort to reduce unnecessary error conditions and
> messages, I certainly would be happy.
>
>> "Don't know how to decode data: %d %d %d %d\n"
>
> This would indicate a sensor, that uses the same protocol but an
> unsupported data format. This is a permanent error and therefor should
> be logged IMO.
>
> I guess, if you have a bad readout due to EMI but the checksum
> accidentally matches, then you might get this message too. But this
> should be a very rare case.

It's not very rare but sure not permanent. It can happen once per
several hundreds / a thousand requests.

>
>> They all are from a single callback and say the same thing -
>> communication problem.
>
> Not really. See above.
>
>> If we make all those messages as errors it'd be great to have
>> mechanism
>> to disable them e.g. thru module parameter or somehow without
>> rebuilding
>> kernel.
>
> No. What you try to change is cosmetic at best. It certainly doesn't
> justify adding any complexity.
>
> Since Jonathan deferred to my judgment:
>
> As you can see, I did consider the trade-off between useful diagnostics
> and spamming the log carefully. So naturally I'm inclined to reject
> your proposal unless it solves an actual problem.
>
> Also people still mail me directly with bogus bug reports about the
> driver when really they have some issue with their setup. I fear, if we
> reduce diagnostics, it will increase that noise.

If you don't ask those people to send dmesg with CONFIG_DEBUG is on you
lost 2 of 4 possible errors and don't see the whole picture. Other
dev_dbg messages in this driver are rather helpful.

Anyway thanks for that very interesting information. It's not obvious at
all if you're unfamiliar with protocol and the driver. Probably it
should be put in some documentation. Without diving into driver's source
code those messages are just noise.
Still I consider all those cases are purely debug. When there no
hardware support for sensor's protocol and the kernel is not realtime
then errors are probable. But often a temp sensor application is
not required to struggle for every measurement. And it's not desirable
to add logrotate to embedded system just because of temp sensor diagnostics.

>
> So I reject your proposed changes, if they are for the sake of
> unification. I'm willing to discuss, what the most sensible trade-off
> is, but it would need to actually add to the considerations I already
> did.
>
> Best regards,
> Harald
>
>
>> Those errors can be bypassed by increasing read rate.
>>
>>>
>>> I don't consider the dmesg buffer being rotated after a month or
>>> two a
>>> bug. But I suppose this is a corner case. I'll happily accept
>>> whatever
>>> Jonathan thinks is reasonable.
>>>
>>> Best regards,
>>> Harald
>>>
>>>
>>>> Signed-off-by: George Stark <[email protected]>
>>>> ---
>>>> I use DHT22 sensor with Raspberry Pi Zero W as a simple home
>>>> meteo
>>>> station.
>>>> Even if to poll the sensor once per tens of seconds after month
>>>> or
>>>> two dmesg
>>>> may become full of useless parsing error messages. Anyway those
>>>> errors are caught
>>>> in the user software thru return values.
>>>>
>>>>   drivers/iio/humidity/dht11.c | 4 ++--
>>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/iio/humidity/dht11.c
>>>> b/drivers/iio/humidity/dht11.c
>>>> index c97e25448772..e2cbc442177b 100644
>>>> --- a/drivers/iio/humidity/dht11.c
>>>> +++ b/drivers/iio/humidity/dht11.c
>>>> @@ -156,7 +156,7 @@ static int dht11_decode(struct dht11 *dht11,
>>>> int
>>>> offset)
>>>>                  dht11->temperature = temp_int * 1000;
>>>>                  dht11->humidity = hum_int * 1000;
>>>>          } else {
>>>> -               dev_err(dht11->dev,
>>>> +               dev_dbg(dht11->dev,
>>>>                          "Don't know how to decode data: %d %d %d
>>>> %d\n",
>>>>                          hum_int, hum_dec, temp_int, temp_dec);
>>>>                  return -EIO;
>>>> @@ -239,7 +239,7 @@ static int dht11_read_raw(struct iio_dev
>>>> *iio_dev,
>>>>   #endif
>>>>
>>>>                  if (ret == 0 && dht11->num_edges <
>>>> DHT11_EDGES_PER_READ - 1) {
>>>> -                       dev_err(dht11->dev, "Only %d signal edges
>>>> detected\n",
>>>> +                       dev_dbg(dht11->dev, "Only %d signal edges
>>>> detected\n",
>>>>                                  dht11->num_edges);
>>>>                          ret = -ETIMEDOUT;
>>>>                  }
>>>> --
>>>> 2.25.1
>>>>
>>>
>>
>

--
Best regards
George