2017-06-01 18:48:04

by Benjamin Tissoires

[permalink] [raw]
Subject: [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi,

Sending this as a WIP as it still need a few changes, but it mostly works as
expected (still not fully compliant yet).

So this is based on Lennart's comment in [1]: if the LID state is not reliable,
the kernel should not export the LID switch device as long as we are not sure
about its state.

That is the basic idea, and here are some more general comments:
Lv described the 5 cases in "RFC PATCH v3" regarding the LID switch.
Let me rewrite them here (they are in patch 2):

1. Some platforms send "open" ACPI notification to the OS and the event
arrive before the button driver is resumed;
2. Some platforms send "open" ACPI notification to the OS, but the event
arrives after the button driver is resumed, ex., Samsung N210+;
3. Some platforms never send an "open" ACPI notification to the OS, but
update the cached _LID return value to "open", and this update arrives
before the button driver is resumed;
4. Some platforms never send an "open" ACPI notification to the OS, but
update the cached _LID return value to "open", but this update arrives
after the button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send an "open" ACPI notification to the OS, and
_LID ACPI method returns a value which stays to "close", ex.,
Surface Pro 1.

We we consider that we can mark the LID switch as unreliable and make it
disappear when we are not certain of the state, we can consider cases 1, 2, 3
are solved: cases 1 and 3 are solved when the LID state is reliable (majority
of existing laptops), and case 2 is solved just by marking when the LID is not
reliable. When we go to sleep, we unregister the input node. We wait for
the next ACPI notification to re-export the LID switch input node with the
correct state.

Given that the "close" event is reliable, on platforms where the LID switch is
not reliable for "open", we will get the "close" event when we will start
exporting the switch at the input level.

Note that systemd currently doesn't sync the state when the input node just
appears. This is a systemd bug, and it should not be handled by the kernel
community.

For case 4, we are not aware at the acpi/button.c level when the state is valid.
We can solve this by polling every seconds for let's say 1 min, and if we detect
a change, then we can re-export the input node (this hasn't been implemented
yet). After this delay, we can consider the state as valid and export the input
node with the current reported state in the ACPI.

However, this will conflict with case 5 where the ACPI value reported by
the _LID method can be wrong anytime. We will need to treat this separately
or find some other magic to make cases 4 and 5 compatible.

libinput will help cases 4 and 5 to restore the proper state, but that's
assuming we have exported a wrong state. It might happen in case 5, but
shouldn't in case 4.

Anyway, that is just a WIP which IMO is less hacky than the few other series.
I still need to work on the udev/hwdb rules to have the list of problematic
platforms in hwdb to not have them in the kernel, but that shouldn't be much
of an issue. I also need to work on the polling but I'd like to get some inputs
from Lv, Peter and others before spending too much time on it.

Note: yes, there is a lot of boilerplate for the input handler and for the
reliable state, but I think this simplifies the logic as we are all reliying
on the input stack to filter duplicate events.
One other benefit of this boilerplate is that when libinput changes the LID
state, i915 and nouveau will get notified.

Cheers,
Benjamin


[1] https://github.com/systemd/systemd/issues/2807

Benjamin Tissoires (3):
ACPI: button: extract input creation/destruction helpers
ACPI: button: remove the LID input node when the state is unknown
ACPI: button: Let input filter out the LID events

Lv Zheng (1):
ACPI: button: Fix lid notification locks

drivers/acpi/button.c | 453 +++++++++++++++++++++++++++++++++++---------------
1 file changed, 320 insertions(+), 133 deletions(-)

--
2.9.4


2017-06-01 18:47:08

by Benjamin Tissoires

[permalink] [raw]
Subject: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown

Because of the variation of firmware implementation, there is a chance
the LID state is unknown:
1. Some platforms send "open" ACPI notification to the OS and the event
arrive before the button driver is resumed;
2. Some platforms send "open" ACPI notification to the OS, but the event
arrives after the button driver is resumed, ex., Samsung N210+;
3. Some platforms never send an "open" ACPI notification to the OS, but
update the cached _LID return value to "open", and this update arrives
before the button driver is resumed;
4. Some platforms never send an "open" ACPI notification to the OS, but
update the cached _LID return value to "open", but this update arrives
after the button driver is resumed, ex., Surface Pro 3;
5. Some platforms never send an "open" ACPI notification to the OS, and
_LID ACPI method returns a value which stays to "close", ex.,
Surface Pro 1.

We can mark the unreliable platform (cases 2, 4, 5 above) as such and make
sure we do not export an input node with an unknown state to prevent
suspend loops.

The database of unreliable devices is left to userspace to handle with
a hwdb file and a udev rule.

Note that this patch removes the filtering of duplicate events when
calling blocking_notifier_call_chain(), but this will be addressed in
a following patch.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
drivers/acpi/button.c | 207 ++++++++++++++++++++++++++++++++------------------
1 file changed, 131 insertions(+), 76 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 48bcdca..9ad7604 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -25,6 +25,7 @@
#include <linux/module.h>
#include <linux/init.h>
#include <linux/types.h>
+#include <linux/moduleparam.h>
#include <linux/proc_fs.h>
#include <linux/seq_file.h>
#include <linux/input.h>
@@ -79,6 +80,8 @@ MODULE_DEVICE_TABLE(acpi, button_device_ids);
static int acpi_button_add(struct acpi_device *device);
static int acpi_button_remove(struct acpi_device *device);
static void acpi_button_notify(struct acpi_device *device, u32 event);
+static int acpi_button_add_input(struct acpi_device *device);
+static int acpi_lid_update_reliable(struct acpi_device *device);

#ifdef CONFIG_PM_SLEEP
static int acpi_button_suspend(struct device *dev);
@@ -111,6 +114,8 @@ struct acpi_button {
bool suspended;
};

+static DEFINE_MUTEX(button_input_lock);
+
static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
static struct acpi_device *lid_device;
static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
@@ -119,6 +124,44 @@ static unsigned long lid_report_interval __read_mostly = 500;
module_param(lid_report_interval, ulong, 0644);
MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key events");

+static bool lid_reliable = true;
+
+static int param_set_lid_reliable(const char *val,
+ const struct kernel_param *kp)
+{
+ bool prev_lid_reliable = lid_reliable;
+ int ret;
+
+ mutex_lock(&button_input_lock);
+
+ ret = param_set_bool(val, kp);
+ if (ret) {
+ mutex_unlock(&button_input_lock);
+ return ret;
+ }
+
+ /*
+ * prevent a loop when we show up the device to userspace because
+ * of an acpi notification, and userspace immediately removes it
+ * by marking it as unreliable when this was already known.
+ */
+ if (lid_device && prev_lid_reliable != lid_reliable) {
+ ret = acpi_lid_update_reliable(lid_device);
+ if (ret)
+ lid_reliable = prev_lid_reliable;
+ }
+
+ mutex_unlock(&button_input_lock);
+ return ret;
+}
+
+static const struct kernel_param_ops lid_reliable_ops = {
+ .get = param_get_bool,
+ .set = param_set_lid_reliable,
+};
+module_param_cb(lid_reliable, &lid_reliable_ops, &lid_reliable, 0644);
+MODULE_PARM_DESC(lid_reliable, "Is the LID switch reliable (true|false)?");
+
/* --------------------------------------------------------------------------
FS Interface (/proc)
-------------------------------------------------------------------------- */
@@ -142,79 +185,22 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)
{
struct acpi_button *button = acpi_driver_data(device);
int ret;
- ktime_t next_report;
- bool do_update;
+
+ /* button_input_lock must be held */
+
+ if (!button->input)
+ return 0;

/*
- * In lid_init_state=ignore mode, if user opens/closes lid
- * frequently with "open" missing, and "last_time" is also updated
- * frequently, "close" cannot be delivered to the userspace.
- * So "last_time" is only updated after a timeout or an actual
- * switch.
+ * If the lid is unreliable, always send an "open" event before any
+ * other. The input layer will filter out the extra open if required,
+ * and it will force the close event to be sent.
*/
- if (lid_init_state != ACPI_BUTTON_LID_INIT_IGNORE ||
- button->last_state != !!state)
- do_update = true;
- else
- do_update = false;
-
- next_report = ktime_add(button->last_time,
- ms_to_ktime(lid_report_interval));
- if (button->last_state == !!state &&
- ktime_after(ktime_get(), next_report)) {
- /* Complain the buggy firmware */
- pr_warn_once("The lid device is not compliant to SW_LID.\n");
+ if (!lid_reliable)
+ input_report_switch(button->input, SW_LID, 0);

- /*
- * Send the unreliable complement switch event:
- *
- * On most platforms, the lid device is reliable. However
- * there are exceptions:
- * 1. Platforms returning initial lid state as "close" by
- * default after booting/resuming:
- * https://bugzilla.kernel.org/show_bug.cgi?id=89211
- * https://bugzilla.kernel.org/show_bug.cgi?id=106151
- * 2. Platforms never reporting "open" events:
- * https://bugzilla.kernel.org/show_bug.cgi?id=106941
- * On these buggy platforms, the usage model of the ACPI
- * lid device actually is:
- * 1. The initial returning value of _LID may not be
- * reliable.
- * 2. The open event may not be reliable.
- * 3. The close event is reliable.
- *
- * But SW_LID is typed as input switch event, the input
- * layer checks if the event is redundant. Hence if the
- * state is not switched, the userspace cannot see this
- * platform triggered reliable event. By inserting a
- * complement switch event, it then is guaranteed that the
- * platform triggered reliable one can always be seen by
- * the userspace.
- */
- if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE) {
- do_update = true;
- /*
- * Do generate complement switch event for "close"
- * as "close" is reliable and wrong "open" won't
- * trigger unexpected behaviors.
- * Do not generate complement switch event for
- * "open" as "open" is not reliable and wrong
- * "close" will trigger unexpected behaviors.
- */
- if (!state) {
- input_report_switch(button->input,
- SW_LID, state);
- input_sync(button->input);
- }
- }
- }
- /* Send the platform triggered reliable event */
- if (do_update) {
- input_report_switch(button->input, SW_LID, !state);
- input_sync(button->input);
- button->last_state = !!state;
- button->last_time = ktime_get();
- }
+ input_report_switch(button->input, SW_LID, !state);
+ input_sync(button->input);

if (state)
pm_wakeup_hard_event(&device->dev);
@@ -371,6 +357,21 @@ static int acpi_lid_update_state(struct acpi_device *device)
return acpi_lid_notify_state(device, state);
}

+static int acpi_lid_notify(struct acpi_device *device)
+{
+ struct acpi_button *button = acpi_driver_data(device);
+ int ret;
+
+ mutex_lock(&button_input_lock);
+ if (!button->input)
+ acpi_button_add_input(device);
+ ret = acpi_lid_update_state(device);
+ mutex_unlock(&button_input_lock);
+
+
+ return ret;
+}
+
static void acpi_lid_initialize_state(struct acpi_device *device)
{
switch (lid_init_state) {
@@ -398,7 +399,7 @@ static void acpi_button_notify(struct acpi_device *device, u32 event)
case ACPI_BUTTON_NOTIFY_STATUS:
input = button->input;
if (button->type == ACPI_BUTTON_TYPE_LID) {
- acpi_lid_update_state(device);
+ acpi_lid_notify(device);
} else {
int keycode;

@@ -433,6 +434,16 @@ static int acpi_button_suspend(struct device *dev)
struct acpi_button *button = acpi_driver_data(device);

button->suspended = true;
+
+ if (button->type == ACPI_BUTTON_TYPE_LID) {
+ /*
+ * If lid is marked unreliable, this will have the effect
+ * of unregistering the LID input node
+ */
+ mutex_lock(&button_input_lock);
+ acpi_lid_update_reliable(device);
+ mutex_unlock(&button_input_lock);
+ }
return 0;
}

@@ -442,8 +453,17 @@ static int acpi_button_resume(struct device *dev)
struct acpi_button *button = acpi_driver_data(device);

button->suspended = false;
- if (button->type == ACPI_BUTTON_TYPE_LID)
+ if (button->type == ACPI_BUTTON_TYPE_LID) {
+ /*
+ * If lid is marked reliable, this will have the effect
+ * of registering a new LID input node if none was there
+ */
+ mutex_lock(&button_input_lock);
+ acpi_lid_update_reliable(device);
acpi_lid_initialize_state(device);
+ mutex_unlock(&button_input_lock);
+ }
+
return 0;
}
#endif
@@ -452,6 +472,7 @@ static void acpi_button_remove_input(struct acpi_device *device)
{
struct acpi_button *button = acpi_driver_data(device);

+ /* button_input_lock must be held */
input_unregister_device(button->input);
button->input = NULL;
}
@@ -462,6 +483,8 @@ static int acpi_button_add_input(struct acpi_device *device)
struct input_dev *input;
int error;

+ /* button_input_lock must be held */
+
button->input = input = input_allocate_device();
if (!input) {
error = -ENOMEM;
@@ -500,6 +523,31 @@ static int acpi_button_add_input(struct acpi_device *device)
return error;
}

+static int acpi_lid_update_reliable(struct acpi_device *device)
+{
+ struct acpi_button *button = acpi_driver_data(lid_device);
+ int error;
+
+ /* button_input_lock must be held */
+
+ if (lid_reliable && !button->input) {
+ error = acpi_button_add_input(device);
+ if (error)
+ return error;
+
+ error = acpi_lid_update_state(device);
+ if (error) {
+ acpi_button_remove_input(device);
+ return error;
+ }
+ }
+
+ if (!lid_reliable && button->input)
+ acpi_button_remove_input(device);
+
+ return 0;
+}
+
static int acpi_button_add(struct acpi_device *device)
{
struct acpi_button *button;
@@ -547,12 +595,7 @@ static int acpi_button_add(struct acpi_device *device)

snprintf(button->phys, sizeof(button->phys), "%s/button/input0", hid);

- error = acpi_button_add_input(device);
- if (error)
- goto err_remove_fs;
-
if (button->type == ACPI_BUTTON_TYPE_LID) {
- acpi_lid_initialize_state(device);
/*
* This assumes there's only one lid device, or if there are
* more we only care about the last one...
@@ -560,6 +603,18 @@ static int acpi_button_add(struct acpi_device *device)
lid_device = device;
}

+ if (lid_reliable || button->type != ACPI_BUTTON_TYPE_LID) {
+ error = acpi_button_add_input(device);
+ if (error)
+ goto err_remove_fs;
+
+ if (button->type == ACPI_BUTTON_TYPE_LID) {
+ mutex_lock(&button_input_lock);
+ acpi_lid_initialize_state(device);
+ mutex_unlock(&button_input_lock);
+ }
+ }
+
device_init_wakeup(&device->dev, true);
printk(KERN_INFO PREFIX "%s [%s]\n", name, acpi_device_bid(device));
return 0;
--
2.9.4

2017-06-01 18:47:20

by Benjamin Tissoires

[permalink] [raw]
Subject: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks

From: Lv Zheng <[email protected]>

acpi/button.c now contains the logic to avoid frequently replayed events
which originally was ensured by using blocking notifier.
On the contrary, using a blocking notifier is wrong as it could keep on
returning NOTIFY_DONE, causing events lost.

This patch thus changes lid notification to raw notifier in order not to
have any events lost.

Signed-off-by: Lv Zheng <[email protected]>
Signed-off-by: Benjamin Tissoires <[email protected]>
---
drivers/acpi/button.c | 68 ++++++++++++++++++++-------------------------------
1 file changed, 27 insertions(+), 41 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 03e5981..1927b08 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -114,7 +114,7 @@ struct acpi_button {

static DEFINE_MUTEX(button_input_lock);

-static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
+static RAW_NOTIFIER_HEAD(acpi_lid_notifier);
static struct acpi_device *lid_device;
static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;

@@ -179,14 +179,12 @@ static int acpi_lid_evaluate_state(struct acpi_device *device)
return lid_state ? 1 : 0;
}

-static int acpi_lid_notify_state(struct acpi_device *device, int state)
+static void acpi_lid_notify_state(struct acpi_device *device, int state)
{
struct acpi_button *button = acpi_driver_data(device);

- /* button_input_lock must be held */
-
if (!button->input)
- return 0;
+ return;

/*
* If the lid is unreliable, always send an "open" event before any
@@ -201,8 +199,6 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)

if (state)
pm_wakeup_hard_event(&device->dev);
-
- return 0;
}

/*
@@ -214,28 +210,14 @@ static void acpi_button_lid_events(struct input_handle *handle,
{
const struct input_value *v;
int state = -1;
- int ret;

for (v = vals; v != vals + count; v++) {
switch (v->type) {
case EV_SYN:
- if (v->code == SYN_REPORT && state >= 0) {
- ret = blocking_notifier_call_chain(&acpi_lid_notifier,
+ if (v->code == SYN_REPORT && state >= 0)
+ (void)raw_notifier_call_chain(&acpi_lid_notifier,
state,
lid_device);
- if (ret == NOTIFY_DONE)
- ret = blocking_notifier_call_chain(&acpi_lid_notifier,
- state,
- lid_device);
- if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
- /*
- * It is also regarded as success if
- * the notifier_chain returns NOTIFY_OK
- * or NOTIFY_DONE.
- */
- ret = 0;
- }
- }
break;
case EV_SW:
if (v->code == SW_LID)
@@ -433,13 +415,25 @@ static int acpi_button_remove_fs(struct acpi_device *device)
-------------------------------------------------------------------------- */
int acpi_lid_notifier_register(struct notifier_block *nb)
{
- return blocking_notifier_chain_register(&acpi_lid_notifier, nb);
+ return raw_notifier_chain_register(&acpi_lid_notifier, nb);
}
EXPORT_SYMBOL(acpi_lid_notifier_register);

+static inline int __acpi_lid_notifier_unregister(struct notifier_block *nb,
+ bool sync)
+{
+ int ret;
+
+ ret = raw_notifier_chain_unregister(&acpi_lid_notifier, nb);
+ if (sync)
+ synchronize_rcu();
+
+ return ret;
+}
+
int acpi_lid_notifier_unregister(struct notifier_block *nb)
{
- return blocking_notifier_chain_unregister(&acpi_lid_notifier, nb);
+ return __acpi_lid_notifier_unregister(nb, false);
}
EXPORT_SYMBOL(acpi_lid_notifier_unregister);

@@ -452,40 +446,36 @@ int acpi_lid_open(void)
}
EXPORT_SYMBOL(acpi_lid_open);

-static int acpi_lid_update_state(struct acpi_device *device)
+static void acpi_lid_update_state(struct acpi_device *device)
{
int state;

state = acpi_lid_evaluate_state(device);
if (state < 0)
- return state;
+ return;

- return acpi_lid_notify_state(device, state);
+ acpi_lid_notify_state(device, state);
}

-static int acpi_lid_notify(struct acpi_device *device)
+static void acpi_lid_notify(struct acpi_device *device)
{
struct acpi_button *button = acpi_driver_data(device);
- int ret;

mutex_lock(&button_input_lock);
if (!button->input)
acpi_button_add_input(device);
- ret = acpi_lid_update_state(device);
+ acpi_lid_update_state(device);
mutex_unlock(&button_input_lock);
-
-
- return ret;
}

static void acpi_lid_initialize_state(struct acpi_device *device)
{
switch (lid_init_state) {
case ACPI_BUTTON_LID_INIT_OPEN:
- (void)acpi_lid_notify_state(device, 1);
+ acpi_lid_notify_state(device, 1);
break;
case ACPI_BUTTON_LID_INIT_METHOD:
- (void)acpi_lid_update_state(device);
+ acpi_lid_update_state(device);
break;
case ACPI_BUTTON_LID_INIT_IGNORE:
default:
@@ -641,11 +631,7 @@ static int acpi_lid_update_reliable(struct acpi_device *device)
if (error)
return error;

- error = acpi_lid_update_state(device);
- if (error) {
- acpi_button_remove_input(device);
- return error;
- }
+ acpi_lid_update_state(device);
}

if (!lid_reliable && button->input)
--
2.9.4

2017-06-01 18:47:16

by Benjamin Tissoires

[permalink] [raw]
Subject: [WIP PATCH 3/4] ACPI: button: Let input filter out the LID events

The input stack already filters out the LID events. So instead of
filtering them out at the source, we can hook up after the input
processing and propagate the lid switch events when the input stack
tells us to.

An other benefit is that if userspace (think libinput) "fixes" the lid
switch state by some heuristics, this new state is forwarded to the
listeners in the kernel.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
drivers/acpi/button.c | 156 ++++++++++++++++++++++++++++++++++++++++++++------
1 file changed, 139 insertions(+), 17 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 9ad7604..03e5981 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -109,8 +109,6 @@ struct acpi_button {
struct input_dev *input;
char phys[32]; /* for input device */
unsigned long pushed;
- int last_state;
- ktime_t last_time;
bool suspended;
};

@@ -184,7 +182,6 @@ static int acpi_lid_evaluate_state(struct acpi_device *device)
static int acpi_lid_notify_state(struct acpi_device *device, int state)
{
struct acpi_button *button = acpi_driver_data(device);
- int ret;

/* button_input_lock must be held */

@@ -205,20 +202,129 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)
if (state)
pm_wakeup_hard_event(&device->dev);

- ret = blocking_notifier_call_chain(&acpi_lid_notifier, state, device);
- if (ret == NOTIFY_DONE)
- ret = blocking_notifier_call_chain(&acpi_lid_notifier, state,
- device);
- if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
- /*
- * It is also regarded as success if the notifier_chain
- * returns NOTIFY_OK or NOTIFY_DONE.
- */
- ret = 0;
+ return 0;
+}
+
+/*
+ * Pass incoming event to all connected clients.
+ */
+static void acpi_button_lid_events(struct input_handle *handle,
+ const struct input_value *vals,
+ unsigned int count)
+{
+ const struct input_value *v;
+ int state = -1;
+ int ret;
+
+ for (v = vals; v != vals + count; v++) {
+ switch (v->type) {
+ case EV_SYN:
+ if (v->code == SYN_REPORT && state >= 0) {
+ ret = blocking_notifier_call_chain(&acpi_lid_notifier,
+ state,
+ lid_device);
+ if (ret == NOTIFY_DONE)
+ ret = blocking_notifier_call_chain(&acpi_lid_notifier,
+ state,
+ lid_device);
+ if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
+ /*
+ * It is also regarded as success if
+ * the notifier_chain returns NOTIFY_OK
+ * or NOTIFY_DONE.
+ */
+ ret = 0;
+ }
+ }
+ break;
+ case EV_SW:
+ if (v->code == SW_LID)
+ state = !v->value;
+ break;
+ }
}
- return ret;
}

+static int acpi_button_lid_connect(struct input_handler *handler,
+ struct input_dev *dev,
+ const struct input_device_id *id)
+{
+ struct input_handle *handle;
+ int error;
+
+ handle = kzalloc(sizeof(struct input_handle), GFP_KERNEL);
+ if (!handle)
+ return -ENOMEM;
+
+ handle->dev = dev;
+ handle->handler = handler;
+ handle->name = "acpi-button-lid";
+
+ error = input_register_handle(handle);
+ if (error) {
+ dev_err(&lid_device->dev, "Error installing input handle\n");
+ goto err_free;
+ }
+
+ error = input_open_device(handle);
+ if (error) {
+ dev_err(&lid_device->dev, "Failed to open input device\n");
+ goto err_unregister;
+ }
+
+ return 0;
+
+ err_unregister:
+ input_unregister_handle(handle);
+ err_free:
+ kfree(handle);
+ return error;
+}
+
+static void acpi_button_lid_disconnect(struct input_handle *handle)
+{
+ input_close_device(handle);
+ input_unregister_handle(handle);
+ kfree(handle);
+}
+
+bool acpi_button_lid_match(struct input_handler *handler,
+ struct input_dev *dev)
+{
+ struct acpi_button *button;
+
+ if (!lid_device)
+ return false;
+
+ button = acpi_driver_data(lid_device);
+
+ if (dev != button->input)
+ return false;
+
+ return true;
+}
+
+static const struct input_device_id acpi_button_lid_ids[] = {
+ {
+ .flags = INPUT_DEVICE_ID_MATCH_EVBIT,
+ .evbit = { BIT_MASK(EV_SW) },
+ .swbit = { BIT_MASK(SW_LID) },
+ },
+ { },
+};
+
+MODULE_DEVICE_TABLE(input, acpi_button_lid_ids);
+
+static struct input_handler acpi_button_lid_handler = {
+ .match = acpi_button_lid_match,
+ .connect = acpi_button_lid_connect,
+ .disconnect = acpi_button_lid_disconnect,
+ .events = acpi_button_lid_events,
+ .name = "acpi-lid-callchain",
+ .id_table = acpi_button_lid_ids,
+};
+
+
static int acpi_button_state_seq_show(struct seq_file *seq, void *offset)
{
struct acpi_device *device = seq->private;
@@ -581,8 +687,6 @@ static int acpi_button_add(struct acpi_device *device)
strcpy(name, ACPI_BUTTON_DEVICE_NAME_LID);
sprintf(class, "%s/%s",
ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
- button->last_state = !!acpi_lid_evaluate_state(device);
- button->last_time = ktime_get();
} else {
printk(KERN_ERR PREFIX "Unsupported hid [%s]\n", hid);
error = -ENODEV;
@@ -674,4 +778,22 @@ module_param_call(lid_init_state,
NULL, 0644);
MODULE_PARM_DESC(lid_init_state, "Behavior for reporting LID initial state");

-module_acpi_driver(acpi_button_driver);
+static int __init acpi_button_init(void)
+{
+ int error;
+
+ error = input_register_handler(&acpi_button_lid_handler);
+ if (error)
+ return error;
+
+ return acpi_bus_register_driver(&acpi_button_driver);
+}
+
+static void __exit acpi_button_exit(void)
+{
+ acpi_bus_unregister_driver(&acpi_button_driver);
+ input_unregister_handler(&acpi_button_lid_handler);
+}
+
+module_init(acpi_button_init);
+module_exit(acpi_button_exit);
--
2.9.4

2017-06-01 18:48:34

by Benjamin Tissoires

[permalink] [raw]
Subject: [WIP PATCH 1/4] ACPI: button: extract input creation/destruction helpers

When the LID switch ACPI implementation is unreliable, we might want
to remove the device when we are not sure about the state. This should
prevent any suspend/resume loops given that in that case, there will be
no more LID switch input node.

This patch prepares the dynamic creation/destruction of the input node.

Signed-off-by: Benjamin Tissoires <[email protected]>
---
drivers/acpi/button.c | 90 ++++++++++++++++++++++++++++++++-------------------
1 file changed, 57 insertions(+), 33 deletions(-)

diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
index 9ad8cdb..48bcdca 100644
--- a/drivers/acpi/button.c
+++ b/drivers/acpi/button.c
@@ -448,10 +448,61 @@ static int acpi_button_resume(struct device *dev)
}
#endif

+static void acpi_button_remove_input(struct acpi_device *device)
+{
+ struct acpi_button *button = acpi_driver_data(device);
+
+ input_unregister_device(button->input);
+ button->input = NULL;
+}
+
+static int acpi_button_add_input(struct acpi_device *device)
+{
+ struct acpi_button *button = acpi_driver_data(device);
+ struct input_dev *input;
+ int error;
+
+ button->input = input = input_allocate_device();
+ if (!input) {
+ error = -ENOMEM;
+ goto err;
+ }
+
+ input->name = acpi_device_name(device);
+ input->phys = button->phys;
+ input->id.bustype = BUS_HOST;
+ input->id.product = button->type;
+ input->dev.parent = &device->dev;
+
+ switch (button->type) {
+ case ACPI_BUTTON_TYPE_POWER:
+ input_set_capability(input, EV_KEY, KEY_POWER);
+ break;
+
+ case ACPI_BUTTON_TYPE_SLEEP:
+ input_set_capability(input, EV_KEY, KEY_SLEEP);
+ break;
+
+ case ACPI_BUTTON_TYPE_LID:
+ input_set_capability(input, EV_SW, SW_LID);
+ break;
+ }
+
+ error = input_register_device(input);
+ if (error)
+ goto err;
+
+ return 0;
+
+ err:
+ input_free_device(input);
+ button->input = NULL;
+ return error;
+}
+
static int acpi_button_add(struct acpi_device *device)
{
struct acpi_button *button;
- struct input_dev *input;
const char *hid = acpi_device_hid(device);
char *name, *class;
int error;
@@ -462,12 +513,6 @@ static int acpi_button_add(struct acpi_device *device)

device->driver_data = button;

- button->input = input = input_allocate_device();
- if (!input) {
- error = -ENOMEM;
- goto err_free_button;
- }
-
name = acpi_device_name(device);
class = acpi_device_class(device);

@@ -493,38 +538,19 @@ static int acpi_button_add(struct acpi_device *device)
} else {
printk(KERN_ERR PREFIX "Unsupported hid [%s]\n", hid);
error = -ENODEV;
- goto err_free_input;
+ goto err_free_button;
}

error = acpi_button_add_fs(device);
if (error)
- goto err_free_input;
+ goto err_free_button;

snprintf(button->phys, sizeof(button->phys), "%s/button/input0", hid);

- input->name = name;
- input->phys = button->phys;
- input->id.bustype = BUS_HOST;
- input->id.product = button->type;
- input->dev.parent = &device->dev;
-
- switch (button->type) {
- case ACPI_BUTTON_TYPE_POWER:
- input_set_capability(input, EV_KEY, KEY_POWER);
- break;
-
- case ACPI_BUTTON_TYPE_SLEEP:
- input_set_capability(input, EV_KEY, KEY_SLEEP);
- break;
-
- case ACPI_BUTTON_TYPE_LID:
- input_set_capability(input, EV_SW, SW_LID);
- break;
- }
-
- error = input_register_device(input);
+ error = acpi_button_add_input(device);
if (error)
goto err_remove_fs;
+
if (button->type == ACPI_BUTTON_TYPE_LID) {
acpi_lid_initialize_state(device);
/*
@@ -540,8 +566,6 @@ static int acpi_button_add(struct acpi_device *device)

err_remove_fs:
acpi_button_remove_fs(device);
- err_free_input:
- input_free_device(input);
err_free_button:
kfree(button);
return error;
@@ -552,7 +576,7 @@ static int acpi_button_remove(struct acpi_device *device)
struct acpi_button *button = acpi_driver_data(device);

acpi_button_remove_fs(device);
- input_unregister_device(button->input);
+ acpi_button_remove_input(device);
kfree(button);
return 0;
}
--
2.9.4

2017-06-01 21:44:03

by Bastien Nocera

[permalink] [raw]
Subject: Re: [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

On Thu, 2017-06-01 at 20:46 +0200, Benjamin Tissoires wrote:
> Hi,
>
> Sending this as a WIP as it still need a few changes, but it mostly
> works as
> expected (still not fully compliant yet).
>
> So this is based on Lennart's comment in [1]: if the LID state is not
> reliable,
> the kernel should not export the LID switch device as long as we are
> not sure
> about its state.
>
> That is the basic idea, and here are some more general comments:
> Lv described the 5 cases in "RFC PATCH v3" regarding the LID switch.
> Let me rewrite them here (they are in patch 2):
>
> 1. Some platforms send "open" ACPI notification to the OS and the
> event
> arrive before the button driver is resumed;
> 2. Some platforms send "open" ACPI notification to the OS, but the
> event
> arrives after the button driver is resumed, ex., Samsung N210+;
> 3. Some platforms never send an "open" ACPI notification to the OS,
> but
> update the cached _LID return value to "open", and this update
> arrives
> before the button driver is resumed;
> 4. Some platforms never send an "open" ACPI notification to the OS,
> but
> update the cached _LID return value to "open", but this update
> arrives
> after the button driver is resumed, ex., Surface Pro 3;
> 5. Some platforms never send an "open" ACPI notification to the OS,
> and
> _LID ACPI method returns a value which stays to "close", ex.,
> Surface Pro 1.

In which case does the Surface 3 lie? I believe we still needed your
"gpiolib-acpi: make sure we trigger the events at least once" patch to
make that one work.

Cheers

2017-06-02 07:24:36

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

On Jun 01 2017 or thereabouts, Bastien Nocera wrote:
> On Thu, 2017-06-01 at 20:46 +0200, Benjamin Tissoires wrote:
> > Hi,
> >
> > Sending this as a WIP as it still need a few changes, but it mostly
> > works as
> > expected (still not fully compliant yet).
> >
> > So this is based on Lennart's comment in [1]: if the LID state is not
> > reliable,
> > the kernel should not export the LID switch device as long as we are
> > not sure
> > about its state.
> >
> > That is the basic idea, and here are some more general comments:
> > Lv described the 5 cases in "RFC PATCH v3" regarding the LID switch.
> > Let me rewrite them here (they are in patch 2):
> >
> > 1. Some platforms send "open" ACPI notification to the OS and the
> > event
> > arrive before the button driver is resumed;
> > 2. Some platforms send "open" ACPI notification to the OS, but the
> > event
> > arrives after the button driver is resumed, ex., Samsung N210+;
> > 3. Some platforms never send an "open" ACPI notification to the OS,
> > but
> > update the cached _LID return value to "open", and this update
> > arrives
> > before the button driver is resumed;
> > 4. Some platforms never send an "open" ACPI notification to the OS,
> > but
> > update the cached _LID return value to "open", but this update
> > arrives
> > after the button driver is resumed, ex., Surface Pro 3;
> > 5. Some platforms never send an "open" ACPI notification to the OS,
> > and
> > _LID ACPI method returns a value which stays to "close", ex.,
> > Surface Pro 1.
>
> In which case does the Surface 3 lie? I believe we still needed your
> "gpiolib-acpi: make sure we trigger the events at least once" patch to
> make that one work.

Well, the surface 3 is using a different driver for the LID switch
(surface3-wmi). From what I can remember, we don't need this patch when
using this driver (which is why I did not submitted it further). But I
might be wrong...

Cheers,
Benjamin

2017-06-05 02:26:14

by Zheng, Lv

[permalink] [raw]
Subject: RE: [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi, Benjamin

> From: Benjamin Tissoires [mailto:[email protected]]
> Subject: [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
>
> Hi,
>
> Sending this as a WIP as it still need a few changes, but it mostly works as
> expected (still not fully compliant yet).
>
> So this is based on Lennart's comment in [1]: if the LID state is not reliable,
> the kernel should not export the LID switch device as long as we are not sure
> about its state.
>
> That is the basic idea, and here are some more general comments:
> Lv described the 5 cases in "RFC PATCH v3" regarding the LID switch.
> Let me rewrite them here (they are in patch 2):
>
> 1. Some platforms send "open" ACPI notification to the OS and the event
> arrive before the button driver is resumed;
> 2. Some platforms send "open" ACPI notification to the OS, but the event
> arrives after the button driver is resumed, ex., Samsung N210+;
> 3. Some platforms never send an "open" ACPI notification to the OS, but
> update the cached _LID return value to "open", and this update arrives
> before the button driver is resumed;
> 4. Some platforms never send an "open" ACPI notification to the OS, but
> update the cached _LID return value to "open", but this update arrives
> after the button driver is resumed, ex., Surface Pro 3;
> 5. Some platforms never send an "open" ACPI notification to the OS, and
> _LID ACPI method returns a value which stays to "close", ex.,
> Surface Pro 1.
>
> We we consider that we can mark the LID switch as unreliable and make it
> disappear when we are not certain of the state, we can consider cases 1, 2, 3
> are solved:

I have concerns with case 2.

> cases 1 and 3 are solved when the LID state is reliable (majority
> of existing laptops),

Agreed.

> and case 2 is solved just by marking when the LID is not
> reliable. When we go to sleep, we unregister the input node. We wait for
> the next ACPI notification to re-export the LID switch input node with the
> correct state.

According to the test, both case 2,4,5 have already been solved in systemd.
So we needn't do anything in kernel.

If you still want to improve in acpi button.
IMO, for case 2, 4, we really have chance to improve.

For example, we could just add a timer right after resume.
And before it is timed out, if we can see the BIOS notify, we delete the timer.
And after it is timed out, we report "lid init value" to input layer.

> Given that the "close" event is reliable, on platforms where the LID switch is
> not reliable for "open", we will get the "close" event when we will start
> exporting the switch at the input level.
>
> Note that systemd currently doesn't sync the state when the input node just
> appears. This is a systemd bug, and it should not be handled by the kernel
> community.

According to the test, systemd should be ok now.
Why do we need to change it again?

>
> For case 4, we are not aware at the acpi/button.c level when the state is valid.
> We can solve this by polling every seconds for let's say 1 min, and if we detect
> a change, then we can re-export the input node (this hasn't been implemented
> yet). After this delay, we can consider the state as valid and export the input
> node with the current reported state in the ACPI.

Looks similar as the timer solution mentioned above.

>
> However, this will conflict with case 5 where the ACPI value reported by
> the _LID method can be wrong anytime. We will need to treat this separately
> or find some other magic to make cases 4 and 5 compatible.

Case 5 is not compliant to SW_LID anyway.
However it works well with latest systemd.
Maybe we should just let it be and wait for further user request.

>
> libinput will help cases 4 and 5 to restore the proper state, but that's
> assuming we have exported a wrong state. It might happen in case 5, but
> shouldn't in case 4.

IMO, if we improved case 2,4, libinput should only help to handle case 5.
Which is entirely not SW_LID compliant.

Thanks
Lv

>
> Anyway, that is just a WIP which IMO is less hacky than the few other series.
> I still need to work on the udev/hwdb rules to have the list of problematic
> platforms in hwdb to not have them in the kernel, but that shouldn't be much
> of an issue. I also need to work on the polling but I'd like to get some inputs
> from Lv, Peter and others before spending too much time on it.
>
> Note: yes, there is a lot of boilerplate for the input handler and for the
> reliable state, but I think this simplifies the logic as we are all reliying
> on the input stack to filter duplicate events.
> One other benefit of this boilerplate is that when libinput changes the LID
> state, i915 and nouveau will get notified.
>
> Cheers,
> Benjamin
>
>
> [1] https://github.com/systemd/systemd/issues/2807
>
> Benjamin Tissoires (3):
> ACPI: button: extract input creation/destruction helpers
> ACPI: button: remove the LID input node when the state is unknown
> ACPI: button: Let input filter out the LID events
>
> Lv Zheng (1):
> ACPI: button: Fix lid notification locks
>
> drivers/acpi/button.c | 453 +++++++++++++++++++++++++++++++++++---------------
> 1 file changed, 320 insertions(+), 133 deletions(-)
>
> --
> 2.9.4

2017-06-05 03:19:49

by Zheng, Lv

[permalink] [raw]
Subject: RE: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown

Hi, Benjamin

> From: Benjamin Tissoires [mailto:[email protected]]
> Subject: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown
>
> Because of the variation of firmware implementation, there is a chance
> the LID state is unknown:
> 1. Some platforms send "open" ACPI notification to the OS and the event
> arrive before the button driver is resumed;
> 2. Some platforms send "open" ACPI notification to the OS, but the event
> arrives after the button driver is resumed, ex., Samsung N210+;
> 3. Some platforms never send an "open" ACPI notification to the OS, but
> update the cached _LID return value to "open", and this update arrives
> before the button driver is resumed;
> 4. Some platforms never send an "open" ACPI notification to the OS, but
> update the cached _LID return value to "open", but this update arrives
> after the button driver is resumed, ex., Surface Pro 3;
> 5. Some platforms never send an "open" ACPI notification to the OS, and
> _LID ACPI method returns a value which stays to "close", ex.,
> Surface Pro 1.
>
> We can mark the unreliable platform (cases 2, 4, 5 above) as such and make
> sure we do not export an input node with an unknown state to prevent
> suspend loops.
>
> The database of unreliable devices is left to userspace to handle with
> a hwdb file and a udev rule.
>
> Note that this patch removes the filtering of duplicate events when
> calling blocking_notifier_call_chain(), but this will be addressed in
> a following patch.
>
> Signed-off-by: Benjamin Tissoires <[email protected]>
> ---
> drivers/acpi/button.c | 207 ++++++++++++++++++++++++++++++++------------------
> 1 file changed, 131 insertions(+), 76 deletions(-)
>
> diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> index 48bcdca..9ad7604 100644
> --- a/drivers/acpi/button.c
> +++ b/drivers/acpi/button.c
> @@ -25,6 +25,7 @@
> #include <linux/module.h>
> #include <linux/init.h>
> #include <linux/types.h>
> +#include <linux/moduleparam.h>
> #include <linux/proc_fs.h>
> #include <linux/seq_file.h>
> #include <linux/input.h>
> @@ -79,6 +80,8 @@ MODULE_DEVICE_TABLE(acpi, button_device_ids);
> static int acpi_button_add(struct acpi_device *device);
> static int acpi_button_remove(struct acpi_device *device);
> static void acpi_button_notify(struct acpi_device *device, u32 event);
> +static int acpi_button_add_input(struct acpi_device *device);
> +static int acpi_lid_update_reliable(struct acpi_device *device);
>
> #ifdef CONFIG_PM_SLEEP
> static int acpi_button_suspend(struct device *dev);
> @@ -111,6 +114,8 @@ struct acpi_button {
> bool suspended;
> };
>
> +static DEFINE_MUTEX(button_input_lock);
> +
> static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
> static struct acpi_device *lid_device;
> static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
> @@ -119,6 +124,44 @@ static unsigned long lid_report_interval __read_mostly = 500;
> module_param(lid_report_interval, ulong, 0644);
> MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key events");
>
> +static bool lid_reliable = true;
> +
> +static int param_set_lid_reliable(const char *val,
> + const struct kernel_param *kp)
> +{
> + bool prev_lid_reliable = lid_reliable;
> + int ret;
> +
> + mutex_lock(&button_input_lock);
> +
> + ret = param_set_bool(val, kp);
> + if (ret) {
> + mutex_unlock(&button_input_lock);
> + return ret;
> + }
> +
> + /*
> + * prevent a loop when we show up the device to userspace because
> + * of an acpi notification, and userspace immediately removes it
> + * by marking it as unreliable when this was already known.
> + */
> + if (lid_device && prev_lid_reliable != lid_reliable) {
> + ret = acpi_lid_update_reliable(lid_device);
> + if (ret)
> + lid_reliable = prev_lid_reliable;
> + }
> +
> + mutex_unlock(&button_input_lock);
> + return ret;
> +}
> +
> +static const struct kernel_param_ops lid_reliable_ops = {
> + .get = param_get_bool,
> + .set = param_set_lid_reliable,
> +};
> +module_param_cb(lid_reliable, &lid_reliable_ops, &lid_reliable, 0644);
> +MODULE_PARM_DESC(lid_reliable, "Is the LID switch reliable (true|false)?");
> +
> /* --------------------------------------------------------------------------
> FS Interface (/proc)
> -------------------------------------------------------------------------- */
> @@ -142,79 +185,22 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)
> {
> struct acpi_button *button = acpi_driver_data(device);
> int ret;
> - ktime_t next_report;
> - bool do_update;
> +
> + /* button_input_lock must be held */
> +
> + if (!button->input)
> + return 0;
>
> /*
> - * In lid_init_state=ignore mode, if user opens/closes lid
> - * frequently with "open" missing, and "last_time" is also updated
> - * frequently, "close" cannot be delivered to the userspace.
> - * So "last_time" is only updated after a timeout or an actual
> - * switch.
> + * If the lid is unreliable, always send an "open" event before any
> + * other. The input layer will filter out the extra open if required,
> + * and it will force the close event to be sent.
> */
> - if (lid_init_state != ACPI_BUTTON_LID_INIT_IGNORE ||
> - button->last_state != !!state)
> - do_update = true;
> - else
> - do_update = false;
> -
> - next_report = ktime_add(button->last_time,
> - ms_to_ktime(lid_report_interval));
> - if (button->last_state == !!state &&
> - ktime_after(ktime_get(), next_report)) {
> - /* Complain the buggy firmware */
> - pr_warn_once("The lid device is not compliant to SW_LID.\n");
> + if (!lid_reliable)
> + input_report_switch(button->input, SW_LID, 0);
>
> - /*
> - * Send the unreliable complement switch event:
> - *
> - * On most platforms, the lid device is reliable. However
> - * there are exceptions:
> - * 1. Platforms returning initial lid state as "close" by
> - * default after booting/resuming:
> - * https://bugzilla.kernel.org/show_bug.cgi?id=89211
> - * https://bugzilla.kernel.org/show_bug.cgi?id=106151
> - * 2. Platforms never reporting "open" events:
> - * https://bugzilla.kernel.org/show_bug.cgi?id=106941
> - * On these buggy platforms, the usage model of the ACPI
> - * lid device actually is:
> - * 1. The initial returning value of _LID may not be
> - * reliable.
> - * 2. The open event may not be reliable.
> - * 3. The close event is reliable.
> - *
> - * But SW_LID is typed as input switch event, the input
> - * layer checks if the event is redundant. Hence if the
> - * state is not switched, the userspace cannot see this
> - * platform triggered reliable event. By inserting a
> - * complement switch event, it then is guaranteed that the
> - * platform triggered reliable one can always be seen by
> - * the userspace.
> - */
> - if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE) {
> - do_update = true;
> - /*
> - * Do generate complement switch event for "close"
> - * as "close" is reliable and wrong "open" won't
> - * trigger unexpected behaviors.
> - * Do not generate complement switch event for
> - * "open" as "open" is not reliable and wrong
> - * "close" will trigger unexpected behaviors.
> - */
> - if (!state) {
> - input_report_switch(button->input,
> - SW_LID, state);
> - input_sync(button->input);
> - }
> - }
> - }

My dell latitude 6430u test platform sends multiple Notify(lid) before suspend and after resume.
This is because the aml table puts many Notify(LID, 0x80) in various control methods.
And not one of them but multiple of them will be invoked by various OS drivers during suspend/resume period.
I think this is not an isolated platform that will invoke multiple redundant "Notify(lid)".

Fortunately, the lid state for the multiple notify(lid) should be same as the first "Notify(lid)".
I suppose this is why SW_LID is invented, as it can really filter such redundant events.
And user space finally can only see 1 "close" event.

But unconditionally prepending "open" before all "close" events surely can break the logic by
delivering multiple "close" events to the user space.

Another issue is, for case 5, when we use button.lid_init_state=method.
Unconditionally prepending "open" before driver initiated "close" event
sent due to acpi_lid_initialize_state(), we will see suspend/resume cycles.

Thus if we consider both cases, we should:
1. put a frequency check to filter possible redundant events.
2. distinguish AML "Notify" call and button driver initiated lid notification.
And we may reach a same result like the following:
https://patchwork.kernel.org/patch/9756467/


> - /* Send the platform triggered reliable event */
> - if (do_update) {
> - input_report_switch(button->input, SW_LID, !state);
> - input_sync(button->input);
> - button->last_state = !!state;
> - button->last_time = ktime_get();
> - }
> + input_report_switch(button->input, SW_LID, !state);
> + input_sync(button->input);
>
> if (state)
> pm_wakeup_hard_event(&device->dev);
> @@ -371,6 +357,21 @@ static int acpi_lid_update_state(struct acpi_device *device)
> return acpi_lid_notify_state(device, state);
> }
>
> +static int acpi_lid_notify(struct acpi_device *device)
> +{
> + struct acpi_button *button = acpi_driver_data(device);
> + int ret;
> +
> + mutex_lock(&button_input_lock);
> + if (!button->input)
> + acpi_button_add_input(device);
> + ret = acpi_lid_update_state(device);
> + mutex_unlock(&button_input_lock);
> +
> +
> + return ret;
> +}
> +
> static void acpi_lid_initialize_state(struct acpi_device *device)
> {
> switch (lid_init_state) {
> @@ -398,7 +399,7 @@ static void acpi_button_notify(struct acpi_device *device, u32 event)
> case ACPI_BUTTON_NOTIFY_STATUS:
> input = button->input;
> if (button->type == ACPI_BUTTON_TYPE_LID) {
> - acpi_lid_update_state(device);
> + acpi_lid_notify(device);
> } else {
> int keycode;
>
> @@ -433,6 +434,16 @@ static int acpi_button_suspend(struct device *dev)
> struct acpi_button *button = acpi_driver_data(device);
>
> button->suspended = true;
> +
> + if (button->type == ACPI_BUTTON_TYPE_LID) {
> + /*
> + * If lid is marked unreliable, this will have the effect
> + * of unregistering the LID input node
> + */
> + mutex_lock(&button_input_lock);
> + acpi_lid_update_reliable(device);
> + mutex_unlock(&button_input_lock);
> + }
> return 0;
> }
>
> @@ -442,8 +453,17 @@ static int acpi_button_resume(struct device *dev)
> struct acpi_button *button = acpi_driver_data(device);
>
> button->suspended = false;
> - if (button->type == ACPI_BUTTON_TYPE_LID)
> + if (button->type == ACPI_BUTTON_TYPE_LID) {
> + /*
> + * If lid is marked reliable, this will have the effect
> + * of registering a new LID input node if none was there
> + */
> + mutex_lock(&button_input_lock);
> + acpi_lid_update_reliable(device);
> acpi_lid_initialize_state(device);
> + mutex_unlock(&button_input_lock);
> + }
> +
> return 0;
> }
> #endif
> @@ -452,6 +472,7 @@ static void acpi_button_remove_input(struct acpi_device *device)
> {
> struct acpi_button *button = acpi_driver_data(device);
>
> + /* button_input_lock must be held */
> input_unregister_device(button->input);
> button->input = NULL;
> }
> @@ -462,6 +483,8 @@ static int acpi_button_add_input(struct acpi_device *device)
> struct input_dev *input;
> int error;
>
> + /* button_input_lock must be held */
> +
> button->input = input = input_allocate_device();
> if (!input) {
> error = -ENOMEM;
> @@ -500,6 +523,31 @@ static int acpi_button_add_input(struct acpi_device *device)
> return error;
> }
>
> +static int acpi_lid_update_reliable(struct acpi_device *device)
> +{
> + struct acpi_button *button = acpi_driver_data(lid_device);
> + int error;
> +
> + /* button_input_lock must be held */
> +
> + if (lid_reliable && !button->input) {
> + error = acpi_button_add_input(device);
> + if (error)
> + return error;
> +
> + error = acpi_lid_update_state(device);
> + if (error) {
> + acpi_button_remove_input(device);
> + return error;
> + }
> + }
> +
> + if (!lid_reliable && button->input)
> + acpi_button_remove_input(device);
> +
> + return 0;
> +}
> +
> static int acpi_button_add(struct acpi_device *device)
> {
> struct acpi_button *button;
> @@ -547,12 +595,7 @@ static int acpi_button_add(struct acpi_device *device)
>
> snprintf(button->phys, sizeof(button->phys), "%s/button/input0", hid);
>
> - error = acpi_button_add_input(device);
> - if (error)
> - goto err_remove_fs;
> -
> if (button->type == ACPI_BUTTON_TYPE_LID) {
> - acpi_lid_initialize_state(device);
> /*
> * This assumes there's only one lid device, or if there are
> * more we only care about the last one...
> @@ -560,6 +603,18 @@ static int acpi_button_add(struct acpi_device *device)
> lid_device = device;
> }
>
> + if (lid_reliable || button->type != ACPI_BUTTON_TYPE_LID) {
> + error = acpi_button_add_input(device);
> + if (error)
> + goto err_remove_fs;
> +
> + if (button->type == ACPI_BUTTON_TYPE_LID) {
> + mutex_lock(&button_input_lock);
> + acpi_lid_initialize_state(device);
> + mutex_unlock(&button_input_lock);
> + }
> + }
> +
> device_init_wakeup(&device->dev, true);
> printk(KERN_INFO PREFIX "%s [%s]\n", name, acpi_device_bid(device));
> return 0;

This is another major differences between your proposal and mine.

First of all, I think it should be in a separate patch.

Second, I have concerns related to such a change:
I can see that, you are trying to address a problem that:
The input layer requires a determined initial SW_LID state while ACPI button driver cannot offer.
So by adding/removing input node, you can introduce a tristate SW_LID input node.
However I doubt if this is necessary and can solve real issues, as:
systemd now works fine with button driver for all cases,
only desktop managers should be changed to be compliant to case 2/4/5
(or if we improve by adding a timer, they should only be changed to be compliant to case 5).
And doing this won't help desktop managers to be able to work fine with case 5.
So this finally may only remove ACPI SW_LID support from Surface Pro 1.
While with latest systemd, closing lid on that platform can correctly triggers suspend.
And no suspend/resume cycles should be seen.
So why do we need to remove this feature (ACPI SW_LID) from Surface Pro 1?
If you really want to propose an ABI change for user space.
Why don't you do this in input layer by defining SW_LID as tristate?

Thanks and best regards
Lv

2017-06-05 03:33:32

by Zheng, Lv

[permalink] [raw]
Subject: RE: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks

Hi,

> From: Benjamin Tissoires [mailto:[email protected]]
> Subject: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks
>
> From: Lv Zheng <[email protected]>
>
> acpi/button.c now contains the logic to avoid frequently replayed events
> which originally was ensured by using blocking notifier.
> On the contrary, using a blocking notifier is wrong as it could keep on
> returning NOTIFY_DONE, causing events lost.
>
> This patch thus changes lid notification to raw notifier in order not to
> have any events lost.

This patch is on top of the following:
https://patchwork.kernel.org/patch/9756467/
where button driver implements a frequency check and
thus is capable of filtering redundant events itself:
I saw you have deleted it from PATCH 02.
So this patch is not applicable now.

Is input layer capable of filtering redundant events.
I saw you unconditionally prepend "open" before "close",
which may make input layer incapable of filtering redundant close events.

If input layer is capable of filtering redundant events,
why don't you:
1. drop this commit;
2. remove all ACPI lid notifier APIs;
3. change lid notifier callers to register notification via input layer?

Thanks and best regards
Lv

>
> Signed-off-by: Lv Zheng <[email protected]>
> Signed-off-by: Benjamin Tissoires <[email protected]>
> ---
> drivers/acpi/button.c | 68 ++++++++++++++++++++-------------------------------
> 1 file changed, 27 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> index 03e5981..1927b08 100644
> --- a/drivers/acpi/button.c
> +++ b/drivers/acpi/button.c
> @@ -114,7 +114,7 @@ struct acpi_button {
>
> static DEFINE_MUTEX(button_input_lock);
>
> -static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
> +static RAW_NOTIFIER_HEAD(acpi_lid_notifier);
> static struct acpi_device *lid_device;
> static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
>
> @@ -179,14 +179,12 @@ static int acpi_lid_evaluate_state(struct acpi_device *device)
> return lid_state ? 1 : 0;
> }
>
> -static int acpi_lid_notify_state(struct acpi_device *device, int state)
> +static void acpi_lid_notify_state(struct acpi_device *device, int state)
> {
> struct acpi_button *button = acpi_driver_data(device);
>
> - /* button_input_lock must be held */
> -
> if (!button->input)
> - return 0;
> + return;
>
> /*
> * If the lid is unreliable, always send an "open" event before any
> @@ -201,8 +199,6 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)
>
> if (state)
> pm_wakeup_hard_event(&device->dev);
> -
> - return 0;
> }
>
> /*
> @@ -214,28 +210,14 @@ static void acpi_button_lid_events(struct input_handle *handle,
> {
> const struct input_value *v;
> int state = -1;
> - int ret;
>
> for (v = vals; v != vals + count; v++) {
> switch (v->type) {
> case EV_SYN:
> - if (v->code == SYN_REPORT && state >= 0) {
> - ret = blocking_notifier_call_chain(&acpi_lid_notifier,
> + if (v->code == SYN_REPORT && state >= 0)
> + (void)raw_notifier_call_chain(&acpi_lid_notifier,
> state,
> lid_device);
> - if (ret == NOTIFY_DONE)
> - ret = blocking_notifier_call_chain(&acpi_lid_notifier,
> - state,
> - lid_device);
> - if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> - /*
> - * It is also regarded as success if
> - * the notifier_chain returns NOTIFY_OK
> - * or NOTIFY_DONE.
> - */
> - ret = 0;
> - }
> - }
> break;
> case EV_SW:
> if (v->code == SW_LID)
> @@ -433,13 +415,25 @@ static int acpi_button_remove_fs(struct acpi_device *device)
> -------------------------------------------------------------------------- */
> int acpi_lid_notifier_register(struct notifier_block *nb)
> {
> - return blocking_notifier_chain_register(&acpi_lid_notifier, nb);
> + return raw_notifier_chain_register(&acpi_lid_notifier, nb);
> }
> EXPORT_SYMBOL(acpi_lid_notifier_register);
>
> +static inline int __acpi_lid_notifier_unregister(struct notifier_block *nb,
> + bool sync)
> +{
> + int ret;
> +
> + ret = raw_notifier_chain_unregister(&acpi_lid_notifier, nb);
> + if (sync)
> + synchronize_rcu();
> +
> + return ret;
> +}
> +
> int acpi_lid_notifier_unregister(struct notifier_block *nb)
> {
> - return blocking_notifier_chain_unregister(&acpi_lid_notifier, nb);
> + return __acpi_lid_notifier_unregister(nb, false);
> }
> EXPORT_SYMBOL(acpi_lid_notifier_unregister);
>
> @@ -452,40 +446,36 @@ int acpi_lid_open(void)
> }
> EXPORT_SYMBOL(acpi_lid_open);
>
> -static int acpi_lid_update_state(struct acpi_device *device)
> +static void acpi_lid_update_state(struct acpi_device *device)
> {
> int state;
>
> state = acpi_lid_evaluate_state(device);
> if (state < 0)
> - return state;
> + return;
>
> - return acpi_lid_notify_state(device, state);
> + acpi_lid_notify_state(device, state);
> }
>
> -static int acpi_lid_notify(struct acpi_device *device)
> +static void acpi_lid_notify(struct acpi_device *device)
> {
> struct acpi_button *button = acpi_driver_data(device);
> - int ret;
>
> mutex_lock(&button_input_lock);
> if (!button->input)
> acpi_button_add_input(device);
> - ret = acpi_lid_update_state(device);
> + acpi_lid_update_state(device);
> mutex_unlock(&button_input_lock);
> -
> -
> - return ret;
> }
>
> static void acpi_lid_initialize_state(struct acpi_device *device)
> {
> switch (lid_init_state) {
> case ACPI_BUTTON_LID_INIT_OPEN:
> - (void)acpi_lid_notify_state(device, 1);
> + acpi_lid_notify_state(device, 1);
> break;
> case ACPI_BUTTON_LID_INIT_METHOD:
> - (void)acpi_lid_update_state(device);
> + acpi_lid_update_state(device);
> break;
> case ACPI_BUTTON_LID_INIT_IGNORE:
> default:
> @@ -641,11 +631,7 @@ static int acpi_lid_update_reliable(struct acpi_device *device)
> if (error)
> return error;
>
> - error = acpi_lid_update_state(device);
> - if (error) {
> - acpi_button_remove_input(device);
> - return error;
> - }
> + acpi_lid_update_state(device);
> }
>
> if (!lid_reliable && button->input)
> --
> 2.9.4

2017-06-05 04:28:19

by Zheng, Lv

[permalink] [raw]
Subject: RE: [WIP PATCH 3/4] ACPI: button: Let input filter out the LID events

Hi, Benjamin

> From: Benjamin Tissoires [mailto:[email protected]]
> Subject: [WIP PATCH 3/4] ACPI: button: Let input filter out the LID events
>
> The input stack already filters out the LID events. So instead of
> filtering them out at the source, we can hook up after the input
> processing and propagate the lid switch events when the input stack
> tells us to.
>
> An other benefit is that if userspace (think libinput) "fixes" the lid
> switch state by some heuristics, this new state is forwarded to the
> listeners in the kernel.

See my comments to PATCH 4.
IMO, it sounds better that
1. ACPI lid works as a driver of SW_LID, and
2. i915 registers notification (the only user) via input layer.
So it looks i915 rather than button driver should call input_register_handler().
And input layer may help to provide a simplified interface for drivers to register key notifications.

Thanks and best regards
Lv

>
> Signed-off-by: Benjamin Tissoires <[email protected]>
> ---
> drivers/acpi/button.c | 156 ++++++++++++++++++++++++++++++++++++++++++++------
> 1 file changed, 139 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> index 9ad7604..03e5981 100644
> --- a/drivers/acpi/button.c
> +++ b/drivers/acpi/button.c
> @@ -109,8 +109,6 @@ struct acpi_button {
> struct input_dev *input;
> char phys[32]; /* for input device */
> unsigned long pushed;
> - int last_state;
> - ktime_t last_time;
> bool suspended;
> };
>
> @@ -184,7 +182,6 @@ static int acpi_lid_evaluate_state(struct acpi_device *device)
> static int acpi_lid_notify_state(struct acpi_device *device, int state)
> {
> struct acpi_button *button = acpi_driver_data(device);
> - int ret;
>
> /* button_input_lock must be held */
>
> @@ -205,20 +202,129 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)
> if (state)
> pm_wakeup_hard_event(&device->dev);
>
> - ret = blocking_notifier_call_chain(&acpi_lid_notifier, state, device);
> - if (ret == NOTIFY_DONE)
> - ret = blocking_notifier_call_chain(&acpi_lid_notifier, state,
> - device);
> - if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> - /*
> - * It is also regarded as success if the notifier_chain
> - * returns NOTIFY_OK or NOTIFY_DONE.
> - */
> - ret = 0;
> + return 0;
> +}
> +
> +/*
> + * Pass incoming event to all connected clients.
> + */
> +static void acpi_button_lid_events(struct input_handle *handle,
> + const struct input_value *vals,
> + unsigned int count)
> +{
> + const struct input_value *v;
> + int state = -1;
> + int ret;
> +
> + for (v = vals; v != vals + count; v++) {
> + switch (v->type) {
> + case EV_SYN:
> + if (v->code == SYN_REPORT && state >= 0) {
> + ret = blocking_notifier_call_chain(&acpi_lid_notifier,
> + state,
> + lid_device);
> + if (ret == NOTIFY_DONE)
> + ret = blocking_notifier_call_chain(&acpi_lid_notifier,
> + state,
> + lid_device);
> + if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> + /*
> + * It is also regarded as success if
> + * the notifier_chain returns NOTIFY_OK
> + * or NOTIFY_DONE.
> + */
> + ret = 0;
> + }
> + }
> + break;
> + case EV_SW:
> + if (v->code == SW_LID)
> + state = !v->value;
> + break;
> + }
> }
> - return ret;
> }
>
> +static int acpi_button_lid_connect(struct input_handler *handler,
> + struct input_dev *dev,
> + const struct input_device_id *id)
> +{
> + struct input_handle *handle;
> + int error;
> +
> + handle = kzalloc(sizeof(struct input_handle), GFP_KERNEL);
> + if (!handle)
> + return -ENOMEM;
> +
> + handle->dev = dev;
> + handle->handler = handler;
> + handle->name = "acpi-button-lid";
> +
> + error = input_register_handle(handle);
> + if (error) {
> + dev_err(&lid_device->dev, "Error installing input handle\n");
> + goto err_free;
> + }
> +
> + error = input_open_device(handle);
> + if (error) {
> + dev_err(&lid_device->dev, "Failed to open input device\n");
> + goto err_unregister;
> + }
> +
> + return 0;
> +
> + err_unregister:
> + input_unregister_handle(handle);
> + err_free:
> + kfree(handle);
> + return error;
> +}
> +
> +static void acpi_button_lid_disconnect(struct input_handle *handle)
> +{
> + input_close_device(handle);
> + input_unregister_handle(handle);
> + kfree(handle);
> +}
> +
> +bool acpi_button_lid_match(struct input_handler *handler,
> + struct input_dev *dev)
> +{
> + struct acpi_button *button;
> +
> + if (!lid_device)
> + return false;
> +
> + button = acpi_driver_data(lid_device);
> +
> + if (dev != button->input)
> + return false;
> +
> + return true;
> +}
> +
> +static const struct input_device_id acpi_button_lid_ids[] = {
> + {
> + .flags = INPUT_DEVICE_ID_MATCH_EVBIT,
> + .evbit = { BIT_MASK(EV_SW) },
> + .swbit = { BIT_MASK(SW_LID) },
> + },
> + { },
> +};
> +
> +MODULE_DEVICE_TABLE(input, acpi_button_lid_ids);
> +
> +static struct input_handler acpi_button_lid_handler = {
> + .match = acpi_button_lid_match,
> + .connect = acpi_button_lid_connect,
> + .disconnect = acpi_button_lid_disconnect,
> + .events = acpi_button_lid_events,
> + .name = "acpi-lid-callchain",
> + .id_table = acpi_button_lid_ids,
> +};
> +
> +
> static int acpi_button_state_seq_show(struct seq_file *seq, void *offset)
> {
> struct acpi_device *device = seq->private;
> @@ -581,8 +687,6 @@ static int acpi_button_add(struct acpi_device *device)
> strcpy(name, ACPI_BUTTON_DEVICE_NAME_LID);
> sprintf(class, "%s/%s",
> ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
> - button->last_state = !!acpi_lid_evaluate_state(device);
> - button->last_time = ktime_get();
> } else {
> printk(KERN_ERR PREFIX "Unsupported hid [%s]\n", hid);
> error = -ENODEV;
> @@ -674,4 +778,22 @@ module_param_call(lid_init_state,
> NULL, 0644);
> MODULE_PARM_DESC(lid_init_state, "Behavior for reporting LID initial state");
>
> -module_acpi_driver(acpi_button_driver);
> +static int __init acpi_button_init(void)
> +{
> + int error;
> +
> + error = input_register_handler(&acpi_button_lid_handler);
> + if (error)
> + return error;
> +
> + return acpi_bus_register_driver(&acpi_button_driver);
> +}
> +
> +static void __exit acpi_button_exit(void)
> +{
> + acpi_bus_unregister_driver(&acpi_button_driver);
> + input_unregister_handler(&acpi_button_lid_handler);
> +}
> +
> +module_init(acpi_button_init);
> +module_exit(acpi_button_exit);
> --
> 2.9.4

2017-06-06 10:22:31

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown

Hi Lv,

On Jun 05 2017 or thereabouts, Zheng, Lv wrote:
> Hi, Benjamin
>
> > From: Benjamin Tissoires [mailto:[email protected]]
> > Subject: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown
> >
> > Because of the variation of firmware implementation, there is a chance
> > the LID state is unknown:
> > 1. Some platforms send "open" ACPI notification to the OS and the event
> > arrive before the button driver is resumed;
> > 2. Some platforms send "open" ACPI notification to the OS, but the event
> > arrives after the button driver is resumed, ex., Samsung N210+;
> > 3. Some platforms never send an "open" ACPI notification to the OS, but
> > update the cached _LID return value to "open", and this update arrives
> > before the button driver is resumed;
> > 4. Some platforms never send an "open" ACPI notification to the OS, but
> > update the cached _LID return value to "open", but this update arrives
> > after the button driver is resumed, ex., Surface Pro 3;
> > 5. Some platforms never send an "open" ACPI notification to the OS, and
> > _LID ACPI method returns a value which stays to "close", ex.,
> > Surface Pro 1.
> >
> > We can mark the unreliable platform (cases 2, 4, 5 above) as such and make
> > sure we do not export an input node with an unknown state to prevent
> > suspend loops.
> >
> > The database of unreliable devices is left to userspace to handle with
> > a hwdb file and a udev rule.
> >
> > Note that this patch removes the filtering of duplicate events when
> > calling blocking_notifier_call_chain(), but this will be addressed in
> > a following patch.
> >
> > Signed-off-by: Benjamin Tissoires <[email protected]>
> > ---
> > drivers/acpi/button.c | 207 ++++++++++++++++++++++++++++++++------------------
> > 1 file changed, 131 insertions(+), 76 deletions(-)
> >
> > diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> > index 48bcdca..9ad7604 100644
> > --- a/drivers/acpi/button.c
> > +++ b/drivers/acpi/button.c
> > @@ -25,6 +25,7 @@
> > #include <linux/module.h>
> > #include <linux/init.h>
> > #include <linux/types.h>
> > +#include <linux/moduleparam.h>
> > #include <linux/proc_fs.h>
> > #include <linux/seq_file.h>
> > #include <linux/input.h>
> > @@ -79,6 +80,8 @@ MODULE_DEVICE_TABLE(acpi, button_device_ids);
> > static int acpi_button_add(struct acpi_device *device);
> > static int acpi_button_remove(struct acpi_device *device);
> > static void acpi_button_notify(struct acpi_device *device, u32 event);
> > +static int acpi_button_add_input(struct acpi_device *device);
> > +static int acpi_lid_update_reliable(struct acpi_device *device);
> >
> > #ifdef CONFIG_PM_SLEEP
> > static int acpi_button_suspend(struct device *dev);
> > @@ -111,6 +114,8 @@ struct acpi_button {
> > bool suspended;
> > };
> >
> > +static DEFINE_MUTEX(button_input_lock);
> > +
> > static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
> > static struct acpi_device *lid_device;
> > static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
> > @@ -119,6 +124,44 @@ static unsigned long lid_report_interval __read_mostly = 500;
> > module_param(lid_report_interval, ulong, 0644);
> > MODULE_PARM_DESC(lid_report_interval, "Interval (ms) between lid key events");
> >
> > +static bool lid_reliable = true;
> > +
> > +static int param_set_lid_reliable(const char *val,
> > + const struct kernel_param *kp)
> > +{
> > + bool prev_lid_reliable = lid_reliable;
> > + int ret;
> > +
> > + mutex_lock(&button_input_lock);
> > +
> > + ret = param_set_bool(val, kp);
> > + if (ret) {
> > + mutex_unlock(&button_input_lock);
> > + return ret;
> > + }
> > +
> > + /*
> > + * prevent a loop when we show up the device to userspace because
> > + * of an acpi notification, and userspace immediately removes it
> > + * by marking it as unreliable when this was already known.
> > + */
> > + if (lid_device && prev_lid_reliable != lid_reliable) {
> > + ret = acpi_lid_update_reliable(lid_device);
> > + if (ret)
> > + lid_reliable = prev_lid_reliable;
> > + }
> > +
> > + mutex_unlock(&button_input_lock);
> > + return ret;
> > +}
> > +
> > +static const struct kernel_param_ops lid_reliable_ops = {
> > + .get = param_get_bool,
> > + .set = param_set_lid_reliable,
> > +};
> > +module_param_cb(lid_reliable, &lid_reliable_ops, &lid_reliable, 0644);
> > +MODULE_PARM_DESC(lid_reliable, "Is the LID switch reliable (true|false)?");
> > +
> > /* --------------------------------------------------------------------------
> > FS Interface (/proc)
> > -------------------------------------------------------------------------- */
> > @@ -142,79 +185,22 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)
> > {
> > struct acpi_button *button = acpi_driver_data(device);
> > int ret;
> > - ktime_t next_report;
> > - bool do_update;
> > +
> > + /* button_input_lock must be held */
> > +
> > + if (!button->input)
> > + return 0;
> >
> > /*
> > - * In lid_init_state=ignore mode, if user opens/closes lid
> > - * frequently with "open" missing, and "last_time" is also updated
> > - * frequently, "close" cannot be delivered to the userspace.
> > - * So "last_time" is only updated after a timeout or an actual
> > - * switch.
> > + * If the lid is unreliable, always send an "open" event before any
> > + * other. The input layer will filter out the extra open if required,
> > + * and it will force the close event to be sent.
> > */
> > - if (lid_init_state != ACPI_BUTTON_LID_INIT_IGNORE ||
> > - button->last_state != !!state)
> > - do_update = true;
> > - else
> > - do_update = false;
> > -
> > - next_report = ktime_add(button->last_time,
> > - ms_to_ktime(lid_report_interval));
> > - if (button->last_state == !!state &&
> > - ktime_after(ktime_get(), next_report)) {
> > - /* Complain the buggy firmware */
> > - pr_warn_once("The lid device is not compliant to SW_LID.\n");
> > + if (!lid_reliable)
> > + input_report_switch(button->input, SW_LID, 0);
> >
> > - /*
> > - * Send the unreliable complement switch event:
> > - *
> > - * On most platforms, the lid device is reliable. However
> > - * there are exceptions:
> > - * 1. Platforms returning initial lid state as "close" by
> > - * default after booting/resuming:
> > - * https://bugzilla.kernel.org/show_bug.cgi?id=89211
> > - * https://bugzilla.kernel.org/show_bug.cgi?id=106151
> > - * 2. Platforms never reporting "open" events:
> > - * https://bugzilla.kernel.org/show_bug.cgi?id=106941
> > - * On these buggy platforms, the usage model of the ACPI
> > - * lid device actually is:
> > - * 1. The initial returning value of _LID may not be
> > - * reliable.
> > - * 2. The open event may not be reliable.
> > - * 3. The close event is reliable.
> > - *
> > - * But SW_LID is typed as input switch event, the input
> > - * layer checks if the event is redundant. Hence if the
> > - * state is not switched, the userspace cannot see this
> > - * platform triggered reliable event. By inserting a
> > - * complement switch event, it then is guaranteed that the
> > - * platform triggered reliable one can always be seen by
> > - * the userspace.
> > - */
> > - if (lid_init_state == ACPI_BUTTON_LID_INIT_IGNORE) {
> > - do_update = true;
> > - /*
> > - * Do generate complement switch event for "close"
> > - * as "close" is reliable and wrong "open" won't
> > - * trigger unexpected behaviors.
> > - * Do not generate complement switch event for
> > - * "open" as "open" is not reliable and wrong
> > - * "close" will trigger unexpected behaviors.
> > - */
> > - if (!state) {
> > - input_report_switch(button->input,
> > - SW_LID, state);
> > - input_sync(button->input);
> > - }
> > - }
> > - }
>
> My dell latitude 6430u test platform sends multiple Notify(lid) before suspend and after resume.

Does this platform requires the not lid_reliable check as per this
series? Because if it doesn't, then we should not care.

> This is because the aml table puts many Notify(LID, 0x80) in various control methods.
> And not one of them but multiple of them will be invoked by various OS drivers during suspend/resume period.
> I think this is not an isolated platform that will invoke multiple redundant "Notify(lid)".
>
> Fortunately, the lid state for the multiple notify(lid) should be same as the first "Notify(lid)".
> I suppose this is why SW_LID is invented, as it can really filter such redundant events.
> And user space finally can only see 1 "close" event.
>
> But unconditionally prepending "open" before all "close" events surely can break the logic by
> delivering multiple "close" events to the user space.

That doesn't matter. What matters is the state of the switch, not the
event. So if user space receives (in case we marked the switch as not
reliable) several close events, all user space will do is realize that
the state is still closed and will act accordingly.

>
> Another issue is, for case 5, when we use button.lid_init_state=method.
> Unconditionally prepending "open" before driver initiated "close" event
> sent due to acpi_lid_initialize_state(), we will see suspend/resume cycles.

Case 5 is broken anyway and needs to be handled specially. It was not
targeted in this WIP series.

>
> Thus if we consider both cases, we should:
> 1. put a frequency check to filter possible redundant events.

This doesn't work and should be avoided. The state of the input switch
is known to the input layer only, and given there are spinlocks, you can
not know if the state is actually the one you expected beforehand.

You can however add frequency checks in the input handler, but that
would assume the input layer is not doing its job properly and so should
be avoided.

> 2. distinguish AML "Notify" call and button driver initiated lid notification.

Again, we don't care if the "event" comes from ACPI, the driver itself or
user space (libinput). All that matters is the current state of the
input node switch, that needs to match the physical world at any time.

> And we may reach a same result like the following:
> https://patchwork.kernel.org/patch/9756467/
>
>
> > - /* Send the platform triggered reliable event */
> > - if (do_update) {
> > - input_report_switch(button->input, SW_LID, !state);
> > - input_sync(button->input);
> > - button->last_state = !!state;
> > - button->last_time = ktime_get();
> > - }
> > + input_report_switch(button->input, SW_LID, !state);
> > + input_sync(button->input);
> >
> > if (state)
> > pm_wakeup_hard_event(&device->dev);
> > @@ -371,6 +357,21 @@ static int acpi_lid_update_state(struct acpi_device *device)
> > return acpi_lid_notify_state(device, state);
> > }
> >
> > +static int acpi_lid_notify(struct acpi_device *device)
> > +{
> > + struct acpi_button *button = acpi_driver_data(device);
> > + int ret;
> > +
> > + mutex_lock(&button_input_lock);
> > + if (!button->input)
> > + acpi_button_add_input(device);
> > + ret = acpi_lid_update_state(device);
> > + mutex_unlock(&button_input_lock);
> > +
> > +
> > + return ret;
> > +}
> > +
> > static void acpi_lid_initialize_state(struct acpi_device *device)
> > {
> > switch (lid_init_state) {
> > @@ -398,7 +399,7 @@ static void acpi_button_notify(struct acpi_device *device, u32 event)
> > case ACPI_BUTTON_NOTIFY_STATUS:
> > input = button->input;
> > if (button->type == ACPI_BUTTON_TYPE_LID) {
> > - acpi_lid_update_state(device);
> > + acpi_lid_notify(device);
> > } else {
> > int keycode;
> >
> > @@ -433,6 +434,16 @@ static int acpi_button_suspend(struct device *dev)
> > struct acpi_button *button = acpi_driver_data(device);
> >
> > button->suspended = true;
> > +
> > + if (button->type == ACPI_BUTTON_TYPE_LID) {
> > + /*
> > + * If lid is marked unreliable, this will have the effect
> > + * of unregistering the LID input node
> > + */
> > + mutex_lock(&button_input_lock);
> > + acpi_lid_update_reliable(device);
> > + mutex_unlock(&button_input_lock);
> > + }
> > return 0;
> > }
> >
> > @@ -442,8 +453,17 @@ static int acpi_button_resume(struct device *dev)
> > struct acpi_button *button = acpi_driver_data(device);
> >
> > button->suspended = false;
> > - if (button->type == ACPI_BUTTON_TYPE_LID)
> > + if (button->type == ACPI_BUTTON_TYPE_LID) {
> > + /*
> > + * If lid is marked reliable, this will have the effect
> > + * of registering a new LID input node if none was there
> > + */
> > + mutex_lock(&button_input_lock);
> > + acpi_lid_update_reliable(device);
> > acpi_lid_initialize_state(device);
> > + mutex_unlock(&button_input_lock);
> > + }
> > +
> > return 0;
> > }
> > #endif
> > @@ -452,6 +472,7 @@ static void acpi_button_remove_input(struct acpi_device *device)
> > {
> > struct acpi_button *button = acpi_driver_data(device);
> >
> > + /* button_input_lock must be held */
> > input_unregister_device(button->input);
> > button->input = NULL;
> > }
> > @@ -462,6 +483,8 @@ static int acpi_button_add_input(struct acpi_device *device)
> > struct input_dev *input;
> > int error;
> >
> > + /* button_input_lock must be held */
> > +
> > button->input = input = input_allocate_device();
> > if (!input) {
> > error = -ENOMEM;
> > @@ -500,6 +523,31 @@ static int acpi_button_add_input(struct acpi_device *device)
> > return error;
> > }
> >
> > +static int acpi_lid_update_reliable(struct acpi_device *device)
> > +{
> > + struct acpi_button *button = acpi_driver_data(lid_device);
> > + int error;
> > +
> > + /* button_input_lock must be held */
> > +
> > + if (lid_reliable && !button->input) {
> > + error = acpi_button_add_input(device);
> > + if (error)
> > + return error;
> > +
> > + error = acpi_lid_update_state(device);
> > + if (error) {
> > + acpi_button_remove_input(device);
> > + return error;
> > + }
> > + }
> > +
> > + if (!lid_reliable && button->input)
> > + acpi_button_remove_input(device);
> > +
> > + return 0;
> > +}
> > +
> > static int acpi_button_add(struct acpi_device *device)
> > {
> > struct acpi_button *button;
> > @@ -547,12 +595,7 @@ static int acpi_button_add(struct acpi_device *device)
> >
> > snprintf(button->phys, sizeof(button->phys), "%s/button/input0", hid);
> >
> > - error = acpi_button_add_input(device);
> > - if (error)
> > - goto err_remove_fs;
> > -
> > if (button->type == ACPI_BUTTON_TYPE_LID) {
> > - acpi_lid_initialize_state(device);
> > /*
> > * This assumes there's only one lid device, or if there are
> > * more we only care about the last one...
> > @@ -560,6 +603,18 @@ static int acpi_button_add(struct acpi_device *device)
> > lid_device = device;
> > }
> >
> > + if (lid_reliable || button->type != ACPI_BUTTON_TYPE_LID) {
> > + error = acpi_button_add_input(device);
> > + if (error)
> > + goto err_remove_fs;
> > +
> > + if (button->type == ACPI_BUTTON_TYPE_LID) {
> > + mutex_lock(&button_input_lock);
> > + acpi_lid_initialize_state(device);
> > + mutex_unlock(&button_input_lock);
> > + }
> > + }
> > +
> > device_init_wakeup(&device->dev, true);
> > printk(KERN_INFO PREFIX "%s [%s]\n", name, acpi_device_bid(device));
> > return 0;
>
> This is another major differences between your proposal and mine.
>
> First of all, I think it should be in a separate patch.

Well, that's already a patch on its own :/

>
> Second, I have concerns related to such a change:
> I can see that, you are trying to address a problem that:
> The input layer requires a determined initial SW_LID state while ACPI button driver cannot offer.
> So by adding/removing input node, you can introduce a tristate SW_LID input node.

You can put it that way. I prefer putting it: "when we export the LID
switch input node, you are guaranteed to have the proper state".

> However I doubt if this is necessary and can solve real issues, as:
> systemd now works fine with button driver for all cases,

I do not care about systemd or the suspend lopps introduced by systemd.
All I care is that the kernel provides correct behavior. If systemd can
work around some issues we see because we are too lazy to fix them in
the kernel (this is not a personal attack, sometimes being lazy is the
right solution), fine. But the current state of this driver doesn't
follow the specification of the input subsystem on some platform, and
this is what this series fixes.

> only desktop managers should be changed to be compliant to case 2/4/5

As long as the kernel lies, we should not even remotely envision asking
user space to change.

> (or if we improve by adding a timer, they should only be changed to be compliant to case 5).
> And doing this won't help desktop managers to be able to work fine with case 5.
> So this finally may only remove ACPI SW_LID support from Surface Pro 1.

I haven't decided how to solve case 5 completely. So please do not take
this case into account yet.

> While with latest systemd, closing lid on that platform can correctly triggers suspend.
> And no suspend/resume cycles should be seen.
> So why do we need to remove this feature (ACPI SW_LID) from Surface Pro 1?

Well, with this change, if you mark the surface pro 1 as unreliable, the
event will be forwarded to user space correctly. Just that there is a
systemd bug in which the state is not synced when the device appears.

So yes, it'll expose a new bug in userspace, but given the blacklist of
unreliable LID switches will be handled in userspace, user space can
make sure this will not show up before systemd can handle it.

> If you really want to propose an ABI change for user space.
> Why don't you do this in input layer by defining SW_LID as tristate?

Because this proposal is not a kABI change, and the kABI change you
propose is just not doable. We have too many users of EV_SW to be able
to say that we change the semantic. A solution would be to add a new
EV_* event, but we don't really need.

Cheers,
Benjamin

>
> Thanks and best regards
> Lv

2017-06-06 10:29:21

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks

On Jun 05 2017 or thereabouts, Zheng, Lv wrote:
> Hi,
>
> > From: Benjamin Tissoires [mailto:[email protected]]
> > Subject: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks
> >
> > From: Lv Zheng <[email protected]>
> >
> > acpi/button.c now contains the logic to avoid frequently replayed events
> > which originally was ensured by using blocking notifier.
> > On the contrary, using a blocking notifier is wrong as it could keep on
> > returning NOTIFY_DONE, causing events lost.
> >
> > This patch thus changes lid notification to raw notifier in order not to
> > have any events lost.
>
> This patch is on top of the following:
> https://patchwork.kernel.org/patch/9756467/
> where button driver implements a frequency check and
> thus is capable of filtering redundant events itself:
> I saw you have deleted it from PATCH 02.
> So this patch is not applicable now.

I actually rebased it in this series. I kept your SoB line given that
the idea came from you and the resulting patch was rather similar (only
one hunk differs, but the meaning is the same).

>
> Is input layer capable of filtering redundant events.

I don't think it does, and it should not. If an event is emitted, it has
to be forwarded. However, the logic of the protocol makes that the only
state that matters is when an EV_SYN is emitted. So if a SW_LID 0 then 1
is sent between the 2 EV_SYN, and the state was 1 before, from a
protocol point of view it's a no-op.

> I saw you unconditionally prepend "open" before "close",
> which may make input layer incapable of filtering redundant close events.

Again, we don't care about events. We care about states, and those are
only emitted when the lid is marked as non reliable.

>
> If input layer is capable of filtering redundant events,
> why don't you:
> 1. drop this commit;
> 2. remove all ACPI lid notifier APIs;
> 3. change lid notifier callers to register notification via input layer?

Having the i915 driver listening to the input events is actually a good
solution. Let me think about it a little bit more and I'll come back.

Cheers,
Benjamin

>
> Thanks and best regards
> Lv
>
> >
> > Signed-off-by: Lv Zheng <[email protected]>
> > Signed-off-by: Benjamin Tissoires <[email protected]>
> > ---
> > drivers/acpi/button.c | 68 ++++++++++++++++++++-------------------------------
> > 1 file changed, 27 insertions(+), 41 deletions(-)
> >
> > diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> > index 03e5981..1927b08 100644
> > --- a/drivers/acpi/button.c
> > +++ b/drivers/acpi/button.c
> > @@ -114,7 +114,7 @@ struct acpi_button {
> >
> > static DEFINE_MUTEX(button_input_lock);
> >
> > -static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
> > +static RAW_NOTIFIER_HEAD(acpi_lid_notifier);
> > static struct acpi_device *lid_device;
> > static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
> >
> > @@ -179,14 +179,12 @@ static int acpi_lid_evaluate_state(struct acpi_device *device)
> > return lid_state ? 1 : 0;
> > }
> >
> > -static int acpi_lid_notify_state(struct acpi_device *device, int state)
> > +static void acpi_lid_notify_state(struct acpi_device *device, int state)
> > {
> > struct acpi_button *button = acpi_driver_data(device);
> >
> > - /* button_input_lock must be held */
> > -
> > if (!button->input)
> > - return 0;
> > + return;
> >
> > /*
> > * If the lid is unreliable, always send an "open" event before any
> > @@ -201,8 +199,6 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)
> >
> > if (state)
> > pm_wakeup_hard_event(&device->dev);
> > -
> > - return 0;
> > }
> >
> > /*
> > @@ -214,28 +210,14 @@ static void acpi_button_lid_events(struct input_handle *handle,
> > {
> > const struct input_value *v;
> > int state = -1;
> > - int ret;
> >
> > for (v = vals; v != vals + count; v++) {
> > switch (v->type) {
> > case EV_SYN:
> > - if (v->code == SYN_REPORT && state >= 0) {
> > - ret = blocking_notifier_call_chain(&acpi_lid_notifier,
> > + if (v->code == SYN_REPORT && state >= 0)
> > + (void)raw_notifier_call_chain(&acpi_lid_notifier,
> > state,
> > lid_device);
> > - if (ret == NOTIFY_DONE)
> > - ret = blocking_notifier_call_chain(&acpi_lid_notifier,
> > - state,
> > - lid_device);
> > - if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> > - /*
> > - * It is also regarded as success if
> > - * the notifier_chain returns NOTIFY_OK
> > - * or NOTIFY_DONE.
> > - */
> > - ret = 0;
> > - }
> > - }
> > break;
> > case EV_SW:
> > if (v->code == SW_LID)
> > @@ -433,13 +415,25 @@ static int acpi_button_remove_fs(struct acpi_device *device)
> > -------------------------------------------------------------------------- */
> > int acpi_lid_notifier_register(struct notifier_block *nb)
> > {
> > - return blocking_notifier_chain_register(&acpi_lid_notifier, nb);
> > + return raw_notifier_chain_register(&acpi_lid_notifier, nb);
> > }
> > EXPORT_SYMBOL(acpi_lid_notifier_register);
> >
> > +static inline int __acpi_lid_notifier_unregister(struct notifier_block *nb,
> > + bool sync)
> > +{
> > + int ret;
> > +
> > + ret = raw_notifier_chain_unregister(&acpi_lid_notifier, nb);
> > + if (sync)
> > + synchronize_rcu();
> > +
> > + return ret;
> > +}
> > +
> > int acpi_lid_notifier_unregister(struct notifier_block *nb)
> > {
> > - return blocking_notifier_chain_unregister(&acpi_lid_notifier, nb);
> > + return __acpi_lid_notifier_unregister(nb, false);
> > }
> > EXPORT_SYMBOL(acpi_lid_notifier_unregister);
> >
> > @@ -452,40 +446,36 @@ int acpi_lid_open(void)
> > }
> > EXPORT_SYMBOL(acpi_lid_open);
> >
> > -static int acpi_lid_update_state(struct acpi_device *device)
> > +static void acpi_lid_update_state(struct acpi_device *device)
> > {
> > int state;
> >
> > state = acpi_lid_evaluate_state(device);
> > if (state < 0)
> > - return state;
> > + return;
> >
> > - return acpi_lid_notify_state(device, state);
> > + acpi_lid_notify_state(device, state);
> > }
> >
> > -static int acpi_lid_notify(struct acpi_device *device)
> > +static void acpi_lid_notify(struct acpi_device *device)
> > {
> > struct acpi_button *button = acpi_driver_data(device);
> > - int ret;
> >
> > mutex_lock(&button_input_lock);
> > if (!button->input)
> > acpi_button_add_input(device);
> > - ret = acpi_lid_update_state(device);
> > + acpi_lid_update_state(device);
> > mutex_unlock(&button_input_lock);
> > -
> > -
> > - return ret;
> > }
> >
> > static void acpi_lid_initialize_state(struct acpi_device *device)
> > {
> > switch (lid_init_state) {
> > case ACPI_BUTTON_LID_INIT_OPEN:
> > - (void)acpi_lid_notify_state(device, 1);
> > + acpi_lid_notify_state(device, 1);
> > break;
> > case ACPI_BUTTON_LID_INIT_METHOD:
> > - (void)acpi_lid_update_state(device);
> > + acpi_lid_update_state(device);
> > break;
> > case ACPI_BUTTON_LID_INIT_IGNORE:
> > default:
> > @@ -641,11 +631,7 @@ static int acpi_lid_update_reliable(struct acpi_device *device)
> > if (error)
> > return error;
> >
> > - error = acpi_lid_update_state(device);
> > - if (error) {
> > - acpi_button_remove_input(device);
> > - return error;
> > - }
> > + acpi_lid_update_state(device);
> > }
> >
> > if (!lid_reliable && button->input)
> > --
> > 2.9.4
>

2017-06-06 10:31:42

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [WIP PATCH 3/4] ACPI: button: Let input filter out the LID events

On Jun 05 2017 or thereabouts, Zheng, Lv wrote:
> Hi, Benjamin
>
> > From: Benjamin Tissoires [mailto:[email protected]]
> > Subject: [WIP PATCH 3/4] ACPI: button: Let input filter out the LID events
> >
> > The input stack already filters out the LID events. So instead of
> > filtering them out at the source, we can hook up after the input
> > processing and propagate the lid switch events when the input stack
> > tells us to.
> >
> > An other benefit is that if userspace (think libinput) "fixes" the lid
> > switch state by some heuristics, this new state is forwarded to the
> > listeners in the kernel.
>
> See my comments to PATCH 4.
> IMO, it sounds better that
> 1. ACPI lid works as a driver of SW_LID, and
> 2. i915 registers notification (the only user) via input layer.
> So it looks i915 rather than button driver should call input_register_handler().
> And input layer may help to provide a simplified interface for drivers to register key notifications.

Sounds good.

Dmitry, would a simplified API for other drivers to listen to input
events be something you would agree on?

Cheers,
Benjamin

>
> Thanks and best regards
> Lv
>
> >
> > Signed-off-by: Benjamin Tissoires <[email protected]>
> > ---
> > drivers/acpi/button.c | 156 ++++++++++++++++++++++++++++++++++++++++++++------
> > 1 file changed, 139 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> > index 9ad7604..03e5981 100644
> > --- a/drivers/acpi/button.c
> > +++ b/drivers/acpi/button.c
> > @@ -109,8 +109,6 @@ struct acpi_button {
> > struct input_dev *input;
> > char phys[32]; /* for input device */
> > unsigned long pushed;
> > - int last_state;
> > - ktime_t last_time;
> > bool suspended;
> > };
> >
> > @@ -184,7 +182,6 @@ static int acpi_lid_evaluate_state(struct acpi_device *device)
> > static int acpi_lid_notify_state(struct acpi_device *device, int state)
> > {
> > struct acpi_button *button = acpi_driver_data(device);
> > - int ret;
> >
> > /* button_input_lock must be held */
> >
> > @@ -205,20 +202,129 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)
> > if (state)
> > pm_wakeup_hard_event(&device->dev);
> >
> > - ret = blocking_notifier_call_chain(&acpi_lid_notifier, state, device);
> > - if (ret == NOTIFY_DONE)
> > - ret = blocking_notifier_call_chain(&acpi_lid_notifier, state,
> > - device);
> > - if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> > - /*
> > - * It is also regarded as success if the notifier_chain
> > - * returns NOTIFY_OK or NOTIFY_DONE.
> > - */
> > - ret = 0;
> > + return 0;
> > +}
> > +
> > +/*
> > + * Pass incoming event to all connected clients.
> > + */
> > +static void acpi_button_lid_events(struct input_handle *handle,
> > + const struct input_value *vals,
> > + unsigned int count)
> > +{
> > + const struct input_value *v;
> > + int state = -1;
> > + int ret;
> > +
> > + for (v = vals; v != vals + count; v++) {
> > + switch (v->type) {
> > + case EV_SYN:
> > + if (v->code == SYN_REPORT && state >= 0) {
> > + ret = blocking_notifier_call_chain(&acpi_lid_notifier,
> > + state,
> > + lid_device);
> > + if (ret == NOTIFY_DONE)
> > + ret = blocking_notifier_call_chain(&acpi_lid_notifier,
> > + state,
> > + lid_device);
> > + if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> > + /*
> > + * It is also regarded as success if
> > + * the notifier_chain returns NOTIFY_OK
> > + * or NOTIFY_DONE.
> > + */
> > + ret = 0;
> > + }
> > + }
> > + break;
> > + case EV_SW:
> > + if (v->code == SW_LID)
> > + state = !v->value;
> > + break;
> > + }
> > }
> > - return ret;
> > }
> >
> > +static int acpi_button_lid_connect(struct input_handler *handler,
> > + struct input_dev *dev,
> > + const struct input_device_id *id)
> > +{
> > + struct input_handle *handle;
> > + int error;
> > +
> > + handle = kzalloc(sizeof(struct input_handle), GFP_KERNEL);
> > + if (!handle)
> > + return -ENOMEM;
> > +
> > + handle->dev = dev;
> > + handle->handler = handler;
> > + handle->name = "acpi-button-lid";
> > +
> > + error = input_register_handle(handle);
> > + if (error) {
> > + dev_err(&lid_device->dev, "Error installing input handle\n");
> > + goto err_free;
> > + }
> > +
> > + error = input_open_device(handle);
> > + if (error) {
> > + dev_err(&lid_device->dev, "Failed to open input device\n");
> > + goto err_unregister;
> > + }
> > +
> > + return 0;
> > +
> > + err_unregister:
> > + input_unregister_handle(handle);
> > + err_free:
> > + kfree(handle);
> > + return error;
> > +}
> > +
> > +static void acpi_button_lid_disconnect(struct input_handle *handle)
> > +{
> > + input_close_device(handle);
> > + input_unregister_handle(handle);
> > + kfree(handle);
> > +}
> > +
> > +bool acpi_button_lid_match(struct input_handler *handler,
> > + struct input_dev *dev)
> > +{
> > + struct acpi_button *button;
> > +
> > + if (!lid_device)
> > + return false;
> > +
> > + button = acpi_driver_data(lid_device);
> > +
> > + if (dev != button->input)
> > + return false;
> > +
> > + return true;
> > +}
> > +
> > +static const struct input_device_id acpi_button_lid_ids[] = {
> > + {
> > + .flags = INPUT_DEVICE_ID_MATCH_EVBIT,
> > + .evbit = { BIT_MASK(EV_SW) },
> > + .swbit = { BIT_MASK(SW_LID) },
> > + },
> > + { },
> > +};
> > +
> > +MODULE_DEVICE_TABLE(input, acpi_button_lid_ids);
> > +
> > +static struct input_handler acpi_button_lid_handler = {
> > + .match = acpi_button_lid_match,
> > + .connect = acpi_button_lid_connect,
> > + .disconnect = acpi_button_lid_disconnect,
> > + .events = acpi_button_lid_events,
> > + .name = "acpi-lid-callchain",
> > + .id_table = acpi_button_lid_ids,
> > +};
> > +
> > +
> > static int acpi_button_state_seq_show(struct seq_file *seq, void *offset)
> > {
> > struct acpi_device *device = seq->private;
> > @@ -581,8 +687,6 @@ static int acpi_button_add(struct acpi_device *device)
> > strcpy(name, ACPI_BUTTON_DEVICE_NAME_LID);
> > sprintf(class, "%s/%s",
> > ACPI_BUTTON_CLASS, ACPI_BUTTON_SUBCLASS_LID);
> > - button->last_state = !!acpi_lid_evaluate_state(device);
> > - button->last_time = ktime_get();
> > } else {
> > printk(KERN_ERR PREFIX "Unsupported hid [%s]\n", hid);
> > error = -ENODEV;
> > @@ -674,4 +778,22 @@ module_param_call(lid_init_state,
> > NULL, 0644);
> > MODULE_PARM_DESC(lid_init_state, "Behavior for reporting LID initial state");
> >
> > -module_acpi_driver(acpi_button_driver);
> > +static int __init acpi_button_init(void)
> > +{
> > + int error;
> > +
> > + error = input_register_handler(&acpi_button_lid_handler);
> > + if (error)
> > + return error;
> > +
> > + return acpi_bus_register_driver(&acpi_button_driver);
> > +}
> > +
> > +static void __exit acpi_button_exit(void)
> > +{
> > + acpi_bus_unregister_driver(&acpi_button_driver);
> > + input_unregister_handler(&acpi_button_lid_handler);
> > +}
> > +
> > +module_init(acpi_button_init);
> > +module_exit(acpi_button_exit);
> > --
> > 2.9.4
>

2017-06-07 01:28:04

by Peter Hutterer

[permalink] [raw]
Subject: Re: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown

On Tue, Jun 06, 2017 at 12:22:09PM +0200, Benjamin Tissoires wrote:
[...]
> > This is because the aml table puts many Notify(LID, 0x80) in various control methods.
> > And not one of them but multiple of them will be invoked by various OS drivers during suspend/resume period.
> > I think this is not an isolated platform that will invoke multiple redundant "Notify(lid)".
> >
> > Fortunately, the lid state for the multiple notify(lid) should be same as the first "Notify(lid)".
> > I suppose this is why SW_LID is invented, as it can really filter such redundant events.
> > And user space finally can only see 1 "close" event.
> >
> > But unconditionally prepending "open" before all "close" events surely can break the logic by
> > delivering multiple "close" events to the user space.
>
> That doesn't matter. What matters is the state of the switch, not the
> event. So if user space receives (in case we marked the switch as not
> reliable) several close events, all user space will do is realize that
> the state is still closed and will act accordingly.

[...]

> Again, we don't care if the "event" comes from ACPI, the driver itself or
> user space (libinput). All that matters is the current state of the
> input node switch, that needs to match the physical world at any time.

as extra comment on those two: you cannot guarantee when e.g. libinput
checks the state. it may start up after the kernel has updated the EV_SW
state, it may close and re-open a device without notice (disabling a
device in X has that effect for example). So the EV_SW/SW_LID events are
nice to have, but we really rely on the *state* of the switch more than the
events.

Cheers,
Peter

2017-06-07 07:59:00

by Lennart Poettering

[permalink] [raw]
Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

On Thu, 01.06.17 20:46, Benjamin Tissoires ([email protected]) wrote:

> Hi,
>
> Sending this as a WIP as it still need a few changes, but it mostly works as
> expected (still not fully compliant yet).
>
> So this is based on Lennart's comment in [1]: if the LID state is not reliable,
> the kernel should not export the LID switch device as long as we are not sure
> about its state.

Ah nice! I (obviously) like this approach.

> Note that systemd currently doesn't sync the state when the input node just
> appears. This is a systemd bug, and it should not be handled by the kernel
> community.

Uh if this is borked, we should indeed fix this in systemd. Is there
already a systemd github bug about this? If not, please create one,
and we'll look into it!

Thanks for working on this,

Lennart

--
Lennart Poettering, Red Hat

2017-06-07 09:47:38

by Zheng, Lv

[permalink] [raw]
Subject: RE: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks

Hi, Benjamin

> From: Benjamin Tissoires [mailto:[email protected]]
> Subject: Re: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks
>
> On Jun 05 2017 or thereabouts, Zheng, Lv wrote:
> > Hi,
> >
> > > From: Benjamin Tissoires [mailto:[email protected]]
> > > Subject: [WIP PATCH 4/4] ACPI: button: Fix lid notification locks
> > >
> > > From: Lv Zheng <[email protected]>
> > >
> > > acpi/button.c now contains the logic to avoid frequently replayed events
> > > which originally was ensured by using blocking notifier.
> > > On the contrary, using a blocking notifier is wrong as it could keep on
> > > returning NOTIFY_DONE, causing events lost.
> > >
> > > This patch thus changes lid notification to raw notifier in order not to
> > > have any events lost.
> >
> > This patch is on top of the following:
> > https://patchwork.kernel.org/patch/9756467/
> > where button driver implements a frequency check and
> > thus is capable of filtering redundant events itself:
> > I saw you have deleted it from PATCH 02.
> > So this patch is not applicable now.
>
> I actually rebased it in this series. I kept your SoB line given that
> the idea came from you and the resulting patch was rather similar (only
> one hunk differs, but the meaning is the same).
>
> >
> > Is input layer capable of filtering redundant events.
>
> I don't think it does, and it should not. If an event is emitted, it has
> to be forwarded. However, the logic of the protocol makes that the only
> state that matters is when an EV_SYN is emitted. So if a SW_LID 0 then 1
> is sent between the 2 EV_SYN, and the state was 1 before, from a
> protocol point of view it's a no-op.
>
> > I saw you unconditionally prepend "open" before "close",
> > which may make input layer incapable of filtering redundant close events.
>
> Again, we don't care about events. We care about states, and those are
> only emitted when the lid is marked as non reliable.
>
> >
> > If input layer is capable of filtering redundant events,
> > why don't you:
> > 1. drop this commit;
> > 2. remove all ACPI lid notifier APIs;
> > 3. change lid notifier callers to register notification via input layer?
>
> Having the i915 driver listening to the input events is actually a good
> solution. Let me think about it a little bit more and I'll come back.

OK, then I'll drop the frequency check mechanism and drop patch 4/5.

Cheers,
Lv

>
> Cheers,
> Benjamin
>
> >
> > Thanks and best regards
> > Lv
> >
> > >
> > > Signed-off-by: Lv Zheng <[email protected]>
> > > Signed-off-by: Benjamin Tissoires <[email protected]>
> > > ---
> > > drivers/acpi/button.c | 68 ++++++++++++++++++++-------------------------------
> > > 1 file changed, 27 insertions(+), 41 deletions(-)
> > >
> > > diff --git a/drivers/acpi/button.c b/drivers/acpi/button.c
> > > index 03e5981..1927b08 100644
> > > --- a/drivers/acpi/button.c
> > > +++ b/drivers/acpi/button.c
> > > @@ -114,7 +114,7 @@ struct acpi_button {
> > >
> > > static DEFINE_MUTEX(button_input_lock);
> > >
> > > -static BLOCKING_NOTIFIER_HEAD(acpi_lid_notifier);
> > > +static RAW_NOTIFIER_HEAD(acpi_lid_notifier);
> > > static struct acpi_device *lid_device;
> > > static u8 lid_init_state = ACPI_BUTTON_LID_INIT_METHOD;
> > >
> > > @@ -179,14 +179,12 @@ static int acpi_lid_evaluate_state(struct acpi_device *device)
> > > return lid_state ? 1 : 0;
> > > }
> > >
> > > -static int acpi_lid_notify_state(struct acpi_device *device, int state)
> > > +static void acpi_lid_notify_state(struct acpi_device *device, int state)
> > > {
> > > struct acpi_button *button = acpi_driver_data(device);
> > >
> > > - /* button_input_lock must be held */
> > > -
> > > if (!button->input)
> > > - return 0;
> > > + return;
> > >
> > > /*
> > > * If the lid is unreliable, always send an "open" event before any
> > > @@ -201,8 +199,6 @@ static int acpi_lid_notify_state(struct acpi_device *device, int state)
> > >
> > > if (state)
> > > pm_wakeup_hard_event(&device->dev);
> > > -
> > > - return 0;
> > > }
> > >
> > > /*
> > > @@ -214,28 +210,14 @@ static void acpi_button_lid_events(struct input_handle *handle,
> > > {
> > > const struct input_value *v;
> > > int state = -1;
> > > - int ret;
> > >
> > > for (v = vals; v != vals + count; v++) {
> > > switch (v->type) {
> > > case EV_SYN:
> > > - if (v->code == SYN_REPORT && state >= 0) {
> > > - ret = blocking_notifier_call_chain(&acpi_lid_notifier,
> > > + if (v->code == SYN_REPORT && state >= 0)
> > > + (void)raw_notifier_call_chain(&acpi_lid_notifier,
> > > state,
> > > lid_device);
> > > - if (ret == NOTIFY_DONE)
> > > - ret = blocking_notifier_call_chain(&acpi_lid_notifier,
> > > - state,
> > > - lid_device);
> > > - if (ret == NOTIFY_DONE || ret == NOTIFY_OK) {
> > > - /*
> > > - * It is also regarded as success if
> > > - * the notifier_chain returns NOTIFY_OK
> > > - * or NOTIFY_DONE.
> > > - */
> > > - ret = 0;
> > > - }
> > > - }
> > > break;
> > > case EV_SW:
> > > if (v->code == SW_LID)
> > > @@ -433,13 +415,25 @@ static int acpi_button_remove_fs(struct acpi_device *device)
> > > -------------------------------------------------------------------------- */
> > > int acpi_lid_notifier_register(struct notifier_block *nb)
> > > {
> > > - return blocking_notifier_chain_register(&acpi_lid_notifier, nb);
> > > + return raw_notifier_chain_register(&acpi_lid_notifier, nb);
> > > }
> > > EXPORT_SYMBOL(acpi_lid_notifier_register);
> > >
> > > +static inline int __acpi_lid_notifier_unregister(struct notifier_block *nb,
> > > + bool sync)
> > > +{
> > > + int ret;
> > > +
> > > + ret = raw_notifier_chain_unregister(&acpi_lid_notifier, nb);
> > > + if (sync)
> > > + synchronize_rcu();
> > > +
> > > + return ret;
> > > +}
> > > +
> > > int acpi_lid_notifier_unregister(struct notifier_block *nb)
> > > {
> > > - return blocking_notifier_chain_unregister(&acpi_lid_notifier, nb);
> > > + return __acpi_lid_notifier_unregister(nb, false);
> > > }
> > > EXPORT_SYMBOL(acpi_lid_notifier_unregister);
> > >
> > > @@ -452,40 +446,36 @@ int acpi_lid_open(void)
> > > }
> > > EXPORT_SYMBOL(acpi_lid_open);
> > >
> > > -static int acpi_lid_update_state(struct acpi_device *device)
> > > +static void acpi_lid_update_state(struct acpi_device *device)
> > > {
> > > int state;
> > >
> > > state = acpi_lid_evaluate_state(device);
> > > if (state < 0)
> > > - return state;
> > > + return;
> > >
> > > - return acpi_lid_notify_state(device, state);
> > > + acpi_lid_notify_state(device, state);
> > > }
> > >
> > > -static int acpi_lid_notify(struct acpi_device *device)
> > > +static void acpi_lid_notify(struct acpi_device *device)
> > > {
> > > struct acpi_button *button = acpi_driver_data(device);
> > > - int ret;
> > >
> > > mutex_lock(&button_input_lock);
> > > if (!button->input)
> > > acpi_button_add_input(device);
> > > - ret = acpi_lid_update_state(device);
> > > + acpi_lid_update_state(device);
> > > mutex_unlock(&button_input_lock);
> > > -
> > > -
> > > - return ret;
> > > }
> > >
> > > static void acpi_lid_initialize_state(struct acpi_device *device)
> > > {
> > > switch (lid_init_state) {
> > > case ACPI_BUTTON_LID_INIT_OPEN:
> > > - (void)acpi_lid_notify_state(device, 1);
> > > + acpi_lid_notify_state(device, 1);
> > > break;
> > > case ACPI_BUTTON_LID_INIT_METHOD:
> > > - (void)acpi_lid_update_state(device);
> > > + acpi_lid_update_state(device);
> > > break;
> > > case ACPI_BUTTON_LID_INIT_IGNORE:
> > > default:
> > > @@ -641,11 +631,7 @@ static int acpi_lid_update_reliable(struct acpi_device *device)
> > > if (error)
> > > return error;
> > >
> > > - error = acpi_lid_update_state(device);
> > > - if (error) {
> > > - acpi_button_remove_input(device);
> > > - return error;
> > > - }
> > > + acpi_lid_update_state(device);
> > > }
> > >
> > > if (!lid_reliable && button->input)
> > > --
> > > 2.9.4
> >

2017-06-07 09:57:08

by Zheng, Lv

[permalink] [raw]
Subject: RE: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown

Hi, Benjamin

> From: [email protected] [mailto:[email protected]] On Behalf Of Benjamin
> Tissoires
> Subject: Re: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown
>
> Hi Lv,
>
> On Jun 05 2017 or thereabouts, Zheng, Lv wrote:
> > Hi, Benjamin
> >
> > > From: Benjamin Tissoires [mailto:[email protected]]
> > > Subject: [WIP PATCH 2/4] ACPI: button: remove the LID input node when the state is unknown
>
> > My dell latitude 6430u test platform sends multiple Notify(lid) before suspend and after resume.
>
> Does this platform requires the not lid_reliable check as per this
> series? Because if it doesn't, then we should not care.

No need to mark lid_reliable.

> > This is because the aml table puts many Notify(LID, 0x80) in various control methods.
> > And not one of them but multiple of them will be invoked by various OS drivers during suspend/resume
> period.
> > I think this is not an isolated platform that will invoke multiple redundant "Notify(lid)".
> >
> > Fortunately, the lid state for the multiple notify(lid) should be same as the first "Notify(lid)".
> > I suppose this is why SW_LID is invented, as it can really filter such redundant events.
> > And user space finally can only see 1 "close" event.
> >
> > But unconditionally prepending "open" before all "close" events surely can break the logic by
> > delivering multiple "close" events to the user space.
>
> That doesn't matter. What matters is the state of the switch, not the
> event. So if user space receives (in case we marked the switch as not
> reliable) several close events, all user space will do is realize that
> the state is still closed and will act accordingly.

OK, I tried to address this here:
https://patchwork.kernel.org/patch/9771121/

> > Another issue is, for case 5, when we use button.lid_init_state=method.
> > Unconditionally prepending "open" before driver initiated "close" event
> > sent due to acpi_lid_initialize_state(), we will see suspend/resume cycles.
>
> Case 5 is broken anyway and needs to be handled specially. It was not
> targeted in this WIP series.

It was addressed by button.lid_init_state=open and newer systemd.
It's not broken any more.


> > Thus if we consider both cases, we should:
> > 1. put a frequency check to filter possible redundant events.
>
> This doesn't work and should be avoided. The state of the input switch
> is known to the input layer only, and given there are spinlocks, you can
> not know if the state is actually the one you expected beforehand.
>
> You can however add frequency checks in the input handler, but that
> would assume the input layer is not doing its job properly and so should
> be avoided.

OK, I dropped frequency check mechanism.

> > 2. distinguish AML "Notify" call and button driver initiated lid notification.
>
> Again, we don't care if the "event" comes from ACPI, the driver itself or
> user space (libinput). All that matters is the current state of the
> input node switch, that needs to match the physical world at any time.

That depends on the final test result.
However I managed to get systemd working with case 2,4 using this commit:
https://patchwork.kernel.org/patch/9771121/

> > This is another major differences between your proposal and mine.
> >
> > First of all, I think it should be in a separate patch.
>
> Well, that's already a patch on its own :/
>
> >
> > Second, I have concerns related to such a change:
> > I can see that, you are trying to address a problem that:
> > The input layer requires a determined initial SW_LID state while ACPI button driver cannot offer.
> > So by adding/removing input node, you can introduce a tristate SW_LID input node.
>
> You can put it that way. I prefer putting it: "when we export the LID
> switch input node, you are guaranteed to have the proper state".
>
> > However I doubt if this is necessary and can solve real issues, as:
> > systemd now works fine with button driver for all cases,
>
> I do not care about systemd or the suspend lopps introduced by systemd.
> All I care is that the kernel provides correct behavior. If systemd can
> work around some issues we see because we are too lazy to fix them in
> the kernel (this is not a personal attack, sometimes being lazy is the
> right solution), fine. But the current state of this driver doesn't
> follow the specification of the input subsystem on some platform, and
> this is what this series fixes.

No, we just can see if case 5 is properly addressed (like current systemd 233 does),
we don't need to do anything.
So if you still insist to fix systemd 229, I just post an RFC:
https://patchwork.kernel.org/patch/9771121/

> > only desktop managers should be changed to be compliant to case 2/4/5
>
> As long as the kernel lies, we should not even remotely envision asking
> user space to change.

By reverting back to "method" mode and fixing "method" mode, there is no regression.
What if we just stay to see what will happen next with MS Surface Pro 1 docking station support.
Maybe it is minor.

> > (or if we improve by adding a timer, they should only be changed to be compliant to case 5).
> > And doing this won't help desktop managers to be able to work fine with case 5.
> > So this finally may only remove ACPI SW_LID support from Surface Pro 1.
>
> I haven't decided how to solve case 5 completely. So please do not take
> this case into account yet.
>
> > While with latest systemd, closing lid on that platform can correctly triggers suspend.
> > And no suspend/resume cycles should be seen.
> > So why do we need to remove this feature (ACPI SW_LID) from Surface Pro 1?
>
> Well, with this change, if you mark the surface pro 1 as unreliable, the
> event will be forwarded to user space correctly. Just that there is a
> systemd bug in which the state is not synced when the device appears.
>
> So yes, it'll expose a new bug in userspace, but given the blacklist of
> unreliable LID switches will be handled in userspace, user space can
> make sure this will not show up before systemd can handle it.

IMO, we don't need to remove input node, button.lid_init_state=ignore is developed for the same purpose.
And it works perfectly with systemd 233. So we can stay with old solution.
So hope this can replace your aggressive change:
https://patchwork.kernel.org/patch/9771119/

> > If you really want to propose an ABI change for user space.
> > Why don't you do this in input layer by defining SW_LID as tristate?
>
> Because this proposal is not a kABI change, and the kABI change you
> propose is just not doable. We have too many users of EV_SW to be able
> to say that we change the semantic. A solution would be to add a new
> EV_* event, but we don't really need.

Cheers,
Lv

2017-06-13 10:06:31

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi,

[Sorry for the delay, I have been sidetracked from this]

On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> On Thu, 01.06.17 20:46, Benjamin Tissoires ([email protected]) wrote:
>
> > Hi,
> >
> > Sending this as a WIP as it still need a few changes, but it mostly works as
> > expected (still not fully compliant yet).
> >
> > So this is based on Lennart's comment in [1]: if the LID state is not reliable,
> > the kernel should not export the LID switch device as long as we are not sure
> > about its state.
>
> Ah nice! I (obviously) like this approach.

Heh. Now I just need to convince Lv that it's the right approach.

>
> > Note that systemd currently doesn't sync the state when the input node just
> > appears. This is a systemd bug, and it should not be handled by the kernel
> > community.
>
> Uh if this is borked, we should indeed fix this in systemd. Is there
> already a systemd github bug about this? If not, please create one,
> and we'll look into it!

I don't think there is. I haven't raised it yet because I am not so sure
this will not break again those worthless unreliable LID, and if we play
whack a mole between the kernel and user space, things are going to be
nasty. So I'd rather have this fixed in systemd along with the
unreliable LID switch knowledge, so we are sure that the kernel behaves
the way we expect it to be.

>
> Thanks for working on this,
>

No worries.

Cheers,
Benjamin

2017-06-14 23:50:28

by Zheng, Lv

[permalink] [raw]
Subject: RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi,

> From: Lennart Poettering [mailto:[email protected]]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
>
> On Thu, 01.06.17 20:46, Benjamin Tissoires ([email protected]) wrote:
>
> > Hi,
> >
> > Sending this as a WIP as it still need a few changes, but it mostly works as
> > expected (still not fully compliant yet).
> >
> > So this is based on Lennart's comment in [1]: if the LID state is not reliable,
> > the kernel should not export the LID switch device as long as we are not sure
> > about its state.
>
> Ah nice! I (obviously) like this approach.
>
> > Note that systemd currently doesn't sync the state when the input node just
> > appears. This is a systemd bug, and it should not be handled by the kernel
> > community.
>
> Uh if this is borked, we should indeed fix this in systemd. Is there
> already a systemd github bug about this? If not, please create one,
> and we'll look into it!

This is not my opinion.
My opinion is as follows.

We confirmed Ubuntu shipped systemd (version 229) with "reliable|unreliable" platforms.
We can see 2 problems:
1. LID_OPEN cannot cancel an on-going suspend sequence
After boot, if user space receives "LID_CLOSE" key event,
systemd may not suspend the platform right after seeing the event,
it may suspend the platform several seconds later.
This is not a problem.

The problem is, if "LID_OPEN" is sent within this deferring period,
Systemd doesn't cancel previously scheduled "suspend".
And the platform may be suspended with lid opened.
Then users need to close and re-open the lid to wake the system up.
Causing another "LID_CLOSE/LID_OPEN" sequence delivered to the user space after resume.
Users then can see a suspend/resume loop.
This problem can even be seen on "reliable" platforms.
It can be easily triggered by user actions.
2. Need explicit LID_OPEN to stay woken-up
After boot, systemd seems to be wait for a significant "LID_OPEN".
If it cannot see a "LID_OPEN" within several seconds,
it suspends the platform.
So if a platform doesn't send "LID_OPEN" or fails to send "LID_OPEN" within this period.
Users then can see a suspend/resume loop.

However we've tested with github cloned systemd (version 233).
The 2 problems seem to have been fixed.
It works well with current ACPI button driver,
but you need to boot linux kernel with button.lid_init_state=ignore.
I don't know the story of the improvement.
Systemd developers should know that better than me.

So IMO, systemd needn't do any further improvement.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

But the kernel button driver implements several "lid_init_state" modes.
It appears "method" mode is determined to be the default mode.
Thus we need to do:
1. improve button driver "method" mode to make systemd 233 work well with it.
2. determine if we need to improve button driver to make it work well with systemd 229.

Thanks and best regards
Lv

2017-06-15 02:56:32

by Zheng, Lv

[permalink] [raw]
Subject: RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi, Benjamin

> From: Benjamin Tissoires [mailto:[email protected]]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
>
> Hi,
>
> [Sorry for the delay, I have been sidetracked from this]
>
> On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > On Thu, 01.06.17 20:46, Benjamin Tissoires ([email protected]) wrote:
> >
> > > Hi,
> > >
> > > Sending this as a WIP as it still need a few changes, but it mostly works as
> > > expected (still not fully compliant yet).
> > >
> > > So this is based on Lennart's comment in [1]: if the LID state is not reliable,
> > > the kernel should not export the LID switch device as long as we are not sure
> > > about its state.
> >
> > Ah nice! I (obviously) like this approach.
>
> Heh. Now I just need to convince Lv that it's the right approach.

I feel we don't have big conflicts.
And I already took part of your idea into this patchset:
https://patchwork.kernel.org/patch/9771121/
https://patchwork.kernel.org/patch/9771119/
I tested my surface pros with Ubuntu, they are working as expected.

> > > Note that systemd currently doesn't sync the state when the input node just
> > > appears. This is a systemd bug, and it should not be handled by the kernel
> > > community.
> >
> > Uh if this is borked, we should indeed fix this in systemd. Is there
> > already a systemd github bug about this? If not, please create one,
> > and we'll look into it!
>
> I don't think there is. I haven't raised it yet because I am not so sure
> this will not break again those worthless unreliable LID, and if we play
> whack a mole between the kernel and user space, things are going to be
> nasty. So I'd rather have this fixed in systemd along with the
> unreliable LID switch knowledge, so we are sure that the kernel behaves
> the way we expect it to be.

This is my feeling:
We needn't go that far.
We can interpret "input node appears" into "default input node state".
That's what you want for acpi button driver - we now defaults to "method" mode.

What's your opinion?

Thanks
Lv

2017-06-15 06:49:44

by Peter Hutterer

[permalink] [raw]
Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

On Thu, Jun 15, 2017 at 02:52:57AM +0000, Zheng, Lv wrote:
> Hi, Benjamin
>
> > From: Benjamin Tissoires [mailto:[email protected]]
> > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> >
> > Hi,
> >
> > [Sorry for the delay, I have been sidetracked from this]
> >
> > On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > > On Thu, 01.06.17 20:46, Benjamin Tissoires ([email protected]) wrote:
> > >
> > > > Hi,
> > > >
> > > > Sending this as a WIP as it still need a few changes, but it mostly works as
> > > > expected (still not fully compliant yet).
> > > >
> > > > So this is based on Lennart's comment in [1]: if the LID state is not reliable,
> > > > the kernel should not export the LID switch device as long as we are not sure
> > > > about its state.
> > >
> > > Ah nice! I (obviously) like this approach.
> >
> > Heh. Now I just need to convince Lv that it's the right approach.
>
> I feel we don't have big conflicts.
> And I already took part of your idea into this patchset:
> https://patchwork.kernel.org/patch/9771121/
> https://patchwork.kernel.org/patch/9771119/
> I tested my surface pros with Ubuntu, they are working as expected.
>
> > > > Note that systemd currently doesn't sync the state when the input node just
> > > > appears. This is a systemd bug, and it should not be handled by the kernel
> > > > community.
> > >
> > > Uh if this is borked, we should indeed fix this in systemd. Is there
> > > already a systemd github bug about this? If not, please create one,
> > > and we'll look into it!
> >
> > I don't think there is. I haven't raised it yet because I am not so sure
> > this will not break again those worthless unreliable LID, and if we play
> > whack a mole between the kernel and user space, things are going to be
> > nasty. So I'd rather have this fixed in systemd along with the
> > unreliable LID switch knowledge, so we are sure that the kernel behaves
> > the way we expect it to be.
>
> This is my feeling:
> We needn't go that far.
> We can interpret "input node appears" into "default input node state".

Sorry, can you clarify this bit please? I'm not sure what you mean here.
Note that there's an unknown amount of time between "device node appearing
in the system" and when a userspace process actually opens it and looks at
its state. By then, the node may have changed state again.

Cheers,
Peter

> That's what you want for acpi button driver - we now defaults to "method" mode.
>
> What's your opinion?
>
> Thanks
> Lv
>

2017-06-15 07:34:15

by Zheng, Lv

[permalink] [raw]
Subject: RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi, Peter

> From: Peter Hutterer [mailto:[email protected]]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
>
> On Thu, Jun 15, 2017 at 02:52:57AM +0000, Zheng, Lv wrote:
> > Hi, Benjamin
> >
> > > From: Benjamin Tissoires [mailto:[email protected]]
> > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> > >
> > > Hi,
> > >
> > > [Sorry for the delay, I have been sidetracked from this]
> > >
> > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > > > On Thu, 01.06.17 20:46, Benjamin Tissoires ([email protected]) wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Sending this as a WIP as it still need a few changes, but it mostly works as
> > > > > expected (still not fully compliant yet).
> > > > >
> > > > > So this is based on Lennart's comment in [1]: if the LID state is not reliable,
> > > > > the kernel should not export the LID switch device as long as we are not sure
> > > > > about its state.
> > > >
> > > > Ah nice! I (obviously) like this approach.
> > >
> > > Heh. Now I just need to convince Lv that it's the right approach.
> >
> > I feel we don't have big conflicts.
> > And I already took part of your idea into this patchset:
> > https://patchwork.kernel.org/patch/9771121/
> > https://patchwork.kernel.org/patch/9771119/
> > I tested my surface pros with Ubuntu, they are working as expected.
> >
> > > > > Note that systemd currently doesn't sync the state when the input node just
> > > > > appears. This is a systemd bug, and it should not be handled by the kernel
> > > > > community.
> > > >
> > > > Uh if this is borked, we should indeed fix this in systemd. Is there
> > > > already a systemd github bug about this? If not, please create one,
> > > > and we'll look into it!
> > >
> > > I don't think there is. I haven't raised it yet because I am not so sure
> > > this will not break again those worthless unreliable LID, and if we play
> > > whack a mole between the kernel and user space, things are going to be
> > > nasty. So I'd rather have this fixed in systemd along with the
> > > unreliable LID switch knowledge, so we are sure that the kernel behaves
> > > the way we expect it to be.
> >
> > This is my feeling:
> > We needn't go that far.
> > We can interpret "input node appears" into "default input node state".
>
> Sorry, can you clarify this bit please? I'm not sure what you mean here.
> Note that there's an unknown amount of time between "device node appearing
> in the system" and when a userspace process actually opens it and looks at
> its state. By then, the node may have changed state again.

We can see:
"logind" has already implemented a timeout, and will not respond lid state
unless it can be stable within this timeout period.
I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?

I feel "removing the input node for a period where its state is not trustful"
is technically identical to this mechanism.

Cheers,
Lv

>
> Cheers,
> Peter
>
> > That's what you want for acpi button driver - we now defaults to "method" mode.
> >
> > What's your opinion?
> >
> > Thanks
> > Lv
> >

2017-06-15 07:58:21

by Peter Hutterer

[permalink] [raw]
Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

On Thu, Jun 15, 2017 at 07:33:58AM +0000, Zheng, Lv wrote:
> Hi, Peter
>
> > From: Peter Hutterer [mailto:[email protected]]
> > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> >
> > On Thu, Jun 15, 2017 at 02:52:57AM +0000, Zheng, Lv wrote:
> > > Hi, Benjamin
> > >
> > > > From: Benjamin Tissoires [mailto:[email protected]]
> > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> > > >
> > > > Hi,
> > > >
> > > > [Sorry for the delay, I have been sidetracked from this]
> > > >
> > > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > > > > On Thu, 01.06.17 20:46, Benjamin Tissoires ([email protected]) wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Sending this as a WIP as it still need a few changes, but it mostly works as
> > > > > > expected (still not fully compliant yet).
> > > > > >
> > > > > > So this is based on Lennart's comment in [1]: if the LID state is not reliable,
> > > > > > the kernel should not export the LID switch device as long as we are not sure
> > > > > > about its state.
> > > > >
> > > > > Ah nice! I (obviously) like this approach.
> > > >
> > > > Heh. Now I just need to convince Lv that it's the right approach.
> > >
> > > I feel we don't have big conflicts.
> > > And I already took part of your idea into this patchset:
> > > https://patchwork.kernel.org/patch/9771121/
> > > https://patchwork.kernel.org/patch/9771119/
> > > I tested my surface pros with Ubuntu, they are working as expected.
> > >
> > > > > > Note that systemd currently doesn't sync the state when the input node just
> > > > > > appears. This is a systemd bug, and it should not be handled by the kernel
> > > > > > community.
> > > > >
> > > > > Uh if this is borked, we should indeed fix this in systemd. Is there
> > > > > already a systemd github bug about this? If not, please create one,
> > > > > and we'll look into it!
> > > >
> > > > I don't think there is. I haven't raised it yet because I am not so sure
> > > > this will not break again those worthless unreliable LID, and if we play
> > > > whack a mole between the kernel and user space, things are going to be
> > > > nasty. So I'd rather have this fixed in systemd along with the
> > > > unreliable LID switch knowledge, so we are sure that the kernel behaves
> > > > the way we expect it to be.
> > >
> > > This is my feeling:
> > > We needn't go that far.
> > > We can interpret "input node appears" into "default input node state".
> >
> > Sorry, can you clarify this bit please? I'm not sure what you mean here.
> > Note that there's an unknown amount of time between "device node appearing
> > in the system" and when a userspace process actually opens it and looks at
> > its state. By then, the node may have changed state again.
>
> We can see:
> "logind" has already implemented a timeout, and will not respond lid state
> unless it can be stable within this timeout period.
> I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?
>
> I feel "removing the input node for a period where its state is not trustful"
> is technically identical to this mechanism.

but you'd be making kernel policy based on one userspace implementation.
e.g. libinput doesn't have a timeout period, it assumes the state is
correct when an input node is present.

Cheers,
Peter

2017-06-16 05:37:29

by Zheng, Lv

[permalink] [raw]
Subject: RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi,

> From: [email protected] [mailto:[email protected]] On Behalf Of Peter
> Hutterer
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
>
> On Thu, Jun 15, 2017 at 07:33:58AM +0000, Zheng, Lv wrote:
> > Hi, Peter
> >
> > > From: Peter Hutterer [mailto:[email protected]]
> > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> > >
> > > On Thu, Jun 15, 2017 at 02:52:57AM +0000, Zheng, Lv wrote:
> > > > Hi, Benjamin
> > > >
> > > > > From: Benjamin Tissoires [mailto:[email protected]]
> > > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> > > > >
> > > > > Hi,
> > > > >
> > > > > [Sorry for the delay, I have been sidetracked from this]
> > > > >
> > > > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > > > > > On Thu, 01.06.17 20:46, Benjamin Tissoires ([email protected]) wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > Sending this as a WIP as it still need a few changes, but it mostly works as
> > > > > > > expected (still not fully compliant yet).
> > > > > > >
> > > > > > > So this is based on Lennart's comment in [1]: if the LID state is not reliable,
> > > > > > > the kernel should not export the LID switch device as long as we are not sure
> > > > > > > about its state.
> > > > > >
> > > > > > Ah nice! I (obviously) like this approach.
> > > > >
> > > > > Heh. Now I just need to convince Lv that it's the right approach.
> > > >
> > > > I feel we don't have big conflicts.
> > > > And I already took part of your idea into this patchset:
> > > > https://patchwork.kernel.org/patch/9771121/
> > > > https://patchwork.kernel.org/patch/9771119/
> > > > I tested my surface pros with Ubuntu, they are working as expected.
> > > >
> > > > > > > Note that systemd currently doesn't sync the state when the input node just
> > > > > > > appears. This is a systemd bug, and it should not be handled by the kernel
> > > > > > > community.
> > > > > >
> > > > > > Uh if this is borked, we should indeed fix this in systemd. Is there
> > > > > > already a systemd github bug about this? If not, please create one,
> > > > > > and we'll look into it!
> > > > >
> > > > > I don't think there is. I haven't raised it yet because I am not so sure
> > > > > this will not break again those worthless unreliable LID, and if we play
> > > > > whack a mole between the kernel and user space, things are going to be
> > > > > nasty. So I'd rather have this fixed in systemd along with the
> > > > > unreliable LID switch knowledge, so we are sure that the kernel behaves
> > > > > the way we expect it to be.
> > > >
> > > > This is my feeling:
> > > > We needn't go that far.
> > > > We can interpret "input node appears" into "default input node state".
> > >
> > > Sorry, can you clarify this bit please? I'm not sure what you mean here.
> > > Note that there's an unknown amount of time between "device node appearing
> > > in the system" and when a userspace process actually opens it and looks at
> > > its state. By then, the node may have changed state again.
> >
> > We can see:
> > "logind" has already implemented a timeout, and will not respond lid state
> > unless it can be stable within this timeout period.
> > I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?
> >
> > I feel "removing the input node for a period where its state is not trustful"
> > is technically identical to this mechanism.
>
> but you'd be making kernel policy based on one userspace implementation.
> e.g. libinput doesn't have a timeout period, it assumes the state is
> correct when an input node is present.

Do you see practical issues?
If not, should we avoid over-engineering at this moment?

After resume, SW_LID state could remain unreliable "close" for a while.
But that's just a kind of delay happens in all computing programs.
I suppose all power managing programs have already handled that.
I confirmed no breakage for systemd 233.
For systemd 229, it cannot handle it well due to bugs.
But my latest patch series has worked the bug around.
So I don't see any breakage related to post-resume incorrect state period.
Do you see problems that my tests haven't covered?

So I wonder if you mean:
After boot, button driver should create input node right before sending first input report.
Is this exactly what you want me to improve?
If so, please also let me know if you have seen real issues related to this?

Cheers,
Lv

2017-06-16 07:23:34

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

On Jun 16 2017 or thereabouts, Zheng, Lv wrote:
> Hi,
>
> > From: [email protected] [mailto:[email protected]] On Behalf Of Peter
> > Hutterer
> > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> >
> > On Thu, Jun 15, 2017 at 07:33:58AM +0000, Zheng, Lv wrote:
> > > Hi, Peter
> > >
> > > > From: Peter Hutterer [mailto:[email protected]]
> > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> > > >
> > > > On Thu, Jun 15, 2017 at 02:52:57AM +0000, Zheng, Lv wrote:
> > > > > Hi, Benjamin
> > > > >
> > > > > > From: Benjamin Tissoires [mailto:[email protected]]
> > > > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > [Sorry for the delay, I have been sidetracked from this]
> > > > > >
> > > > > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > > > > > > On Thu, 01.06.17 20:46, Benjamin Tissoires ([email protected]) wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > Sending this as a WIP as it still need a few changes, but it mostly works as
> > > > > > > > expected (still not fully compliant yet).
> > > > > > > >
> > > > > > > > So this is based on Lennart's comment in [1]: if the LID state is not reliable,
> > > > > > > > the kernel should not export the LID switch device as long as we are not sure
> > > > > > > > about its state.
> > > > > > >
> > > > > > > Ah nice! I (obviously) like this approach.
> > > > > >
> > > > > > Heh. Now I just need to convince Lv that it's the right approach.
> > > > >
> > > > > I feel we don't have big conflicts.
> > > > > And I already took part of your idea into this patchset:
> > > > > https://patchwork.kernel.org/patch/9771121/
> > > > > https://patchwork.kernel.org/patch/9771119/
> > > > > I tested my surface pros with Ubuntu, they are working as expected.
> > > > >
> > > > > > > > Note that systemd currently doesn't sync the state when the input node just
> > > > > > > > appears. This is a systemd bug, and it should not be handled by the kernel
> > > > > > > > community.
> > > > > > >
> > > > > > > Uh if this is borked, we should indeed fix this in systemd. Is there
> > > > > > > already a systemd github bug about this? If not, please create one,
> > > > > > > and we'll look into it!
> > > > > >
> > > > > > I don't think there is. I haven't raised it yet because I am not so sure
> > > > > > this will not break again those worthless unreliable LID, and if we play
> > > > > > whack a mole between the kernel and user space, things are going to be
> > > > > > nasty. So I'd rather have this fixed in systemd along with the
> > > > > > unreliable LID switch knowledge, so we are sure that the kernel behaves
> > > > > > the way we expect it to be.
> > > > >
> > > > > This is my feeling:
> > > > > We needn't go that far.
> > > > > We can interpret "input node appears" into "default input node state".
> > > >
> > > > Sorry, can you clarify this bit please? I'm not sure what you mean here.
> > > > Note that there's an unknown amount of time between "device node appearing
> > > > in the system" and when a userspace process actually opens it and looks at
> > > > its state. By then, the node may have changed state again.
> > >
> > > We can see:
> > > "logind" has already implemented a timeout, and will not respond lid state
> > > unless it can be stable within this timeout period.
> > > I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?
> > >
> > > I feel "removing the input node for a period where its state is not trustful"
> > > is technically identical to this mechanism.
> >
> > but you'd be making kernel policy based on one userspace implementation.
> > e.g. libinput doesn't have a timeout period, it assumes the state is
> > correct when an input node is present.
>
> Do you see practical issues?

Yes, libinput can't rely on the LID switch information to disable
touchpads/touchscreens that are potentially sending false positive.

> If not, should we avoid over-engineering at this moment?

It's not over-engineering. You are changing the specification of the
input node EV_SW event. And if systemd-whatever-version works currently
with your current patch, as long as you do not stick to the protocol
specification, systemd-whatever-version+N can break this. They will
be legitimate to do so because the kernel is not following the protocol.

>
> After resume, SW_LID state could remain unreliable "close" for a while.

This is not an option. It is not part of the protocol, having an
unreliable state.

> But that's just a kind of delay happens in all computing programs.
> I suppose all power managing programs have already handled that.
> I confirmed no breakage for systemd 233.
> For systemd 229, it cannot handle it well due to bugs.
> But my latest patch series has worked the bug around.
> So I don't see any breakage related to post-resume incorrect state period.
> Do you see problems that my tests haven't covered?

The problems are that you are not following the protocol. And if systemd
233 works around it, that's good, but systemd is not the only listener
of the LID switch input node, and you are still breaking those by
refusing to follow the specification of the evdev protocol.

>
> So I wonder if you mean:
> After boot, button driver should create input node right before sending first input report.
> Is this exactly what you want me to improve?

No, Peter doesn't want you to improve anything (neither do I). The
series I sent here as a WIP already does that: when the state is
unknown, the input node disappears (or is not presented to user-space at
all).

The good point of that is that *all* user space clients know how to
behave when there is no LID switch input node (that's a pretty common
use case, the desktop workstation).
But none but maybe systemd 233 can handle some transient state where the
LID switch reports garbage.

So really, there is nothing to add there, let me finish both the kernel
part and raise the appropriate bugs in systemd (or send PR) to have a
fully finished future proof solution.

Cheers,
Benjamin

> If so, please also let me know if you have seen real issues related to this?
>
> Cheers,
> Lv

2017-06-16 07:46:04

by Zheng, Lv

[permalink] [raw]
Subject: RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi, Benjamin

Let me just say something one more time.

> From: Benjamin Tissoires [mailto:[email protected]]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
>
> On Jun 16 2017 or thereabouts, Zheng, Lv wrote:
> > Hi,
> >
> > > From: [email protected] [mailto:[email protected]] On Behalf Of
> Peter
> > > Hutterer
> > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> > >
> > > On Thu, Jun 15, 2017 at 07:33:58AM +0000, Zheng, Lv wrote:
> > > > Hi, Peter
> > > >
> > > > > From: Peter Hutterer [mailto:[email protected]]
> > > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> > > > >
> > > > > On Thu, Jun 15, 2017 at 02:52:57AM +0000, Zheng, Lv wrote:
> > > > > > Hi, Benjamin
> > > > > >
> > > > > > > From: Benjamin Tissoires [mailto:[email protected]]
> > > > > > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by
> ACPI
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > [Sorry for the delay, I have been sidetracked from this]
> > > > > > >
> > > > > > > On Jun 07 2017 or thereabouts, Lennart Poettering wrote:
> > > > > > > > On Thu, 01.06.17 20:46, Benjamin Tissoires ([email protected]) wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > Sending this as a WIP as it still need a few changes, but it mostly works as
> > > > > > > > > expected (still not fully compliant yet).
> > > > > > > > >
> > > > > > > > > So this is based on Lennart's comment in [1]: if the LID state is not reliable,
> > > > > > > > > the kernel should not export the LID switch device as long as we are not sure
> > > > > > > > > about its state.
> > > > > > > >
> > > > > > > > Ah nice! I (obviously) like this approach.
> > > > > > >
> > > > > > > Heh. Now I just need to convince Lv that it's the right approach.
> > > > > >
> > > > > > I feel we don't have big conflicts.
> > > > > > And I already took part of your idea into this patchset:
> > > > > > https://patchwork.kernel.org/patch/9771121/
> > > > > > https://patchwork.kernel.org/patch/9771119/
> > > > > > I tested my surface pros with Ubuntu, they are working as expected.
> > > > > >
> > > > > > > > > Note that systemd currently doesn't sync the state when the input node just
> > > > > > > > > appears. This is a systemd bug, and it should not be handled by the kernel
> > > > > > > > > community.
> > > > > > > >
> > > > > > > > Uh if this is borked, we should indeed fix this in systemd. Is there
> > > > > > > > already a systemd github bug about this? If not, please create one,
> > > > > > > > and we'll look into it!
> > > > > > >
> > > > > > > I don't think there is. I haven't raised it yet because I am not so sure
> > > > > > > this will not break again those worthless unreliable LID, and if we play
> > > > > > > whack a mole between the kernel and user space, things are going to be
> > > > > > > nasty. So I'd rather have this fixed in systemd along with the
> > > > > > > unreliable LID switch knowledge, so we are sure that the kernel behaves
> > > > > > > the way we expect it to be.
> > > > > >
> > > > > > This is my feeling:
> > > > > > We needn't go that far.
> > > > > > We can interpret "input node appears" into "default input node state".
> > > > >
> > > > > Sorry, can you clarify this bit please? I'm not sure what you mean here.
> > > > > Note that there's an unknown amount of time between "device node appearing
> > > > > in the system" and when a userspace process actually opens it and looks at
> > > > > its state. By then, the node may have changed state again.
> > > >
> > > > We can see:
> > > > "logind" has already implemented a timeout, and will not respond lid state
> > > > unless it can be stable within this timeout period.
> > > > I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?
> > > >
> > > > I feel "removing the input node for a period where its state is not trustful"
> > > > is technically identical to this mechanism.
> > >
> > > but you'd be making kernel policy based on one userspace implementation.
> > > e.g. libinput doesn't have a timeout period, it assumes the state is
> > > correct when an input node is present.
> >
> > Do you see practical issues?
>
> Yes, libinput can't rely on the LID switch information to disable
> touchpads/touchscreens that are potentially sending false positive.

"potential" doesn't mean "practical", right?
After applying my last version.
There are no false-positives IMO.
There are only delays for the reliable key events.
^^^^^^
While the "delay" is very common in computing world.

> > After resume, SW_LID state could remain unreliable "close" for a while.
>
> This is not an option. It is not part of the protocol, having an
> unreliable state.
>
> > But that's just a kind of delay happens in all computing programs.
> > I suppose all power managing programs have already handled that.
> > I confirmed no breakage for systemd 233.
> > For systemd 229, it cannot handle it well due to bugs.
> > But my latest patch series has worked the bug around.
> > So I don't see any breakage related to post-resume incorrect state period.
> > Do you see problems that my tests haven't covered?
>
> The problems are that you are not following the protocol. And if systemd
> 233 works around it, that's good, but systemd is not the only listener
> of the LID switch input node, and you are still breaking those by
> refusing to follow the specification of the evdev protocol.

As you are talking about protocol, let me just ask once.

In computing world,
1. delay is very common
There are bus turnaround, network turnaround, ...
Even measurement itself has delay described by Shannon sampling.
Should the delay be a part of the protocol?
2. programs are acting according to rules (we call state machines)
States are only determined after measurement (like "quantum states")
I have Schroedinger's cat in my mind.
Events are determined as they always occur after measurement to trigger "quantum jumps".
So for EV_SW protocol,
Should programs rely on the reliable "quantum jumps",
Or should programs rely on the unreliable "quantum states"?

I think most UI programs care no about state stored in the input node,
they only receive events raised from the input node.
Why should the kernel export a fade-in/fade-out input node to the UI programs and ask them to change?

The only program that cares about the state stored in the input node is libinput.
So why should every user program be changed to make libinput easier?

Cheers,
Lv

2017-06-16 08:10:07

by Benjamin Tissoires

[permalink] [raw]
Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

On Jun 16 2017 or thereabouts, Zheng, Lv wrote:
> Hi, Benjamin
>
> Let me just say something one more time.
>
> > From: Benjamin Tissoires [mailto:[email protected]]

[snip]
> > > > >
> > > > > We can see:
> > > > > "logind" has already implemented a timeout, and will not respond lid state
> > > > > unless it can be stable within this timeout period.
> > > > > I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?
> > > > >
> > > > > I feel "removing the input node for a period where its state is not trustful"
> > > > > is technically identical to this mechanism.
> > > >
> > > > but you'd be making kernel policy based on one userspace implementation.
> > > > e.g. libinput doesn't have a timeout period, it assumes the state is
> > > > correct when an input node is present.
> > >
> > > Do you see practical issues?
> >
> > Yes, libinput can't rely on the LID switch information to disable
> > touchpads/touchscreens that are potentially sending false positive.
>
> "potential" doesn't mean "practical", right?

I was using potential to say that some actual devices are sending
rightful states, while others are not (we already named them a lot in
those countless threads). So potential here is from a user space
perspective where you are not sure if the state is reliable or not
(given we currently don't have this information about reliability).

> After applying my last version.
> There are no false-positives IMO.
> There are only delays for the reliable key events.
> ^^^^^^
> While the "delay" is very common in computing world.

No, if there is a delay, there is a false positive, because the initial
state is wrong with respect to the device physical state.

>
> > > After resume, SW_LID state could remain unreliable "close" for a while.
> >
> > This is not an option. It is not part of the protocol, having an
> > unreliable state.
> >
> > > But that's just a kind of delay happens in all computing programs.
> > > I suppose all power managing programs have already handled that.
> > > I confirmed no breakage for systemd 233.
> > > For systemd 229, it cannot handle it well due to bugs.
> > > But my latest patch series has worked the bug around.
> > > So I don't see any breakage related to post-resume incorrect state period.
> > > Do you see problems that my tests haven't covered?
> >
> > The problems are that you are not following the protocol. And if systemd
> > 233 works around it, that's good, but systemd is not the only listener
> > of the LID switch input node, and you are still breaking those by
> > refusing to follow the specification of the evdev protocol.
>
> As you are talking about protocol, let me just ask once.
>
> In computing world,
> 1. delay is very common
> There are bus turnaround, network turnaround, ...
> Even measurement itself has delay described by Shannon sampling.
> Should the delay be a part of the protocol?

Please, you are either trolling or just kidding. If there are delays in
the "computing world", these has to be handled by the kernel, and not
exported to the user space if the kernel protocol says that the state is
reliable.

> 2. programs are acting according to rules (we call state machines)
> States are only determined after measurement (like "quantum states")
> I have Schroedinger's cat in my mind.
> Events are determined as they always occur after measurement to trigger "quantum jumps".
> So for EV_SW protocol,
> Should programs rely on the reliable "quantum jumps",
> Or should programs rely on the unreliable "quantum states"?

No comments, this won't get us anywhere.

>
> I think most UI programs care no about state stored in the input node,
> they only receive events raised from the input node.

Bullshit. When you launch such a program, you need to query the state
because you won't receive the event that happened way before the launch.

> Why should the kernel export a fade-in/fade-out input node to the UI programs and ask them to change?
>
> The only program that cares about the state stored in the input node is libinput.
> So why should every user program be changed to make libinput easier?

No, all program that listen to LID switches input nodes care about the
state. We already told you that, you just don't listen:

- systemd cares about the state as it does polling on the input node in
case it misses an event
- libinput cares about the state as previously mentioned
- gnome-setting-daemons cares about the state given it decides whether
or not lighting up the monitors depending on the state. And if you
relaunch the daemon, it'll query the state to decide what is the best
arrangement for the screens
- KDE should do the same (as it is not a daemon that can catch any
events)

And I really don't know why I am telling you that. The rule is simple:
there is a kernel protocol, which is a contract between the kernel and
the user space. With such "delays" this acpi/button.c driver doesn't
follow the protocol. So it's a bug. End of the story.

If you want to have a transient state with delay or such, feel free to
write down a protocol for such tri-state switch, but I doubt this will
be accepted by the input maintainer.

Cheers,
Benjamin

>
> Cheers,
> Lv

2017-06-16 08:53:36

by Zheng, Lv

[permalink] [raw]
Subject: RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi,

> From: Benjamin Tissoires [mailto:[email protected]]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
>
> On Jun 16 2017 or thereabouts, Zheng, Lv wrote:
> > Hi, Benjamin
> >
> > Let me just say something one more time.
> >
> > > From: Benjamin Tissoires [mailto:[email protected]]
>
> [snip]
> > > > > >
> > > > > > We can see:
> > > > > > "logind" has already implemented a timeout, and will not respond lid state
> > > > > > unless it can be stable within this timeout period.
> > > > > > I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?
> > > > > >
> > > > > > I feel "removing the input node for a period where its state is not trustful"
> > > > > > is technically identical to this mechanism.
> > > > >
> > > > > but you'd be making kernel policy based on one userspace implementation.
> > > > > e.g. libinput doesn't have a timeout period, it assumes the state is
> > > > > correct when an input node is present.
> > > >
> > > > Do you see practical issues?
> > >
> > > Yes, libinput can't rely on the LID switch information to disable
> > > touchpads/touchscreens that are potentially sending false positive.
> >
> > "potential" doesn't mean "practical", right?
>
> I was using potential to say that some actual devices are sending
> rightful states, while others are not (we already named them a lot in
> those countless threads). So potential here is from a user space
> perspective where you are not sure if the state is reliable or not
> (given we currently don't have this information about reliability).
>
> > After applying my last version.
> > There are no false-positives IMO.
> > There are only delays for the reliable key events.
> > ^^^^^^
> > While the "delay" is very common in computing world.
>
> No, if there is a delay, there is a false positive, because the initial
> state is wrong with respect to the device physical state.
>
> >
> > > > After resume, SW_LID state could remain unreliable "close" for a while.
> > >
> > > This is not an option. It is not part of the protocol, having an
> > > unreliable state.
> > >
> > > > But that's just a kind of delay happens in all computing programs.
> > > > I suppose all power managing programs have already handled that.
> > > > I confirmed no breakage for systemd 233.
> > > > For systemd 229, it cannot handle it well due to bugs.
> > > > But my latest patch series has worked the bug around.
> > > > So I don't see any breakage related to post-resume incorrect state period.
> > > > Do you see problems that my tests haven't covered?
> > >
> > > The problems are that you are not following the protocol. And if systemd
> > > 233 works around it, that's good, but systemd is not the only listener
> > > of the LID switch input node, and you are still breaking those by
> > > refusing to follow the specification of the evdev protocol.
> >
> > As you are talking about protocol, let me just ask once.
> >
> > In computing world,
> > 1. delay is very common
> > There are bus turnaround, network turnaround, ...
> > Even measurement itself has delay described by Shannon sampling.
> > Should the delay be a part of the protocol?
>
> Please, you are either trolling or just kidding. If there are delays in
> the "computing world", these has to be handled by the kernel, and not
> exported to the user space if the kernel protocol says that the state is
> reliable.
>
> > 2. programs are acting according to rules (we call state machines)
> > States are only determined after measurement (like "quantum states")
> > I have Schroedinger's cat in my mind.
> > Events are determined as they always occur after measurement to trigger "quantum jumps".
> > So for EV_SW protocol,
> > Should programs rely on the reliable "quantum jumps",
> > Or should programs rely on the unreliable "quantum states"?
>
> No comments, this won't get us anywhere.
>
> >
> > I think most UI programs care no about state stored in the input node,
> > they only receive events raised from the input node.
>
> Bullshit. When you launch such a program, you need to query the state
> because you won't receive the event that happened way before the launch.
>
> > Why should the kernel export a fade-in/fade-out input node to the UI programs and ask them to change?
> >
> > The only program that cares about the state stored in the input node is libinput.
> > So why should every user program be changed to make libinput easier?
>
> No, all program that listen to LID switches input nodes care about the
> state. We already told you that, you just don't listen:
>
> - systemd cares about the state as it does polling on the input node in
> case it misses an event
> - libinput cares about the state as previously mentioned
> - gnome-setting-daemons cares about the state given it decides whether
> or not lighting up the monitors depending on the state. And if you
> relaunch the daemon, it'll query the state to decide what is the best
> arrangement for the screens

Let's consider this case with delay:
After resume, gnome-setting-daemon queries SW_LID and got "close".
Then it lights up the wrong monitors.
Then I believe "open" will be delivered to it several seconds later.
Should gnome-setting-daemon light-up correct monitors this time?
So it just looks like user programs behave with a delay accordingly because of the "platform turnaround" delay.

> - KDE should do the same (as it is not a daemon that can catch any
> events)

Cheers,
Lv

2017-06-16 09:06:45

by Bastien Nocera

[permalink] [raw]
Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI



> On 16 Jun 2017, at 10:53, Zheng, Lv <[email protected]> wrote:
>
> Hi,
>
>> From: Benjamin Tissoires [mailto:[email protected]]
>> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
>>
>>> On Jun 16 2017 or thereabouts, Zheng, Lv wrote:
>>> Hi, Benjamin
>>>
>>> Let me just say something one more time.
>>>
>>>> From: Benjamin Tissoires [mailto:[email protected]]
>>
>> [snip]
>>>>>>>
>>>>>>> We can see:
>>>>>>> "logind" has already implemented a timeout, and will not respond lid state
>>>>>>> unless it can be stable within this timeout period.
>>>>>>> I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?
>>>>>>>
>>>>>>> I feel "removing the input node for a period where its state is not trustful"
>>>>>>> is technically identical to this mechanism.
>>>>>>
>>>>>> but you'd be making kernel policy based on one userspace implementation.
>>>>>> e.g. libinput doesn't have a timeout period, it assumes the state is
>>>>>> correct when an input node is present.
>>>>>
>>>>> Do you see practical issues?
>>>>
>>>> Yes, libinput can't rely on the LID switch information to disable
>>>> touchpads/touchscreens that are potentially sending false positive.
>>>
>>> "potential" doesn't mean "practical", right?
>>
>> I was using potential to say that some actual devices are sending
>> rightful states, while others are not (we already named them a lot in
>> those countless threads). So potential here is from a user space
>> perspective where you are not sure if the state is reliable or not
>> (given we currently don't have this information about reliability).
>>
>>> After applying my last version.
>>> There are no false-positives IMO.
>>> There are only delays for the reliable key events.
>>> ^^^^^^
>>> While the "delay" is very common in computing world.
>>
>> No, if there is a delay, there is a false positive, because the initial
>> state is wrong with respect to the device physical state.
>>
>>>
>>>>> After resume, SW_LID state could remain unreliable "close" for a while.
>>>>
>>>> This is not an option. It is not part of the protocol, having an
>>>> unreliable state.
>>>>
>>>>> But that's just a kind of delay happens in all computing programs.
>>>>> I suppose all power managing programs have already handled that.
>>>>> I confirmed no breakage for systemd 233.
>>>>> For systemd 229, it cannot handle it well due to bugs.
>>>>> But my latest patch series has worked the bug around.
>>>>> So I don't see any breakage related to post-resume incorrect state period.
>>>>> Do you see problems that my tests haven't covered?
>>>>
>>>> The problems are that you are not following the protocol. And if systemd
>>>> 233 works around it, that's good, but systemd is not the only listener
>>>> of the LID switch input node, and you are still breaking those by
>>>> refusing to follow the specification of the evdev protocol.
>>>
>>> As you are talking about protocol, let me just ask once.
>>>
>>> In computing world,
>>> 1. delay is very common
>>> There are bus turnaround, network turnaround, ...
>>> Even measurement itself has delay described by Shannon sampling.
>>> Should the delay be a part of the protocol?
>>
>> Please, you are either trolling or just kidding. If there are delays in
>> the "computing world", these has to be handled by the kernel, and not
>> exported to the user space if the kernel protocol says that the state is
>> reliable.
>>
>>> 2. programs are acting according to rules (we call state machines)
>>> States are only determined after measurement (like "quantum states")
>>> I have Schroedinger's cat in my mind.
>>> Events are determined as they always occur after measurement to trigger "quantum jumps".
>>> So for EV_SW protocol,
>>> Should programs rely on the reliable "quantum jumps",
>>> Or should programs rely on the unreliable "quantum states"?
>>
>> No comments, this won't get us anywhere.
>>
>>>
>>> I think most UI programs care no about state stored in the input node,
>>> they only receive events raised from the input node.
>>
>> Bullshit. When you launch such a program, you need to query the state
>> because you won't receive the event that happened way before the launch.
>>
>>> Why should the kernel export a fade-in/fade-out input node to the UI programs and ask them to change?
>>>
>>> The only program that cares about the state stored in the input node is libinput.
>>> So why should every user program be changed to make libinput easier?
>>
>> No, all program that listen to LID switches input nodes care about the
>> state. We already told you that, you just don't listen:
>>
>> - systemd cares about the state as it does polling on the input node in
>> case it misses an event
>> - libinput cares about the state as previously mentioned
>> - gnome-setting-daemons cares about the state given it decides whether
>> or not lighting up the monitors depending on the state. And if you
>> relaunch the daemon, it'll query the state to decide what is the best
>> arrangement for the screens
>
> Let's consider this case with delay:
> After resume, gnome-setting-daemon queries SW_LID and got "close".
> Then it lights up the wrong monitors.
> Then I believe "open" will be delivered to it several seconds later.
> Should gnome-setting-daemon light-up correct monitors this time?
> So it just looks like user programs behave with a delay accordingly because of the "platform turnaround" delay.

If you implement it in such a way that GNOME settings daemon behaves weirdly, you'll get my revert request in the mail. Do. Not. Ever. Lie.

>
>> - KDE should do the same (as it is not a daemon that can catch any
>> events)
>
> Cheers,
> Lv
> ¢éì¹»®&Þ~º&¶¬–+-±éݶ¥Šw®žË›±Êâmébžìbž›­Š{ayºʇڙë,j­¢f£¢·hš‹àz¹®w¥¢¸ ¢·¦j:+v‰¨ŠwèjØm¶Ÿÿ¾«‘êçzZ+ƒùšŽŠÝ¢j"ú!¶i

2017-06-16 16:33:02

by Lennart Poettering

[permalink] [raw]
Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

On Fri, 16.06.17 11:06, Bastien Nocera ([email protected]) wrote:

> > Let's consider this case with delay:
> > After resume, gnome-setting-daemon queries SW_LID and got "close".
> > Then it lights up the wrong monitors.
> > Then I believe "open" will be delivered to it several seconds later.
> > Should gnome-setting-daemon light-up correct monitors this time?
> > So it just looks like user programs behave with a delay accordingly because of the "platform turnaround" delay.
>
> If you implement it in such a way that GNOME settings daemon behaves weirdly, you'll get my revert request in the mail. Do. Not. Ever. Lie.

Just to mention this:

the reason logind applies the timeout and doesn't immediately react to
lid changes is to be friendly to users, if they quickly close and
reopen the lid. It's not supposed to be a work-around around broken
input drivers.

I am very sure that input drivers shouldn't lie to userspace. If you
don't know the state of the switch, then you don#t know it, and should
clarify that to userspace somehow.

Lennart

--
Lennart Poettering, Red Hat

2017-06-19 01:43:59

by Zheng, Lv

[permalink] [raw]
Subject: RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi,

> From: Bastien Nocera [mailto:[email protected]]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
>
>
>
> > On 16 Jun 2017, at 10:53, Zheng, Lv <[email protected]> wrote:
> >
> > Hi,
> >
> >> From: Benjamin Tissoires [mailto:[email protected]]
> >> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
> >>
> >>> On Jun 16 2017 or thereabouts, Zheng, Lv wrote:
> >>> Hi, Benjamin
> >>>
> >>> Let me just say something one more time.
> >>>
> >>>> From: Benjamin Tissoires [mailto:[email protected]]
> >>
> >> [snip]
> >>>>>>>
> >>>>>>> We can see:
> >>>>>>> "logind" has already implemented a timeout, and will not respond lid state
> >>>>>>> unless it can be stable within this timeout period.
> >>>>>>> I'm not an expert of logind, maybe this is because of "HoldOffTimeoutSec"?
> >>>>>>>
> >>>>>>> I feel "removing the input node for a period where its state is not trustful"
> >>>>>>> is technically identical to this mechanism.
> >>>>>>
> >>>>>> but you'd be making kernel policy based on one userspace implementation.
> >>>>>> e.g. libinput doesn't have a timeout period, it assumes the state is
> >>>>>> correct when an input node is present.
> >>>>>
> >>>>> Do you see practical issues?
> >>>>
> >>>> Yes, libinput can't rely on the LID switch information to disable
> >>>> touchpads/touchscreens that are potentially sending false positive.
> >>>
> >>> "potential" doesn't mean "practical", right?
> >>
> >> I was using potential to say that some actual devices are sending
> >> rightful states, while others are not (we already named them a lot in
> >> those countless threads). So potential here is from a user space
> >> perspective where you are not sure if the state is reliable or not
> >> (given we currently don't have this information about reliability).
> >>
> >>> After applying my last version.
> >>> There are no false-positives IMO.
> >>> There are only delays for the reliable key events.
> >>> ^^^^^^
> >>> While the "delay" is very common in computing world.
> >>
> >> No, if there is a delay, there is a false positive, because the initial
> >> state is wrong with respect to the device physical state.
> >>
> >>>
> >>>>> After resume, SW_LID state could remain unreliable "close" for a while.
> >>>>
> >>>> This is not an option. It is not part of the protocol, having an
> >>>> unreliable state.
> >>>>
> >>>>> But that's just a kind of delay happens in all computing programs.
> >>>>> I suppose all power managing programs have already handled that.
> >>>>> I confirmed no breakage for systemd 233.
> >>>>> For systemd 229, it cannot handle it well due to bugs.
> >>>>> But my latest patch series has worked the bug around.
> >>>>> So I don't see any breakage related to post-resume incorrect state period.
> >>>>> Do you see problems that my tests haven't covered?
> >>>>
> >>>> The problems are that you are not following the protocol. And if systemd
> >>>> 233 works around it, that's good, but systemd is not the only listener
> >>>> of the LID switch input node, and you are still breaking those by
> >>>> refusing to follow the specification of the evdev protocol.
> >>>
> >>> As you are talking about protocol, let me just ask once.
> >>>
> >>> In computing world,
> >>> 1. delay is very common
> >>> There are bus turnaround, network turnaround, ...
> >>> Even measurement itself has delay described by Shannon sampling.
> >>> Should the delay be a part of the protocol?
> >>
> >> Please, you are either trolling or just kidding. If there are delays in
> >> the "computing world", these has to be handled by the kernel, and not
> >> exported to the user space if the kernel protocol says that the state is
> >> reliable.
> >>
> >>> 2. programs are acting according to rules (we call state machines)
> >>> States are only determined after measurement (like "quantum states")
> >>> I have Schroedinger's cat in my mind.
> >>> Events are determined as they always occur after measurement to trigger "quantum jumps".
> >>> So for EV_SW protocol,
> >>> Should programs rely on the reliable "quantum jumps",
> >>> Or should programs rely on the unreliable "quantum states"?
> >>
> >> No comments, this won't get us anywhere.
> >>
> >>>
> >>> I think most UI programs care no about state stored in the input node,
> >>> they only receive events raised from the input node.
> >>
> >> Bullshit. When you launch such a program, you need to query the state
> >> because you won't receive the event that happened way before the launch.
> >>
> >>> Why should the kernel export a fade-in/fade-out input node to the UI programs and ask them to
> change?
> >>>
> >>> The only program that cares about the state stored in the input node is libinput.
> >>> So why should every user program be changed to make libinput easier?
> >>
> >> No, all program that listen to LID switches input nodes care about the
> >> state. We already told you that, you just don't listen:
> >>
> >> - systemd cares about the state as it does polling on the input node in
> >> case it misses an event
> >> - libinput cares about the state as previously mentioned
> >> - gnome-setting-daemons cares about the state given it decides whether
> >> or not lighting up the monitors depending on the state. And if you
> >> relaunch the daemon, it'll query the state to decide what is the best
> >> arrangement for the screens
> >
> > Let's consider this case with delay:
> > After resume, gnome-setting-daemon queries SW_LID and got "close".
> > Then it lights up the wrong monitors.
> > Then I believe "open" will be delivered to it several seconds later.
> > Should gnome-setting-daemon light-up correct monitors this time?
> > So it just looks like user programs behave with a delay accordingly because of the "platform
> turnaround" delay.
>
> If you implement it in such a way that GNOME settings daemon behaves weirdly, you'll get my revert
> request in the mail. Do. Not. Ever. Lie.

First, I don't know what should be reverted...
I have 2 solutions here for review, and Benjamin has 1.
And none of them has been upstreamed.
We are just discussing.
However we need to get 1 of them upstreamed in next cycle.

I think users won't startup gnome-setting-daemon right after resume.
It should have already been started.

There is only 1 platform may see delayed state update after resume.
Let's see if there is a practical issue.
1. Before suspend, the "lid state" is "close", and
2. After resume, the state might remain "close" for a while
Since libinput won't deliver close to userspace,
and gnome-setting-daemon listens to key switches, there is no wrong behavior.
3. Then after several seconds, "open" arrives.
gnome-setting-daemon re-arrange monitors and screen layouts in response to the new event.
So there is no problem. IMO, there is no need to improve for post-resume case.

Users will just startup gnome-setting-daemon once after boot.
And it's likely that when it is started, the state is correct.
IMO, there might be a chance to improve for post-boot case using Benjamin's approach.

Thanks
Lv


>
> >
> >> - KDE should do the same (as it is not a daemon that can catch any
> >> events)
> >
> > Cheers,
> > Lv
> > ¢éì¹»®&Þ~º&¶¬–+-±éݶ¥Šw®žË›±Êâmébžìbž›­Š{ayºʇڙë,j
­¢f£¢·hš‹àz¹®w¥¢¸
>
> ¢·¦j:+v‰¨ŠwèjØm¶Ÿÿ¾
«‘êçzZ+ƒùšŽŠÝ¢j"ú!¶i


2017-06-19 02:17:00

by Zheng, Lv

[permalink] [raw]
Subject: RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi, Lennart

> From: Lennart Poettering [mailto:[email protected]]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
>
> On Fri, 16.06.17 11:06, Bastien Nocera ([email protected]) wrote:
>
> > > Let's consider this case with delay:
> > > After resume, gnome-setting-daemon queries SW_LID and got "close".
> > > Then it lights up the wrong monitors.
> > > Then I believe "open" will be delivered to it several seconds later.
> > > Should gnome-setting-daemon light-up correct monitors this time?
> > > So it just looks like user programs behave with a delay accordingly because of the "platform
> turnaround" delay.
> >
> > If you implement it in such a way that GNOME settings daemon behaves weirdly, you'll get my revert
> request in the mail. Do. Not. Ever. Lie.
>
> Just to mention this:
>
> the reason logind applies the timeout and doesn't immediately react to
> lid changes is to be friendly to users, if they quickly close and
> reopen the lid. It's not supposed to be a work-around around broken
> input drivers.

I see, it's same reason for button driver to prepare "lid_report_interval".

I think all old user reports are meaningless to us.
At that time, we found 2 problems in systemd (version below 229):
1. If no "open" event received after resume, systemd suspends the platform.
2. If an "open" event received after a "close" event, the suspend cannot be
cancelled, systemd still suspends the platform.
It looks the 2 problems are 1 single issue that has already been fixed in
recent systemd (I confirmed that this has been fixed in 233).
It's hard for a kernel driver to work these 2 problems around.

>
> I am very sure that input drivers shouldn't lie to userspace. If you
> don't know the state of the switch, then you don#t know it, and should
> clarify that to userspace somehow.

Without considering "Surface Pro 1" case which requires a quirk anyway.

For my version 2 solution, for all other platforms, there is no "lie".
There is only a delay, and it's likely there is only 1 platform suffering
from such a delay.

Considering a platform that suffers from such a delay:
Before the platform sends the "open" event, the old cached state is "close".
And input layer automatically filters redundant "close".
Thus systemd won't see any event after resume.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Then after several seconds, (can be configured by HoldoffTimeoutSec),
The lid status is turned into correct and systemd can see an "open".
So there won't be a problem.
I can tell you that I tested systemd 229/233, no problem can be seen with
version 2 solution on all those platforms.

Since everything works, I mean why we need to change the ACPI driven
SW_LID into a "fade-in/out" input node.

On the contrary:
1. I feel the delay is common:
If an HID device is built on top of USBIP, there is always delays
(several seconds as network turnaround) for its SW_xxx keys if any.
So do we need to change all HID device drivers to export SW keys into
fade-in/out style just because the underlying transport layer may change?
For the case we are discussing, it's just the underlying transport layer
is the platform hardware/firmware and some of them have a huge delay.
2. I feel the delay is inevitable:
If kernel must ensure to resume userspace after determining the wakeup
reason and after the related wakeup source hardware or firmware driver
has synchronized the states. It then will be a long-time-consuming
suspend/resume cycle and cannot meet the fast-idle-entry/exit
requirements of the modern idle platforms. And even worse thing is,
for most of the hardware/firmware drivers, they don't even know that
the hardware/firmware driven by them are the waking the platform up.

I feel it's too early to say that we need such a big change.
We can wait and see if there are any further use cases requiring us to
handle before making such a big change.

Cheers,
Lv

2017-06-19 22:09:03

by Bastien Nocera

[permalink] [raw]
Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

On Mon, 2017-06-19 at 01:43 +0000, Zheng, Lv wrote:
> <snip>
> >
> > If you implement it in such a way that GNOME settings daemon
> > behaves weirdly, you'll get my revert
> > request in the mail. Do. Not. Ever. Lie.
>
> First, I don't know what should be reverted...
> I have 2 solutions here for review, and Benjamin has 1.
> And none of them has been upstreamed.
> We are just discussing.

The discussion is getting tiring quite frankly. We've been over this
for nearly a year now, and with no end in sight.

> However we need to get 1 of them upstreamed in next cycle.
>
> I think users won't startup gnome-setting-daemon right after resume.
> It should have already been started.
>
> There is only 1 platform may see delayed state update after resume.
> Let's see if there is a practical issue.
> 1. Before suspend, the "lid state" is "close", and
> 2. After resume, the state might remain "close" for a while
> Since libinput won't deliver close to userspace,
> and gnome-setting-daemon listens to key switches, there is no
> wrong behavior.

It doesn't. It listens to UPower, which tells user-space whether there
is a lid switch, and whether it's opened or closed.

> 3. Then after several seconds, "open" arrives.
> gnome-setting-daemon re-arrange monitors and screen layouts in
> response to the new event.

Just how is anyone supposed to know that there is an event coming?

> So there is no problem. IMO, there is no need to improve for post-
> resume case.
>
> Users will just startup gnome-setting-daemon once after boot.
> And it's likely that when it is started, the state is correct.

You cannot rely on when gnome-settings-daemon will be started to make
*any* decision. Certainly not decisions on how the kernel should
behave.

> IMO, there might be a chance to improve for post-boot case using
> Benjamin's approach.

2017-06-20 02:45:22

by Zheng, Lv

[permalink] [raw]
Subject: RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi,

> From: Bastien Nocera [mailto:[email protected]]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
>
> On Mon, 2017-06-19 at 01:43 +0000, Zheng, Lv wrote:
> > <snip>
> > >
> > > If you implement it in such a way that GNOME settings daemon
> > > behaves weirdly, you'll get my revert
> > > request in the mail. Do. Not. Ever. Lie.
> >
> > First, I don't know what should be reverted...
> > I have 2 solutions here for review, and Benjamin has 1.
> > And none of them has been upstreamed.
> > We are just discussing.
>
> The discussion is getting tiring quite frankly. We've been over this
> for nearly a year now, and with no end in sight.

We have concerns to introduce too complicated logics to such a
simple button driver especially the logics are related to platform
firmware, input ABI and user space behaviors.

I understand the situation.
Anyway this shouldn't be a big deal.
Let's prepare a smarter series to collect all fixes and solutions
with runtime configurables and get that to the end users.
So that we can figure out which is the simplest solution.

But before that, let me ask several questions about gnome-setting-deamon.

>
> > However we need to get 1 of them upstreamed in next cycle.
> >
> > I think users won't startup gnome-setting-daemon right after resume.
> > It should have already been started.
> >
> > There is only 1 platform may see delayed state update after resume.
> > Let's see if there is a practical issue.
> > 1. Before suspend, the "lid state" is "close", and
> > 2. After resume, the state might remain "close" for a while
> > Since libinput won't deliver close to userspace,
> > and gnome-setting-daemon listens to key switches, there is no
> > wrong behavior.
>
> It doesn't. It listens to UPower, which tells user-space whether there
> is a lid switch, and whether it's opened or closed.

Thanks for the information.
However I don't see differences here.

>
> > 3. Then after several seconds, "open" arrives.
> > gnome-setting-daemon re-arrange monitors and screen layouts in
> > response to the new event.
>
> Just how is anyone supposed to know that there is an event coming?

Will UPower deliver EV_SW key events to gnome-setting-daemon?

>
> > So there is no problem. IMO, there is no need to improve for post-
> > resume case.
> >
> > Users will just startup gnome-setting-daemon once after boot.
> > And it's likely that when it is started, the state is correct.
>
> You cannot rely on when gnome-settings-daemon will be started to make
> *any* decision. Certainly not decisions on how the kernel should
> behave.

My bad wording, I just meant:
When gnome-settings-daemon is started is not related to what we are
discussing.

Do you want to fix regressions?
Or you want to fix new issues on recent platforms?
If you want to fix regressions, I think Benjamin has submitted a revision
to use old method mode, there shouldn't be regressions for
gnome-settings-daemon.

What else we want to do is to fix regressions related to systemd when
we go back to default method mode. Since there is no issue with systemd
233 and after just applying a small change, systemd 229 can also be
worked around, I mean dynamically add/remove input node is not strictly
required for achieving our purposes.

But if you want to fix new issues on new platforms, we can discuss
further and determine which program should be changed and which program
is the best candidate to stop all problems - the ACPI button driver or
the user space.

Cheers
Lv

2017-06-21 10:23:18

by Bastien Nocera

[permalink] [raw]
Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

On Tue, 2017-06-20 at 02:45 +0000, Zheng, Lv wrote:
> Hi,
>
> > From: Bastien Nocera [mailto:[email protected]]
> > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable
> > LID switch exported by ACPI
> >
> > On Mon, 2017-06-19 at 01:43 +0000, Zheng, Lv wrote:
> > > <snip>
> > > >
> > > > If you implement it in such a way that GNOME settings daemon
> > > > behaves weirdly, you'll get my revert
> > > > request in the mail. Do. Not. Ever. Lie.
> > >
> > > First, I don't know what should be reverted...
> > > I have 2 solutions here for review, and Benjamin has 1.
> > > And none of them has been upstreamed.
> > > We are just discussing.
> >
> > The discussion is getting tiring quite frankly. We've been over
> > this
> > for nearly a year now, and with no end in sight.
>
> We have concerns to introduce too complicated logics to such a
> simple button driver especially the logics are related to platform
> firmware, input ABI and user space behaviors.
>
> I understand the situation.
> Anyway this shouldn't be a big deal.
> Let's prepare a smarter series to collect all fixes and solutions
> with runtime configurables and get that to the end users.
> So that we can figure out which is the simplest solution.
>
> But before that, let me ask several questions about gnome-setting-
> deamon.
>
> >
> > > However we need to get 1 of them upstreamed in next cycle.
> > >
> > > I think users won't startup gnome-setting-daemon right after
> > > resume.
> > > It should have already been started.
> > >
> > > There is only 1 platform may see delayed state update after
> > > resume.
> > > Let's see if there is a practical issue.
> > > 1. Before suspend, the "lid state" is "close", and
> > > 2. After resume, the state might remain "close" for a while
> > > Since libinput won't deliver close to userspace,
> > > and gnome-setting-daemon listens to key switches, there is no
> > > wrong behavior.
> >
> > It doesn't. It listens to UPower, which tells user-space whether
> > there
> > is a lid switch, and whether it's opened or closed.
>
> Thanks for the information.
> However I don't see differences here.
>
> >
> > > 3. Then after several seconds, "open" arrives.
> > > gnome-setting-daemon re-arrange monitors and screen layouts in
> > > response to the new event.
> >
> > Just how is anyone supposed to know that there is an event coming?
>
> Will UPower deliver EV_SW key events to gnome-setting-daemon?
>
> >
> > > So there is no problem. IMO, there is no need to improve for
> > > post-
> > > resume case.
> > >
> > > Users will just startup gnome-setting-daemon once after boot.
> > > And it's likely that when it is started, the state is correct.
> >
> > You cannot rely on when gnome-settings-daemon will be started to
> > make
> > *any* decision. Certainly not decisions on how the kernel should
> > behave.
>
> My bad wording, I just meant:
> When gnome-settings-daemon is started is not related to what we are
> discussing.
>
> Do you want to fix regressions?
> Or you want to fix new issues on recent platforms?
> If you want to fix regressions, I think Benjamin has submitted a
> revision
> to use old method mode, there shouldn't be regressions for
> gnome-settings-daemon.
>
> What else we want to do is to fix regressions related to systemd when
> we go back to default method mode. Since there is no issue with
> systemd
> 233 and after just applying a small change, systemd 229 can also be
> worked around, I mean dynamically add/remove input node is not
> strictly
> required for achieving our purposes.
>
> But if you want to fix new issues on new platforms, we can discuss
> further and determine which program should be changed and which
> program
> is the best candidate to stop all problems - the ACPI button driver
> or
> the user space.

I'm happy with Benjamin's patches which don't introduce any
dependencies on new user-space, and don't rely on undocumented
heuristics. What was the API is still the API.

2017-06-22 03:16:20

by Zheng, Lv

[permalink] [raw]
Subject: RE: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI

Hi,

> From: Bastien Nocera [mailto:[email protected]]
> Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable LID switch exported by ACPI
>
> On Tue, 2017-06-20 at 02:45 +0000, Zheng, Lv wrote:
> > Hi,
> >
> > > From: Bastien Nocera [mailto:[email protected]]
> > > Subject: Re: [systemd-devel] [WIP PATCH 0/4] Rework the unreliable
> > > LID switch exported by ACPI
> > >
> > > On Mon, 2017-06-19 at 01:43 +0000, Zheng, Lv wrote:
> > > > <snip>
> > > > >
> > > > > If you implement it in such a way that GNOME settings daemon
> > > > > behaves weirdly, you'll get my revert
> > > > > request in the mail. Do. Not. Ever. Lie.
> > > >
> > > > First, I don't know what should be reverted...
> > > > I have 2 solutions here for review, and Benjamin has 1.
> > > > And none of them has been upstreamed.
> > > > We are just discussing.
> > >
> > > The discussion is getting tiring quite frankly. We've been over
> > > this
> > > for nearly a year now, and with no end in sight.
> >
> > We have concerns to introduce too complicated logics to such a
> > simple button driver especially the logics are related to platform
> > firmware, input ABI and user space behaviors.
> >
> > I understand the situation.
> > Anyway this shouldn't be a big deal.
> > Let's prepare a smarter series to collect all fixes and solutions
> > with runtime configurables and get that to the end users.
> > So that we can figure out which is the simplest solution.
> >
> > But before that, let me ask several questions about gnome-setting-
> > deamon.
> >
> > >
> > > > However we need to get 1 of them upstreamed in next cycle.
> > > >
> > > > I think users won't startup gnome-setting-daemon right after
> > > > resume.
> > > > It should have already been started.
> > > >
> > > > There is only 1 platform may see delayed state update after
> > > > resume.
> > > > Let's see if there is a practical issue.
> > > > 1. Before suspend, the "lid state" is "close", and
> > > > 2. After resume, the state might remain "close" for a while
> > > > Since libinput won't deliver close to userspace,
> > > > and gnome-setting-daemon listens to key switches, there is no
> > > > wrong behavior.
> > >
> > > It doesn't. It listens to UPower, which tells user-space whether
> > > there
> > > is a lid switch, and whether it's opened or closed.
> >
> > Thanks for the information.
> > However I don't see differences here.
> >
> > >
> > > > 3. Then after several seconds, "open" arrives.
> > > > gnome-setting-daemon re-arrange monitors and screen layouts in
> > > > response to the new event.
> > >
> > > Just how is anyone supposed to know that there is an event coming?
> >
> > Will UPower deliver EV_SW key events to gnome-setting-daemon?
> >
> > >
> > > > So there is no problem. IMO, there is no need to improve for
> > > > post-
> > > > resume case.
> > > >
> > > > Users will just startup gnome-setting-daemon once after boot.
> > > > And it's likely that when it is started, the state is correct.
> > >
> > > You cannot rely on when gnome-settings-daemon will be started to
> > > make
> > > *any* decision. Certainly not decisions on how the kernel should
> > > behave.
> >
> > My bad wording, I just meant:
> > When gnome-settings-daemon is started is not related to what we are
> > discussing.
> >
> > Do you want to fix regressions?
> > Or you want to fix new issues on recent platforms?
> > If you want to fix regressions, I think Benjamin has submitted a
> > revision
> > to use old method mode, there shouldn't be regressions for
> > gnome-settings-daemon.
> >
> > What else we want to do is to fix regressions related to systemd when
> > we go back to default method mode. Since there is no issue with
> > systemd
> > 233 and after just applying a small change, systemd 229 can also be
> > worked around, I mean dynamically add/remove input node is not
> > strictly
> > required for achieving our purposes.
> >
> > But if you want to fix new issues on new platforms, we can discuss
> > further and determine which program should be changed and which
> > program
> > is the best candidate to stop all problems - the ACPI button driver
> > or
> > the user space.
>
> I'm happy with Benjamin's patches which don't introduce any
> dependencies on new user-space, and don't rely on undocumented
> heuristics. What was the API is still the API.

I've put them in the new series as you wish.

But what you say here is not the truth.
With Benjamin's patch, userspace still need to be changed to be able
to know an input node is dynamically created during runtime. So it
is still dependent on new user-space, and even newer than systemd
233.
It will be no difference by just asking:
1. systemd to not to rely on "open" events
(which has already been fixed in 233).
2. desktop managers to not to rely on LID state to handle layouts.
(which is what I'm asking to change).
If desktop managers still want to rely on LID state to handle layouts,
kernel drivers still need to be changed to periodically report _LID
state to input layer. And that's beyond what Benjamin's patches can
handle.

Thanks and best regards
Lv