A few issues have been found with device naming for the serial core
controller port device. These issues currently mostly affect the output
for /sys/bus/serial-base/devices, but need to be also fixed to avoid
port addressing issues later on.
Changes since v4:
- Make the sysfs device names nicer with DEVNAME:0.0 naming, the "ctrl"
and "port" prefixes are not needed with the controller id in the name
- Fix Andy's Reviewed-by as noted by Andy
- Generate patches with --patience to make the last patch more readable
Changes since v3:
- Drop unnecessary else on the return path in serial_base_device_init()
as noted by Andy
- Add Andy's Reviewed-by
- Update patch description for port_id instead of port port_id for the
first patch
Changes since v2:
- Fix my email script as it had started to drop linux-serial as noted by
Greg
- Explain why we're changing ctrl_id as requested by Greg
Changes since v1:
- Port id cannot be negative as noted by Jiri
- Controller id cannot be negative as noted by Andy
- Port name is missing the controller instance as noted by Andy
Tony Lindgren (3):
serial: core: Controller id cannot be negative
serial: core: Fix serial core port id to not use port->line
serial: core: Fix serial core controller port name to show controller
id
drivers/tty/serial/8250/8250_core.c | 2 ++
drivers/tty/serial/serial_base_bus.c | 32 +++++++++++++++++-----------
include/linux/serial_core.h | 3 ++-
3 files changed, 24 insertions(+), 13 deletions(-)
--
2.41.0
The serial core port id should be serial core controller specific port
instance, which is not always the port->line index.
For example, 8250 driver maps a number of legacy ports, and when a
hardware specific device driver takes over, we typically have one
driver instance for each port. Let's instead add port->port_id to
keep track serial ports mapped to each serial core controller instance.
Currently this is only a cosmetic issue for the serial core port device
names. The issue can be noticed looking at /sys/bus/serial-base/devices
for example though. Let's fix the issue to avoid port addressing issues
later on.
Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>
---
drivers/tty/serial/8250/8250_core.c | 2 ++
drivers/tty/serial/serial_base_bus.c | 2 +-
include/linux/serial_core.h | 1 +
3 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -497,6 +497,7 @@ static struct uart_8250_port *serial8250_setup_port(int index)
up = &serial8250_ports[index];
up->port.line = index;
+ up->port.port_id = index;
serial8250_init_port(up);
if (!base_ops)
@@ -1040,6 +1041,7 @@ int serial8250_register_8250_port(const struct uart_8250_port *up)
uart_remove_one_port(&serial8250_reg, &uart->port);
uart->port.ctrl_id = up->port.ctrl_id;
+ uart->port.port_id = up->port.port_id;
uart->port.iobase = up->port.iobase;
uart->port.membase = up->port.membase;
uart->port.irq = up->port.irq;
diff --git a/drivers/tty/serial/serial_base_bus.c b/drivers/tty/serial/serial_base_bus.c
--- a/drivers/tty/serial/serial_base_bus.c
+++ b/drivers/tty/serial/serial_base_bus.c
@@ -136,7 +136,7 @@ struct serial_port_device *serial_base_port_add(struct uart_port *port,
err = serial_base_device_init(port, &port_dev->dev,
&ctrl_dev->dev, &serial_port_type,
serial_base_port_release,
- port->line);
+ port->port_id);
if (err)
goto err_put_device;
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -460,6 +460,7 @@ struct uart_port {
int (*iso7816_config)(struct uart_port *,
struct serial_iso7816 *iso7816);
unsigned int ctrl_id; /* optional serial core controller id */
+ unsigned int port_id; /* optional serial core port id */
unsigned int irq; /* irq number */
unsigned long irqflags; /* irq flags */
unsigned int uartclk; /* base uart clock */
--
2.41.0
We are missing the serial core controller id for the serial core port
name. Let's fix the issue for sane sysfs output, and to avoid issues
addressing serial ports later on.
And as we're now showing the controller id, the "ctrl" and "port" prefix
for the DEVNAME become useless, we can just drop them. Let's standardize on
DEVNAME:0 for controller name, where 0 is the controller id. And
DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
This makes the sysfs output nicer, on qemu for example:
$ ls /sys/bus/serial-base/devices
00:04:0 serial8250:0 serial8250:0.2
00:04:0.0 serial8250:0.1 serial8250:0.3
Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
Reported-by: Andy Shevchenko <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>
---
Andy, I kept your Reviewed-by although I updated the device naming and
description, does the patch still look OK to you?
---
drivers/tty/serial/serial_base_bus.c | 32 +++++++++++++++++-----------
1 file changed, 20 insertions(+), 12 deletions(-)
diff --git a/drivers/tty/serial/serial_base_bus.c b/drivers/tty/serial/serial_base_bus.c
--- a/drivers/tty/serial/serial_base_bus.c
+++ b/drivers/tty/serial/serial_base_bus.c
@@ -19,6 +19,14 @@
static bool serial_base_initialized;
+static const struct device_type serial_ctrl_type = {
+ .name = "ctrl",
+};
+
+static const struct device_type serial_port_type = {
+ .name = "port",
+};
+
static int serial_base_match(struct device *dev, struct device_driver *drv)
{
int len = strlen(drv->name);
@@ -48,7 +56,8 @@ static int serial_base_device_init(struct uart_port *port,
struct device *parent_dev,
const struct device_type *type,
void (*release)(struct device *dev),
- int id)
+ unsigned int ctrl_id,
+ unsigned int port_id)
{
device_initialize(dev);
dev->type = type;
@@ -61,13 +70,16 @@ static int serial_base_device_init(struct uart_port *port,
return -EPROBE_DEFER;
}
- return dev_set_name(dev, "%s.%s.%d", type->name, dev_name(port->dev), id);
+ if (type == &serial_ctrl_type)
+ return dev_set_name(dev, "%s:%d", dev_name(port->dev), ctrl_id);
+
+ if (type == &serial_port_type)
+ return dev_set_name(dev, "%s:%d.%d", dev_name(port->dev),
+ ctrl_id, port_id);
+
+ return -EINVAL;
}
-static const struct device_type serial_ctrl_type = {
- .name = "ctrl",
-};
-
static void serial_base_ctrl_release(struct device *dev)
{
struct serial_ctrl_device *ctrl_dev = to_serial_base_ctrl_device(dev);
@@ -96,7 +108,7 @@ struct serial_ctrl_device *serial_base_ctrl_add(struct uart_port *port,
err = serial_base_device_init(port, &ctrl_dev->dev,
parent, &serial_ctrl_type,
serial_base_ctrl_release,
- port->ctrl_id);
+ port->ctrl_id, 0);
if (err)
goto err_put_device;
@@ -112,10 +124,6 @@ struct serial_ctrl_device *serial_base_ctrl_add(struct uart_port *port,
return ERR_PTR(err);
}
-static const struct device_type serial_port_type = {
- .name = "port",
-};
-
static void serial_base_port_release(struct device *dev)
{
struct serial_port_device *port_dev = to_serial_base_port_device(dev);
@@ -136,7 +144,7 @@ struct serial_port_device *serial_base_port_add(struct uart_port *port,
err = serial_base_device_init(port, &port_dev->dev,
&ctrl_dev->dev, &serial_port_type,
serial_base_port_release,
- port->port_id);
+ port->ctrl_id, port->port_id);
if (err)
goto err_put_device;
--
2.41.0
The controller id cannot be negative. Let's fix the ctrl_id in preparation
for adding port_id to fix the device name.
Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
Reported-by: Andy Shevchenko <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Signed-off-by: Tony Lindgren <[email protected]>
---
include/linux/serial_core.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -459,7 +459,7 @@ struct uart_port {
struct serial_rs485 *rs485);
int (*iso7816_config)(struct uart_port *,
struct serial_iso7816 *iso7816);
- int ctrl_id; /* optional serial core controller id */
+ unsigned int ctrl_id; /* optional serial core controller id */
unsigned int irq; /* irq number */
unsigned long irqflags; /* irq flags */
unsigned int uartclk; /* base uart clock */
--
2.41.0
On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
> We are missing the serial core controller id for the serial core port
> name. Let's fix the issue for sane sysfs output, and to avoid issues
> addressing serial ports later on.
>
> And as we're now showing the controller id, the "ctrl" and "port" prefix
> for the DEVNAME become useless, we can just drop them. Let's standardize on
> DEVNAME:0 for controller name, where 0 is the controller id. And
> DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
>
> This makes the sysfs output nicer, on qemu for example:
>
> $ ls /sys/bus/serial-base/devices
> 00:04:0 serial8250:0 serial8250:0.2
> 00:04:0.0 serial8250:0.1 serial8250:0.3
Hmm... Why 0.0 is absent for serial8250?
Btw, what was before this patch there?
And maybe ls -l will look more informative?
> Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
> Reported-by: Andy Shevchenko <[email protected]>
> Reviewed-by: Andy Shevchenko <[email protected]>
> Signed-off-by: Tony Lindgren <[email protected]>
> ---
>
> Andy, I kept your Reviewed-by although I updated the device naming and
> description, does the patch still look OK to you?
Looks okay, but I have a question above.
--
With Best Regards,
Andy Shevchenko
* Andy Shevchenko <[email protected]> [230725 09:07]:
> On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
> > We are missing the serial core controller id for the serial core port
> > name. Let's fix the issue for sane sysfs output, and to avoid issues
> > addressing serial ports later on.
> >
> > And as we're now showing the controller id, the "ctrl" and "port" prefix
> > for the DEVNAME become useless, we can just drop them. Let's standardize on
> > DEVNAME:0 for controller name, where 0 is the controller id. And
> > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
> >
> > This makes the sysfs output nicer, on qemu for example:
> >
> > $ ls /sys/bus/serial-base/devices
> > 00:04:0 serial8250:0 serial8250:0.2
> > 00:04:0.0 serial8250:0.1 serial8250:0.3
>
> Hmm... Why 0.0 is absent for serial8250?
The serial8250:0.0 port was around initially, and then it's preallocated
slot got taken over by the 00:04:0.0 device. See nr_uarts in 8250_core.c
for what is going on.
> Btw, what was before this patch there?
# ls /sys/bus/serial-base/devices/
ctrl.00:04.0 port.00:04.0 port.serial8250.2
ctrl.serial8250.0 port.serial8250.1 port.serial8250.3
The earlier naming is different format from the DEVNAME:0.0. The sysfs
output is not usable directly for the users for the port addressing we're
discussing.
Sorry I did not notice the different format earier, I noticed only when I
started playing with using the DEVNAME:0.0 style port addressing.
> And maybe ls -l will look more informative?
I've appended qemu output of the ls -l for DEVNAME:0.0 style naming below.
> > Andy, I kept your Reviewed-by although I updated the device naming and
> > description, does the patch still look OK to you?
>
> Looks okay, but I have a question above.
OK best to get the device names right if we're planning to use them :)
Regards,
Tony
8< ------
ls -l /sys/bus/serial-base/devices/
total 0
lrwxrwxrwx 1 root root 0 Jul 25 05:21 00:04:0 -> ../../../devices/pnp0/00:04/00:04:0
lrwxrwxrwx 1 root root 0 Jul 25 05:21 00:04:0.0 -> ../../../devices/pnp0/00:04/00:04:0/00:04:0.0
lrwxrwxrwx 1 root root 0 Jul 25 05:21 serial8250:0 -> ../../../devices/platform/serial8250/serial8250:0
lrwxrwxrwx 1 root root 0 Jul 25 05:21 serial8250:0.1 -> ../../../devices/platform/serial8250/serial8250:0/serial8250:0.1
lrwxrwxrwx 1 root root 0 Jul 25 05:21 serial8250:0.2 -> ../../../devices/platform/serial8250/serial8250:0/serial8250:0.2
lrwxrwxrwx 1 root root 0 Jul 25 05:21 serial8250:0.3 -> ../../../devices/platform/serial8250/serial8250:0/serial8250:0.3
On Tue, Jul 25, 2023 at 12:07:04PM +0300, Andy Shevchenko wrote:
> On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
> > We are missing the serial core controller id for the serial core port
> > name. Let's fix the issue for sane sysfs output, and to avoid issues
> > addressing serial ports later on.
> >
> > And as we're now showing the controller id, the "ctrl" and "port" prefix
> > for the DEVNAME become useless, we can just drop them. Let's standardize on
> > DEVNAME:0 for controller name, where 0 is the controller id. And
> > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
> >
> > This makes the sysfs output nicer, on qemu for example:
> >
> > $ ls /sys/bus/serial-base/devices
> > 00:04:0 serial8250:0 serial8250:0.2
> > 00:04:0.0 serial8250:0.1 serial8250:0.3
>
> Hmm... Why 0.0 is absent for serial8250?
> Btw, what was before this patch there?
> And maybe ls -l will look more informative?
>
> > Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
> > Reported-by: Andy Shevchenko <[email protected]>
> > Reviewed-by: Andy Shevchenko <[email protected]>
> > Signed-off-by: Tony Lindgren <[email protected]>
> > ---
> >
> > Andy, I kept your Reviewed-by although I updated the device naming and
> > description, does the patch still look OK to you?
>
> Looks okay, but I have a question above.
Can I get an ack for this if you are ok with these fixes?
thanks,
greg k-h
On Mon, Jul 31, 2023 at 05:14:15PM +0200, Greg Kroah-Hartman wrote:
> On Tue, Jul 25, 2023 at 12:07:04PM +0300, Andy Shevchenko wrote:
> > On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
> > > We are missing the serial core controller id for the serial core port
> > > name. Let's fix the issue for sane sysfs output, and to avoid issues
> > > addressing serial ports later on.
> > >
> > > And as we're now showing the controller id, the "ctrl" and "port" prefix
> > > for the DEVNAME become useless, we can just drop them. Let's standardize on
> > > DEVNAME:0 for controller name, where 0 is the controller id. And
> > > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
> > >
> > > This makes the sysfs output nicer, on qemu for example:
> > >
> > > $ ls /sys/bus/serial-base/devices
> > > 00:04:0 serial8250:0 serial8250:0.2
> > > 00:04:0.0 serial8250:0.1 serial8250:0.3
> >
> > Hmm... Why 0.0 is absent for serial8250?
> > Btw, what was before this patch there?
> > And maybe ls -l will look more informative?
> >
> > > Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
> > > Reported-by: Andy Shevchenko <[email protected]>
> > > Reviewed-by: Andy Shevchenko <[email protected]>
> > > Signed-off-by: Tony Lindgren <[email protected]>
> > > ---
> > >
> > > Andy, I kept your Reviewed-by although I updated the device naming and
> > > description, does the patch still look OK to you?
> >
> > Looks okay, but I have a question above.
>
> Can I get an ack for this if you are ok with these fixes?
Sure,
Acked-by: Andy Shevchenko <[email protected]>
--
With Best Regards,
Andy Shevchenko
Hello,
kernel test robot noticed machine hang on:
commit: 4de64f4800a581e7eeba6392b3b2ce2131195145 ("[PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id")
url: https://github.com/intel-lab-lkp/linux/commits/Tony-Lindgren/serial-core-Controller-id-cannot-be-negative/20230725-134452
base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing
patch link: https://lore.kernel.org/all/[email protected]/
patch subject: [PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id
in testcase: boot
compiler: gcc-12
test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]
from serial, we observed last print out is:
[ 15.584772][ T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT)
[ 15.597328][ T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT)
[ 15.610326][ T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
[ 15.623375][ T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
[ 15.640145][ T19] intel_rapl_common: Found RAPL domain package
[ 15.655890][ T19] intel_rapl_common: Found RAPL domain dram
[ 15.661983][ T19] intel_rapl_common: package-0:package:long_term locked by BIOS
[ 15.678564][ T19] intel_rapl_common: package-0:package:short_term locked by BIOS
[ 15.695259][ T19] intel_rapl_common: package-0:dram:long_term locked by BIOS
[ 15.713068][ T158] intel_rapl_common: Found RAPL domain package
[ 15.728719][ T158] intel_rapl_common: Found RAPL domain dram
[ 15.734743][ T158] intel_rapl_common: package-1:package:long_term locked by BIOS
[ 15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s
[ 15.761297][ T158] intel_rapl_common: package-1:package:short_term locked by BIOS
[ 15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s
[ 15.768866][ T158] intel_rapl_common: package-1:dram:long_term locked by BIOS
[ 15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s
[ 15.812245][ T1154] raid6: avx2x4 gen() 18060 MB/s
[ 15.834244][ T1154] raid6: avx2x2 gen() 18076 MB/s
[ 15.856244][ T1154] raid6: avx2x1 gen() 13836 MB/s
[ 15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s
[ 15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled
[ 15.890254][ T1154] raid6: using avx512x2 recovery algorithm
[ 15.897891][ T1154] xor: measuring software checksum speed
[ 15.904013][ T1154] prefetch64-sse : 31308 MB/sec
[ 15.909878][ T1154] generic_sse : 22929 MB/sec
[ 15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec)
[ 16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no
[ 16.054593][ T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930)
then the machine is just stuck there. (whole dmesg captured from serial is
attached), and the issue is 100% reproducible for this commit.
for parent, we never observed the boot failure.
it looks quite strange to us why this commit could cause this behavior on our
machine. could you help check dmesg, config and kernel command line which is
also captured in dmesg, etc. and guide us if anything need to be updated to be
compatible with this change? Thanks!
To reproduce:
# build kernel
cd linux
cp config-6.5.0-rc2-00003-g4de64f4800a5 .config
make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-12 CC=gcc-12 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
* kernel test robot <[email protected]> [230802 08:16]:
> from serial, we observed last print out is:
>
> [ 15.584772][ T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT)
> [ 15.597328][ T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT)
> [ 15.610326][ T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
> [ 15.623375][ T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
> [ 15.640145][ T19] intel_rapl_common: Found RAPL domain package
> [ 15.655890][ T19] intel_rapl_common: Found RAPL domain dram
> [ 15.661983][ T19] intel_rapl_common: package-0:package:long_term locked by BIOS
> [ 15.678564][ T19] intel_rapl_common: package-0:package:short_term locked by BIOS
> [ 15.695259][ T19] intel_rapl_common: package-0:dram:long_term locked by BIOS
> [ 15.713068][ T158] intel_rapl_common: Found RAPL domain package
> [ 15.728719][ T158] intel_rapl_common: Found RAPL domain dram
> [ 15.734743][ T158] intel_rapl_common: package-1:package:long_term locked by BIOS
> [ 15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s
> [ 15.761297][ T158] intel_rapl_common: package-1:package:short_term locked by BIOS
> [ 15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s
> [ 15.768866][ T158] intel_rapl_common: package-1:dram:long_term locked by BIOS
> [ 15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s
> [ 15.812245][ T1154] raid6: avx2x4 gen() 18060 MB/s
> [ 15.834244][ T1154] raid6: avx2x2 gen() 18076 MB/s
> [ 15.856244][ T1154] raid6: avx2x1 gen() 13836 MB/s
> [ 15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s
> [ 15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled
> [ 15.890254][ T1154] raid6: using avx512x2 recovery algorithm
> [ 15.897891][ T1154] xor: measuring software checksum speed
> [ 15.904013][ T1154] prefetch64-sse : 31308 MB/sec
> [ 15.909878][ T1154] generic_sse : 22929 MB/sec
> [ 15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec)
> [ 16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no
> [ 16.054593][ T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930)
>
>
> then the machine is just stuck there. (whole dmesg captured from serial is
> attached), and the issue is 100% reproducible for this commit.
>
> for parent, we never observed the boot failure.
>
> it looks quite strange to us why this commit could cause this behavior on our
> machine. could you help check dmesg, config and kernel command line which is
> also captured in dmesg, etc. and guide us if anything need to be updated to be
> compatible with this change? Thanks!
Thanks for the report. With the ctrl and port prefixes dropped, I broke
serial_base_match() looks like. As we attempt to continue anyways, things
still mostly work..
Greg, can you please drop the related commit?
It's the following commit:
1ef2c2df1199 ("serial: core: Fix serial core controller port name to show controller id")
Regards,
Tony
On Wed, Aug 02, 2023 at 12:23:54PM +0300, Tony Lindgren wrote:
> * kernel test robot <[email protected]> [230802 08:16]:
> > from serial, we observed last print out is:
> >
> > [ 15.584772][ T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT)
> > [ 15.597328][ T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT)
> > [ 15.610326][ T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
> > [ 15.623375][ T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
> > [ 15.640145][ T19] intel_rapl_common: Found RAPL domain package
> > [ 15.655890][ T19] intel_rapl_common: Found RAPL domain dram
> > [ 15.661983][ T19] intel_rapl_common: package-0:package:long_term locked by BIOS
> > [ 15.678564][ T19] intel_rapl_common: package-0:package:short_term locked by BIOS
> > [ 15.695259][ T19] intel_rapl_common: package-0:dram:long_term locked by BIOS
> > [ 15.713068][ T158] intel_rapl_common: Found RAPL domain package
> > [ 15.728719][ T158] intel_rapl_common: Found RAPL domain dram
> > [ 15.734743][ T158] intel_rapl_common: package-1:package:long_term locked by BIOS
> > [ 15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s
> > [ 15.761297][ T158] intel_rapl_common: package-1:package:short_term locked by BIOS
> > [ 15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s
> > [ 15.768866][ T158] intel_rapl_common: package-1:dram:long_term locked by BIOS
> > [ 15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s
> > [ 15.812245][ T1154] raid6: avx2x4 gen() 18060 MB/s
> > [ 15.834244][ T1154] raid6: avx2x2 gen() 18076 MB/s
> > [ 15.856244][ T1154] raid6: avx2x1 gen() 13836 MB/s
> > [ 15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s
> > [ 15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled
> > [ 15.890254][ T1154] raid6: using avx512x2 recovery algorithm
> > [ 15.897891][ T1154] xor: measuring software checksum speed
> > [ 15.904013][ T1154] prefetch64-sse : 31308 MB/sec
> > [ 15.909878][ T1154] generic_sse : 22929 MB/sec
> > [ 15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec)
> > [ 16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no
> > [ 16.054593][ T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930)
> >
> >
> > then the machine is just stuck there. (whole dmesg captured from serial is
> > attached), and the issue is 100% reproducible for this commit.
> >
> > for parent, we never observed the boot failure.
> >
> > it looks quite strange to us why this commit could cause this behavior on our
> > machine. could you help check dmesg, config and kernel command line which is
> > also captured in dmesg, etc. and guide us if anything need to be updated to be
> > compatible with this change? Thanks!
>
> Thanks for the report. With the ctrl and port prefixes dropped, I broke
> serial_base_match() looks like. As we attempt to continue anyways, things
> still mostly work..
>
> Greg, can you please drop the related commit?
>
> It's the following commit:
>
> 1ef2c2df1199 ("serial: core: Fix serial core controller port name to show controller id")
Please send me a revert, I can't rewrite history in my public branches.
thanks,
greg k-h
* Greg Kroah-Hartman <[email protected]> [230802 09:40]:
> On Wed, Aug 02, 2023 at 12:23:54PM +0300, Tony Lindgren wrote:
> > * kernel test robot <[email protected]> [230802 08:16]:
> > > from serial, we observed last print out is:
> > >
> > > [ 15.584772][ T954] EDAC MC0: Giving out device to module skx_edac controller Skylake Socket#0 IMC#0: DEV 0000:3a:0a.0 (INTERRUPT)
> > > [ 15.597328][ T954] EDAC MC1: Giving out device to module skx_edac controller Skylake Socket#0 IMC#1: DEV 0000:3a:0c.0 (INTERRUPT)
> > > [ 15.610326][ T954] EDAC MC2: Giving out device to module skx_edac controller Skylake Socket#1 IMC#0: DEV 0000:ae:0a.0 (INTERRUPT)
> > > [ 15.623375][ T954] EDAC MC3: Giving out device to module skx_edac controller Skylake Socket#1 IMC#1: DEV 0000:ae:0c.0 (INTERRUPT)
> > > [ 15.640145][ T19] intel_rapl_common: Found RAPL domain package
> > > [ 15.655890][ T19] intel_rapl_common: Found RAPL domain dram
> > > [ 15.661983][ T19] intel_rapl_common: package-0:package:long_term locked by BIOS
> > > [ 15.678564][ T19] intel_rapl_common: package-0:package:short_term locked by BIOS
> > > [ 15.695259][ T19] intel_rapl_common: package-0:dram:long_term locked by BIOS
> > > [ 15.713068][ T158] intel_rapl_common: Found RAPL domain package
> > > [ 15.728719][ T158] intel_rapl_common: Found RAPL domain dram
> > > [ 15.734743][ T158] intel_rapl_common: package-1:package:long_term locked by BIOS
> > > [ 15.745244][ T1154] raid6: avx512x4 gen() 18153 MB/s
> > > [ 15.761297][ T158] intel_rapl_common: package-1:package:short_term locked by BIOS
> > > [ 15.767244][ T1154] raid6: avx512x2 gen() 18130 MB/s
> > > [ 15.768866][ T158] intel_rapl_common: package-1:dram:long_term locked by BIOS
> > > [ 15.790243][ T1154] raid6: avx512x1 gen() 18155 MB/s
> > > [ 15.812245][ T1154] raid6: avx2x4 gen() 18060 MB/s
> > > [ 15.834244][ T1154] raid6: avx2x2 gen() 18076 MB/s
> > > [ 15.856244][ T1154] raid6: avx2x1 gen() 13836 MB/s
> > > [ 15.861474][ T1154] raid6: using algorithm avx512x1 gen() 18155 MB/s
> > > [ 15.884243][ T1154] raid6: .... xor() 27974 MB/s, rmw enabled
> > > [ 15.890254][ T1154] raid6: using avx512x2 recovery algorithm
> > > [ 15.897891][ T1154] xor: measuring software checksum speed
> > > [ 15.904013][ T1154] prefetch64-sse : 31308 MB/sec
> > > [ 15.909878][ T1154] generic_sse : 22929 MB/sec
> > > [ 15.915230][ T1154] xor: using function: prefetch64-sse (31308 MB/sec)
> > > [ 16.042623][ T1154] Btrfs loaded, zoned=no, fsverity=no
> > > [ 16.054593][ T930] BTRFS: device fsid e422031c-19be-42f5-ab4f-be5f306aa6e1 devid 1 transid 39725 /dev/sda2 scanned by systemd-udevd (930)
> > >
> > >
> > > then the machine is just stuck there. (whole dmesg captured from serial is
> > > attached), and the issue is 100% reproducible for this commit.
> > >
> > > for parent, we never observed the boot failure.
> > >
> > > it looks quite strange to us why this commit could cause this behavior on our
> > > machine. could you help check dmesg, config and kernel command line which is
> > > also captured in dmesg, etc. and guide us if anything need to be updated to be
> > > compatible with this change? Thanks!
> >
> > Thanks for the report. With the ctrl and port prefixes dropped, I broke
> > serial_base_match() looks like. As we attempt to continue anyways, things
> > still mostly work..
> >
> > Greg, can you please drop the related commit?
> >
> > It's the following commit:
> >
> > 1ef2c2df1199 ("serial: core: Fix serial core controller port name to show controller id")
>
> Please send me a revert, I can't rewrite history in my public branches.
OK. The fix might be just to check for device_type in serial_base_match().
Regards,
Tony
* Tony Lindgren <[email protected]> [230802 13:47]:
> OK. The fix might be just to check for device_type in serial_base_match().
FYI fix sent as "[PATCH] serial: core: Fix serial_base_match() after fixing
controller port name" [0].
Regards,
Tony
[0] https://lore.kernel.org/linux-serial/[email protected]/T/#u
On Wed, Aug 02, 2023 at 04:15:28PM +0800, kernel test robot wrote:
> kernel test robot noticed machine hang on:
> commit: 4de64f4800a581e7eeba6392b3b2ce2131195145 ("[PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id")
> url: https://github.com/intel-lab-lkp/linux/commits/Tony-Lindgren/serial-core-Controller-id-cannot-be-negative/20230725-134452
> base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing
> patch link: https://lore.kernel.org/all/[email protected]/
> patch subject: [PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id
>
> in testcase: boot
>
> compiler: gcc-12
> test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
I've also bisected this commit as causing boot hangs on at least the
i.MX8MP-EVK, though most of the boards in my lab and a huge swathe of
those in KernelCI are out:
https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20230802/plan/baseline/
which is having a pretty devastating effect on -next testing.
* Mark Brown <[email protected]> [230802 18:20]:
> On Wed, Aug 02, 2023 at 04:15:28PM +0800, kernel test robot wrote:
>
> > kernel test robot noticed machine hang on:
>
> > commit: 4de64f4800a581e7eeba6392b3b2ce2131195145 ("[PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id")
> > url: https://github.com/intel-lab-lkp/linux/commits/Tony-Lindgren/serial-core-Controller-id-cannot-be-negative/20230725-134452
> > base: https://git.kernel.org/cgit/linux/kernel/git/gregkh/tty.git tty-testing
> > patch link: https://lore.kernel.org/all/[email protected]/
> > patch subject: [PATCH v5 3/3] serial: core: Fix serial core controller port name to show controller id
> >
> > in testcase: boot
> >
> > compiler: gcc-12
> > test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory
>
> I've also bisected this commit as causing boot hangs on at least the
> i.MX8MP-EVK, though most of the boards in my lab and a huge swathe of
> those in KernelCI are out:
>
> https://linux.kernelci.org/test/job/next/branch/master/kernel/next-20230802/plan/baseline/
>
> which is having a pretty devastating effect on -next testing.
Yes sorry about that, I should have noticed it right away. I thought I
still had my test machine set to use the console=DEVNAME:0.0 style
naming and I mostly test over ssh and was just looking at the DEVNAME.
Regards,
Tony
On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
> We are missing the serial core controller id for the serial core port
> name. Let's fix the issue for sane sysfs output, and to avoid issues
> addressing serial ports later on.
>
> And as we're now showing the controller id, the "ctrl" and "port" prefix
> for the DEVNAME become useless, we can just drop them. Let's standardize on
> DEVNAME:0 for controller name, where 0 is the controller id. And
> DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
>
> This makes the sysfs output nicer, on qemu for example:
>
> $ ls /sys/bus/serial-base/devices
> 00:04:0 serial8250:0 serial8250:0.2
> 00:04:0.0 serial8250:0.1 serial8250:0.3
>
> Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
> Reported-by: Andy Shevchenko <[email protected]>
> Reviewed-by: Andy Shevchenko <[email protected]>
> Signed-off-by: Tony Lindgren <[email protected]>
This patch causes about 50% of my boot tests to fail because the console
is no longer recognized. Reverting this patch fixes the problem.
Bisect log attached.
Guenter
---
# bad: [35245ef82c5b8206d97d0296017df658fd8ea3d2] Merge branch 'for-linux-next-fixes' of git://anongit.freedesktop.org/drm/drm-misc
# good: [5d0c230f1de8c7515b6567d9afba1f196fb4e2f4] Linux 6.5-rc4
git bisect start 'HEAD' 'v6.5-rc4'
# good: [ec0f64d0666ce02114b11efd3df3234f7a3497d8] Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git
git bisect good ec0f64d0666ce02114b11efd3df3234f7a3497d8
# bad: [8eb8b701a263abed01d3fd7e7f1984ef37b02149] Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine.git
git bisect bad 8eb8b701a263abed01d3fd7e7f1984ef37b02149
# good: [f29c3a80b329fbfbf92278c29fdcaafb736e3d01] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git
git bisect good f29c3a80b329fbfbf92278c29fdcaafb736e3d01
# bad: [eddb92c4c656a669c30e17ce934e5eba8c261392] Merge branch 'fixes-togreg' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio.git
git bisect bad eddb92c4c656a669c30e17ce934e5eba8c261392
# good: [6811694eb2f6b7a4e97be2029edc7dd6a39460f8] iio: imu: lsm6dsx: Fix mount matrix retrieval
git bisect good 6811694eb2f6b7a4e97be2029edc7dd6a39460f8
# bad: [1ef2c2df11997b8135f34adcf2c200d3b4aacbe9] serial: core: Fix serial core controller port name to show controller id
git bisect bad 1ef2c2df11997b8135f34adcf2c200d3b4aacbe9
# good: [83c35180abfdfb22f3d7703b0c85ad2d442ed2c5] serial: core: Controller id cannot be negative
git bisect good 83c35180abfdfb22f3d7703b0c85ad2d442ed2c5
# good: [d962de6ae51f9b76ad736220077cda83084090b1] serial: core: Fix serial core port id to not use port->line
git bisect good d962de6ae51f9b76ad736220077cda83084090b1
# first bad commit: [1ef2c2df11997b8135f34adcf2c200d3b4aacbe9] serial: core: Fix serial core controller port name to show controller id
On Thu, Aug 03, 2023 at 03:18:42PM -0700, Guenter Roeck wrote:
> On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
> > We are missing the serial core controller id for the serial core port
> > name. Let's fix the issue for sane sysfs output, and to avoid issues
> > addressing serial ports later on.
> >
> > And as we're now showing the controller id, the "ctrl" and "port" prefix
> > for the DEVNAME become useless, we can just drop them. Let's standardize on
> > DEVNAME:0 for controller name, where 0 is the controller id. And
> > DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
> >
> > This makes the sysfs output nicer, on qemu for example:
> >
> > $ ls /sys/bus/serial-base/devices
> > 00:04:0 serial8250:0 serial8250:0.2
> > 00:04:0.0 serial8250:0.1 serial8250:0.3
> >
> > Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
> > Reported-by: Andy Shevchenko <[email protected]>
> > Reviewed-by: Andy Shevchenko <[email protected]>
> > Signed-off-by: Tony Lindgren <[email protected]>
>
> This patch causes about 50% of my boot tests to fail because the console
> is no longer recognized. Reverting this patch fixes the problem.
> Bisect log attached.
Isn't fix already available?
7d695d83767c ("serial: core: Fix serial_base_match() after fixing controller port name")
(in tty/tty-linus)
--
With Best Regards,
Andy Shevchenko
On 8/3/23 21:20, Andy Shevchenko wrote:
> On Thu, Aug 03, 2023 at 03:18:42PM -0700, Guenter Roeck wrote:
>> On Tue, Jul 25, 2023 at 08:42:12AM +0300, Tony Lindgren wrote:
>>> We are missing the serial core controller id for the serial core port
>>> name. Let's fix the issue for sane sysfs output, and to avoid issues
>>> addressing serial ports later on.
>>>
>>> And as we're now showing the controller id, the "ctrl" and "port" prefix
>>> for the DEVNAME become useless, we can just drop them. Let's standardize on
>>> DEVNAME:0 for controller name, where 0 is the controller id. And
>>> DEVNAME:0.0 for port name, where 0.0 are the controller id and port id.
>>>
>>> This makes the sysfs output nicer, on qemu for example:
>>>
>>> $ ls /sys/bus/serial-base/devices
>>> 00:04:0 serial8250:0 serial8250:0.2
>>> 00:04:0.0 serial8250:0.1 serial8250:0.3
>>>
>>> Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
>>> Reported-by: Andy Shevchenko <[email protected]>
>>> Reviewed-by: Andy Shevchenko <[email protected]>
>>> Signed-off-by: Tony Lindgren <[email protected]>
>>
>> This patch causes about 50% of my boot tests to fail because the console
>> is no longer recognized. Reverting this patch fixes the problem.
>> Bisect log attached.
>
> Isn't fix already available?
> 7d695d83767c ("serial: core: Fix serial_base_match() after fixing controller port name")
>
Yes, hopefully that should fix it. We'll see tonight, when my test bed builds it again.
Guenter