2021-06-26 04:13:42

by Maciej W. Rozycki

[permalink] [raw]
Subject: [PATCH v2 1/2] serial: 8250: Add proper clock handling for OxSemi PCIe devices

Oxford Semiconductor PCIe (Tornado) serial port devices are driven by a
fixed 62.5MHz clock input derived from the 100MHz PCI Express clock.

We currently drive the device using its default oversampling rate of 16
and the clock prescaler disabled, consequently yielding the baud base of
3906250. This base is inadequate for some of the high-speed baud rates
such as 460800bps, for which the closest rate possible can be obtained
by dividing the baud base by 8, yielding the baud rate of 488281.25bps,
which is off by 5.9638%. This is enough for data communication to break
with the remote end talking actual 460800bps where missed stop bits have
been observed.

We can do better however, by taking advantage of a reduced oversampling
rate, which can be set to any integer value from 4 to 16 inclusive by
programming the TCR register, and by using the clock prescaler, which
can be set to any value from 1 to 63.875 in increments of 0.125 in the
CPR/CPR2 register pair[1][2][3][4]. The prescaler has to be explicitly
enabled though by setting bit 7 in the MCR or otherwise it is bypassed
as if the value of 1 was used.

By using these parameters rates from 15625000bps down to 1bps can be
obtained, with either exact or highly-accurate actual bit rates for
standard and many non-standard rates.

Make use of these features then as follows:

- Set the baud base to 15625000, reflecting the minimum oversampling
rate of 4 with the clock prescaler and divisor both set to 1.

- Set the MCR mask and force bits in the UART template so as to have
MCR[7] always set and then have 8250 core propagate those settings, if
supplied as non-zero, overriding the ALPHA_KLUDGE_MCR default.

- Override the `get_divisor' handler and determine a good combination of
parameters by using a lookup table with predetermined value pairs of
the oversampling rate and the clock prescaler and finding a pair that
divides the input clock such that the quotient, when rounded to the
nearest integer, deviates the least from the exact result. Calculate
the clock divisor accordingly.

Scale the resulting oversampling rate (only by powers of two) if
possible so as to maximise it, reducing the divisor accordingly, and
avoid a divisor overflow for very low baud rates by scaling the
oversampling rate and/or the prescaler even if that causes some
accuracy loss.

Also handle the historic spd_cust feature so as to allow one to set
all the three parameters manually to arbitrary values, by keeping the
low 16 bits for the divisor and then putting TCR in bits 19:16 and
CPR/CPR2 in bits 28:20, sanitising the bit pattern supplied such as
to clamp CPR/CPR2 values between 0.000 and 0.875 inclusive to 1.000.
This preserves compatibility with any existing setups, that is where
requesting a custom divisor that only has any bits set among the low
16 the oversampling rate of 16 and the clock prescaler of 1 will be
used.

Finally abuse the `frac' argument to store the determined bit patterns
for the TCR, CPR and CPR2 registers.

- Override the `set_divisor' handler so as to set the TCR, CPR and CPR2
registers from the `frac' value supplied. Set the divisor as usually.

With the baud base set to 15625000 and the unsigned 16-bit UART_DIV_MAX
limitation imposed by `serial8250_get_baud_rate' standard baud rates
below 300bps become unavailable in the regular way, e.g. the rate of
200bps requires the baud base to be divided by 78125 and that is beyond
the unsigned 16-bit range. The historic spd_cust feature can still be
used to obtain such rates if so required.

Here are the figures for the standard and some non-standard baud rates
(including those quoted in Oxford Semiconductor documentation), giving
the requested rate (r), the actual rate yielded (a) and its deviation
from the requested rate (d), and the values of the oversampling rate
(tcr), the clock prescaler (cpr) and the divisor (div) produced by the
new `get_divisor' handler:

r: 15625000, a: 15625000.00, d: 0.0000%, tcr: 4, cpr: 1.000, div: 1
r: 12500000, a: 12500000.00, d: 0.0000%, tcr: 5, cpr: 1.000, div: 1
r: 10416666, a: 10416666.67, d: 0.0000%, tcr: 6, cpr: 1.000, div: 1
r: 8928571, a: 8928571.43, d: 0.0000%, tcr: 7, cpr: 1.000, div: 1
r: 7812500, a: 7812500.00, d: 0.0000%, tcr: 8, cpr: 1.000, div: 1
r: 4000000, a: 4000000.00, d: 0.0000%, tcr: 5, cpr: 3.125, div: 1
r: 3686400, a: 3676470.59, d: -0.2694%, tcr: 8, cpr: 2.125, div: 1
r: 3500000, a: 3496503.50, d: -0.0999%, tcr: 13, cpr: 1.375, div: 1
r: 3000000, a: 2976190.48, d: -0.7937%, tcr: 14, cpr: 1.500, div: 1
r: 2500000, a: 2500000.00, d: 0.0000%, tcr: 10, cpr: 2.500, div: 1
r: 2000000, a: 2000000.00, d: 0.0000%, tcr: 10, cpr: 3.125, div: 1
r: 1843200, a: 1838235.29, d: -0.2694%, tcr: 16, cpr: 2.125, div: 1
r: 1500000, a: 1492537.31, d: -0.4975%, tcr: 5, cpr: 8.375, div: 1
r: 1152000, a: 1152073.73, d: 0.0064%, tcr: 14, cpr: 3.875, div: 1
r: 921600, a: 919117.65, d: -0.2694%, tcr: 16, cpr: 2.125, div: 2
r: 576000, a: 576036.87, d: 0.0064%, tcr: 14, cpr: 3.875, div: 2
r: 460800, a: 460829.49, d: 0.0064%, tcr: 7, cpr: 3.875, div: 5
r: 230400, a: 230414.75, d: 0.0064%, tcr: 14, cpr: 3.875, div: 5
r: 115200, a: 115207.37, d: 0.0064%, tcr: 14, cpr: 1.250, div: 31
r: 57600, a: 57603.69, d: 0.0064%, tcr: 8, cpr: 3.875, div: 35
r: 38400, a: 38402.46, d: 0.0064%, tcr: 14, cpr: 3.875, div: 30
r: 19200, a: 19201.23, d: 0.0064%, tcr: 8, cpr: 3.875, div: 105
r: 9600, a: 9600.06, d: 0.0006%, tcr: 9, cpr: 1.125, div: 643
r: 4800, a: 4799.98, d: -0.0004%, tcr: 7, cpr: 2.875, div: 647
r: 2400, a: 2400.02, d: 0.0008%, tcr: 9, cpr: 2.250, div: 1286
r: 1200, a: 1200.00, d: 0.0000%, tcr: 14, cpr: 2.875, div: 1294
r: 300, a: 300.00, d: 0.0000%, tcr: 11, cpr: 2.625, div: 7215
r: 200, a: 200.00, d: 0.0000%, tcr: 16, cpr: 1.250, div: 15625
r: 150, a: 150.00, d: 0.0000%, tcr: 13, cpr: 2.250, div: 14245
r: 134, a: 134.00, d: 0.0000%, tcr: 11, cpr: 2.625, div: 16153
r: 110, a: 110.00, d: 0.0000%, tcr: 12, cpr: 1.000, div: 47348
r: 75, a: 75.00, d: 0.0000%, tcr: 4, cpr: 5.875, div: 35461
r: 50, a: 50.00, d: 0.0000%, tcr: 16, cpr: 1.250, div: 62500
r: 25, a: 25.00, d: 0.0000%, tcr: 16, cpr: 2.500, div: 62500
r: 4, a: 4.00, d: 0.0000%, tcr: 16, cpr: 20.000, div: 48828
r: 2, a: 2.00, d: 0.0000%, tcr: 16, cpr: 40.000, div: 48828
r: 1, a: 1.00, d: 0.0000%, tcr: 16, cpr: 63.875, div: 61154

References:

[1] "OXPCIe200 PCI Express Multi-Port Bridge", Oxford Semiconductor,
Inc., DS-0045, 10 Nov 2008, Section "950 Mode", pp. 64-65

[2] "OXPCIe952 PCI Express Bridge to Dual Serial & Parallel Port",
Oxford Semiconductor, Inc., DS-0046, Mar 06 08, Section "950 Mode",
p. 20

[3] "OXPCIe954 PCI Express Bridge to Quad Serial Port", Oxford
Semiconductor, Inc., DS-0047, Feb 08, Section "950 Mode", p. 20

[4] "OXPCIe958 PCI Express Bridge to Octal Serial Port", Oxford
Semiconductor, Inc., DS-0048, Feb 08, Section "950 Mode", p. 20

Signed-off-by: Maciej W. Rozycki <[email protected]>
---
Changes from v1:

- Kill a silly `-Woverflow' warning.
---
drivers/tty/serial/8250/8250.h | 23 ++
drivers/tty/serial/8250/8250_core.c | 6
drivers/tty/serial/8250/8250_pci.c | 331 ++++++++++++++++++++++++++++--------
drivers/tty/serial/8250/8250_port.c | 21 --
4 files changed, 293 insertions(+), 88 deletions(-)

linux-serial-8250-oxsemi-pcie-prescaler.diff
Index: linux-macro-ide-tty/drivers/tty/serial/8250/8250.h
===================================================================
--- linux-macro-ide-tty.orig/drivers/tty/serial/8250/8250.h
+++ linux-macro-ide-tty/drivers/tty/serial/8250/8250.h
@@ -120,6 +120,29 @@ static inline void serial_out(struct uar
up->port.serial_out(&up->port, offset, value);
}

+/*
+ * For the 16C950
+ */
+static inline void serial_icr_write(struct uart_8250_port *up,
+ int offset, int value)
+{
+ serial_out(up, UART_SCR, offset);
+ serial_out(up, UART_ICR, value);
+}
+
+static inline unsigned int serial_icr_read(struct uart_8250_port *up,
+ int offset)
+{
+ unsigned int value;
+
+ serial_icr_write(up, UART_ACR, up->acr | UART_ACR_ICRRD);
+ serial_out(up, UART_SCR, offset);
+ value = serial_in(up, UART_ICR);
+ serial_icr_write(up, UART_ACR, up->acr);
+
+ return value;
+}
+
void serial8250_clear_and_reinit_fifos(struct uart_8250_port *p);

static inline int serial_dl_read(struct uart_8250_port *up)
Index: linux-macro-ide-tty/drivers/tty/serial/8250/8250_core.c
===================================================================
--- linux-macro-ide-tty.orig/drivers/tty/serial/8250/8250_core.c
+++ linux-macro-ide-tty/drivers/tty/serial/8250/8250_core.c
@@ -1011,6 +1011,12 @@ int serial8250_register_8250_port(const
uart->rs485_stop_tx = up->rs485_stop_tx;
uart->dma = up->dma;

+ /* Override ALPHA_KLUDGE_MCR default. */
+ if (up->mcr_mask | up->mcr_force) {
+ uart->mcr_mask = up->mcr_mask;
+ uart->mcr_force = up->mcr_force;
+ }
+
/* Take tx_loadsz from fifosize if it wasn't set separately */
if (uart->port.fifosize && !uart->tx_loadsz)
uart->tx_loadsz = uart->port.fifosize;
Index: linux-macro-ide-tty/drivers/tty/serial/8250/8250_pci.c
===================================================================
--- linux-macro-ide-tty.orig/drivers/tty/serial/8250/8250_pci.c
+++ linux-macro-ide-tty/drivers/tty/serial/8250/8250_pci.c
@@ -1066,6 +1066,203 @@ static int pci_oxsemi_tornado_init(struc
return number_uarts;
}

+/* Tornado-specific constants for the TCR and CPR registers; see below. */
+#define OXSEMI_TORNADO_TCR_MASK 0xf
+#define OXSEMI_TORNADO_CPR_MASK 0x1ff
+#define OXSEMI_TORNADO_CPR_MIN 8
+
+/*
+ * Determine the oversampling rate, the clock prescaler, and the clock
+ * divisor for the requested baud rate. The clock rate is 62.5 MHz,
+ * which is four times the baud base, and the prescaler increments in
+ * steps of 1/8. Therefore to make calculations on integers we need
+ * to use a scaled clock rate, which is the baud base multiplied by 32
+ * (or our assumed UART clock rate multiplied by 2).
+ *
+ * The allowed oversampling rates are from 4 up to 16 inclusive (values
+ * from 0 to 3 inclusive map to 16). Likewise the clock prescaler allows
+ * values between 1.000 and 63.875 inclusive (operation for values from
+ * 0.000 to 0.875 has not been specified). The clock divisor is the usual
+ * unsigned 16-bit integer.
+ *
+ * For the most accurate baud rate we use a table of predetermined
+ * oversampling rates and clock prescalers that records all possible
+ * products of the two parameters in the range from 4 up to 255 inclusive,
+ * and additionally 335 for the 1500000bps rate, with the prescaler scaled
+ * by 8. The table is sorted by the decreasing value of the oversampling
+ * rate and ties are resolved by sorting by the decreasing value of the
+ * product. This way preference is given to higher oversampling rates.
+ *
+ * We iterate over the table and choose the product of an oversampling
+ * rate and a clock prescaler that gives the lowest integer division
+ * result deviation, or if an exact integer divider is found we stop
+ * looking for right away. We do some fixup if the resulting clock
+ * divisor required would be out of its unsigned 16-bit integer range.
+ *
+ * Finally we abuse the supposed fractional part returned to encode the
+ * 4-bit value of the oversampling rate and the 9-bit value of the clock
+ * prescaler which will end up in the TCR and CPR/CPR2 registers.
+ */
+static unsigned int pci_oxsemi_tornado_get_divisor(struct uart_port *port,
+ unsigned int baud,
+ unsigned int *frac)
+{
+ static u8 p[][2] = {
+ { 16, 14, }, { 16, 13, }, { 16, 12, }, { 16, 11, },
+ { 16, 10, }, { 16, 9, }, { 16, 8, }, { 15, 17, },
+ { 15, 16, }, { 15, 15, }, { 15, 14, }, { 15, 13, },
+ { 15, 12, }, { 15, 11, }, { 15, 10, }, { 15, 9, },
+ { 15, 8, }, { 14, 18, }, { 14, 17, }, { 14, 14, },
+ { 14, 13, }, { 14, 12, }, { 14, 11, }, { 14, 10, },
+ { 14, 9, }, { 14, 8, }, { 13, 19, }, { 13, 18, },
+ { 13, 17, }, { 13, 13, }, { 13, 12, }, { 13, 11, },
+ { 13, 10, }, { 13, 9, }, { 13, 8, }, { 12, 19, },
+ { 12, 18, }, { 12, 17, }, { 12, 11, }, { 12, 9, },
+ { 12, 8, }, { 11, 23, }, { 11, 22, }, { 11, 21, },
+ { 11, 20, }, { 11, 19, }, { 11, 18, }, { 11, 17, },
+ { 11, 11, }, { 11, 10, }, { 11, 9, }, { 11, 8, },
+ { 10, 25, }, { 10, 23, }, { 10, 20, }, { 10, 19, },
+ { 10, 17, }, { 10, 10, }, { 10, 9, }, { 10, 8, },
+ { 9, 27, }, { 9, 23, }, { 9, 21, }, { 9, 19, },
+ { 9, 18, }, { 9, 17, }, { 9, 9, }, { 9, 8, },
+ { 8, 31, }, { 8, 29, }, { 8, 23, }, { 8, 19, },
+ { 8, 17, }, { 8, 8, }, { 7, 35, }, { 7, 31, },
+ { 7, 29, }, { 7, 25, }, { 7, 23, }, { 7, 21, },
+ { 7, 19, }, { 7, 17, }, { 7, 15, }, { 7, 14, },
+ { 7, 13, }, { 7, 12, }, { 7, 11, }, { 7, 10, },
+ { 7, 9, }, { 7, 8, }, { 6, 41, }, { 6, 37, },
+ { 6, 31, }, { 6, 29, }, { 6, 23, }, { 6, 19, },
+ { 6, 17, }, { 6, 13, }, { 6, 11, }, { 6, 10, },
+ { 6, 9, }, { 6, 8, }, { 5, 67, }, { 5, 47, },
+ { 5, 43, }, { 5, 41, }, { 5, 37, }, { 5, 31, },
+ { 5, 29, }, { 5, 25, }, { 5, 23, }, { 5, 19, },
+ { 5, 17, }, { 5, 15, }, { 5, 13, }, { 5, 11, },
+ { 5, 10, }, { 5, 9, }, { 5, 8, }, { 4, 61, },
+ { 4, 59, }, { 4, 53, }, { 4, 47, }, { 4, 43, },
+ { 4, 41, }, { 4, 37, }, { 4, 31, }, { 4, 29, },
+ { 4, 23, }, { 4, 19, }, { 4, 17, }, { 4, 13, },
+ { 4, 9, }, { 4, 8, },
+ };
+ /* Scale the quotient for comparison to get the fractional part. */
+ const unsigned int quot_scale = 65536;
+ unsigned int sclk = port->uartclk * 2;
+ unsigned int sdiv = (sclk + (baud / 2)) / baud;
+ unsigned int best_squot;
+ unsigned int squot;
+ unsigned int quot;
+ u16 cpr;
+ u8 tcr;
+ int i;
+
+ /* Old custom speed handling. */
+ if (baud == 38400 && (port->flags & UPF_SPD_MASK) == UPF_SPD_CUST) {
+ unsigned int cust_div = port->custom_divisor;
+
+ quot = cust_div & UART_DIV_MAX;
+ tcr = (cust_div >> 16) & OXSEMI_TORNADO_TCR_MASK;
+ cpr = (cust_div >> 20) & OXSEMI_TORNADO_CPR_MASK;
+ if (cpr < OXSEMI_TORNADO_CPR_MIN)
+ cpr = OXSEMI_TORNADO_CPR_MIN;
+ } else {
+ best_squot = quot_scale;
+ for (i = 0; i < ARRAY_SIZE(p); i++) {
+ unsigned int spre;
+ unsigned int srem;
+ u8 cp;
+ u8 tc;
+
+ tc = p[i][0];
+ cp = p[i][1];
+ spre = tc * cp;
+
+ srem = sdiv % spre;
+ if (srem > spre / 2)
+ srem = spre - srem;
+ squot = (srem * quot_scale + spre / 2) / spre;
+
+ if (srem == 0) {
+ tcr = tc;
+ cpr = cp;
+ quot = sdiv / spre;
+ break;
+ } else if (squot < best_squot) {
+ best_squot = squot;
+ tcr = tc;
+ cpr = cp;
+ quot = (sdiv + spre / 2) / spre;
+ }
+ }
+ while (tcr <= (OXSEMI_TORNADO_TCR_MASK + 1) >> 1 &&
+ quot % 2 == 0) {
+ quot >>= 1;
+ tcr <<= 1;
+ }
+ while (quot > UART_DIV_MAX) {
+ if (tcr <= (OXSEMI_TORNADO_TCR_MASK + 1) >> 1) {
+ quot >>= 1;
+ tcr <<= 1;
+ } else if (cpr <= OXSEMI_TORNADO_CPR_MASK >> 1) {
+ quot >>= 1;
+ cpr <<= 1;
+ } else {
+ quot = quot * cpr / OXSEMI_TORNADO_CPR_MASK;
+ cpr = OXSEMI_TORNADO_CPR_MASK;
+ }
+ }
+ }
+
+ *frac = (cpr << 8) | (tcr & OXSEMI_TORNADO_TCR_MASK);
+ return quot;
+}
+
+/*
+ * Set the oversampling rate in the transmitter clock cycle register (TCR),
+ * the clock prescaler in the clock prescaler register (CPR and CPR2), and
+ * the clock divisor in the divisor latch (DLL and DLM). Note that for
+ * backwards compatibility any write to CPR clears CPR2 and therefore CPR
+ * has to be written first, followed by CPR2, which occupies the location
+ * of CKS used with earlier UART designs.
+ */
+static void pci_oxsemi_tornado_set_divisor(struct uart_port *port,
+ unsigned int baud,
+ unsigned int quot,
+ unsigned int quot_frac)
+{
+ struct uart_8250_port *up = up_to_u8250p(port);
+ u8 cpr2 = quot_frac >> 16;
+ u8 cpr = quot_frac >> 8;
+ u8 tcr = quot_frac;
+
+ serial_icr_write(up, UART_TCR, tcr);
+ serial_icr_write(up, UART_CPR, cpr);
+ serial_icr_write(up, UART_CKS, cpr2);
+ serial8250_do_set_divisor(port, baud, quot, 0);
+}
+
+/*
+ * For Tornado devices we force MCR[7] set for the Divide-by-M N/8 baud rate
+ * generator prescaler (CPR and CPR2). Otherwise no prescaler would be used.
+ */
+static int pci_oxsemi_tornado_setup(struct serial_private *priv,
+ const struct pciserial_board *board,
+ struct uart_8250_port *up, int idx)
+{
+ struct pci_dev *dev = priv->dev;
+
+ /* OxSemi Tornado devices are all 0xCxxx */
+ if (dev->vendor == PCI_VENDOR_ID_OXSEMI &&
+ (dev->device & 0xF000) == 0xC000) {
+ up->port.get_divisor = pci_oxsemi_tornado_get_divisor;
+ up->port.set_divisor = pci_oxsemi_tornado_set_divisor;
+
+ /* Avoid `-Woverflow' warning with ~UART_MCR_CLKSEL. */
+ up->mcr_mask = UART_MCR_CLKSEL ^ 0xffu;
+ up->mcr_force = UART_MCR_CLKSEL;
+ }
+
+ return pci_default_setup(priv, board, up, idx);
+}
+
static int pci_asix_setup(struct serial_private *priv,
const struct pciserial_board *board,
struct uart_8250_port *port, int idx)
@@ -2518,7 +2715,7 @@ static struct pci_serial_quirk pci_seria
.subvendor = PCI_ANY_ID,
.subdevice = PCI_ANY_ID,
.init = pci_oxsemi_tornado_init,
- .setup = pci_default_setup,
+ .setup = pci_oxsemi_tornado_setup,
},
{
.vendor = PCI_VENDOR_ID_MAINPINE,
@@ -2526,7 +2723,7 @@ static struct pci_serial_quirk pci_seria
.subvendor = PCI_ANY_ID,
.subdevice = PCI_ANY_ID,
.init = pci_oxsemi_tornado_init,
- .setup = pci_default_setup,
+ .setup = pci_oxsemi_tornado_setup,
},
{
.vendor = PCI_VENDOR_ID_DIGI,
@@ -2534,7 +2731,7 @@ static struct pci_serial_quirk pci_seria
.subvendor = PCI_SUBVENDOR_ID_IBM,
.subdevice = PCI_ANY_ID,
.init = pci_oxsemi_tornado_init,
- .setup = pci_default_setup,
+ .setup = pci_oxsemi_tornado_setup,
},
{
.vendor = PCI_VENDOR_ID_INTEL,
@@ -2851,7 +3048,7 @@ enum pci_board_num_t {
pbn_b0_2_1843200,
pbn_b0_4_1843200,

- pbn_b0_1_3906250,
+ pbn_b0_1_15625000,

pbn_b0_bt_1_115200,
pbn_b0_bt_2_115200,
@@ -2931,10 +3128,10 @@ enum pci_board_num_t {
pbn_plx_romulus,
pbn_endrun_2_4000000,
pbn_oxsemi,
- pbn_oxsemi_1_3906250,
- pbn_oxsemi_2_3906250,
- pbn_oxsemi_4_3906250,
- pbn_oxsemi_8_3906250,
+ pbn_oxsemi_1_15625000,
+ pbn_oxsemi_2_15625000,
+ pbn_oxsemi_4_15625000,
+ pbn_oxsemi_8_15625000,
pbn_intel_i960,
pbn_sgi_ioc3,
pbn_computone_4,
@@ -3081,10 +3278,10 @@ static struct pciserial_board pci_boards
.uart_offset = 8,
},

- [pbn_b0_1_3906250] = {
+ [pbn_b0_1_15625000] = {
.flags = FL_BASE0,
.num_ports = 1,
- .base_baud = 3906250,
+ .base_baud = 15625000,
.uart_offset = 8,
},

@@ -3479,31 +3676,31 @@ static struct pciserial_board pci_boards
.base_baud = 115200,
.uart_offset = 8,
},
- [pbn_oxsemi_1_3906250] = {
+ [pbn_oxsemi_1_15625000] = {
.flags = FL_BASE0,
.num_ports = 1,
- .base_baud = 3906250,
+ .base_baud = 15625000,
.uart_offset = 0x200,
.first_offset = 0x1000,
},
- [pbn_oxsemi_2_3906250] = {
+ [pbn_oxsemi_2_15625000] = {
.flags = FL_BASE0,
.num_ports = 2,
- .base_baud = 3906250,
+ .base_baud = 15625000,
.uart_offset = 0x200,
.first_offset = 0x1000,
},
- [pbn_oxsemi_4_3906250] = {
+ [pbn_oxsemi_4_15625000] = {
.flags = FL_BASE0,
.num_ports = 4,
- .base_baud = 3906250,
+ .base_baud = 15625000,
.uart_offset = 0x200,
.first_offset = 0x1000,
},
- [pbn_oxsemi_8_3906250] = {
+ [pbn_oxsemi_8_15625000] = {
.flags = FL_BASE0,
.num_ports = 8,
- .base_baud = 3906250,
+ .base_baud = 15625000,
.uart_offset = 0x200,
.first_offset = 0x1000,
},
@@ -4510,158 +4707,158 @@ static const struct pci_device_id serial
*/
{ PCI_VENDOR_ID_OXSEMI, 0xc101, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc105, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc11b, /* OXPCIe952 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc11f, /* OXPCIe952 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc120, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc124, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc138, /* OXPCIe952 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc13d, /* OXPCIe952 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc140, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc141, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc144, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc145, /* OXPCIe952 1 Legacy UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_b0_1_3906250 },
+ pbn_b0_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc158, /* OXPCIe952 2 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_2_3906250 },
+ pbn_oxsemi_2_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc15d, /* OXPCIe952 2 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_2_3906250 },
+ pbn_oxsemi_2_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc208, /* OXPCIe954 4 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_4_3906250 },
+ pbn_oxsemi_4_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc20d, /* OXPCIe954 4 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_4_3906250 },
+ pbn_oxsemi_4_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc308, /* OXPCIe958 8 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_8_3906250 },
+ pbn_oxsemi_8_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc30d, /* OXPCIe958 8 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_8_3906250 },
+ pbn_oxsemi_8_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc40b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc40f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc41b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc41f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc42b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc42f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc43b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc43f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc44b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc44f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc45b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc45f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc46b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc46f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc47b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc47f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc48b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc48f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc49b, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc49f, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc4ab, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc4af, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc4bb, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc4bf, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc4cb, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_OXSEMI, 0xc4cf, /* OXPCIe200 1 Native UART */
PCI_ANY_ID, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
/*
* Mainpine Inc. IQ Express "Rev3" utilizing OxSemi Tornado
*/
{ PCI_VENDOR_ID_MAINPINE, 0x4000, /* IQ Express 1 Port V.34 Super-G3 Fax */
PCI_VENDOR_ID_MAINPINE, 0x4001, 0, 0,
- pbn_oxsemi_1_3906250 },
+ pbn_oxsemi_1_15625000 },
{ PCI_VENDOR_ID_MAINPINE, 0x4000, /* IQ Express 2 Port V.34 Super-G3 Fax */
PCI_VENDOR_ID_MAINPINE, 0x4002, 0, 0,
- pbn_oxsemi_2_3906250 },
+ pbn_oxsemi_2_15625000 },
{ PCI_VENDOR_ID_MAINPINE, 0x4000, /* IQ Express 4 Port V.34 Super-G3 Fax */
PCI_VENDOR_ID_MAINPINE, 0x4004, 0, 0,
- pbn_oxsemi_4_3906250 },
+ pbn_oxsemi_4_15625000 },
{ PCI_VENDOR_ID_MAINPINE, 0x4000, /* IQ Express 8 Port V.34 Super-G3 Fax */
PCI_VENDOR_ID_MAINPINE, 0x4008, 0, 0,
- pbn_oxsemi_8_3906250 },
+ pbn_oxsemi_8_15625000 },

/*
* Digi/IBM PCIe 2-port Async EIA-232 Adapter utilizing OxSemi Tornado
*/
{ PCI_VENDOR_ID_DIGI, PCIE_DEVICE_ID_NEO_2_OX_IBM,
PCI_SUBVENDOR_ID_IBM, PCI_ANY_ID, 0, 0,
- pbn_oxsemi_2_3906250 },
+ pbn_oxsemi_2_15625000 },

/*
* SBS Technologies, Inc. P-Octal and PMC-OCTPRO cards,
Index: linux-macro-ide-tty/drivers/tty/serial/8250/8250_port.c
===================================================================
--- linux-macro-ide-tty.orig/drivers/tty/serial/8250/8250_port.c
+++ linux-macro-ide-tty/drivers/tty/serial/8250/8250_port.c
@@ -529,27 +529,6 @@ serial_port_out_sync(struct uart_port *p
}

/*
- * For the 16C950
- */
-static void serial_icr_write(struct uart_8250_port *up, int offset, int value)
-{
- serial_out(up, UART_SCR, offset);
- serial_out(up, UART_ICR, value);
-}
-
-static unsigned int serial_icr_read(struct uart_8250_port *up, int offset)
-{
- unsigned int value;
-
- serial_icr_write(up, UART_ACR, up->acr | UART_ACR_ICRRD);
- serial_out(up, UART_SCR, offset);
- value = serial_in(up, UART_ICR);
- serial_icr_write(up, UART_ACR, up->acr);
-
- return value;
-}
-
-/*
* FIFO support.
*/
static void serial8250_clear_fifos(struct uart_8250_port *p)


2021-07-12 20:16:25

by andy

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] serial: 8250: Add proper clock handling for OxSemi PCIe devices

Sat, Jun 26, 2021 at 06:11:32AM +0200, Maciej W. Rozycki kirjoitti:
> Oxford Semiconductor PCIe (Tornado) serial port devices are driven by a
> fixed 62.5MHz clock input derived from the 100MHz PCI Express clock.
>
> We currently drive the device using its default oversampling rate of 16
> and the clock prescaler disabled, consequently yielding the baud base of
> 3906250. This base is inadequate for some of the high-speed baud rates
> such as 460800bps, for which the closest rate possible can be obtained
> by dividing the baud base by 8, yielding the baud rate of 488281.25bps,
> which is off by 5.9638%. This is enough for data communication to break
> with the remote end talking actual 460800bps where missed stop bits have
> been observed.
>
> We can do better however, by taking advantage of a reduced oversampling
> rate, which can be set to any integer value from 4 to 16 inclusive by
> programming the TCR register, and by using the clock prescaler, which
> can be set to any value from 1 to 63.875 in increments of 0.125 in the
> CPR/CPR2 register pair[1][2][3][4]. The prescaler has to be explicitly
> enabled though by setting bit 7 in the MCR or otherwise it is bypassed
> as if the value of 1 was used.
>
> By using these parameters rates from 15625000bps down to 1bps can be
> obtained, with either exact or highly-accurate actual bit rates for
> standard and many non-standard rates.
>
> Make use of these features then as follows:
>
> - Set the baud base to 15625000, reflecting the minimum oversampling
> rate of 4 with the clock prescaler and divisor both set to 1.
>
> - Set the MCR mask and force bits in the UART template so as to have
> MCR[7] always set and then have 8250 core propagate those settings, if
> supplied as non-zero, overriding the ALPHA_KLUDGE_MCR default.
>
> - Override the `get_divisor' handler and determine a good combination of
> parameters by using a lookup table with predetermined value pairs of
> the oversampling rate and the clock prescaler and finding a pair that
> divides the input clock such that the quotient, when rounded to the
> nearest integer, deviates the least from the exact result. Calculate
> the clock divisor accordingly.
>
> Scale the resulting oversampling rate (only by powers of two) if
> possible so as to maximise it, reducing the divisor accordingly, and
> avoid a divisor overflow for very low baud rates by scaling the
> oversampling rate and/or the prescaler even if that causes some
> accuracy loss.

> Also handle the historic spd_cust feature so as to allow one to set
> all the three parameters manually to arbitrary values, by keeping the
> low 16 bits for the divisor and then putting TCR in bits 19:16 and
> CPR/CPR2 in bits 28:20, sanitising the bit pattern supplied such as
> to clamp CPR/CPR2 values between 0.000 and 0.875 inclusive to 1.000.
> This preserves compatibility with any existing setups, that is where
> requesting a custom divisor that only has any bits set among the low
> 16 the oversampling rate of 16 and the clock prescaler of 1 will be
> used.

Please no. We really would like to get rid of that ugly hack. The BOTHER exists
for ages.

> Finally abuse the `frac' argument to store the determined bit patterns
> for the TCR, CPR and CPR2 registers.
>
> - Override the `set_divisor' handler so as to set the TCR, CPR and CPR2
> registers from the `frac' value supplied. Set the divisor as usually.
>
> With the baud base set to 15625000 and the unsigned 16-bit UART_DIV_MAX
> limitation imposed by `serial8250_get_baud_rate' standard baud rates
> below 300bps become unavailable in the regular way, e.g. the rate of
> 200bps requires the baud base to be divided by 78125 and that is beyond
> the unsigned 16-bit range. The historic spd_cust feature can still be
> used to obtain such rates if so required.
>
> Here are the figures for the standard and some non-standard baud rates
> (including those quoted in Oxford Semiconductor documentation), giving
> the requested rate (r), the actual rate yielded (a) and its deviation
> from the requested rate (d), and the values of the oversampling rate
> (tcr), the clock prescaler (cpr) and the divisor (div) produced by the
> new `get_divisor' handler:
>
> r: 15625000, a: 15625000.00, d: 0.0000%, tcr: 4, cpr: 1.000, div: 1
> r: 12500000, a: 12500000.00, d: 0.0000%, tcr: 5, cpr: 1.000, div: 1
> r: 10416666, a: 10416666.67, d: 0.0000%, tcr: 6, cpr: 1.000, div: 1
> r: 8928571, a: 8928571.43, d: 0.0000%, tcr: 7, cpr: 1.000, div: 1
> r: 7812500, a: 7812500.00, d: 0.0000%, tcr: 8, cpr: 1.000, div: 1
> r: 4000000, a: 4000000.00, d: 0.0000%, tcr: 5, cpr: 3.125, div: 1
> r: 3686400, a: 3676470.59, d: -0.2694%, tcr: 8, cpr: 2.125, div: 1
> r: 3500000, a: 3496503.50, d: -0.0999%, tcr: 13, cpr: 1.375, div: 1
> r: 3000000, a: 2976190.48, d: -0.7937%, tcr: 14, cpr: 1.500, div: 1
> r: 2500000, a: 2500000.00, d: 0.0000%, tcr: 10, cpr: 2.500, div: 1
> r: 2000000, a: 2000000.00, d: 0.0000%, tcr: 10, cpr: 3.125, div: 1
> r: 1843200, a: 1838235.29, d: -0.2694%, tcr: 16, cpr: 2.125, div: 1
> r: 1500000, a: 1492537.31, d: -0.4975%, tcr: 5, cpr: 8.375, div: 1
> r: 1152000, a: 1152073.73, d: 0.0064%, tcr: 14, cpr: 3.875, div: 1
> r: 921600, a: 919117.65, d: -0.2694%, tcr: 16, cpr: 2.125, div: 2
> r: 576000, a: 576036.87, d: 0.0064%, tcr: 14, cpr: 3.875, div: 2
> r: 460800, a: 460829.49, d: 0.0064%, tcr: 7, cpr: 3.875, div: 5
> r: 230400, a: 230414.75, d: 0.0064%, tcr: 14, cpr: 3.875, div: 5
> r: 115200, a: 115207.37, d: 0.0064%, tcr: 14, cpr: 1.250, div: 31
> r: 57600, a: 57603.69, d: 0.0064%, tcr: 8, cpr: 3.875, div: 35
> r: 38400, a: 38402.46, d: 0.0064%, tcr: 14, cpr: 3.875, div: 30
> r: 19200, a: 19201.23, d: 0.0064%, tcr: 8, cpr: 3.875, div: 105
> r: 9600, a: 9600.06, d: 0.0006%, tcr: 9, cpr: 1.125, div: 643
> r: 4800, a: 4799.98, d: -0.0004%, tcr: 7, cpr: 2.875, div: 647
> r: 2400, a: 2400.02, d: 0.0008%, tcr: 9, cpr: 2.250, div: 1286
> r: 1200, a: 1200.00, d: 0.0000%, tcr: 14, cpr: 2.875, div: 1294
> r: 300, a: 300.00, d: 0.0000%, tcr: 11, cpr: 2.625, div: 7215
> r: 200, a: 200.00, d: 0.0000%, tcr: 16, cpr: 1.250, div: 15625
> r: 150, a: 150.00, d: 0.0000%, tcr: 13, cpr: 2.250, div: 14245
> r: 134, a: 134.00, d: 0.0000%, tcr: 11, cpr: 2.625, div: 16153
> r: 110, a: 110.00, d: 0.0000%, tcr: 12, cpr: 1.000, div: 47348
> r: 75, a: 75.00, d: 0.0000%, tcr: 4, cpr: 5.875, div: 35461
> r: 50, a: 50.00, d: 0.0000%, tcr: 16, cpr: 1.250, div: 62500
> r: 25, a: 25.00, d: 0.0000%, tcr: 16, cpr: 2.500, div: 62500
> r: 4, a: 4.00, d: 0.0000%, tcr: 16, cpr: 20.000, div: 48828
> r: 2, a: 2.00, d: 0.0000%, tcr: 16, cpr: 40.000, div: 48828
> r: 1, a: 1.00, d: 0.0000%, tcr: 16, cpr: 63.875, div: 61154
>
> References:
>
> [1] "OXPCIe200 PCI Express Multi-Port Bridge", Oxford Semiconductor,
> Inc., DS-0045, 10 Nov 2008, Section "950 Mode", pp. 64-65
>
> [2] "OXPCIe952 PCI Express Bridge to Dual Serial & Parallel Port",
> Oxford Semiconductor, Inc., DS-0046, Mar 06 08, Section "950 Mode",
> p. 20
>
> [3] "OXPCIe954 PCI Express Bridge to Quad Serial Port", Oxford
> Semiconductor, Inc., DS-0047, Feb 08, Section "950 Mode", p. 20
>
> [4] "OXPCIe958 PCI Express Bridge to Octal Serial Port", Oxford
> Semiconductor, Inc., DS-0048, Feb 08, Section "950 Mode", p. 20

Is it possible to reduce a commit message by shifting some stuff to the
dedicated documentation?

...

> drivers/tty/serial/8250/8250_pci.c | 331 ++++++++++++++++++++++++++++--------

Can we, please, split the quirk driver first as it's done in a lot of examples
(_exar, _mid, _lpss, _...) and then modify it?

...

> +/* Tornado-specific constants for the TCR and CPR registers; see below. */
> +#define OXSEMI_TORNADO_TCR_MASK 0xf
> +#define OXSEMI_TORNADO_CPR_MASK 0x1ff
> +#define OXSEMI_TORNADO_CPR_MIN 8
> +
> +/*
> + * Determine the oversampling rate, the clock prescaler, and the clock
> + * divisor for the requested baud rate. The clock rate is 62.5 MHz,
> + * which is four times the baud base, and the prescaler increments in
> + * steps of 1/8. Therefore to make calculations on integers we need
> + * to use a scaled clock rate, which is the baud base multiplied by 32
> + * (or our assumed UART clock rate multiplied by 2).
> + *
> + * The allowed oversampling rates are from 4 up to 16 inclusive (values
> + * from 0 to 3 inclusive map to 16). Likewise the clock prescaler allows
> + * values between 1.000 and 63.875 inclusive (operation for values from
> + * 0.000 to 0.875 has not been specified). The clock divisor is the usual
> + * unsigned 16-bit integer.
> + *
> + * For the most accurate baud rate we use a table of predetermined
> + * oversampling rates and clock prescalers that records all possible
> + * products of the two parameters in the range from 4 up to 255 inclusive,
> + * and additionally 335 for the 1500000bps rate, with the prescaler scaled
> + * by 8. The table is sorted by the decreasing value of the oversampling
> + * rate and ties are resolved by sorting by the decreasing value of the
> + * product. This way preference is given to higher oversampling rates.
> + *
> + * We iterate over the table and choose the product of an oversampling
> + * rate and a clock prescaler that gives the lowest integer division
> + * result deviation, or if an exact integer divider is found we stop
> + * looking for right away. We do some fixup if the resulting clock
> + * divisor required would be out of its unsigned 16-bit integer range.
> + *
> + * Finally we abuse the supposed fractional part returned to encode the
> + * 4-bit value of the oversampling rate and the 9-bit value of the clock
> + * prescaler which will end up in the TCR and CPR/CPR2 registers.
> + */
> +static unsigned int pci_oxsemi_tornado_get_divisor(struct uart_port *port,
> + unsigned int baud,
> + unsigned int *frac)
> +{
> + static u8 p[][2] = {
> + { 16, 14, }, { 16, 13, }, { 16, 12, }, { 16, 11, },
> + { 16, 10, }, { 16, 9, }, { 16, 8, }, { 15, 17, },
> + { 15, 16, }, { 15, 15, }, { 15, 14, }, { 15, 13, },
> + { 15, 12, }, { 15, 11, }, { 15, 10, }, { 15, 9, },
> + { 15, 8, }, { 14, 18, }, { 14, 17, }, { 14, 14, },
> + { 14, 13, }, { 14, 12, }, { 14, 11, }, { 14, 10, },
> + { 14, 9, }, { 14, 8, }, { 13, 19, }, { 13, 18, },
> + { 13, 17, }, { 13, 13, }, { 13, 12, }, { 13, 11, },
> + { 13, 10, }, { 13, 9, }, { 13, 8, }, { 12, 19, },
> + { 12, 18, }, { 12, 17, }, { 12, 11, }, { 12, 9, },
> + { 12, 8, }, { 11, 23, }, { 11, 22, }, { 11, 21, },
> + { 11, 20, }, { 11, 19, }, { 11, 18, }, { 11, 17, },
> + { 11, 11, }, { 11, 10, }, { 11, 9, }, { 11, 8, },
> + { 10, 25, }, { 10, 23, }, { 10, 20, }, { 10, 19, },
> + { 10, 17, }, { 10, 10, }, { 10, 9, }, { 10, 8, },
> + { 9, 27, }, { 9, 23, }, { 9, 21, }, { 9, 19, },
> + { 9, 18, }, { 9, 17, }, { 9, 9, }, { 9, 8, },
> + { 8, 31, }, { 8, 29, }, { 8, 23, }, { 8, 19, },
> + { 8, 17, }, { 8, 8, }, { 7, 35, }, { 7, 31, },
> + { 7, 29, }, { 7, 25, }, { 7, 23, }, { 7, 21, },
> + { 7, 19, }, { 7, 17, }, { 7, 15, }, { 7, 14, },
> + { 7, 13, }, { 7, 12, }, { 7, 11, }, { 7, 10, },
> + { 7, 9, }, { 7, 8, }, { 6, 41, }, { 6, 37, },
> + { 6, 31, }, { 6, 29, }, { 6, 23, }, { 6, 19, },
> + { 6, 17, }, { 6, 13, }, { 6, 11, }, { 6, 10, },
> + { 6, 9, }, { 6, 8, }, { 5, 67, }, { 5, 47, },
> + { 5, 43, }, { 5, 41, }, { 5, 37, }, { 5, 31, },
> + { 5, 29, }, { 5, 25, }, { 5, 23, }, { 5, 19, },
> + { 5, 17, }, { 5, 15, }, { 5, 13, }, { 5, 11, },
> + { 5, 10, }, { 5, 9, }, { 5, 8, }, { 4, 61, },
> + { 4, 59, }, { 4, 53, }, { 4, 47, }, { 4, 43, },
> + { 4, 41, }, { 4, 37, }, { 4, 31, }, { 4, 29, },
> + { 4, 23, }, { 4, 19, }, { 4, 17, }, { 4, 13, },
> + { 4, 9, }, { 4, 8, },
> + };

Oh l? l?! Please, use rational best approximation algorithm instead (check CONFIG_RATIONAL).


--
With Best Regards,
Andy Shevchenko


2021-07-13 01:53:17

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] serial: 8250: Add proper clock handling for OxSemi PCIe devices

Hi Andy,

Something wrong with your "From:" header; I've fixed it up based on a
best guess basis.

On Mon, 12 Jul 2021, [email protected] wrote:

> > Also handle the historic spd_cust feature so as to allow one to set
> > all the three parameters manually to arbitrary values, by keeping the
> > low 16 bits for the divisor and then putting TCR in bits 19:16 and
> > CPR/CPR2 in bits 28:20, sanitising the bit pattern supplied such as
> > to clamp CPR/CPR2 values between 0.000 and 0.875 inclusive to 1.000.
> > This preserves compatibility with any existing setups, that is where
> > requesting a custom divisor that only has any bits set among the low
> > 16 the oversampling rate of 16 and the clock prescaler of 1 will be
> > used.
>
> Please no. We really would like to get rid of that ugly hack. The BOTHER exists
> for ages.

I have actually carefully considered it before submission and:

1. it remains a supported user API with a tool included with contemporary
distributions, and

2. with this device you can't set all the possible whole-number baud
rates let alone UART clock frequencies with the BOTHER API, and

3. it doesn't hurt.

If you'd like to get rid of SPD_CUST, then just do so, but until then I
fail to see a point to have it supported with some devices but not other
ones.

NB if you do get to it, then please consider adding an equally flexible
API too, e.g. for fractional baud rates (134.5bps haha); I won't mind if
it's less hackish though.

> > References:
> >
> > [1] "OXPCIe200 PCI Express Multi-Port Bridge", Oxford Semiconductor,
> > Inc., DS-0045, 10 Nov 2008, Section "950 Mode", pp. 64-65
> >
> > [2] "OXPCIe952 PCI Express Bridge to Dual Serial & Parallel Port",
> > Oxford Semiconductor, Inc., DS-0046, Mar 06 08, Section "950 Mode",
> > p. 20
> >
> > [3] "OXPCIe954 PCI Express Bridge to Quad Serial Port", Oxford
> > Semiconductor, Inc., DS-0047, Feb 08, Section "950 Mode", p. 20
> >
> > [4] "OXPCIe958 PCI Express Bridge to Octal Serial Port", Oxford
> > Semiconductor, Inc., DS-0048, Feb 08, Section "950 Mode", p. 20
>
> Is it possible to reduce a commit message by shifting some stuff to the
> dedicated documentation?

The relevant stuff has been included as comments along with actual code
already, and the rest is the usual submission-time rationale. This will
be the initial source of information when someone studies the history of
this code (`git log').

I don't consider it cast in stone however, so if there's any particular
piece you'd like to see elsewhere, then please point out to me what to
move and where. Or give any guidance other than just: "Rewrite it!"

(Yes I often have troubles figuring out the real intent of some changes
made say 15 years ago that have turned out broken after all those years
and whose change description is simply too terse now that the lore has
been lost.)

> > drivers/tty/serial/8250/8250_pci.c | 331 ++++++++++++++++++++++++++++--------
>
> Can we, please, split the quirk driver first as it's done in a lot of examples
> (_exar, _mid, _lpss, _...) and then modify it?

I have found it unclear where the line is drawn between having support
code included with 8250_pci.c proper and having it split off to a separate
file. All the device-specific files seem to provide complex handling,
well beyond just calculating the clock.

I'll be happy to split it off however (with a suitable preparatory
change) if there is a consensus in favour to doing so.

> > +/*
> > + * Determine the oversampling rate, the clock prescaler, and the clock
> > + * divisor for the requested baud rate. The clock rate is 62.5 MHz,
> > + * which is four times the baud base, and the prescaler increments in
> > + * steps of 1/8. Therefore to make calculations on integers we need
> > + * to use a scaled clock rate, which is the baud base multiplied by 32
> > + * (or our assumed UART clock rate multiplied by 2).
> > + *
> > + * The allowed oversampling rates are from 4 up to 16 inclusive (values
> > + * from 0 to 3 inclusive map to 16). Likewise the clock prescaler allows
> > + * values between 1.000 and 63.875 inclusive (operation for values from
> > + * 0.000 to 0.875 has not been specified). The clock divisor is the usual
> > + * unsigned 16-bit integer.
> > + *
> > + * For the most accurate baud rate we use a table of predetermined
> > + * oversampling rates and clock prescalers that records all possible
> > + * products of the two parameters in the range from 4 up to 255 inclusive,
> > + * and additionally 335 for the 1500000bps rate, with the prescaler scaled
> > + * by 8. The table is sorted by the decreasing value of the oversampling
> > + * rate and ties are resolved by sorting by the decreasing value of the
> > + * product. This way preference is given to higher oversampling rates.
> > + *
> > + * We iterate over the table and choose the product of an oversampling
> > + * rate and a clock prescaler that gives the lowest integer division
> > + * result deviation, or if an exact integer divider is found we stop
> > + * looking for right away. We do some fixup if the resulting clock
> > + * divisor required would be out of its unsigned 16-bit integer range.
> > + *
> > + * Finally we abuse the supposed fractional part returned to encode the
> > + * 4-bit value of the oversampling rate and the 9-bit value of the clock
> > + * prescaler which will end up in the TCR and CPR/CPR2 registers.
> > + */
> > +static unsigned int pci_oxsemi_tornado_get_divisor(struct uart_port *port,
> > + unsigned int baud,
> > + unsigned int *frac)
> > +{
> > + static u8 p[][2] = {
> > + { 16, 14, }, { 16, 13, }, { 16, 12, }, { 16, 11, },
> > + { 16, 10, }, { 16, 9, }, { 16, 8, }, { 15, 17, },
> > + { 15, 16, }, { 15, 15, }, { 15, 14, }, { 15, 13, },
> > + { 15, 12, }, { 15, 11, }, { 15, 10, }, { 15, 9, },
> > + { 15, 8, }, { 14, 18, }, { 14, 17, }, { 14, 14, },
> > + { 14, 13, }, { 14, 12, }, { 14, 11, }, { 14, 10, },
> > + { 14, 9, }, { 14, 8, }, { 13, 19, }, { 13, 18, },
> > + { 13, 17, }, { 13, 13, }, { 13, 12, }, { 13, 11, },
> > + { 13, 10, }, { 13, 9, }, { 13, 8, }, { 12, 19, },
> > + { 12, 18, }, { 12, 17, }, { 12, 11, }, { 12, 9, },
> > + { 12, 8, }, { 11, 23, }, { 11, 22, }, { 11, 21, },
> > + { 11, 20, }, { 11, 19, }, { 11, 18, }, { 11, 17, },
> > + { 11, 11, }, { 11, 10, }, { 11, 9, }, { 11, 8, },
> > + { 10, 25, }, { 10, 23, }, { 10, 20, }, { 10, 19, },
> > + { 10, 17, }, { 10, 10, }, { 10, 9, }, { 10, 8, },
> > + { 9, 27, }, { 9, 23, }, { 9, 21, }, { 9, 19, },
> > + { 9, 18, }, { 9, 17, }, { 9, 9, }, { 9, 8, },
> > + { 8, 31, }, { 8, 29, }, { 8, 23, }, { 8, 19, },
> > + { 8, 17, }, { 8, 8, }, { 7, 35, }, { 7, 31, },
> > + { 7, 29, }, { 7, 25, }, { 7, 23, }, { 7, 21, },
> > + { 7, 19, }, { 7, 17, }, { 7, 15, }, { 7, 14, },
> > + { 7, 13, }, { 7, 12, }, { 7, 11, }, { 7, 10, },
> > + { 7, 9, }, { 7, 8, }, { 6, 41, }, { 6, 37, },
> > + { 6, 31, }, { 6, 29, }, { 6, 23, }, { 6, 19, },
> > + { 6, 17, }, { 6, 13, }, { 6, 11, }, { 6, 10, },
> > + { 6, 9, }, { 6, 8, }, { 5, 67, }, { 5, 47, },
> > + { 5, 43, }, { 5, 41, }, { 5, 37, }, { 5, 31, },
> > + { 5, 29, }, { 5, 25, }, { 5, 23, }, { 5, 19, },
> > + { 5, 17, }, { 5, 15, }, { 5, 13, }, { 5, 11, },
> > + { 5, 10, }, { 5, 9, }, { 5, 8, }, { 4, 61, },
> > + { 4, 59, }, { 4, 53, }, { 4, 47, }, { 4, 43, },
> > + { 4, 41, }, { 4, 37, }, { 4, 31, }, { 4, 29, },
> > + { 4, 23, }, { 4, 19, }, { 4, 17, }, { 4, 13, },
> > + { 4, 9, }, { 4, 8, },
> > + };
>
> Oh là là! Please, use rational best approximation algorithm instead
> (check CONFIG_RATIONAL).

Thanks for the pointer, I didn't know we had this piece.

However how is it supposed to apply here? The denominator is always 8,
so we can rule it out (by multiplying the dividend by 8, which this piece
does, so that the divisor is a whole number), but the numerator has to be
a product of three integers, from a different range each ([4,16], [8,511],
[1, 65535]) as noted above.

Essentially we need to find such three integers (with extra constraints)
the product of which is closest to (500000000 / baud_rate) -- which IMHO
amounts to factorisation, an NP-complete problem as you have been surely
aware (and the whole world relies on), and I have decided that this simple
table-driven approximation is good enough to handle the usual baud rates,
especially the higher ones. For several baud rates it gives more accurate
results (lower deviation) than the factors proposed in the manufacturer's
datasheets.

I just fail to see how your proposed algorithm could be factored in here,
but I'll be happy to be proved wrong, so I'll appreciate guidance.

In any case thank you for your review, always appreciated!

Maciej

2021-07-13 07:21:39

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] serial: 8250: Add proper clock handling for OxSemi PCIe devices

On Tue, Jul 13, 2021 at 4:52 AM Maciej W. Rozycki <[email protected]> wrote:
> Something wrong with your "From:" header; I've fixed it up based on a
> best guess basis.

Ah, yes, I have to fix it locally. Thanks!

> On Mon, 12 Jul 2021, [email protected] wrote:
>
> > > Also handle the historic spd_cust feature so as to allow one to set
> > > all the three parameters manually to arbitrary values, by keeping the
> > > low 16 bits for the divisor and then putting TCR in bits 19:16 and
> > > CPR/CPR2 in bits 28:20, sanitising the bit pattern supplied such as
> > > to clamp CPR/CPR2 values between 0.000 and 0.875 inclusive to 1.000.
> > > This preserves compatibility with any existing setups, that is where
> > > requesting a custom divisor that only has any bits set among the low
> > > 16 the oversampling rate of 16 and the clock prescaler of 1 will be
> > > used.
> >
> > Please no. We really would like to get rid of that ugly hack. The BOTHER exists
> > for ages.
>
> I have actually carefully considered it before submission and:
>
> 1. it remains a supported user API with a tool included with contemporary
> distributions, and

What supported API?

> 2. with this device you can't set all the possible whole-number baud
> rates let alone UART clock frequencies with the BOTHER API, and

How does SPD_CUST make it different?

> 3. it doesn't hurt.

It hurts development a lot.

> If you'd like to get rid of SPD_CUST, then just do so, but until then I
> fail to see a point to have it supported with some devices but not other
> ones.

It _is_ the current state of affairs. Most of the contemporary drivers
do not support this "feature" at all.

> NB if you do get to it, then please consider adding an equally flexible
> API too, e.g. for fractional baud rates (134.5bps haha); I won't mind if
> it's less hackish though.

Why do you need fractional baud rates for the small speeds? I do not
believe we have any good use case for that. And 1/2 from 134 is less
than 0.5% which is tolerable by UART by definition.

So, please no, drop it.

> > > References:
> > >
> > > [1] "OXPCIe200 PCI Express Multi-Port Bridge", Oxford Semiconductor,
> > > Inc., DS-0045, 10 Nov 2008, Section "950 Mode", pp. 64-65
> > >
> > > [2] "OXPCIe952 PCI Express Bridge to Dual Serial & Parallel Port",
> > > Oxford Semiconductor, Inc., DS-0046, Mar 06 08, Section "950 Mode",
> > > p. 20
> > >
> > > [3] "OXPCIe954 PCI Express Bridge to Quad Serial Port", Oxford
> > > Semiconductor, Inc., DS-0047, Feb 08, Section "950 Mode", p. 20
> > >
> > > [4] "OXPCIe958 PCI Express Bridge to Octal Serial Port", Oxford
> > > Semiconductor, Inc., DS-0048, Feb 08, Section "950 Mode", p. 20
> >
> > Is it possible to reduce a commit message by shifting some stuff to the
> > dedicated documentation?
>
> The relevant stuff has been included as comments along with actual code
> already, and the rest is the usual submission-time rationale. This will
> be the initial source of information when someone studies the history of
> this code (`git log').

I do not object to this, but perhaps in the form of documentation it
would serve a better job (no-one will need to go deep into the Git
history for this, especially non-developer people who just got a
tarball, for example).

> I don't consider it cast in stone however, so if there's any particular
> piece you'd like to see elsewhere, then please point out to me what to
> move and where. Or give any guidance other than just: "Rewrite it!"

At least that table with divisors and deviation with accompanying
text. But I dare to say 90-95% of the commit message and leave
something like "Here is a new driver. documentation is there." To
where? Documentation/admin-guide seems most suitable right now
(looking at the presence of auxdisplay folder), however I think that
maybe dedicated folder like Documentation/hardware-notes maybe better.

+Cc: Mauro. What do you think about this? We need a folder where we
rather describe hardware features and maybe some driver implementation
details.

> (Yes I often have troubles figuring out the real intent of some changes
> made say 15 years ago that have turned out broken after all those years
> and whose change description is simply too terse now that the lore has
> been lost.)
>
> > > drivers/tty/serial/8250/8250_pci.c | 331 ++++++++++++++++++++++++++++--------
> >
> > Can we, please, split the quirk driver first as it's done in a lot of examples
> > (_exar, _mid, _lpss, _...) and then modify it?
>
> I have found it unclear where the line is drawn between having support
> code included with 8250_pci.c proper and having it split off to a separate
> file. All the device-specific files seem to provide complex handling,
> well beyond just calculating the clock.

Lines of code in the current 8250_pci in conjunction with expansion.
To me 331 (okay, it's something like 280?) LOC + sounds like a very
good justification to split.

> I'll be happy to split it off however (with a suitable preparatory
> change) if there is a consensus in favour to doing so.

If you have a consensus with yourself :-) Maintaining 8250_pci is a burden.
You may look into the history of 8250_pci (and you will often see my
name there) how it was shrinking in time.

> > > +/*
> > > + * Determine the oversampling rate, the clock prescaler, and the clock
> > > + * divisor for the requested baud rate. The clock rate is 62.5 MHz,
> > > + * which is four times the baud base, and the prescaler increments in
> > > + * steps of 1/8. Therefore to make calculations on integers we need
> > > + * to use a scaled clock rate, which is the baud base multiplied by 32
> > > + * (or our assumed UART clock rate multiplied by 2).
> > > + *
> > > + * The allowed oversampling rates are from 4 up to 16 inclusive (values
> > > + * from 0 to 3 inclusive map to 16). Likewise the clock prescaler allows
> > > + * values between 1.000 and 63.875 inclusive (operation for values from
> > > + * 0.000 to 0.875 has not been specified). The clock divisor is the usual
> > > + * unsigned 16-bit integer.
> > > + *
> > > + * For the most accurate baud rate we use a table of predetermined
> > > + * oversampling rates and clock prescalers that records all possible
> > > + * products of the two parameters in the range from 4 up to 255 inclusive,
> > > + * and additionally 335 for the 1500000bps rate, with the prescaler scaled
> > > + * by 8. The table is sorted by the decreasing value of the oversampling
> > > + * rate and ties are resolved by sorting by the decreasing value of the
> > > + * product. This way preference is given to higher oversampling rates.
> > > + *
> > > + * We iterate over the table and choose the product of an oversampling
> > > + * rate and a clock prescaler that gives the lowest integer division
> > > + * result deviation, or if an exact integer divider is found we stop
> > > + * looking for right away. We do some fixup if the resulting clock

for it right

> > > + * divisor required would be out of its unsigned 16-bit integer range.
> > > + *
> > > + * Finally we abuse the supposed fractional part returned to encode the
> > > + * 4-bit value of the oversampling rate and the 9-bit value of the clock
> > > + * prescaler which will end up in the TCR and CPR/CPR2 registers.
> > > + */
> > > +static unsigned int pci_oxsemi_tornado_get_divisor(struct uart_port *port,
> > > + unsigned int baud,
> > > + unsigned int *frac)
> > > +{
> > > + static u8 p[][2] = {
> > > + { 16, 14, }, { 16, 13, }, { 16, 12, }, { 16, 11, },
> > > + { 16, 10, }, { 16, 9, }, { 16, 8, }, { 15, 17, },
> > > + { 15, 16, }, { 15, 15, }, { 15, 14, }, { 15, 13, },
> > > + { 15, 12, }, { 15, 11, }, { 15, 10, }, { 15, 9, },
> > > + { 15, 8, }, { 14, 18, }, { 14, 17, }, { 14, 14, },
> > > + { 14, 13, }, { 14, 12, }, { 14, 11, }, { 14, 10, },
> > > + { 14, 9, }, { 14, 8, }, { 13, 19, }, { 13, 18, },
> > > + { 13, 17, }, { 13, 13, }, { 13, 12, }, { 13, 11, },
> > > + { 13, 10, }, { 13, 9, }, { 13, 8, }, { 12, 19, },
> > > + { 12, 18, }, { 12, 17, }, { 12, 11, }, { 12, 9, },
> > > + { 12, 8, }, { 11, 23, }, { 11, 22, }, { 11, 21, },
> > > + { 11, 20, }, { 11, 19, }, { 11, 18, }, { 11, 17, },
> > > + { 11, 11, }, { 11, 10, }, { 11, 9, }, { 11, 8, },
> > > + { 10, 25, }, { 10, 23, }, { 10, 20, }, { 10, 19, },
> > > + { 10, 17, }, { 10, 10, }, { 10, 9, }, { 10, 8, },
> > > + { 9, 27, }, { 9, 23, }, { 9, 21, }, { 9, 19, },
> > > + { 9, 18, }, { 9, 17, }, { 9, 9, }, { 9, 8, },
> > > + { 8, 31, }, { 8, 29, }, { 8, 23, }, { 8, 19, },
> > > + { 8, 17, }, { 8, 8, }, { 7, 35, }, { 7, 31, },
> > > + { 7, 29, }, { 7, 25, }, { 7, 23, }, { 7, 21, },
> > > + { 7, 19, }, { 7, 17, }, { 7, 15, }, { 7, 14, },
> > > + { 7, 13, }, { 7, 12, }, { 7, 11, }, { 7, 10, },
> > > + { 7, 9, }, { 7, 8, }, { 6, 41, }, { 6, 37, },
> > > + { 6, 31, }, { 6, 29, }, { 6, 23, }, { 6, 19, },
> > > + { 6, 17, }, { 6, 13, }, { 6, 11, }, { 6, 10, },
> > > + { 6, 9, }, { 6, 8, }, { 5, 67, }, { 5, 47, },
> > > + { 5, 43, }, { 5, 41, }, { 5, 37, }, { 5, 31, },
> > > + { 5, 29, }, { 5, 25, }, { 5, 23, }, { 5, 19, },
> > > + { 5, 17, }, { 5, 15, }, { 5, 13, }, { 5, 11, },
> > > + { 5, 10, }, { 5, 9, }, { 5, 8, }, { 4, 61, },
> > > + { 4, 59, }, { 4, 53, }, { 4, 47, }, { 4, 43, },
> > > + { 4, 41, }, { 4, 37, }, { 4, 31, }, { 4, 29, },
> > > + { 4, 23, }, { 4, 19, }, { 4, 17, }, { 4, 13, },
> > > + { 4, 9, }, { 4, 8, },
> > > + };
> >
> > Oh là là! Please, use rational best approximation algorithm instead
> > (check CONFIG_RATIONAL).
>
> Thanks for the pointer, I didn't know we had this piece.
>
> However how is it supposed to apply here? The denominator is always 8,
> so we can rule it out (by multiplying the dividend by 8, which this piece
> does, so that the divisor is a whole number), but the numerator has to be
> a product of three integers, from a different range each ([4,16], [8,511],
> [1, 65535]) as noted above.
>
> Essentially we need to find such three integers (with extra constraints)
> the product of which is closest to (500000000 / baud_rate) -- which IMHO
> amounts to factorisation, an NP-complete problem as you have been surely
> aware (and the whole world relies on), and I have decided that this simple
> table-driven approximation is good enough to handle the usual baud rates,
> especially the higher ones. For several baud rates it gives more accurate
> results (lower deviation) than the factors proposed in the manufacturer's
> datasheets.

And my point is to calculate is always based on the asked baud rate.
Yes. I understand what you wrote above and sometimes only brute force
can be used, but in the kernel we have integer arithmetics which helps
a lot besides the fact of bits twiddlings.

> I just fail to see how your proposed algorithm could be factored in here,
> but I'll be happy to be proved wrong, so I'll appreciate guidance.

It's possible that it doesn't fit in the current form or for all three
integers. Just give some time and think about it. Maybe you can come
up with a better idea. I usually point to one case I have solved [1]
to show that ugly tables can be dropped (in some cases it makes sense
to leave them, though).

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/spi/spi-pxa2xx.c?id=9df461eca18f5395ee84670cdba6755dddec1898

> In any case thank you for your review, always appreciated!

You/re welcome!

--
With Best Regards,
Andy Shevchenko

2021-07-15 20:03:50

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] serial: 8250: Add proper clock handling for OxSemi PCIe devices

On Tue, 13 Jul 2021, Andy Shevchenko wrote:

> > > Please no. We really would like to get rid of that ugly hack. The BOTHER exists
> > > for ages.
> >
> > I have actually carefully considered it before submission and:
> >
> > 1. it remains a supported user API with a tool included with contemporary
> > distributions, and
>
> What supported API?

The TIOCGSERIAL/TIOCSSERIAL ioctls. It's not clear to me why you're
asking; as a serial driver expert you must have been surely aware of their
existence.

> > 2. with this device you can't set all the possible whole-number baud
> > rates let alone UART clock frequencies with the BOTHER API, and
>
> How does SPD_CUST make it different?

You can program the divider registers directly with any bit pattern
supported by hardware. You don't know what use for it has been out there
in the field for the last ~30 years.

> > 3. it doesn't hurt.
>
> It hurts development a lot.

It is there and the presence or absence of the feature for OxSemi devices
does not affect anything but OxSemi support code.

> > If you'd like to get rid of SPD_CUST, then just do so, but until then I
> > fail to see a point to have it supported with some devices but not other
> > ones.
>
> It _is_ the current state of affairs. Most of the contemporary drivers
> do not support this "feature" at all.

It is currently a supported feature for OxSemi devices (and most if not
all 8250 derivatives), and I don't see a reason to selectively remove it
with this specifice submission. There may be installations out there that
rely on it -- there have been shortcomings in the implementation, which
are the motivation for this patch series, but mind that we've had support
for OxSemi Tornado devices since 2008. That's a lot of time.

Driver writers for other UART implementations may choose to have it or
not to with their new code written from scratch. As a matter of interest
I've checked zs.c, one of the serial hw drivers I had considerable input
to and it uses its own ZS_BPS_TO_BRG macro rather than `uart_get_divisor',
so it does not support this feature (and dz.c only has a set of 15 fixed
baud rates, which is where the original termio B<rate> macro bit patterns,
as well as the EXTA and EXTB macros for the externally supplied clock
chosen by the 16th bit pattern, come from).

And as I say: if you want to remove it, then do it globally and tell
people that as from Linux x.y.z this feature is no longer supported, so
that is clear and unambiguos and some poor IT soul out there does not get
hit by a random obscure driver feature removal with a kernel upgrade.

NB in the course of this effort I've learnt the hard way that `setserial'
is even invoked automatically nowadays in startup/shutdown scripts for
port state saving and restoration, so a random unannounced feature removal
here can wreak havoc with people's installations they have configured
years ago.

> > NB if you do get to it, then please consider adding an equally flexible
> > API too, e.g. for fractional baud rates (134.5bps haha); I won't mind if
> > it's less hackish though.
>
> Why do you need fractional baud rates for the small speeds? I do not
> believe we have any good use case for that. And 1/2 from 134 is less
> than 0.5% which is tolerable by UART by definition.

Just pointing out the incompleteness of the implementation should
SPD_CUST be removed.

> So, please no, drop it.

Based on my consideration given above, no, I maintain keeping the feature
is the right approach, at least for this change concerned. After all its
purpose is to correct baud rate setting and not to remove vaguely related
features.

> > The relevant stuff has been included as comments along with actual code
> > already, and the rest is the usual submission-time rationale. This will
> > be the initial source of information when someone studies the history of
> > this code (`git log').
>
> I do not object to this, but perhaps in the form of documentation it
> would serve a better job (no-one will need to go deep into the Git
> history for this, especially non-developer people who just got a
> tarball, for example).

Why would a non-developer need to dive into it while also hesitating to
go through the git log? Moreover, why would a developer hesitate digging
through the git log while trying to understand code?

Maybe I'm old-fashioned, but coming from long before we had any sensible
repository (umm, the MIPS port and I believe m68k had CVS in the old days,
but try to find anything with CVS!) let alone git I can't overrate my
appreciation for its features, and our requirement for sensible change
descriptions is not there for nothing. So looking through `git log' is
the first thing I do when I want to understand why a piece of code unknown
to me looks like it does.

> > I don't consider it cast in stone however, so if there's any particular
> > piece you'd like to see elsewhere, then please point out to me what to
> > move and where. Or give any guidance other than just: "Rewrite it!"
>
> At least that table with divisors and deviation with accompanying
> text. But I dare to say 90-95% of the commit message and leave
> something like "Here is a new driver. documentation is there." To
> where? Documentation/admin-guide seems most suitable right now
> (looking at the presence of auxdisplay folder), however I think that
> maybe dedicated folder like Documentation/hardware-notes maybe better.

Not a new driver really, but a bug fix for the broken calculation we
currently have given how inaccurately the particular input clock rate
chosen by the designer of the ASIC gets divided at least for some of our
standard baud rates. You may have seen with my previous fix that the
original submitter for the OxSemi handlers didn't even get the base baud
rate right (and that stuff has been included as vendor-supplied drivers
with various OxSemi-based serial port card manufacturers!), so I can't
really tell what happened there.

It looks to me like nice if not ultimate UART hardware ruined by broken
software, sigh. Which may have contributed to the withdrawal of the ASICs
from manufacturing as with the extra quality stripped by missing software
support obviously they may not have been able to compete solely by price
with cheap UARTs made elsewhere that provide a basic feature set only (and
no documentation).

NB it does DMA as one would expect especially at the higher baud rates,
and earlier OxSemi PCI or discrete UARTs have similar features (including
the clock prescaler), none of which we handle. If we started, only then I
guess we could call it a new driver. I have the datasheets, but I can't
dedicate more time I'm afraid beyond what is absolutely necessary to get
the broken PCIe stuff right.

OK though, you've convinced me.

> +Cc: Mauro. What do you think about this? We need a folder where we
> rather describe hardware features and maybe some driver implementation
> details.

Not for serial hardware drivers, but a while ago I committed what now
lives at Documentation/networking/device_drivers/fddi/defza.rst along with
several other networking driver notes, so I imagine we can have a similar
collection for 8250 stuff and the like.

> > I have found it unclear where the line is drawn between having support
> > code included with 8250_pci.c proper and having it split off to a separate
> > file. All the device-specific files seem to provide complex handling,
> > well beyond just calculating the clock.
>
> Lines of code in the current 8250_pci in conjunction with expansion.
> To me 331 (okay, it's something like 280?) LOC + sounds like a very
> good justification to split.

OK, the size argument alone makes some sense to me, though OTOH splitting
sources into many files prevents the compiler from doing interprocedural
optimisations, which can only be partially compensated by LTO (if enabled
in the first place).

I'll see what I can do here anyway. Mind that non-PCIe OxSemi stuff
remains incomplete, as I noted above, so it'll be a partial driver anyway.

> > > > + * We iterate over the table and choose the product of an oversampling
> > > > + * rate and a clock prescaler that gives the lowest integer division
> > > > + * result deviation, or if an exact integer divider is found we stop
> > > > + * looking for right away. We do some fixup if the resulting clock
>
> for it right

Fixed, thanks!

> > Essentially we need to find such three integers (with extra constraints)
> > the product of which is closest to (500000000 / baud_rate) -- which IMHO
> > amounts to factorisation, an NP-complete problem as you have been surely
> > aware (and the whole world relies on), and I have decided that this simple
> > table-driven approximation is good enough to handle the usual baud rates,
> > especially the higher ones. For several baud rates it gives more accurate
> > results (lower deviation) than the factors proposed in the manufacturer's
> > datasheets.
>
> And my point is to calculate is always based on the asked baud rate.
> Yes. I understand what you wrote above and sometimes only brute force
> can be used, but in the kernel we have integer arithmetics which helps
> a lot besides the fact of bits twiddlings.

Well, the algorithm is for finding the closest rational number expressed
by a numerator and a denominator to the required one. We have no problem
with that, because the divisor is integer (which is of course a rational
number too, but finding the numerator where the denominator if constant 1
is trivial).

> > I just fail to see how your proposed algorithm could be factored in here,
> > but I'll be happy to be proved wrong, so I'll appreciate guidance.
>
> It's possible that it doesn't fit in the current form or for all three
> integers. Just give some time and think about it. Maybe you can come
> up with a better idea. I usually point to one case I have solved [1]
> to show that ugly tables can be dropped (in some cases it makes sense
> to leave them, though).
>
> [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/spi/spi-pxa2xx.c?id=9df461eca18f5395ee84670cdba6755dddec1898

Well, I considered a naïve algorithm, but it would be quadratic, it would
make a lot of redundant calculations, and division is not exactly a fast
operation; some targets we support have no hardware division instruction
even. Needless to say it would still have to encode the ranges supported.
And any code size gain from the naïve algorithm might still not match the
268 bytes the table consumes, especially with RISC targets.

I think my proposal is a reasonable compromise that addresses the problem
at hand, and if you or anyone else finds a better solution sometime, then
you are free to improve it. Perfect is the enemy of good enough, so the
potential existence of an ultimate solution that hasn't been discovered
yet shouldn't block progress in a hope that the solution gets found soon
enough.

I'll repost the outstanding pieces of the series with the adjustments I
have agreed to make.

Maciej

2021-07-19 14:57:14

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] serial: 8250: Add proper clock handling for OxSemi PCIe devices

On Thu, Jul 15, 2021 at 10:49 PM Maciej W. Rozycki <[email protected]> wrote:
> On Tue, 13 Jul 2021, Andy Shevchenko wrote:
>
> > > > Please no. We really would like to get rid of that ugly hack. The BOTHER exists
> > > > for ages.
> > >
> > > I have actually carefully considered it before submission and:
> > >
> > > 1. it remains a supported user API with a tool included with contemporary
> > > distributions, and
> >
> > What supported API?
>
> The TIOCGSERIAL/TIOCSSERIAL ioctls. It's not clear to me why you're
> asking; as a serial driver expert you must have been surely aware of their
> existence.

Yes, I know how it's used. It's the ugliest hack in the serial support in Linux.
Telling that baud rate is 38400 and supplying something completely different,
non-standard stuff.

> > > 2. with this device you can't set all the possible whole-number baud
> > > rates let alone UART clock frequencies with the BOTHER API, and
> >
> > How does SPD_CUST make it different?
>
> You can program the divider registers directly with any bit pattern
> supported by hardware.

And again, how does it differ from BOTHER paths (except being a hack)?
You supply baud rate with 1 baud granularity and program hardware
registers accordingly. (Yes, I remember you pointed out the sub-HZ
frequencies, but I don't buy it as a very good argument because the
only significant error can be achieved at baud rates under the 100).

> You don't know what use for it has been out there
> in the field for the last ~30 years.

Is it a question or statement?

> > > 3. it doesn't hurt.
> >
> > It hurts development a lot.
>
> It is there and the presence or absence of the feature for OxSemi devices
> does not affect anything but OxSemi support code.

It's a pity. But what support code needs the SPD_CUST nowadays?

> > > If you'd like to get rid of SPD_CUST, then just do so, but until then I
> > > fail to see a point to have it supported with some devices but not other
> > > ones.
> >
> > It _is_ the current state of affairs. Most of the contemporary drivers
> > do not support this "feature" at all.
>
> It is currently a supported feature for OxSemi devices (and most if not
> all 8250 derivatives), and I don't see a reason to selectively remove it
> with this specifice submission. There may be installations out there that
> rely on it -- there have been shortcomings in the implementation, which
> are the motivation for this patch series, but mind that we've had support
> for OxSemi Tornado devices since 2008. That's a lot of time.
>
> Driver writers for other UART implementations may choose to have it or
> not to with their new code written from scratch. As a matter of interest
> I've checked zs.c, one of the serial hw drivers I had considerable input
> to and it uses its own ZS_BPS_TO_BRG macro rather than `uart_get_divisor',
> so it does not support this feature (and dz.c only has a set of 15 fixed
> baud rates, which is where the original termio B<rate> macro bit patterns,
> as well as the EXTA and EXTB macros for the externally supplied clock
> chosen by the 16th bit pattern, come from).
>
> And as I say: if you want to remove it, then do it globally and tell
> people that as from Linux x.y.z this feature is no longer supported, so
> that is clear and unambiguos and some poor IT soul out there does not get
> hit by a random obscure driver feature removal with a kernel upgrade.

I had had this in my plans [2] but suddenly I had no time to accomplish it. :-(

> NB in the course of this effort I've learnt the hard way that `setserial'
> is even invoked automatically nowadays in startup/shutdown scripts for
> port state saving and restoration, so a random unannounced feature removal
> here can wreak havoc with people's installations they have configured
> years ago.

I believe that for these years spd_cust shouldn't be used at all. OK,
it seems the first thing we may do is to patch the user space to give
a fat warning that spd_cust is deprecated and should be avoided.
And... it seems already there [3].

> > > NB if you do get to it, then please consider adding an equally flexible
> > > API too, e.g. for fractional baud rates (134.5bps haha); I won't mind if
> > > it's less hackish though.
> >
> > Why do you need fractional baud rates for the small speeds? I do not
> > believe we have any good use case for that. And 1/2 from 134 is less
> > than 0.5% which is tolerable by UART by definition.
>
> Just pointing out the incompleteness of the implementation should
> SPD_CUST be removed.
>
> > So, please no, drop it.
>
> Based on my consideration given above, no, I maintain keeping the feature
> is the right approach, at least for this change concerned. After all its
> purpose is to correct baud rate setting and not to remove vaguely related
> features.

I see your point. My question here, does this series extend SPD_CUST
to additional (newly supported?) baud rates? If so, I would be against
that extension. Supporting deprecated stuff is okay for a while.

To the end of the discussion may be good to read also these [4] and [5]


> > > The relevant stuff has been included as comments along with actual code
> > > already, and the rest is the usual submission-time rationale. This will
> > > be the initial source of information when someone studies the history of
> > > this code (`git log').
> >
> > I do not object to this, but perhaps in the form of documentation it
> > would serve a better job (no-one will need to go deep into the Git
> > history for this, especially non-developer people who just got a
> > tarball, for example).
>
> Why would a non-developer need to dive into it while also hesitating to
> go through the git log? Moreover, why would a developer hesitate digging
> through the git log while trying to understand code?
>
> Maybe I'm old-fashioned, but coming from long before we had any sensible
> repository (umm, the MIPS port and I believe m68k had CVS in the old days,
> but try to find anything with CVS!) let alone git I can't overrate my
> appreciation for its features, and our requirement for sensible change
> descriptions is not there for nothing. So looking through `git log' is
> the first thing I do when I want to understand why a piece of code unknown
> to me looks like it does.

Tell me how to look into git logs without cloning a repository (I
think it's possible through some borken web UI somehow, but never
heard about it). In any case, either in the commit message or in the
code it will be in the repo, so not a big deal to me after all.

> > > I don't consider it cast in stone however, so if there's any particular
> > > piece you'd like to see elsewhere, then please point out to me what to
> > > move and where. Or give any guidance other than just: "Rewrite it!"
> >
> > At least that table with divisors and deviation with accompanying
> > text. But I dare to say 90-95% of the commit message and leave
> > something like "Here is a new driver. documentation is there." To
> > where? Documentation/admin-guide seems most suitable right now
> > (looking at the presence of auxdisplay folder), however I think that
> > maybe dedicated folder like Documentation/hardware-notes maybe better.
>
> Not a new driver really, but a bug fix for the broken calculation we
> currently have given how inaccurately the particular input clock rate
> chosen by the designer of the ASIC gets divided at least for some of our
> standard baud rates. You may have seen with my previous fix that the
> original submitter for the OxSemi handlers didn't even get the base baud
> rate right (and that stuff has been included as vendor-supplied drivers
> with various OxSemi-based serial port card manufacturers!), so I can't
> really tell what happened there.

Yeah, it seems it has never worked before.

> It looks to me like nice if not ultimate UART hardware ruined by broken
> software, sigh. Which may have contributed to the withdrawal of the ASICs
> from manufacturing as with the extra quality stripped by missing software
> support obviously they may not have been able to compete solely by price
> with cheap UARTs made elsewhere that provide a basic feature set only (and
> no documentation).
>
> NB it does DMA as one would expect especially at the higher baud rates,
> and earlier OxSemi PCI or discrete UARTs have similar features (including
> the clock prescaler), none of which we handle. If we started, only then I
> guess we could call it a new driver. I have the datasheets, but I can't
> dedicate more time I'm afraid beyond what is absolutely necessary to get
> the broken PCIe stuff right.
>
> OK though, you've convinced me.
>
> > +Cc: Mauro. What do you think about this? We need a folder where we
> > rather describe hardware features and maybe some driver implementation
> > details.
>
> Not for serial hardware drivers, but a while ago I committed what now
> lives at Documentation/networking/device_drivers/fddi/defza.rst along with
> several other networking driver notes, so I imagine we can have a similar
> collection for 8250 stuff and the like.

To me any folder is more or less okayish as long as it's there, under
the Documentation :-)

> > > I have found it unclear where the line is drawn between having support
> > > code included with 8250_pci.c proper and having it split off to a separate
> > > file. All the device-specific files seem to provide complex handling,
> > > well beyond just calculating the clock.
> >
> > Lines of code in the current 8250_pci in conjunction with expansion.
> > To me 331 (okay, it's something like 280?) LOC + sounds like a very
> > good justification to split.
>
> OK, the size argument alone makes some sense to me, though OTOH splitting
> sources into many files prevents the compiler from doing interprocedural
> optimisations, which can only be partially compensated by LTO (if enabled
> in the first place).
>
> I'll see what I can do here anyway. Mind that non-PCIe OxSemi stuff
> remains incomplete, as I noted above, so it'll be a partial driver anyway.

Thanks for doing that anyway!

> > > > > + * We iterate over the table and choose the product of an oversampling
> > > > > + * rate and a clock prescaler that gives the lowest integer division
> > > > > + * result deviation, or if an exact integer divider is found we stop
> > > > > + * looking for right away. We do some fixup if the resulting clock
> >
> > for it right
>
> Fixed, thanks!
>
> > > Essentially we need to find such three integers (with extra constraints)
> > > the product of which is closest to (500000000 / baud_rate) -- which IMHO
> > > amounts to factorisation, an NP-complete problem as you have been surely
> > > aware (and the whole world relies on), and I have decided that this simple
> > > table-driven approximation is good enough to handle the usual baud rates,
> > > especially the higher ones. For several baud rates it gives more accurate
> > > results (lower deviation) than the factors proposed in the manufacturer's
> > > datasheets.
> >
> > And my point is to calculate is always based on the asked baud rate.
> > Yes. I understand what you wrote above and sometimes only brute force
> > can be used, but in the kernel we have integer arithmetics which helps
> > a lot besides the fact of bits twiddlings.
>
> Well, the algorithm is for finding the closest rational number expressed
> by a numerator and a denominator to the required one. We have no problem
> with that, because the divisor is integer (which is of course a rational
> number too, but finding the numerator where the denominator if constant 1
> is trivial).
>
> > > I just fail to see how your proposed algorithm could be factored in here,
> > > but I'll be happy to be proved wrong, so I'll appreciate guidance.
> >
> > It's possible that it doesn't fit in the current form or for all three
> > integers. Just give some time and think about it. Maybe you can come
> > up with a better idea. I usually point to one case I have solved [1]
> > to show that ugly tables can be dropped (in some cases it makes sense
> > to leave them, though).
> >
> > [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/spi/spi-pxa2xx.c?id=9df461eca18f5395ee84670cdba6755dddec1898
>
> Well, I considered a naïve algorithm, but it would be quadratic, it would
> make a lot of redundant calculations, and division is not exactly a fast
> operation; some targets we support have no hardware division instruction
> even. Needless to say it would still have to encode the ranges supported.
> And any code size gain from the naïve algorithm might still not match the
> 268 bytes the table consumes, especially with RISC targets.
>
> I think my proposal is a reasonable compromise that addresses the problem
> at hand, and if you or anyone else finds a better solution sometime, then
> you are free to improve it. Perfect is the enemy of good enough, so the
> potential existence of an ultimate solution that hasn't been discovered
> yet shouldn't block progress in a hope that the solution gets found soon
> enough.

Fair enough.

> I'll repost the outstanding pieces of the series with the adjustments I
> have agreed to make.

Thanks!

[2]: Hmm... It's funny how some things may easily disappear from the
internet. I'll look again tomorrow for the links. Okay, I have the
discussion locally available, I may forward you the entire thread.
[3]: https://elixir.bootlin.com/linux/latest/source/drivers/tty/serial/serial_core.c#L986
[4]: https://lore.kernel.org/linux-serial/CAHhAz+hrmLiRJ77cM=CYj+iyH8aUJ64R6FtAwGqqB2pOS0n0aQ@mail.gmail.com/T/#u
[5]: https://github.com/npat-efault/picocom/blob/master/termios2.txt


--
With Best Regards,
Andy Shevchenko

2022-02-14 07:04:40

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] serial: 8250: Add proper clock handling for OxSemi PCIe devices

On Mon, 19 Jul 2021, Andy Shevchenko wrote:

> > The TIOCGSERIAL/TIOCSSERIAL ioctls. It's not clear to me why you're
> > asking; as a serial driver expert you must have been surely aware of their
> > existence.
>
> Yes, I know how it's used. It's the ugliest hack in the serial support in Linux.
> Telling that baud rate is 38400 and supplying something completely different,
> non-standard stuff.

Well, the interface reflects how hardware used to work actually.

My DECstation 5000/200 box has an ASIC providing a quad UART, or a serial
line multiplexer really (handled by drivers/tty/serial/dz.c BTW), which is
a design dating back to 1970 and a discrete DZ11 interface for Unibus[1],
and the device has a 4-bit baud rate selector for the internal generator
with individual settings from 0x0 to 0xe selecting the standard baud rates
between 50 and 9600 baud, and then 0xf selects an external baud generator
input, which in this particular machine lets you switch between 19200 and
38400 through a board CSR. See how this arrangement exactly matches the
EXTA/EXTB macros aka B19200/B38400 (now you have found out where these
have come from!).

The original DZ11 interface did not expect such high serial communication
speeds of course and hence the shortage of control bits for the internal
baud rate generator.

And then the SPD_CUST API has been similarly invented for an analogous
shortcoming of the old version of the terminal I/O interface (who could
have thought speeds beyond 38400 would ever be required!) and we continue
carrying this historical baggage, as the cost of backwards compatibility.

Note that the DECstation 5000/200 is a 1990 design, so not much older
than Linux and we just adapted to the world at the time. I wouldn't mind
a saner API to set the raw clock divider, but nobody has come up with a
more up-to-date solution.

> > You can program the divider registers directly with any bit pattern
> > supported by hardware.
>
> And again, how does it differ from BOTHER paths (except being a hack)?
> You supply baud rate with 1 baud granularity and program hardware
> registers accordingly. (Yes, I remember you pointed out the sub-HZ
> frequencies, but I don't buy it as a very good argument because the
> only significant error can be achieved at baud rates under the 100).

With this device owing to the internal 16-bit divisor range limitation
you can't even set the baud rates under 100 (or under 239 actually) with
BOTHER. This has nothing to do with HZ; it's just that 15625000 / 65535
works out at 238.422.

Plus all the world is not modems and teletypes nowadays anymore. People
wire all kinds of weird equipment to serial ports and the more flexibility
we give them in driving it the better.

> > You don't know what use for it has been out there
> > in the field for the last ~30 years.
>
> Is it a question or statement?

It's a statement.

> > > > 3. it doesn't hurt.
> > >
> > > It hurts development a lot.
> >
> > It is there and the presence or absence of the feature for OxSemi devices
> > does not affect anything but OxSemi support code.
>
> It's a pity. But what support code needs the SPD_CUST nowadays?

I don't know. But there is no alternative raw interface and I think it
is important to give people good tools for their work.

> > It is currently a supported feature for OxSemi devices (and most if not
> > all 8250 derivatives), and I don't see a reason to selectively remove it
> > with this specifice submission. There may be installations out there that
> > rely on it -- there have been shortcomings in the implementation, which
> > are the motivation for this patch series, but mind that we've had support
> > for OxSemi Tornado devices since 2008. That's a lot of time.
> >
> > Driver writers for other UART implementations may choose to have it or
> > not to with their new code written from scratch. As a matter of interest
> > I've checked zs.c, one of the serial hw drivers I had considerable input
> > to and it uses its own ZS_BPS_TO_BRG macro rather than `uart_get_divisor',
> > so it does not support this feature (and dz.c only has a set of 15 fixed
> > baud rates, which is where the original termio B<rate> macro bit patterns,
> > as well as the EXTA and EXTB macros for the externally supplied clock
> > chosen by the 16th bit pattern, come from).
> >
> > And as I say: if you want to remove it, then do it globally and tell
> > people that as from Linux x.y.z this feature is no longer supported, so
> > that is clear and unambiguos and some poor IT soul out there does not get
> > hit by a random obscure driver feature removal with a kernel upgrade.
>
> I had had this in my plans [2] but suddenly I had no time to accomplish
> it. :-(

So I do hope you can realise why perfect is the enemy of good enough.
There was nothing wrong in particular with my proposed bug fix for the
clock calculation and yet half a year on and we got nowhere only because
my proposal was not perfect enough and then I was preempted with other
stuff.

> > NB in the course of this effort I've learnt the hard way that `setserial'
> > is even invoked automatically nowadays in startup/shutdown scripts for
> > port state saving and restoration, so a random unannounced feature removal
> > here can wreak havoc with people's installations they have configured
> > years ago.
>
> I believe that for these years spd_cust shouldn't be used at all. OK,
> it seems the first thing we may do is to patch the user space to give
> a fat warning that spd_cust is deprecated and should be avoided.
> And... it seems already there [3].

Yet no equally functional alternative. Call it BRAW or something and
pass a 32-bit or 64-bit cookie for the driver to interpret. And then you
can remove the spd_cust/38400bps hack.

> > Just pointing out the incompleteness of the implementation should
> > SPD_CUST be removed.
> >
> > > So, please no, drop it.
> >
> > Based on my consideration given above, no, I maintain keeping the feature
> > is the right approach, at least for this change concerned. After all its
> > purpose is to correct baud rate setting and not to remove vaguely related
> > features.
>
> I see your point. My question here, does this series extend SPD_CUST
> to additional (newly supported?) baud rates? If so, I would be against
> that extension. Supporting deprecated stuff is okay for a while.

It does allow one to program the full clock divider range of the OxSemi
devices. I find it appropriate according to my engineer's code of good
practices. And it doesn't cause any burden for non-OxSemi code.

> To the end of the discussion may be good to read also these [4] and [5]

Thank you for the pointers, however there's nothing new to me there,
sorry.

> > Not for serial hardware drivers, but a while ago I committed what now
> > lives at Documentation/networking/device_drivers/fddi/defza.rst along with
> > several other networking driver notes, so I imagine we can have a similar
> > collection for 8250 stuff and the like.
>
> To me any folder is more or less okayish as long as it's there, under
> the Documentation :-)

I've chosen Documentation/tty/device_drivers/oxsemi-tornado.rst then,
following the pattern.

> > OK, the size argument alone makes some sense to me, though OTOH splitting
> > sources into many files prevents the compiler from doing interprocedural
> > optimisations, which can only be partially compensated by LTO (if enabled
> > in the first place).
> >
> > I'll see what I can do here anyway. Mind that non-PCIe OxSemi stuff
> > remains incomplete, as I noted above, so it'll be a partial driver anyway.
>
> Thanks for doing that anyway!

So I have had a look at how it has been done for other drivers and I have
now convinced myself against such a split. The primary reason for this
conclusion is that there is no basic infrastructure for such a split and
the ultimate result is code duplication with no clear benefit to justify
it.

Take for example the most recently separated dedicated 8250_pericom.c
driver: its `pericom8250_probe' function duplicates pieces extracted from
`pciserial_init_one' (so if a change has to be applied to either function,
then all its siblings have to be manually verified as to whether the same
change has to be applied there as well; I bet that it won't happen as
required and pieces of code will slowly diverge and suffer from bitrot)
and then PCI error handling and suspend/remove support have been removed
for no reason.

I suppose common code could be factored out from `pciserial_init_one' and
the other routines private to 8250_pci.c either moved to a library archive
to be linked into individual drivers or to a subdriver along the lines of
drivers/net/ethernet/amd/7990.c or drivers/scsi/esp_scsi.c. But it is not
the case at the moment, so in my opinion the split has been premature.

Originally I thought that those separated drivers are merely a source
code split across multiple files to make maintenance easier, all to be
built and linked together under the SERIAL_8250_PCI option, rather than
standalone drivers. It is clearly not the case though.

> > I'll repost the outstanding pieces of the series with the adjustments I
> > have agreed to make.
>
> Thanks!

I shall be posting v3 right away along with fixes for serveral issues I
have come across in the course of this development. I will be happy to
address any bugs, functional issues or coding style problems that may have
escaped my attention, however I consider this version otherwise final and
I am not going to make this code a separate driver or remove the custom
baud rate divisor support code.

The rationale behind this is that AFAIK it is not the Linux kernel policy
for any of its subsystems to require any other bugs or missing features to
be addressed as a prerequisite for a bug fix or even a functional
improvement to be accepted.

If this remains unacceptable, then so be it. I have already gone above
and beyond with this submission and I have put enough time into it, and I
am not going to invest anymore, as this is likely to exceed what long-term
maintenance as a local patch is going to take on one hand, and the change
has been published so people who might need it can chase it and install
locally themselves. This is not my pet project, but rather an ad hoc
effort to fix issues I have come across by chance.

References:

[1] "DZ11 asynchronous serial line interface",
<https://gunkies.org/wiki/DZ11_asynchronous_serial_line_interface>

Maciej

2022-03-21 23:05:17

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] serial: 8250: Add proper clock handling for OxSemi PCIe devices

On Sat, Feb 12, 2022 at 08:41:19AM +0000, Maciej W. Rozycki wrote:
> On Mon, 19 Jul 2021, Andy Shevchenko wrote:
>
> > > The TIOCGSERIAL/TIOCSSERIAL ioctls. It's not clear to me why you're
> > > asking; as a serial driver expert you must have been surely aware of their
> > > existence.
> >
> > Yes, I know how it's used. It's the ugliest hack in the serial support in Linux.
> > Telling that baud rate is 38400 and supplying something completely different,
> > non-standard stuff.
>
> Well, the interface reflects how hardware used to work actually.
>
> My DECstation 5000/200 box has an ASIC providing a quad UART, or a serial
> line multiplexer really (handled by drivers/tty/serial/dz.c BTW), which is
> a design dating back to 1970 and a discrete DZ11 interface for Unibus[1],
> and the device has a 4-bit baud rate selector for the internal generator
> with individual settings from 0x0 to 0xe selecting the standard baud rates
> between 50 and 9600 baud, and then 0xf selects an external baud generator
> input, which in this particular machine lets you switch between 19200 and
> 38400 through a board CSR. See how this arrangement exactly matches the
> EXTA/EXTB macros aka B19200/B38400 (now you have found out where these
> have come from!).
>
> The original DZ11 interface did not expect such high serial communication
> speeds of course and hence the shortage of control bits for the internal
> baud rate generator.
>
> And then the SPD_CUST API has been similarly invented for an analogous
> shortcoming of the old version of the terminal I/O interface (who could
> have thought speeds beyond 38400 would ever be required!) and we continue
> carrying this historical baggage, as the cost of backwards compatibility.
>
> Note that the DECstation 5000/200 is a 1990 design, so not much older
> than Linux and we just adapted to the world at the time. I wouldn't mind
> a saner API to set the raw clock divider, but nobody has come up with a
> more up-to-date solution.

What do you mean by the last sentence? The BOTHER is exactly what is new
ABI for the baud rate setting without using any ugly hacks in the kernel.

> > > You can program the divider registers directly with any bit pattern
> > > supported by hardware.
> >
> > And again, how does it differ from BOTHER paths (except being a hack)?
> > You supply baud rate with 1 baud granularity and program hardware
> > registers accordingly. (Yes, I remember you pointed out the sub-HZ
> > frequencies, but I don't buy it as a very good argument because the
> > only significant error can be achieved at baud rates under the 100).
>
> With this device owing to the internal 16-bit divisor range limitation
> you can't even set the baud rates under 100 (or under 239 actually) with
> BOTHER. This has nothing to do with HZ; it's just that 15625000 / 65535
> works out at 238.422.

I'm lost here. BOTHER requests whatever user wants to have with 1 baud
granularity. How can't it work? The only limitations here are the hw
limitations, but it doesn't affect the kernel <--> user space interfaces.

> Plus all the world is not modems and teletypes nowadays anymore. People
> wire all kinds of weird equipment to serial ports and the more flexibility
> we give them in driving it the better.
>
> > > You don't know what use for it has been out there
> > > in the field for the last ~30 years.
> >
> > Is it a question or statement?
>
> It's a statement.

Use of SPD_CUST? Yeah, this weirdness may have spread a lot, but some of the
commits in the Linux kernel suggests that SPD_CUST is discouraged to be used.

> > > > > 3. it doesn't hurt.
> > > >
> > > > It hurts development a lot.
> > >
> > > It is there and the presence or absence of the feature for OxSemi devices
> > > does not affect anything but OxSemi support code.
> >
> > It's a pity. But what support code needs the SPD_CUST nowadays?
>
> I don't know. But there is no alternative raw interface and I think it
> is important to give people good tools for their work.

BOTHER _is_ the one. It seems to me you never tried it.

> > > It is currently a supported feature for OxSemi devices (and most if not
> > > all 8250 derivatives), and I don't see a reason to selectively remove it
> > > with this specifice submission. There may be installations out there that
> > > rely on it -- there have been shortcomings in the implementation, which
> > > are the motivation for this patch series, but mind that we've had support
> > > for OxSemi Tornado devices since 2008. That's a lot of time.
> > >
> > > Driver writers for other UART implementations may choose to have it or
> > > not to with their new code written from scratch. As a matter of interest
> > > I've checked zs.c, one of the serial hw drivers I had considerable input
> > > to and it uses its own ZS_BPS_TO_BRG macro rather than `uart_get_divisor',
> > > so it does not support this feature (and dz.c only has a set of 15 fixed
> > > baud rates, which is where the original termio B<rate> macro bit patterns,
> > > as well as the EXTA and EXTB macros for the externally supplied clock
> > > chosen by the 16th bit pattern, come from).
> > >
> > > And as I say: if you want to remove it, then do it globally and tell
> > > people that as from Linux x.y.z this feature is no longer supported, so
> > > that is clear and unambiguos and some poor IT soul out there does not get
> > > hit by a random obscure driver feature removal with a kernel upgrade.
> >
> > I had had this in my plans [2] but suddenly I had no time to accomplish
> > it. :-(
>
> So I do hope you can realise why perfect is the enemy of good enough.

I hope you can realise that as well. Because BOTHER is already supported and
what you are doing is uglification of the code and resurrecting an evil.

> There was nothing wrong in particular with my proposed bug fix for the
> clock calculation and yet half a year on and we got nowhere only because
> my proposal was not perfect enough and then I was preempted with other
> stuff.

> > > NB in the course of this effort I've learnt the hard way that `setserial'
> > > is even invoked automatically nowadays in startup/shutdown scripts for
> > > port state saving and restoration, so a random unannounced feature removal
> > > here can wreak havoc with people's installations they have configured
> > > years ago.
> >
> > I believe that for these years spd_cust shouldn't be used at all. OK,
> > it seems the first thing we may do is to patch the user space to give
> > a fat warning that spd_cust is deprecated and should be avoided.
> > And... it seems already there [3].
>
> Yet no equally functional alternative. Call it BRAW or something and
> pass a 32-bit or 64-bit cookie for the driver to interpret. And then you
> can remove the spd_cust/38400bps hack.

Have you tried BOTHER? It's already there for a long time. Don't tell me that
it's not an equivalent. It's better than that.

> > > Just pointing out the incompleteness of the implementation should
> > > SPD_CUST be removed.
> > >
> > > > So, please no, drop it.
> > >
> > > Based on my consideration given above, no, I maintain keeping the feature
> > > is the right approach, at least for this change concerned. After all its
> > > purpose is to correct baud rate setting and not to remove vaguely related
> > > features.
> >
> > I see your point. My question here, does this series extend SPD_CUST
> > to additional (newly supported?) baud rates? If so, I would be against
> > that extension. Supporting deprecated stuff is okay for a while.
>
> It does allow one to program the full clock divider range of the OxSemi
> devices. I find it appropriate according to my engineer's code of good
> practices. And it doesn't cause any burden for non-OxSemi code.

How BOTHER does prevent you doing the same?

> > To the end of the discussion may be good to read also these [4] and [5]
>
> Thank you for the pointers, however there's nothing new to me there,
> sorry.
>
> > > Not for serial hardware drivers, but a while ago I committed what now
> > > lives at Documentation/networking/device_drivers/fddi/defza.rst along with
> > > several other networking driver notes, so I imagine we can have a similar
> > > collection for 8250 stuff and the like.
> >
> > To me any folder is more or less okayish as long as it's there, under
> > the Documentation :-)
>
> I've chosen Documentation/tty/device_drivers/oxsemi-tornado.rst then,
> following the pattern.
>
> > > OK, the size argument alone makes some sense to me, though OTOH splitting
> > > sources into many files prevents the compiler from doing interprocedural
> > > optimisations, which can only be partially compensated by LTO (if enabled
> > > in the first place).
> > >
> > > I'll see what I can do here anyway. Mind that non-PCIe OxSemi stuff
> > > remains incomplete, as I noted above, so it'll be a partial driver anyway.
> >
> > Thanks for doing that anyway!
>
> So I have had a look at how it has been done for other drivers and I have
> now convinced myself against such a split. The primary reason for this
> conclusion is that there is no basic infrastructure for such a split and
> the ultimate result is code duplication with no clear benefit to justify
> it.

Justification for split is to keep certain quirks out of the scope of the
generic driver. I'm not sure what duplication you are talking about if the
LOC statistics shows otherwise.

> Take for example the most recently separated dedicated 8250_pericom.c
> driver: its `pericom8250_probe' function duplicates pieces extracted from
> `pciserial_init_one' (so if a change has to be applied to either function,
> then all its siblings have to be manually verified as to whether the same
> change has to be applied there as well; I bet that it won't happen as
> required and pieces of code will slowly diverge and suffer from bitrot)
> and then PCI error handling and suspend/remove support have been removed
> for no reason.

You may not want to get the idea, it's fine. The rationale is simple:
isolate quirks for certain platform(s) in one place. Each platform
in a separate module.

> I suppose common code could be factored out from `pciserial_init_one' and
> the other routines private to 8250_pci.c either moved to a library archive
> to be linked into individual drivers or to a subdriver along the lines of
> drivers/net/ethernet/amd/7990.c or drivers/scsi/esp_scsi.c. But it is not
> the case at the moment, so in my opinion the split has been premature.
>
> Originally I thought that those separated drivers are merely a source
> code split across multiple files to make maintenance easier, all to be
> built and linked together under the SERIAL_8250_PCI option, rather than
> standalone drivers. It is clearly not the case though.
>
> > > I'll repost the outstanding pieces of the series with the adjustments I
> > > have agreed to make.
> >
> > Thanks!
>
> I shall be posting v3 right away along with fixes for serveral issues I
> have come across in the course of this development. I will be happy to
> address any bugs, functional issues or coding style problems that may have
> escaped my attention, however I consider this version otherwise final and
> I am not going to make this code a separate driver or remove the custom
> baud rate divisor support code.
>
> The rationale behind this is that AFAIK it is not the Linux kernel policy
> for any of its subsystems to require any other bugs or missing features to
> be addressed as a prerequisite for a bug fix or even a functional
> improvement to be accepted.
>
> If this remains unacceptable, then so be it. I have already gone above
> and beyond with this submission and I have put enough time into it, and I
> am not going to invest anymore, as this is likely to exceed what long-term
> maintenance as a local patch is going to take on one hand, and the change
> has been published so people who might need it can chase it and install
> locally themselves. This is not my pet project, but rather an ad hoc
> effort to fix issues I have come across by chance.
>
> References:
>
> [1] "DZ11 asynchronous serial line interface",
> <https://gunkies.org/wiki/DZ11_asynchronous_serial_line_interface>
>
> Maciej

--
With Best Regards,
Andy Shevchenko


2022-03-24 07:29:36

by Maciej W. Rozycki

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] serial: 8250: Add proper clock handling for OxSemi PCIe devices

On Fri, 18 Mar 2022, Andy Shevchenko wrote:

> > It does allow one to program the full clock divider range of the OxSemi
> > devices. I find it appropriate according to my engineer's code of good
> > practices. And it doesn't cause any burden for non-OxSemi code.
>
> How BOTHER does prevent you doing the same?

It does not allow you to set arbitrary serial port clock rates. You can
only set integer baud rates, and then only those that do not exceed the
[1;65535] clock divisor range.

> > So I have had a look at how it has been done for other drivers and I have
> > now convinced myself against such a split. The primary reason for this
> > conclusion is that there is no basic infrastructure for such a split and
> > the ultimate result is code duplication with no clear benefit to justify
> > it.
>
> Justification for split is to keep certain quirks out of the scope of the
> generic driver. I'm not sure what duplication you are talking about if the
> LOC statistics shows otherwise.

All the init/remove code is almost the same across all the devices. And
suspend/resume and PCI error handling code has been removed from the split
off devices, and for the functional regression to be fixed:

1. this code would have to be replicated, or

2. handlers from the generic 8250_pci.c driver exported and referred to,
or

3. some kind of a helper library (or a core module) created providing this
stuff to 8250_*.c drivers as required.

I guess the latter is the minimum that could convince me this driver
framework is usable for implementing device-specific drivers (as I find
the other variants rather miserable hacks).

Plus there would have to be clear information provided to the users as
otherwise people will be rather confused as to why 3 out of their 4 16x50
PCI/e serial cards work with 8250_pci.c while the remaining one does not
(probably broken, or is it?).

> You may not want to get the idea, it's fine. The rationale is simple:
> isolate quirks for certain platform(s) in one place. Each platform
> in a separate module.

What is a platform in your terminology? A PCI/e option card you can
install in about any modern computer? I usually think of platforms as
specific families of computers rather than option cards. Variants of
otherwise the same device are usually handled with a single driver in
Linux.

Maciej

2022-03-25 18:54:19

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] serial: 8250: Add proper clock handling for OxSemi PCIe devices

On Wed, Mar 23, 2022 at 09:59:28PM +0000, Maciej W. Rozycki wrote:
> On Fri, 18 Mar 2022, Andy Shevchenko wrote:
>
> > > It does allow one to program the full clock divider range of the OxSemi
> > > devices. I find it appropriate according to my engineer's code of good
> > > practices. And it doesn't cause any burden for non-OxSemi code.
> >
> > How BOTHER does prevent you doing the same?
>
> It does not allow you to set arbitrary serial port clock rates. You can
> only set integer baud rates,

Why do you need fractional baud rates? What is the practical use of that, please?

> and then only those that do not exceed the [1;65535] clock divisor
> range.

Can you be more specific as I can't see how it's possible in practice? In
several 8250 drivers we are able to set whatever we want to the limits of
the hardware.

> > > So I have had a look at how it has been done for other drivers and I have
> > > now convinced myself against such a split. The primary reason for this
> > > conclusion is that there is no basic infrastructure for such a split and
> > > the ultimate result is code duplication with no clear benefit to justify
> > > it.
> >
> > Justification for split is to keep certain quirks out of the scope of the
> > generic driver. I'm not sure what duplication you are talking about if the
> > LOC statistics shows otherwise.
>
> All the init/remove code is almost the same across all the devices.

Each of the platform has its own constraints and what you see as a repetition
is just a similarity. If you have an idea of the common probe function, please
share.

Also, don't forget the memory footprint case at run time. In embedded world
we do not need 8250 code that is not supported by the platform in question.

The split allows to disable / remove the code that is not needed.

> And suspend/resume and PCI error handling code has been removed from the
> split off devices,

This is managed by PCI core. Any specifics, please?

> and for the functional regression to be fixed:
>
> 1. this code would have to be replicated, or
>
> 2. handlers from the generic 8250_pci.c driver exported and referred to,
> or
>
> 3. some kind of a helper library (or a core module) created providing this
> stuff to 8250_*.c drivers as required.

Which functional regression? You mean if it will be found then it needs to
be fixed in several places?

> I guess the latter is the minimum that could convince me this driver
> framework is usable for implementing device-specific drivers (as I find
> the other variants rather miserable hacks).
>
> Plus there would have to be clear information provided to the users as
> otherwise people will be rather confused as to why 3 out of their 4 16x50
> PCI/e serial cards work with 8250_pci.c while the remaining one does not
> (probably broken, or is it?).

The default configuration after the split assumes that the driver is enabled
by the very same kernel configuration. Otherwise distributions will choose
what they consider better for their users and customers.

> > You may not want to get the idea, it's fine. The rationale is simple:
> > isolate quirks for certain platform(s) in one place. Each platform
> > in a separate module.
>
> What is a platform in your terminology? A PCI/e option card you can
> install in about any modern computer? I usually think of platforms as
> specific families of computers rather than option cards. Variants of
> otherwise the same device are usually handled with a single driver in
> Linux.

It might be PCIe card, it might be soldered on the motherboard, it can
be part of the SoC.

By platform I assume the certain SoC + certain discrete components wired
in a certain way (PCB level). In your case it's a motherboard with PCIe
serial port card.

--
With Best Regards,
Andy Shevchenko