On 2018-01-31 09:52, Corey Minyard wrote:
> On 01/30/2018 07:37 PM, Wang, Haiyue wrote:
>>
>>
>> On 2018-01-31 09:25, Corey Minyard wrote:
>>> On 01/30/2018 07:02 PM, Wang, Haiyue wrote:
>>>>
>>>>
>>>> On 2018-01-31 08:52, Corey Minyard wrote:
>>>>> On 01/30/2018 06:02 PM, Wang, Haiyue wrote:
>>>>>>
>>>>>>
>>>>>> On 2018-01-30 21:49, Corey Minyard wrote:
>>>>>>> On 01/29/2018 07:57 AM, Wang, Haiyue wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2018-01-26 22:48, Corey Minyard wrote:
>>>>>>>>> On 01/26/2018 12:08 AM, Wang, Haiyue wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2018-01-25 01:48, Corey Minyard wrote:
>>>>>>>>>>> On 01/24/2018 10:06 AM, Haiyue Wang wrote:
>>>>>>>>>>>> The KCS (Keyboard Controller Style) interface is used to
>>>>>>>>>>>> perform in-band
>>>>>>>>>>>> IPMI communication between a server host and its BMC
>>>>>>>>>>>> (BaseBoard Management
>>>>>>>>>>>> Controllers).
>>>>>>>>>>>>
>>>>>>>>>>>> This driver exposes the KCS interface on ASpeed SOCs
>>>>>>>>>>>> (AST2400 and AST2500)
>>>>>>>>>>>> as a character device. Such SOCs are commonly used as BMCs
>>>>>>>>>>>> and this driver
>>>>>>>>>>>> implements the BMC side of the KCS interface.
>>>>>>>>>>>>
>>>>>>>>>>>> Signed-off-by: Haiyue Wang <[email protected]>
>>>>>>>>>>>>
>>>>>>>>>>>> ---
>>>>>>>>>>>> v1->v2
>>>>>>>>>>>>
>>>>>>>>>>>> - Divide the driver into two parts, one handles the BMC KCS
>>>>>>>>>>>> IPMI 2.0 state;
>>>>>>>>>>>>    the other handles the BMC KCS controller such as AST2500
>>>>>>>>>>>> IO accessing.
>>>>>>>>>>>> - Use the spin lock APIs to handle the device file
>>>>>>>>>>>> operations and BMC chip
>>>>>>>>>>>>    IRQ inferface for accessing the same KCS BMC data
>>>>>>>>>>>> structure.
>>>>>>>>>>>> - Enhanced the phases handling of the KCS BMC.
>>>>>>>>>>>> - Unified the IOCTL definition for IPMI BMC, it will be
>>>>>>>>>>>> used by KCS and BT.
>>>>>>>>>>>>
>>>>>>>>>>>> ---
>>>>>>>>>>>>
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void kcs_bmc_handle_data(struct kcs_bmc *kcs_bmc)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    u8 data;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    switch (kcs_bmc->phase) {
>>>>>>>>>>>> +    case KCS_PHASE_WRITE:
>>>>>>>>>>>> +        set_state(kcs_bmc, WRITE_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        /* set OBF before reading data */
>>>>>>>>>>>> +        write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        if (kcs_bmc->data_in_idx < KCS_MSG_BUFSIZ)
>>>>>>>>>>>> + kcs_bmc->data_in[kcs_bmc->data_in_idx++] =
>>>>>>>>>>>> +                        read_data(kcs_bmc);
>>>>>>>>>
>>>>>>>>> I missed this earlier, you need to issue a length error if the
>>>>>>>>> data is too large.
>>>>>>>>>
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_PHASE_WRITE_END:
>>>>>>>>>>>> +        set_state(kcs_bmc, READ_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        if (kcs_bmc->data_in_idx < KCS_MSG_BUFSIZ)
>>>>>>>>>>>> + kcs_bmc->data_in[kcs_bmc->data_in_idx++] =
>>>>>>>>>>>> +                        read_data(kcs_bmc);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_WAIT_READ;
>>>>>>>>>>>> +        if (kcs_bmc->running) {
>>>>>>>>>>>
>>>>>>>>>>> Why do you only do this when running is set? It won't hurt
>>>>>>>>>>> anything if it's not
>>>>>>>>>>> set. As it is, you have a race if something opens the
>>>>>>>>>>> device while this code
>>>>>>>>>>> runs.
>>>>>>>>>>>
>>>>>>>>>>> Also, don't set the state to wait read until the "write" has
>>>>>>>>>>> finished (userland has
>>>>>>>>>>> read the data out of the buffer. More on that later.
>>>>>>>>>>>
>>>>>>>>>> Understood.
>>>>>>>>>>>> + kcs_bmc->data_in_avail = true;
>>>>>>>>>>>> + wake_up_interruptible(&kcs_bmc->queue);
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_PHASE_READ:
>>>>>>>>>>>> +        if (kcs_bmc->data_out_idx == kcs_bmc->data_out_len)
>>>>>>>>>>>> +            set_state(kcs_bmc, IDLE_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        data = read_data(kcs_bmc);
>>>>>>>>>>>> +        if (data != KCS_CMD_READ_BYTE) {
>>>>>>>>>>>> +            set_state(kcs_bmc, ERROR_STATE);
>>>>>>>>>>>> +            write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +            break;
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +
>>>>>>>>>>>> +        if (kcs_bmc->data_out_idx == kcs_bmc->data_out_len) {
>>>>>>>>>>>> +            write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +            kcs_bmc->phase = KCS_PHASE_IDLE;
>>>>>>>>>>>> +            break;
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +
>>>>>>>>>>>> +        write_data(kcs_bmc,
>>>>>>>>>>>> + kcs_bmc->data_out[kcs_bmc->data_out_idx++]);
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_PHASE_ABORT_ERROR1:
>>>>>>>>>>>> +        set_state(kcs_bmc, READ_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        /* Read the Dummy byte */
>>>>>>>>>>>> +        read_data(kcs_bmc);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        write_data(kcs_bmc, kcs_bmc->error);
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_ABORT_ERROR2;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_PHASE_ABORT_ERROR2:
>>>>>>>>>>>> +        set_state(kcs_bmc, IDLE_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        /* Read the Dummy byte */
>>>>>>>>>>>> +        read_data(kcs_bmc);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_IDLE;
>>>>>>>>>>>> +
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    default:
>>>>>>>>>>>> +        set_state(kcs_bmc, ERROR_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        /* Read the Dummy byte */
>>>>>>>>>>>> +        read_data(kcs_bmc);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void kcs_bmc_handle_command(struct kcs_bmc *kcs_bmc)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    u8 cmd;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    set_state(kcs_bmc, WRITE_STATE);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    /* Dummy data to generate OBF */
>>>>>>>>>>>> +    write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    cmd = read_data(kcs_bmc);
>>>>>>>>>>>
>>>>>>>>>>> Shouldn't you check the phase in all the cases below and do
>>>>>>>>>>> error
>>>>>>>>>>> handling if the phase isn't correct?
>>>>>>>>>>>
>>>>>>>>>>> Similar thing if the device here isn't open. You need to handle
>>>>>>>>>>> that gracefully.
>>>>>>>>>>>
>>>>>>>>>>> Also, you should remove data_in_avail and data_in_idx
>>>>>>>>>>> setting from
>>>>>>>>>>> here, for reasons I will explain later.
>>>>>>>>>>>
>>>>>>>>>> If host software sends the data twice such as a retry before
>>>>>>>>>> the BMC's IPMI service starts,
>>>>>>>>>> then the two IPMI requests will be merged into one, if not
>>>>>>>>>> clear data_in_idx after receving
>>>>>>>>>> KCS_CMD_WRITE_START. Most of the states are driven by host
>>>>>>>>>> software (SMS). :(
>>>>>>>>>
>>>>>>>>> True, but what if the host issues WRITE_START or a WRITE_END
>>>>>>>>> while this driver is in read
>>>>>>>>> state? The spec is unclear on this, but it really only makes
>>>>>>>>> sense for the host to issue
>>>>>>>>> WRITE_START in idle stat and WRITE_END in write state. IMHO it
>>>>>>>>> should go to error
>>>>>>>>> state. You might make the case that a WRITE_START anywhere
>>>>>>>>> restarts the transaction,
>>>>>>>>> but the feel of the error state machine kind of goes against
>>>>>>>>> that. WRITE_END is definitely
>>>>>>>>> wrong anywhere but write state.
>>>>>>>>>
>>>>>>>>> I just found the following in the spec (section 9.12):
>>>>>>>>>
>>>>>>>>>    Thus, since the interface will allow a command transfer to be
>>>>>>>>>    started or restarted
>>>>>>>>>    at any time when the input buffer is empty, software could
>>>>>>>>> elect to
>>>>>>>>>    simply retry
>>>>>>>>>    the command upon detecting an error condition, or issue a
>>>>>>>>> ‘known good’
>>>>>>>>>    command in order to clear ERROR_STATE
>>>>>>>>>
>>>>>>>>> So a WRITE_START anywhere is ok. A WRITE_END in the wrong
>>>>>>>>> state should probably
>>>>>>>>> still go to error state. This means the user needs to be able
>>>>>>>>> to handle a write error at
>>>>>>>>> any time. It also means it's very important to make sure the
>>>>>>>>> user does a read before
>>>>>>>>> doing a write. If the host re-issues a WRITE_START and writes
>>>>>>>>> a new command
>>>>>>>>> between the time the use reads the data and writes the
>>>>>>>>> response, the response would
>>>>>>>>> be for the wrong command.
>>>>>>>>>
>>>>>>>>>>>> +    switch (cmd) {
>>>>>>>>>>>> +    case KCS_CMD_WRITE_START:
>>>>>>>>>>>> +        kcs_bmc->data_in_avail = false;
>>>>>>>>>>>> +        kcs_bmc->data_in_idx   = 0;
>>>>>>>>>>>> +        kcs_bmc->phase         = KCS_PHASE_WRITE;
>>>>>>>>>>>> +        kcs_bmc->error         = KCS_NO_ERROR;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_CMD_WRITE_END:
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_WRITE_END;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_CMD_ABORT:
>>>>>>>>>>>> +        if (kcs_bmc->error == KCS_NO_ERROR)
>>>>>>>>>>>> +            kcs_bmc->error = KCS_ABORTED_BY_COMMAND;
>>>>>>>>>>>> +
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_ABORT_ERROR1;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    default:
>>>>>>>>>>>> +        kcs_bmc->error = KCS_ILLEGAL_CONTROL_CODE;
>>>>>>>>>>>> +        set_state(kcs_bmc, ERROR_STATE);
>>>>>>>>>>>> +        write_data(kcs_bmc, kcs_bmc->error);
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_ERROR;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +int kcs_bmc_handle_event(struct kcs_bmc *kcs_bmc)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    unsigned long flags;
>>>>>>>>>>>> +    int ret = 0;
>>>>>>>>>>>> +    u8 status;
>>>>>>>>>>>> +
>>>>>>>>>>>> + spin_lock_irqsave(&kcs_bmc->lock, flags);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    status = read_status(kcs_bmc) & (KCS_STATUS_IBF |
>>>>>>>>>>>> KCS_STATUS_CMD_DAT);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    switch (status) {
>>>>>>>>>>>> +    case KCS_STATUS_IBF | KCS_STATUS_CMD_DAT:
>>>>>>>>>>>> +        kcs_bmc_handle_command(kcs_bmc);
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case KCS_STATUS_IBF:
>>>>>>>>>>>> +        kcs_bmc_handle_data(kcs_bmc);
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    default:
>>>>>>>>>>>> +        ret = -1;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> + spin_unlock_irqrestore(&kcs_bmc->lock, flags);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +EXPORT_SYMBOL(kcs_bmc_handle_event);
>>>>>>>>>>>> +
>>>>>>>>>>>> +static inline struct kcs_bmc *file_kcs_bmc(struct file *filp)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    return container_of(filp->private_data, struct
>>>>>>>>>>>> kcs_bmc, miscdev);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static int kcs_bmc_open(struct inode *inode, struct file
>>>>>>>>>>>> *filp)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>>> +    int ret = 0;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (!kcs_bmc->running) {
>>>>>>>>>>>> +        kcs_bmc->running       = 1;
>>>>>>>>>>>> +        kcs_bmc->phase         = KCS_PHASE_IDLE;
>>>>>>>>>>>> +        kcs_bmc->data_in_avail = false;
>>>>>>>>>>>
>>>>>>>>>>> If you do everything right, setting the phase and
>>>>>>>>>>> data_in_avail should not
>>>>>>>>>>> be necessary here.
>>>>>>>>>>>
>>>>>>>>>>>> +    } else {
>>>>>>>>>>>> +        ret = -EBUSY;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static unsigned int kcs_bmc_poll(struct file *filp,
>>>>>>>>>>>> poll_table *wait)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>>> +    unsigned int mask = 0;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    poll_wait(filp, &kcs_bmc->queue, wait);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (kcs_bmc->data_in_avail)
>>>>>>>>>>>> +        mask |= POLLIN;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return mask;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static ssize_t kcs_bmc_read(struct file *filp, char *buf,
>>>>>>>>>>>> +                size_t count, loff_t *offset)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>>> +    ssize_t ret = -EAGAIN;
>>>>>>>>>>>> +
>>>>>>>>>>>
>>>>>>>>>>> This function still has some issues.
>>>>>>>>>>>
>>>>>>>>>>> You can't call copy_to_user() with a spinlock held or
>>>>>>>>>>> interrupts disabled.
>>>>>>>>>>> To handle readers, you probably need a separate mutex.
>>>>>>>>>>>
>>>>>>>>>>> Also, this function can return -EAGAIN even if O_NONBLOCK is
>>>>>>>>>>> not set if
>>>>>>>>>>> kcs_bmc->data_in_avail changes between when you wait on the
>>>>>>>>>>> event
>>>>>>>>>>> and when you check it under the lock.
>>>>>>>>>>>
>>>>>>>>>>> You also clear data_in_avail even if the copy_to_user()
>>>>>>>>>>> fails, which is
>>>>>>>>>>> wrong.
>>>>>>>>>>>
>>>>>>>>>>> I believe the best way to handle this would be to have the
>>>>>>>>>>> spinlock
>>>>>>>>>>> protect the inner workings of the state machine and a mutex
>>>>>>>>>>> handle
>>>>>>>>>>> copying data out, setting/clearing the running flag (thus a
>>>>>>>>>>> mutex
>>>>>>>>>>> instead of spinlock in open and release) and the ioctl
>>>>>>>>>>> settings (except
>>>>>>>>>>> for abort where you will need to grab the spinlock).
>>>>>>>>>>>
>>>>>>>>>>> After the wait event below, grab the mutex. If data is not
>>>>>>>>>>> available
>>>>>>>>>>> and O_NONBLOCK is not set, drop the mutex and retry. Otherwise
>>>>>>>>>>> this is the only place (besides release) that sets
>>>>>>>>>>> data_in_avail to false.
>>>>>>>>>>> Do the copy_to_user(), grab the spinlock, clear
>>>>>>>>>>> data_in_avail and
>>>>>>>>>>> data_in_idx, then release the lock and mutex. If you are really
>>>>>>>>>>> adventurous you can do this without grabbing the lock using
>>>>>>>>>>> barriers, but it's probably not necessary here.
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> With the state machine being able to be restarted at any time,
>>>>>>>>> you need
>>>>>>>>> something a little different here. You still need the mutex
>>>>>>>>> to handle
>>>>>>>>> multiple readers and the copy. I think the function should be
>>>>>>>>> something
>>>>>>>>> like:
>>>>>>>>>
>>>>>>>> Since KCS is not a multi-reader protocol from BMC's view, you
>>>>>>>> makes things complex. :-)
>>>>>>>
>>>>>>> No, I don't think you understand. The primary purpose of the
>>>>>>> complexity
>>>>>>> here is to protect the driver from the host system (on the other
>>>>>>> side of
>>>>>>> the KCS interface). Without this protection, it is possible for
>>>>>>> the host
>>>>>>> system to start a new write while the user on the BMC side is
>>>>>>> reading
>>>>>>> data out, resulting in corrupt data being read.
>>>>>>>
>>>>>>> I haven't thought too much about this. There may be a simpler way,
>>>>>>> but the protection needs to be there.
>>>>>>>
>>>>>>> And you may not think you need to protect the driver against a
>>>>>>> malicious BMC side user code, but you would be wrong. You can
>>>>>>> only have one opener, but with threads or a fork you can have
>>>>>>> multiple readers. And you don't know if a malicious piece of
>>>>>>> code has taken over userland. You always need to protect the
>>>>>>> kernel.
>>>>>>>
>>>>>> Sure, the read/write have protected the critical data area with
>>>>>> IRQ, and also, these
>>>>>> functions should be thread local safe I believe.
>>>>>>
>>>>>> spin_lock_irq(&kcs_bmc->lock);
>>>>>> ...
>>>>>> spin_unlock_irq(&kcs_bmc->lock);
>>>>>>
>>>>>
>>>>> But remember, you can't call copy_to_user() when IRQs are off or
>>>>> when you are holding
>>>>> a spinlock. That is an absolute no. It can crash the kernel.
>>>>>
>>>>> So you need a design that takes this into account, but will not
>>>>> result in the possibility
>>>>> of bad data being read.
>>>>>
>>>> Yes, sure, as I said before: access_ok(VERIFY_WRITE, to, n), then
>>>> memcpy in spin_lock.
>>>
>>> Where did you get the idea that this was ok? It's not. access_ok()
>>> is not actually very
>>> useful, since the permissions on memory can change at any time
>>> unless you are holding
>>> the mm lock, which is also not an ok thing to do. It is entirely
>>> possible for access_ok()
>>> to pass and copy_to_user() to fail.
>>>
>> I thought memcpy will not fail. :(
>
> Oh, memcpy won't fail as long as the source and destination is kernel
> memory.
> I was a little confused by the access_ok() thing, it's common for
> people to
> assume that if they do access_ok(), that copy_to_user() won't fail.
>
Yes, commonly misunderstand, didn't well understand the hidden things
that kernel do for memory
management.
>>> I'm not exactly sure what you are saying, though. In any event, a
>>> well-designed read()/write()
>>> operation should leave the system unchanged if it gets an error.
>>>
>> I saw BT use a local buffer, If I change the '#define
>> KCS_MSG_BUFSIZ    1024' to ".. 512", should it be OK
>> as BT ?
>>
>> static ssize_t bt_bmc_read(struct file *file, char __user *buf,
>>                size_t count, loff_t *ppos)
>> {
>>     struct bt_bmc *bt_bmc = file_bt_bmc(file);
>>     u8 len;
>>     int len_byte = 1;
>>     u8 kbuffer[BT_BMC_BUFFER_SIZE]; --> #define BT_BMC_BUFFER_SIZE 256
>
> It's good practice to keep larger things off the stack, which is why I
> dynamically
> allocated it. But if you have a mutex, you can put that buffer in
> struct bt_bmc
> since it would only be accessed when holding the mutex.
>
Got it, looks like this is the best idea. I will rewrite the driver
again, hope I can catch all of your code review
comments. :-)
>>
>>> -corey
>>>
>>>>>>>>>    static ssize_t kcs_bmc_read(struct file *filp, char *buf,
>>>>>>>>>                     size_t count, loff_t *offset)
>>>>>>>>>    {
>>>>>>>>>         struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>         ssize_t ret;
>>>>>>>>>         bool avail;
>>>>>>>>>         size_t data_size;
>>>>>>>>>         u8 *data;
>>>>>>>>>
>>>>>>>>>         data = kmalloc(KCS_MSG_BUFSIZ, GFP_KERNEL);
>>>>>>>>>         if (!data)
>>>>>>>>>             return -ENOMEM;
>>>>>>>>>
>>>>>>>>>    retry:
>>>>>>>>>         ret = -EAGAIN;
>>>>>>>>>         if (!(filp->f_flags & O_NONBLOCK))
>>>>>>>>> wait_event_interruptible(kcs_bmc->queue,
>>>>>>>>> kcs_bmc->data_in_avail);
>>>>>>>>>
>>>>>>>>>         mutex_lock(&kcs_bmc->read_mutex);
>>>>>>>>>
>>>>>>>>>         spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>         avail = kcs_bmc->data_in_avail;
>>>>>>>>>         if (avail) {
>>>>>>>>>             memcpy(data, kcs_bmc->data_in, kcs_bmc->data_in_idx);
>>>>>>>>>             data_size = kcs_bmc->data_in_idx;
>>>>>>>>>         }
>>>>>>>>>         spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>
>>>>>>>>>         if (!avail) {
>>>>>>>>>             if (filp->f_flags & O_NONBLOCK)
>>>>>>>>>                 goto out_mutex_unlock;
>>>>>>>>> mutex_unlock(&kcs_bmc->read_mutex);
>>>>>>>>>             goto retry;
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         if (count < data_size) {
>>>>>>>>>             ret = -EOVERFLOW;
>>>>>>>>>            ? I'm not sure about the error, but userspace
>>>>>>>>> needs to know.
>>>>>>>>>         goto out_mutex_unlock;
>>>>>>>
>>>>>>> Maybe a length error to the host side here?
>>>>>
>>>>> You didn't comment on this or the other length error. That needs
>>>>> to be
>>>>> handled.
>>>>>
>>>> Yes, will send a length error by following KCS spec.
>>>>>>>
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         if (!copy_to_user(buf, data, data_size)) {
>>>>>>>>>             ret = -EFAULT;
>>>>>>>>>         goto out_mutex_unlock;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>         ret = data_size;
>>>>>>>>>
>>>>>>>>>         spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>
>>>>>>>>>         if (kcs_bmc->phase != KCS_PHASE_WRITE_END_DONE)
>>>>>>>>>             /* Something aborted or restarted the state
>>>>>>>>> machine. */
>>>>>>>>>             ? Maybe restart if O_NONBLOCK is not set and
>>>>>>>>> -EAGAIN if it is?
>>>>>>>>>             ret = -EIO;
>>>>>>>>>         } else {
>>>>>>>>>           kcs_bmc->phase = KCS_PHASE_WAIT_READ;
>>>>>>>>>             kcs_bmc->data_in_avail = false;
>>>>>>>>>     kcs_bmc->data_in_idx = 0;
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>
>>>>>>>>>    out_mutex_unlock:
>>>>>>>>>         mutex_unlock(&kcs_bmc->read_mutex);
>>>>>>>>>
>>>>>>>>>         kfree(data);
>>>>>>>>>
>>>>>>>>>         return ret;
>>>>>>>>>    }
>>>>>>>>> Note that I added a state, KCS_PHASE_WRITE_END_DONE, which
>>>>>>>>> would be
>>>>>>>>> set after the final byte from the host is received. You want
>>>>>>>>> the read here
>>>>>>>>> done before you can do the write below to avoid the race I
>>>>>>>>> talked about.
>>>>>>>>>
>>>>>>>>> There is a local copy made of the data. What you *never* want
>>>>>>>>> to happen
>>>>>>>>> here is for the state machine to start processing a new write
>>>>>>>>> command
>>>>>>>>> while the data is being copied. It could result in corrupt
>>>>>>>>> data being read
>>>>>>>>> and some random operation being done by the BMC.
>>>>>>>>>
>>>>>>>>> If you want to avoid the local copy, it could be done, but
>>>>>>>>> it's more complex.
>>>>>>>>>
>>>>>>>>>>>> +    if (!(filp->f_flags & O_NONBLOCK))
>>>>>>>>>>>> + wait_event_interruptible(kcs_bmc->queue,
>>>>>>>>>>>> + kcs_bmc->data_in_avail);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (kcs_bmc->data_in_avail) {
>>>>>>>>>>>> +        kcs_bmc->data_in_avail = false;
>>>>>>>>>>>> +
>>>>>>>>>>>> +        if (count > kcs_bmc->data_in_idx)
>>>>>>>>>>>> +            count = kcs_bmc->data_in_idx;
>>>>>>>>>>>> +
>>>>>>>>>>>> +        if (!copy_to_user(buf, kcs_bmc->data_in, count))
>>>>>>>>>>>> +            ret = count;
>>>>>>>>>>>> +        else
>>>>>>>>>>>> +            ret = -EFAULT;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static ssize_t kcs_bmc_write(struct file *filp, const char
>>>>>>>>>>>> *buf,
>>>>>>>>>>>> +                 size_t count, loff_t *offset)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>>> +    ssize_t ret = count;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (count < 1 || count > KCS_MSG_BUFSIZ)
>>>>>>>>>>>> +        return -EINVAL;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (kcs_bmc->phase == KCS_PHASE_WAIT_READ) {
>>>>>>>>>>>> +        if (copy_from_user(kcs_bmc->data_out, buf, count)) {
>>>>>>>>>>>> + spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +            return -EFAULT;
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_READ;
>>>>>>>>>>>> +        kcs_bmc->data_out_idx = 1;
>>>>>>>>>>>> +        kcs_bmc->data_out_len = count;
>>>>>>>>>>>> +        write_data(kcs_bmc, kcs_bmc->data_out[0]);
>>>>>>>>>>>> +    } else if (kcs_bmc->phase == KCS_PHASE_READ) {
>>>>>>>>>>>> +        ret = -EBUSY;
>>>>>>>>>>>> +    } else {
>>>>>>>>>>>> +        ret = -EINVAL;
>>>>>>>>>>>
>>>>>>>>>>> Is there a reason you return -EINVAL here? Why not just
>>>>>>>>>>> -EBUSY in all
>>>>>>>>>>> cases? Is there something that userland will need to do
>>>>>>>>>>> differently?
>>>>>>>>>>>
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static long kcs_bmc_ioctl(struct file *filp, unsigned int
>>>>>>>>>>>> cmd,
>>>>>>>>>>>> +              unsigned long arg)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>>> +    long ret = 0;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    switch (cmd) {
>>>>>>>>>>>> +    case IPMI_BMC_IOCTL_SET_SMS_ATN:
>>>>>>>>>>>> +        update_status_bits(kcs_bmc, KCS_STATUS_SMS_ATN,
>>>>>>>>>>>> + KCS_STATUS_SMS_ATN);
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case IPMI_BMC_IOCTL_CLEAR_SMS_ATN:
>>>>>>>>>>>> +        update_status_bits(kcs_bmc, KCS_STATUS_SMS_ATN,
>>>>>>>>>>>> +                        0);
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    case IPMI_BMC_IOCTL_FORCE_ABORT:
>>>>>>>>>>>> +        set_state(kcs_bmc, ERROR_STATE);
>>>>>>>>>>>> +        read_data(kcs_bmc);
>>>>>>>>>>>> +        write_data(kcs_bmc, KCS_ZERO_DATA);
>>>>>>>>>>>> +
>>>>>>>>>>>> +        kcs_bmc->phase = KCS_PHASE_ERROR;
>>>>>>>>>>>> +        kcs_bmc->data_in_avail = false;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    default:
>>>>>>>>>>>> +        ret = -EINVAL;
>>>>>>>>>>>> +        break;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return ret;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static int kcs_bmc_release(struct inode *inode, struct
>>>>>>>>>>>> file *filp)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct kcs_bmc *kcs_bmc = file_kcs_bmc(filp);
>>>>>>>>>>>> +
>>>>>>>>>>>
>>>>>>>>>>> What happens if the device gets closed in the middle of a
>>>>>>>>>>> transaction? That's
>>>>>>>>>>> an important case to handle. If something is in process,
>>>>>>>>>>> you need to abort it.
>>>>>>>>>>>
>>>>>>>>>> The device just provides the read & write data, the
>>>>>>>>>> transaction is handled in the KCS
>>>>>>>>>> controller's IRQ handler.
>>>>>>>>>
>>>>>>>>> From the spec, section 9.14:
>>>>>>>>>
>>>>>>>>>    The BMC must change the status to ERROR_STATE on any
>>>>>>>>> condition where it
>>>>>>>>>    aborts a command transfer in progress.
>>>>>>>>>
>>>>>>>>> So you need to do something here.
>>>>>>>>>
>>>>>>>> In practice, we do this as spec said in ipmid, NOT in driver,
>>>>>>>> driver can't handle anything, let's
>>>>>>>> make it simple, thanks!
>>>>>>>
>>>>>>> If ipmid crashes or is killed, how does it accomplish this?
>>>>>>>
>>>>>> Every time ipmids (or kcsd) crashed or killed, it needs start to
>>>>>> call FORCE_ARBORT firstly, to sync with
>>>>>> host side software.
>>>>>>>>
>>>>>>>> Whenever the BMC is reset (from power-on or a hard reset), the
>>>>>>>> State Bits are initialized to “11 - Error State”. Doing so
>>>>>>>> allows SMS to detect that the BMC has been reset and that any
>>>>>>>> message in process has been terminated by the BMC.
>>>>>>>
>>>>>>> Right, that's fine, like it should be. But we are not talking
>>>>>>> about a reset.
>>>>>>>
>>>>>> I think the final error handling solution is that kcsd (user
>>>>>> land) runs, otherwise, the host software side still got stuck. We
>>>>>> meet
>>>>>> this kind of issue, so in general, we just doesn't handle some
>>>>>> mirror errors in driver, then in kcsd, when it can provide the real
>>>>>> IPMI service, it will reset the channel firstly to sync with host
>>>>>> side software.
>>>>>
>>>>> "Userland will do the right thing" is not very convincing to a
>>>>> kernel developer.
>>>>>
>>>>> Plus if the above is true, I would think that you would just want
>>>>> to hold the device
>>>>> in an error state when it wasn't opened.
>>>>>
>>>> I understand your concern, of course, driver need handles things
>>>> well. But in fact, if a user app is truly a bad boy, it still can hang
>>>> the host side: set SMS_ATN, but no message returned when software
>>>> host side requests, then host open-ipmi driver will hang, we
>>>> meet this kind of error to hang the customer's host. :) In my
>>>> understanding, kcs-bmc should do the right thing about read and write,
>>>> the real transaction should be handled correctly by the kcsd.
>>>>
>>>> And if no kcsd starts, then this kind of BMC can't be sold out. :)
>>>
>>> True. I'm not as concerned about this sort of thing. It's nicer to
>>> the host side if
>>> it can detect problems quickly, but it will eventually time out.
>>>
>>> From what I can tell from the current design, if the BMC userland is
>>> not running,
>>> the driver will step through the state machine until it hits read
>>> state, then it
>>> will sit there until the host times out and aborts the operation.
>>>
>>> IMHO, it would be better for the host side if the driver just stayed
>>> in error state
>>> if nothing had it open. It would think the spec says that in the
>>> quote I referenced
>>> above, but that quote, like many things in the IPMI spec, is fairly
>>> vague and could
>>> be interpreted many ways.
>>>
>> Well, I will try to fix this errors as possible.
>>> -corey
>>>
>>>
>>>>> -corey
>>>>>
>>>>>>> -corey
>>>>>>>
>>>>>>>>>>>> + spin_lock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    kcs_bmc->running = 0;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    spin_unlock_irq(&kcs_bmc->lock);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    return 0;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>