2021-04-06 15:34:36

by Daejun Park

[permalink] [raw]
Subject: RE: Re: [PATCH 1/2] scsi: ufs: Introduce hba performance monitor sysfs nodes

Hi Can Guo,
>
>Hi Daejun,
>
>On 2021-04-06 12:11, Daejun Park wrote:
>> Hi Can Guo,
>>
>>> +static ssize_t monitor_enable_store(struct device *dev,
>>> + struct device_attribute *attr,
>>> + const char *buf, size_t count)
>>> +{
>>> + struct ufs_hba *hba = dev_get_drvdata(dev);
>>> + unsigned long value, flags;
>>> +
>>> + if (kstrtoul(buf, 0, &value))
>>> + return -EINVAL;
>>> +
>>> + value = !!value;
>>> + spin_lock_irqsave(hba->host->host_lock, flags);
>>> + if (value == hba->monitor.enabled)
>>> + goto out_unlock;
>>> +
>>> + if (!value) {
>>> + memset(&hba->monitor, 0, sizeof(hba->monitor));
>>> + } else {
>>> + hba->monitor.enabled = true;
>>> + hba->monitor.enabled_ts = ktime_get();
>>
>> How about setting lat_max to and lat_min to KTIME_MAX and 0?
>
>lat_min is already 0. What is the benefit of setting lat_max to
>KTIME_MAX?
>
>> I think lat_sum should be 0 at this point.
>
>lat_sum is already 0 at this point, what is the problem?

Sorry. I misunderstood about resetting monitor values.

>
>>
>>> + }
>>> +
>>> +out_unlock:
>>> + spin_unlock_irqrestore(hba->host->host_lock, flags);
>>> + return count;
>>> +}
>>
>>
>>> +static void ufshcd_update_monitor(struct ufs_hba *hba, struct
>>> ufshcd_lrb *lrbp)
>>> +{
>>> + int dir = ufshcd_monitor_opcode2dir(*lrbp->cmd->cmnd);
>>> +
>>> + if (dir >= 0 && hba->monitor.nr_queued[dir] > 0) {
>>> + struct request *req = lrbp->cmd->request;
>>> + struct ufs_hba_monitor *m = &hba->monitor;
>>> + ktime_t now, inc, lat;
>>> +
>>> + now = ktime_get();
>>
>> How about using lrbp->compl_time_stamp instead of getting new value?
>
>I am expecting "now" keeps increasing and use it to update
>m->busy_start_s,
>but if I use lrbp->compl_time_stamp to do that, below line ktime_sub()
>may
>give me an unexpected value as lrbp->compl_time_stamp may be smaller
>than
>m->busy_start_ts, because the actual requests are not completed by the
>device
>in the exact same ordering as the bits set in hba->outstanding_tasks,
>but driver
>is completing them from bit 0 to bit 31 in ascending order.

lrbp->compl_time_stamp is set just before calling ufshcd_update_monitor().
And I don't think it can be negative value, because ufshcd_send_command()
and __ufshcd_transfer_req_compl() are protected by host lock.

>
>>
>>> + inc = ktime_sub(now, m->busy_start_ts[dir]);
>>> + m->total_busy[dir] = ktime_add(m->total_busy[dir],
>>> inc);
>>> + m->nr_sec_rw[dir] += blk_rq_sectors(req);
>>> +
>>> + /* Update latencies */
>>> + m->nr_req[dir]++;
>>> + lat = ktime_sub(now, lrbp->issue_time_stamp);
>>> + m->lat_sum[dir] += lat;
>>> + if (m->lat_max[dir] < lat || !m->lat_max[dir])
>>> + m->lat_max[dir] = lat;
>>> + if (m->lat_min[dir] > lat || !m->lat_min[dir])
>>> + m->lat_min[dir] = lat;
>>
>> This if statement can be shorted, by setting lat_max / lat_min as
>> default value.
>
>I don't quite get it, can you show me the code sample?

I think " || !m->lat_max[dir]" can be removed.

if (m->lat_max[dir] < lat)
m->lat_max[dir] = lat;
if (m->lat_min[dir] > lat)
m->lat_min[dir] = lat;

Thanks,
Daejun

>
>Thanks,
>Can Guo
>
>>
>>> +
>>> + m->nr_queued[dir]--;
>>> + /* Push forward the busy start of monitor */
>>> + m->busy_start_ts[dir] = now;
>>> + }
>>> +}
>>
>> Thanks,
>> Daejun


2021-04-06 15:51:19

by Can Guo

[permalink] [raw]
Subject: Re: [PATCH 1/2] scsi: ufs: Introduce hba performance monitor sysfs nodes

On 2021-04-06 13:58, Daejun Park wrote:
> Hi Can Guo,
>>
>> Hi Daejun,
>>
>> On 2021-04-06 12:11, Daejun Park wrote:
>>> Hi Can Guo,
>>>
>>>> +static ssize_t monitor_enable_store(struct device *dev,
>>>> + struct device_attribute *attr,
>>>> + const char *buf, size_t count)
>>>> +{
>>>> + struct ufs_hba *hba = dev_get_drvdata(dev);
>>>> + unsigned long value, flags;
>>>> +
>>>> + if (kstrtoul(buf, 0, &value))
>>>> + return -EINVAL;
>>>> +
>>>> + value = !!value;
>>>> + spin_lock_irqsave(hba->host->host_lock, flags);
>>>> + if (value == hba->monitor.enabled)
>>>> + goto out_unlock;
>>>> +
>>>> + if (!value) {
>>>> + memset(&hba->monitor, 0, sizeof(hba->monitor));
>>>> + } else {
>>>> + hba->monitor.enabled = true;
>>>> + hba->monitor.enabled_ts = ktime_get();
>>>
>>> How about setting lat_max to and lat_min to KTIME_MAX and 0?
>>
>> lat_min is already 0. What is the benefit of setting lat_max to
>> KTIME_MAX?
>>
>>> I think lat_sum should be 0 at this point.
>>
>> lat_sum is already 0 at this point, what is the problem?
>
> Sorry. I misunderstood about resetting monitor values.
>
>>
>>>
>>>> + }
>>>> +
>>>> +out_unlock:
>>>> + spin_unlock_irqrestore(hba->host->host_lock, flags);
>>>> + return count;
>>>> +}
>>>
>>>
>>>> +static void ufshcd_update_monitor(struct ufs_hba *hba, struct
>>>> ufshcd_lrb *lrbp)
>>>> +{
>>>> + int dir = ufshcd_monitor_opcode2dir(*lrbp->cmd->cmnd);
>>>> +
>>>> + if (dir >= 0 && hba->monitor.nr_queued[dir] > 0) {
>>>> + struct request *req = lrbp->cmd->request;
>>>> + struct ufs_hba_monitor *m = &hba->monitor;
>>>> + ktime_t now, inc, lat;
>>>> +
>>>> + now = ktime_get();
>>>
>>> How about using lrbp->compl_time_stamp instead of getting new value?
>>
>> I am expecting "now" keeps increasing and use it to update
>> m->busy_start_s,
>> but if I use lrbp->compl_time_stamp to do that, below line ktime_sub()
>> may
>> give me an unexpected value as lrbp->compl_time_stamp may be smaller
>> than
>> m->busy_start_ts, because the actual requests are not completed by the
>> device
>> in the exact same ordering as the bits set in hba->outstanding_tasks,
>> but driver
>> is completing them from bit 0 to bit 31 in ascending order.
>
> lrbp->compl_time_stamp is set just before calling
> ufshcd_update_monitor().
> And I don't think it can be negative value, because
> ufshcd_send_command()
> and __ufshcd_transfer_req_compl() are protected by host lock.
>

Yes, I replied u in another mail... I will use the compl_time_stamp in
next
version. And later I will add alloc_time_stamp and release_time_stamp to
lrbp
so that we can monitor the overall send/compl path, including hpb_prep()
and
hpb_rsp().

>>
>>>
>>>> + inc = ktime_sub(now, m->busy_start_ts[dir]);
>>>> + m->total_busy[dir] = ktime_add(m->total_busy[dir],
>>>> inc);
>>>> + m->nr_sec_rw[dir] += blk_rq_sectors(req);
>>>> +
>>>> + /* Update latencies */
>>>> + m->nr_req[dir]++;
>>>> + lat = ktime_sub(now, lrbp->issue_time_stamp);
>>>> + m->lat_sum[dir] += lat;
>>>> + if (m->lat_max[dir] < lat || !m->lat_max[dir])
>>>> + m->lat_max[dir] = lat;
>>>> + if (m->lat_min[dir] > lat || !m->lat_min[dir])
>>>> + m->lat_min[dir] = lat;
>>>
>>> This if statement can be shorted, by setting lat_max / lat_min as
>>> default value.
>>
>> I don't quite get it, can you show me the code sample?
>
> I think " || !m->lat_max[dir]" can be removed.
>
> if (m->lat_max[dir] < lat)
> m->lat_max[dir] = lat;
> if (m->lat_min[dir] > lat)
> m->lat_min[dir] = lat;
>

From the beginning, lat_min is 0, without "!m->lat_min[dir]", m->lat_min
will never be updated. Same for lat_max. Meanwhile, !m->lat_min/max will
be hit only once in each round, which does not hurt.

Thanks,
Can Guo.

> Thanks,
> Daejun
>
>>
>> Thanks,
>> Can Guo
>>
>>>
>>>> +
>>>> + m->nr_queued[dir]--;
>>>> + /* Push forward the busy start of monitor */
>>>> + m->busy_start_ts[dir] = now;
>>>> + }
>>>> +}
>>>
>>> Thanks,
>>> Daejun