LinuxLists.cc - [PATCH 2.6.13 2/14] sas-class: README

2005-09-09 19:39:25

Subject: [PATCH 2.6.13 2/14] sas-class: README

Signed-off-by: Luben Tuikov <[email protected]>

diff -X linux-2.6.13/Documentation/dontdiff -Naur linux-2.6.13-orig/drivers/scsi/sas-class/README linux-2.6.13/drivers/scsi/sas-class/README
--- linux-2.6.13-orig/drivers/scsi/sas-class/README 1969-12-31 19:00:00.000000000 -0500
+++ linux-2.6.13/drivers/scsi/sas-class/README 2005-09-09 12:22:32.000000000 -0400
@@ -0,0 +1,440 @@
+SAS Layer
+---------
+
+The SAS Layer is a management infrastructure which manages
+SAS LLDDs. It sits between SCSI Core and SAS LLDDs. The
+layout is as follows: while SCSI Core is concerned with
+SAM/SPC issues, and a SAS LLDD+sequencer is concerned with
+phy/OOB/link management, the SAS layer is concerned with:
+
+ * SAS Phy/Port/HA event management (LLDD generates,
+ SAS Layer processes),
+ * SAS Port management (creation/destruction),
+ * SAS Domain discovery and revalidation,
+ * SAS Domain device management,
+ * SCSI Host registration/unregistration,
+ * Device registration with SCSI Core (SAS) or libata
+ (SATA), and
+ * Expander management and exporting expander control
+ to user space.
+
+A SAS LLDD is a PCI device driver. It is concerned with
+phy/OOB management, and vendor specific tasks and generates
+events to the SAS layer.
+
+The SAS Layer does most SAS tasks as outlined in the SAS 1.1
+spec.
+
+The sas_ha_struct describes the SAS LLDD to the SAS layer.
+Most of it is used by the SAS Layer but a few fields need to
+be initialized by the LLDDs.
+
+After initializing your hardware, from the probe() function
+you call sas_register_ha(). It will register your LLDD with
+the SCSI subsystem, creating a SCSI host and it will
+register your SAS driver with the sysfs SAS tree it creates.
+It will then return. Then you enable your phys to actually
+start OOB (at which point your driver will start calling the
+notify_* event callbacks).
+
+Structure descriptions:
+
+struct sas_phy --------------------
+Normally this is statically embedded to your driver's
+phy structure:
+ struct my_phy {
+ blah;
+ struct sas_phy sas_phy;
+ bleh;
+ };
+And then all the phys are an array of my_phy in your HA
+struct (shown below).
+
+Then as you go along and initialize your phys you also
+initialize the sas_phy struct, along with your own
+phy structure.
+
+In general, the phys are managed by the LLDD and the ports
+are managed by the SAS layer. So the phys are initialized
+and updated by the LLDD and the ports are initialized and
+updated by the SAS layer.
+
+There is a scheme where the LLDD can RW certain fields,
+and the SAS layer can only read such ones, and vice versa.
+The idea is to avoid unnecessary locking.
+
+enabled -- must be set (0/1)
+id -- must be set [0,MAX_PHYS)
+class, proto, type, role, oob_mode, linkrate -- must be set
+oob_mode -- you set this when OOB has finished and then notify
+the SAS Layer.
+
+sas_addr -- this normally points to an array holding the sas
+address of the phy, possibly somewhere in your my_phy
+struct.
+
+attached_sas_addr -- set this when you (LLDD) receive an
+IDENTIFY frame or a FIS frame, _before_ notifying the SAS
+layer. The idea is that sometimes the LLDD may want to fake
+or provide a different SAS address on that phy/port and this
+allows it to do this. At best you should copy the sas
+address from the IDENTIFY frame or maybe generate a SAS
+address for SATA directly attached devices. The Discover
+process may later change this.
+
+frame_rcvd -- this is where you copy the IDENTIFY/FIS frame
+when you get it; you lock, copy, set frame_rcvd_size and
+unlock the lock, and then call the event. It is a pointer
+since there's no way to know your hw frame size _exactly_,
+so you define the actual array in your phy struct and let
+this pointer point to it. You copy the frame from your
+DMAable memory to that area holding the lock.
+
+sas_prim -- this is where primitives go when they're
+received. See sas.h. Grab the lock, set the primitive,
+release the lock, notify.
+
+port -- this points to the sas_port if the phy belongs
+to a port -- the LLDD only reads this. It points to the
+sas_port this phy is part of. Set by the SAS Layer.
+
+ha -- may be set; the SAS layer sets it anyway.
+
+lldd_phy -- you should set this to point to your phy so you
+can find your way around faster when the SAS layer calls one
+of your callbacks and passes you a phy. If the sas_phy is
+embedded you can also use container_of -- whatever you
+prefer.
+
+
+struct sas_port --------------------
+The LLDD doesn't set any fields of this struct -- it only
+reads them. They should be self explanatory.
+
+phy_mask is 32 bit, this should be enough for now, as I
+haven't heard of a HA having more than 8 phys.
+
+lldd_port -- I haven't found use for that -- maybe other
+LLDD who wish to have internal port representation can make
+use of this.
+
+
+struct sas_ha_struct --------------------
+It normally is statically declared in your own LLDD
+structure describing your adapter:
+struct my_sas_ha {
+ blah;
+ struct sas_ha_struct sas_ha;
+ struct my_phy phys[MAX_PHYS];
+ struct sas_port sas_ports[MAX_PHYS]; /* (1) */
+ bleh;
+};
+
+(1) If your LLDD doesn't have its own port representation.
+
+What needs to be initialized (sample function given below).
+
+pcidev
+sas_addr -- since the SAS layer doesn't want to mess with
+ memory allocation, etc, this points to statically
+ allocated array somewhere (say in your host adapter
+ structure) and holds the SAS address of the host
+ adapter as given by you or the manufacturer, etc.
+sas_port
+sas_phy -- an array of pointers to structures. (see
+ note above on sas_addr).
+ These must be set. See more notes below.
+num_phys -- the number of phys present in the sas_phy array,
+ and the number of ports present in the sas_port
+ array. There can be a maximum num_phys ports (one per
+ port) so we drop the num_ports, and only use
+ num_phys.
+
+The event interface:
+
+ /* LLDD calls these to notify the class of an event. */
+ void (*notify_ha_event)(struct sas_ha_struct *, enum ha_event);
+ void (*notify_port_event)(struct sas_phy *, enum port_event);
+ void (*notify_phy_event)(struct sas_phy *, enum phy_event);
+
+When sas_register_ha() returns, those are set and can be
+called by the LLDD to notify the SAS layer of such events
+the SAS layer.
+
+The port notification:
+
+ /* The class calls these to notify the LLDD of an event. */
+ void (*lldd_port_formed)(struct sas_phy *);
+ void (*lldd_port_deformed)(struct sas_phy *);
+
+If the LLDD wants notification when a port has been formed
+or deformed it sets those to a function satisfying the type.
+
+A SAS LLDD should also implement at least one of the Task
+Management Functions (TMFs) described in SAM:
+
+ /* Task Management Functions. Must be called from process context. */
+ int (*lldd_abort_task)(struct sas_task *);
+ int (*lldd_abort_task_set)(struct domain_device *, u8 *lun);
+ int (*lldd_clear_aca)(struct domain_device *, u8 *lun);
+ int (*lldd_clear_task_set)(struct domain_device *, u8 *lun);
+ int (*lldd_I_T_nexus_reset)(struct domain_device *);
+ int (*lldd_lu_reset)(struct domain_device *, u8 *lun);
+ int (*lldd_query_task)(struct sas_task *);
+
+For more information please read SAM from T10.org.
+
+Port and Adapter management:
+
+ /* Port and Adapter management */
+ int (*lldd_clear_nexus_port)(struct sas_port *);
+ int (*lldd_clear_nexus_ha)(struct sas_ha_struct *);
+
+A SAS LLDD should implement at least one of those.
+
+Phy management:
+
+ /* Phy management */
+ int (*lldd_control_phy)(struct sas_phy *, enum phy_func);
+
+lldd_ha -- set this to point to your HA struct. You can also
+use container_of if you embedded it as shown above.
+
+A sample initialization and registration function
+can look like this (called last thing from probe())
+*but* before you enable the phys to do OOB:
+
+static int register_sas_ha(struct my_sas_ha *my_ha)
+{
+ int i;
+ static struct sas_phy *sas_phys[MAX_PHYS];
+ static struct sas_port *sas_ports[MAX_PHYS];
+
+ my_ha->sas_ha.sas_addr = &my_ha->sas_addr[0];
+
+ for (i = 0; i < MAX_PHYS; i++) {
+ sas_phys[i] = &my_ha->phys[i].sas_phy;
+ sas_ports[i] = &my_ha->sas_ports[i];
+ }
+
+ my_ha->sas_ha.sas_phy = sas_phys;
+ my_ha->sas_ha.sas_port = sas_ports;
+ my_ha->sas_ha.num_phys = MAX_PHYS;
+
+ my_ha->sas_ha.lldd_port_formed = my_port_formed;
+
+ return sas_register_ha(&my_ha->sas_ha);
+}
+
+
+Events
+------
+
+Events are _the only way_ a SAS LLDD notifies the SAS layer
+of anything. There is no other method or way a LLDD to tell
+the SAS layer of anything happening internally or on the SAS
+domain.
+
+Phy events:
+ PHYE_LOSS_OF_SIGNAL, (C)
+ PHYE_OOB_DONE,
+ PHYE_OOB_ERROR, (C)
+ PHYE_SPINUP_HOLD.
+
+Port events, passed on a _phy_:
+ PORTE_BYTES_DMAED, (M)
+ PORTE_BROADCAST_RCVD, (E)
+ PORTE_LINK_RESET_ERR, (C)
+ PORTE_TIMER_EVENT, (C)
+ PORTE_HARD_RESET.
+
+Host Adapter event:
+ HAE_RESET
+
+A SAS LLDD should be able to generate
+ - at least one event from group C (choice),
+ - events marked M (mandatory) are mandatory (only one),
+ - events marked E (expander) if it wants the SAS layer
+ to handle domain revalidation (only one such).
+ - Unmarked events are optional.
+
+Meaning:
+
+HAE_RESET -- when your HA got internal error and was reset.
+
+PORTE_BYTES_DMAED -- on receiving an IDENTIFY/FIS frame
+PORTE_BROADCAST_RCVD -- on receiving a primitive
+PORTE_LINK_RESET_ERR -- timer expired, loss of signal, loss
+of DWS, etc. (*)
+PORTE_TIMER_EVENT -- DWS reset timeout timer expired (*)
+PORTE_HARD_RESET -- Hard Reset primitive received.
+
+PHYE_LOSS_OF_SIGNAL -- the device is gone (*)
+PHYE_OOB_DONE -- OOB went fine and oob_mode is valid
+PHYE_OOB_ERROR -- Error while doing OOB, the device probably
+got disconnected. (*)
+PHYE_SPINUP_HOLD -- SATA is present, COMWAKE not sent.
+
+(*) should set/clear the appropriate fields in the phy,
+ or alternatively call the inlined sas_phy_disconnected()
+ which is just a helper, from their tasklet.
+
+The Execute Command SCSI RPC:
+
+ int (*lldd_execute_task)(struct sas_task *, int num,
+ unsigned long gfp_flags);
+
+Used to queue a task to the SAS LLDD. @task is the tasks to
+be executed. @num should be the number of tasks being
+queued at this function call (they are linked listed via
+task::list), @gfp_mask should be the gfp_mask defining the
+context of the caller.
+
+This function should implement the Execute Command SCSI RPC,
+or if you're sending a SCSI Task as linked commands, you
+should also use this function.
+
+That is, when lldd_execute_task() is called, the command(s)
+go out on the transport *immediately*. There is *no*
+queuing of any sort and at any level in a SAS LLDD.
+
+The use of task::list is two-fold, one for linked commands,
+the other discussed below.
+
+It is possible to queue up more than one task at a time, by
+initializing the list element of struct sas_task, and
+passing the number of tasks enlisted in this manner in num.
+
+Returns: -SAS_QUEUE_FULL, -ENOMEM, nothing was queued;
+ 0, the task(s) were queued.
+
+If you want to pass num > 1, then either
+A) you're the only caller of this function and keep track
+ of what you've queued to the LLDD, or
+B) you know what you're doing and have a strategy of
+ retrying.
+
+As opposed to queuing one task at a time (function call),
+batch queuing of tasks, by having num > 1, greatly
+simplifies LLDD code, sequencer code, and _hardware design_,
+and has some performance advantages in certain situations
+(DBMS).
+
+The LLDD advertises if it can take more than one command at
+a time at lldd_execute_task(), by setting the
+lldd_max_execute_num parameter (controlled by "collector"
+module parameter in aic94xx SAS LLDD).
+
+You should leave this to the default 1, unless you know what
+you're doing.
+
+This is a function of the LLDD, to which the SAS layer can
+cater to.
+
+int lldd_queue_size
+ The host adapter's queue size. This is the maximum
+number of commands the lldd can have pending to domain
+devices on behalf of all upper layers submitting through
+lldd_execute_task().
+
+You really want to set this to something (much) larger than
+1.
+
+This _really_ has absolutely nothing to do with queuing.
+There is no queuing in SAS LLDDs.
+
+struct sas_task {
+ dev -- the device this task is destined to
+ list -- must be initialized (INIT_LIST_HEAD)
+ task_proto -- _one_ of enum sas_proto
+ scatter -- pointer to scatter gather list array
+ num_scatter -- number of elements in scatter
+ total_xfer_len -- total number of bytes expected to be transfered
+ data_dir -- PCI_DMA_...
+ task_done -- callback when the task has finished execution
+};
+
+When an external entity, entity other than the LLDD or the
+SAS Layer, wants to work with a struct domain_device, it
+_must_ call kobject_get() when getting a handle on the
+device and kobject_put() when it is done with the device.
+
+This does two things:
+ A) implements proper kfree() for the device;
+ B) increments/decrements the kref for all players:
+ domain_device
+ all domain_device's ... (if past an expander)
+ port
+ host adapter
+ pci device
+ and up the ladder, etc.
+
+DISCOVERY
+---------
+
+The sysfs tree has the following purposes:
+ a) It shows you the physical layout of the SAS domain at
+ the current time, i.e. how the domain looks in the
+ physical world right now.
+ b) Shows some device parameters _at_discovery_time_.
+
+This is a link to the tree(1) program, very useful in
+viewing the SAS domain:
+ftp://mama.indstate.edu/linux/tree/
+I expect user space applications to actually create a
+graphical interface of this.
+
+That is, the sysfs domain tree doesn't show or keep state if
+you e.g., change the meaning of the READY LED MEANING
+setting, but it does show you the current connection status
+of the domain device.
+
+Keeping internal device state changes is responsibility of
+upper layers (Command set drivers) and user space.
+
+When a device or devices are unplugged from the domain, this
+is reflected in the sysfs tree immediately, and the device(s)
+removed from the system.
+
+The structure domain_device describes any device in the SAS
+domain. It is completely managed by the SAS layer. A task
+points to a domain device, this is how the SAS LLDD knows
+where to send the task(s) to. A SAS LLDD only reads the
+contents of the domain_device structure, but it never creates
+or destroys one.
+
+Expander management from User Space
+-----------------------------------
+
+In each expander directory in sysfs, there is a file called
+"smp_portal". It is a binary sysfs attribute file, which
+implements an SMP portal (Note: this is *NOT* an SMP port),
+to which user space applications can send SMP requests and
+receive SMP responses.
+
+Functionality is deceptively simple:
+
+1. Build the SMP frame you want to send. The format and layout
+ is described in the SAS spec. Leave the CRC field equal 0.
+open(2)
+2. Open the expander's SMP portal sysfs file in RW mode.
+write(2)
+3. Write the frame you built in 1.
+read(2)
+4. Read the amount of data you expect to receive for the frame you built.
+ If you receive different amount of data you expected to receive,
+ then there was some kind of error.
+close(2)
+All this process is shown in detail in the function do_smp_func()
+and its callers, in the file "expander_conf.c".
+
+The kernel functionality is implemented in the file
+"sas_expander.c".
+
+The program "expander_conf.c" implements this. It takes one
+argument, the sysfs file name of the SMP portal to the
+expander, and gives expander information, including routing
+tables.
+
+The SMP portal gives you complete control of the expander,
+so please be careful.

2005-09-11 01:44:35

by Douglas Gilbert

[permalink] [raw]

Subject: Re: [PATCH 2.6.13 2/14] sas-class: README

Luben Tuikov wrote:
> Signed-off-by: Luben Tuikov <[email protected]>

<snip>

An interesting document. I have a small quibble here (and a larger
one about the SMP user space access that I will elaborate on
in a day or so).

> +Port events, passed on a _phy_:
> + PORTE_BYTES_DMAED, (M)
> + PORTE_BROADCAST_RCVD, (E)
> + PORTE_LINK_RESET_ERR, (C)
> + PORTE_TIMER_EVENT, (C)
> + PORTE_HARD_RESET.

Link layer broadcasts don't only come from expanders
(i.e. BROADCAST(CHANGE) ); SAS 1.1 (sas1r09e.pdf) defines
BROADCAST(SES) coming from a target port associated with
an enclosure device (SES peripheral type). It is not
clear to me how the associated primitive is conveyed back
with the broadcast.

If it is not conveyed back then perhaps that broadcast define
could be expanded to:
PORTE_BROADCAST_CHANGE (E)
PORTE_BROADCAST_SES (Target)

and a note inserted that BROADCAST(RESERVED CHANGE 0) and
BROADCAST(RESERVED CHANGE 1) be mapped to PORTE_BROADCAST_CHANGE
by the LLDD as per table 79 of sas1r09e.pdf .

BTW table 70 indicates an initiator can originate a BROADCAST(CHANGE),
not just an expander.

Doug Gilbert

2005-09-12 09:00:29

by Douglas Gilbert

[permalink] [raw]

Subject: Re: [PATCH 2.6.13 2/14] sas-class: README

Luben Tuikov wrote:

<snip>

> +
> +DISCOVERY
> +---------
> +
> +The sysfs tree has the following purposes:
> + a) It shows you the physical layout of the SAS domain at
> + the current time, i.e. how the domain looks in the
> + physical world right now.
> + b) Shows some device parameters _at_discovery_time_.
> +
> +This is a link to the tree(1) program, very useful in
> +viewing the SAS domain:
> +ftp://mama.indstate.edu/linux/tree/
> +I expect user space applications to actually create a
> +graphical interface of this.
> +
> +That is, the sysfs domain tree doesn't show or keep state if
> +you e.g., change the meaning of the READY LED MEANING
> +setting, but it does show you the current connection status
> +of the domain device.

So in that case, user applications should ignore READY
LED MEANING in sysfs and ask the device directly.
For example:
sdparm --get RLM --transport sas /dev/sda

> +Keeping internal device state changes is responsibility of
> +upper layers (Command set drivers) and user space.

... and what about multiple initiators sitting on different
machines? Should they be responsible for:
1) finding out about one another
2) and keeping the sysfs tree in the other machine
in sync when one changes READY LED MEANING
(or anything else)?

Putting distributed state information in sysfs and then
passing off the responsibility for maintaining its state
(because it is a difficult problem) brings into question
the wisdom of the strategy.

Doug Gilbert

2005-09-12 16:56:27

by Luben Tuikov

[permalink] [raw]

Subject: Re: [PATCH 2.6.13 2/14] sas-class: README

On 09/10/05 21:44, Douglas Gilbert wrote:
> Luben Tuikov wrote:
>
>>Signed-off-by: Luben Tuikov <[email protected]>
>
>
> <snip>
>
> An interesting document.

Hi Doug, how are you?

If there is something wrong with the document in general,
please point it out.

Your "An interesting document" sounds like James's
"Aside from all other problems".

> I have a small quibble here (and a larger
> one about the SMP user space access that I will elaborate on
> in a day or so).

Ok.

(Thinking to myself: I wonder, if he's got a problem with
SMP user space access, why not post it now? Why in a day
or so?)

>>+Port events, passed on a _phy_:
>>+ PORTE_BYTES_DMAED, (M)
>>+ PORTE_BROADCAST_RCVD, (E)
>>+ PORTE_LINK_RESET_ERR, (C)
>>+ PORTE_TIMER_EVENT, (C)
>>+ PORTE_HARD_RESET.
>
>
> Link layer broadcasts don't only come from expanders
> (i.e. BROADCAST(CHANGE) ); SAS 1.1 (sas1r09e.pdf) defines

I'm aware of what the spec says.

> BROADCAST(SES) coming from a target port associated with

I'm aware of this primitive.

> an enclosure device (SES peripheral type). It is not
> clear to me how the associated primitive is conveyed back
> with the broadcast.

As you can see the message is "PORTE_BROADCAST_RCVD".
The _type_ of BROADCAST is encoded in phy->sas_prim.

See sas_port.c::void sas_porte_broadcast_rcvd(struct sas_phy *phy).

This function decides what to do depending on the type of
BROADCAST received.

Currently there have been no requests for handling of
BROADCAST(SES).

When there's such a request, we handle it there, telling
the Discover code of a SES event.

Currently only BROADCAST(CHANGE) is handled _by default_.

I.e. we search the domain for the expander and phy which
generated it, and act on it. If we find no expander and
phy which generated it, in case we processed a different
BROADCAST or an expander has broken firmware, we return.
This is all in the code.

BROADCAST filtering is also present in the LLDD, i.e.
notify on such and such BROADCAST or primitive in general.
This is done by the hardware itself (no firmware) so it
is very fast.

> If it is not conveyed back then perhaps that broadcast define
> could be expanded to:
> PORTE_BROADCAST_CHANGE (E)
> PORTE_BROADCAST_SES (Target)

I can add this event if you want.
The questiong is _what_ to do on this event. This is a complex
answer and I'd rather have a _SES_ layer (or at least a logical
module/library) to handle those as storage vendors want this,
_right now_.

In fact, I've some patches to submit regarding SES devices
on the domain, but I wanted to _trim *down*_ the politics,
_not_ escalate them.

Also this patch was "SAS support", not "SES device support".

> and a note inserted that BROADCAST(RESERVED CHANGE 0) and
> BROADCAST(RESERVED CHANGE 1) be mapped to PORTE_BROADCAST_CHANGE
> by the LLDD as per table 79 of sas1r09e.pdf .

They already are Doug.

> BTW table 70 indicates an initiator can originate a BROADCAST(CHANGE),
> not just an expander.

I hope this isn't going to be a thread about
* pointing out the obvious, or
* some kind of competion of who's read the spec the most.

All in all, you could've just asked "How about BROADCAST(SES)?".

Luben

2005-09-12 18:38:16

by Luben Tuikov

[permalink] [raw]

Subject: Re: [PATCH 2.6.13 2/14] sas-class: README

On 09/12/05 05:00, Douglas Gilbert wrote:
> Luben Tuikov wrote:
>
> <snip>
>
>>+
>>+DISCOVERY
>>+---------
>>+
>>+The sysfs tree has the following purposes:
>>+ a) It shows you the physical layout of the SAS domain at
>>+ the current time, i.e. how the domain looks in the
>>+ physical world right now.
>>+ b) Shows some device parameters _at_discovery_time_.
>>+
>>+This is a link to the tree(1) program, very useful in
>>+viewing the SAS domain:
>>+ftp://mama.indstate.edu/linux/tree/
>>+I expect user space applications to actually create a
>>+graphical interface of this.
>>+
>>+That is, the sysfs domain tree doesn't show or keep state if
>>+you e.g., change the meaning of the READY LED MEANING
>>+setting, but it does show you the current connection status
>>+of the domain device.
>
>
> So in that case, user applications should ignore READY
> LED MEANING in sysfs and ask the device directly.
> For example:
> sdparm --get RLM --transport sas /dev/sda

Yes, correct. You and I have discussed this already.

Excerpt from that same "README" file (just above the code
you cut and pasted):

The sysfs tree has the following purposes:
a) It shows you the physical layout of the SAS domain at
the current time, i.e. how the domain looks in the
physical world right now.
b) Shows some device parameters _at_discovery_time_.

> ... and what about multiple initiators sitting on different
> machines? Should they be responsible for:
> 1) finding out about one another
> 2) and keeping the sysfs tree in the other machine
> in sync when one changes READY LED MEANING
> (or anything else)?

And how is this a problem of the Discover code?

There is a way to handle more than one initiator on the domain
doing discovery, but I didn't code it since _at_this_time_
I doubt people would start doing this: they don't even
have SAS hardware, a lot less two initiators on the same
domain.

> Putting distributed state information in sysfs and then
> passing off the responsibility for maintaining its state
> (because it is a difficult problem) brings into question
> the wisdom of the strategy.

No Doug, not true at all.

Sorry that "sg" isn't going to be used to communicate with
expanders, but this is just part of evolution.

If I removed "ready_led_meaning" and a couple of other
such entries: you would have NO argument.

If I left only the directory entries representing the
object and the "smp_portal" you argument falls apart.

Instead, you should've been saying how easy and
_elegant_ it is communicating with expanders on the
domain:
* open(2) the "smp_portal" in the expander
directory you want to talk to,
* write(2) the SMP frame you want to send,
* read(2) the amount of information you
expect to get,
* close(2).

Done! No addressing to figure out, no memory problems, etc.
The easiest interface: read(2)/write(2), _and_ the simplest
format to use: pure SMP frames, easy and straightforward.

GUI programs would be able to use all this to give
a GUI of the whole domain and then you can just point and
click on an expander and get information using this simple
and elegant interface.

Sorry again "sg" wouldn't fit in here. It's just evolution.

Luben
P.S. I _did_ mention to you in private email that this is going
to happen and you didn't reply at all.

2005-09-13 09:22:36

by Douglas Gilbert

[permalink] [raw]

Subject: Re: [PATCH 2.6.13 2/14] sas-class: README

Luben Tuikov wrote:
> On 09/10/05 21:44, Douglas Gilbert wrote:
>
>>Luben Tuikov wrote:
>>
>>
>>>Signed-off-by: Luben Tuikov <[email protected]>
>>
>>
>><snip>
>>
>>An interesting document.
>
>
> Hi Doug, how are you?

Fine, but loosing a fair amount of time reading emails :-)

> If there is something wrong with the document in general,
> please point it out.
>
> Your "An interesting document" sounds like James's
> "Aside from all other problems".
>
>
>>I have a small quibble here (and a larger
>>one about the SMP user space access that I will elaborate on
>>in a day or so).
>
>
> Ok.
>
> (Thinking to myself: I wonder, if he's got a problem with
> SMP user space access, why not post it now? Why in a day
> or so?)
>
>
>>>+Port events, passed on a _phy_:
>>>+ PORTE_BYTES_DMAED, (M)
>>>+ PORTE_BROADCAST_RCVD, (E)
>>>+ PORTE_LINK_RESET_ERR, (C)
>>>+ PORTE_TIMER_EVENT, (C)
>>>+ PORTE_HARD_RESET.
>>
>>
>>Link layer broadcasts don't only come from expanders
>>(i.e. BROADCAST(CHANGE) ); SAS 1.1 (sas1r09e.pdf) defines
>
>
> I'm aware of what the spec says.
>
>
>>BROADCAST(SES) coming from a target port associated with
>
>
> I'm aware of this primitive.
>
>
>>an enclosure device (SES peripheral type). It is not
>>clear to me how the associated primitive is conveyed back
>>with the broadcast.
>
>
> As you can see the message is "PORTE_BROADCAST_RCVD".
> The _type_ of BROADCAST is encoded in phy->sas_prim.

Good.

> See sas_port.c::void sas_porte_broadcast_rcvd(struct sas_phy *phy).
>
> This function decides what to do depending on the type of
> BROADCAST received.
>
> Currently there have been no requests for handling of
> BROADCAST(SES).
>
> When there's such a request, we handle it there, telling
> the Discover code of a SES event.
>
> Currently only BROADCAST(CHANGE) is handled _by default_.
>
> I.e. we search the domain for the expander and phy which
> generated it, and act on it. If we find no expander and
> phy which generated it, in case we processed a different
> BROADCAST or an expander has broken firmware, we return.
> This is all in the code.
>
> BROADCAST filtering is also present in the LLDD, i.e.
> notify on such and such BROADCAST or primitive in general.
> This is done by the hardware itself (no firmware) so it
> is very fast.
>
>
>>If it is not conveyed back then perhaps that broadcast define
>>could be expanded to:
>> PORTE_BROADCAST_CHANGE (E)
>> PORTE_BROADCAST_SES (Target)
>
>
> I can add this event if you want.

No need, phy->sys_prim is fine.

> The questiong is _what_ to do on this event. This is a complex
> answer and I'd rather have a _SES_ layer (or at least a logical
> module/library) to handle those as storage vendors want this,
> _right now_.

Simple answer: generate a hotplug event and let a
user application that cares worry about it. No
need for a SES layer in the kernel.

> In fact, I've some patches to submit regarding SES devices
> on the domain, but I wanted to _trim *down*_ the politics,
> _not_ escalate them.

Oh no, not a sysfs representation of SES abstractions :-)

> Also this patch was "SAS support", not "SES device support".

True.

>>and a note inserted that BROADCAST(RESERVED CHANGE 0) and
>>BROADCAST(RESERVED CHANGE 1) be mapped to PORTE_BROADCAST_CHANGE
>>by the LLDD as per table 79 of sas1r09e.pdf .
>
>
> They already are Doug.
>
>
>>BTW table 70 indicates an initiator can originate a BROADCAST(CHANGE),
>>not just an expander.
>
>
> I hope this isn't going to be a thread about
> * pointing out the obvious, or
> * some kind of competion of who's read the spec the most.
>
> All in all, you could've just asked "How about BROADCAST(SES)?".

Indeed.

It may be an idea to encourage people who look at
your submission.

Doug Gilbert

2005-09-13 10:12:36

by Douglas Gilbert

[permalink] [raw]

Subject: Re: [PATCH 2.6.13 2/14] sas-class: README

Luben Tuikov wrote:
> On 09/12/05 05:00, Douglas Gilbert wrote:

<snip>

>>Putting distributed state information in sysfs and then
>>passing off the responsibility for maintaining its state
>>(because it is a difficult problem) brings into question
>>the wisdom of the strategy.
>
>
> No Doug, not true at all.
>
> Sorry that "sg" isn't going to be used to communicate with
> expanders, but this is just part of evolution.

That is a relief, retired at last.

> If I removed "ready_led_meaning" and a couple of other
> such entries: you would have NO argument.

less ...

> If I left only the directory entries representing the
> object and the "smp_portal" you argument falls apart.
>
> Instead, you should've been saying how easy and
> _elegant_ it is communicating with expanders on the
> domain:
> * open(2) the "smp_portal" in the expander
> directory you want to talk to,
> * write(2) the SMP frame you want to send,
> * read(2) the amount of information you
> expect to get,
> * close(2).
>
> Done! No addressing to figure out, no memory problems, etc.
> The easiest interface: read(2)/write(2), _and_ the simplest
> format to use: pure SMP frames, easy and straightforward.

It is impressive how elegant a passthrough can be when
one dispenses with a bit of metadata such as per command
timeouts and 3 levels of error messages (i.e. from the
driver, from the link layer and from the SMP target).
I always enjoy getting EIO in errno.

This all seems so frustrating; a LLD injects a
command/frame/whatever into an initiator and waits for
a response or something to happen. Networking code faces
the same scenario and presents it "as is" to the user
space (for any application that cares). Yes, there are
messy details. However in the SCSI subsystem we want to
hide this simple reality with all these wierd and
wonderful abstractions.

Doug Gilbert

2005-09-13 10:17:31

by Christoph Hellwig

[permalink] [raw]

Subject: Re: [PATCH 2.6.13 2/14] sas-class: README

On Mon, Sep 12, 2005 at 07:00:54PM +1000, Douglas Gilbert wrote:
> > +This is a link to the tree(1) program, very useful in
> > +viewing the SAS domain:
> > +ftp://mama.indstate.edu/linux/tree/
> > +I expect user space applications to actually create a
> > +graphical interface of this.
> > +
> > +That is, the sysfs domain tree doesn't show or keep state if
> > +you e.g., change the meaning of the READY LED MEANING
> > +setting, but it does show you the current connection status
> > +of the domain device.
>
> So in that case, user applications should ignore READY
> LED MEANING in sysfs and ask the device directly.
> For example:
> sdparm --get RLM --transport sas /dev/sda
>
> > +Keeping internal device state changes is responsibility of
> > +upper layers (Command set drivers) and user space.
>
> ... and what about multiple initiators sitting on different
> machines? Should they be responsible for:
> 1) finding out about one another
> 2) and keeping the sysfs tree in the other machine
> in sync when one changes READY LED MEANING
> (or anything else)?
>
> Putting distributed state information in sysfs and then
> passing off the responsibility for maintaining its state
> (because it is a difficult problem) brings into question
> the wisdom of the strategy.

If you looks at what the other transport classes do is that they put
information at discovery time into sysfs, but try to refresh it on
every access. IMHO that makes a lot of sense, and should be done
that way in the final SAS transport class.

2005-09-13 13:20:15

by Luben Tuikov

[permalink] [raw]

Subject: Re: [PATCH 2.6.13 2/14] sas-class: README

On 09/13/05 05:23, Douglas Gilbert wrote:
> Luben Tuikov wrote:
>>The questiong is _what_ to do on this event. This is a complex
>>answer and I'd rather have a _SES_ layer (or at least a logical
>>module/library) to handle those as storage vendors want this,
>>_right now_.
>
>
> Simple answer: generate a hotplug event and let a
> user application that cares worry about it. No
> need for a SES layer in the kernel.

Well, that sounds ok, but it maybe the case that the SES
device wants to say something about the SAS devices
on the same level. So even if userspace gets it, it would
have nothing to do with it, because of the _type_ of SES
device/event/etc.
(User space can be notified anyway, which is perfectly fine).

>>In fact, I've some patches to submit regarding SES devices
>>on the domain, but I wanted to _trim *down*_ the politics,
>>_not_ escalate them.
>
>
> Oh no, not a sysfs representation of SES abstractions :-)

No, not that.

Luben

2005-09-13 13:31:07

by Luben Tuikov

[permalink] [raw]

Subject: Re: [PATCH 2.6.13 2/14] sas-class: README

On 09/13/05 06:13, Douglas Gilbert wrote:
>
> That is a relief, retired at last.

C'mon, you and I both know that sg has its place,
and is perfect doing what it is doing now.

But for protocol _interjection_, the mecanism
posted is easiest and sanest.

> It is impressive how elegant a passthrough can be when

I cannot call it a "passthrough" since the SMP frame isn't
"passing though" (by passing) anything. When userspace
does a read(2) to get the data they expect, the SMP
frame they wrote(2) is sent to the SDS immediately.
In effect there is no "passing through".

It is a _protocol_ interjection.

That is an SMP frame (submission) _instantiates_
at that layer/level, not lower, not higher.

> one dispenses with a bit of metadata such as per command
> timeouts and 3 levels of error messages (i.e. from the
> driver, from the link layer and from the SMP target).
> I always enjoy getting EIO in errno.
>
> This all seems so frustrating; a LLD injects a
> command/frame/whatever into an initiator and waits for
> a response or something to happen. Networking code faces
> the same scenario and presents it "as is" to the user
> space (for any application that cares). Yes, there are
> messy details. However in the SCSI subsystem we want to
> hide this simple reality with all these wierd and
> wonderful abstractions.

Doug, SMP is what it is, and this is where it _instantiates_.
You have to agree that interjecting an SMP frame at any other
level would be _more_ messy.

Plus, I always like presenting things "as is" to userspace
or higher layer, to keep the current layer cleaner and concerned
only with things belonging to it.

Luben

2005-09-13 14:29:40

by Luben Tuikov

[permalink] [raw]

Subject: Re: [PATCH 2.6.13 2/14] sas-class: README

On 09/13/05 09:30, Luben Tuikov wrote:
> I cannot call it a "passthrough" since the SMP frame isn't
> "passing though" (by passing) anything. When userspace
> does a read(2) to get the data they expect, the SMP
> frame they wrote(2) is sent to the SDS immediately.
> In effect there is no "passing through".
>
> It is a _protocol_ interjection.
>
> That is an SMP frame (submission) _instantiates_
> at that layer/level, not lower, not higher.
>
>
>>one dispenses with a bit of metadata such as per command
>>timeouts and 3 levels of error messages (i.e. from the

I forgot to mention -- SMP transport has a hardware timer
as well as software one. read(2) will never hang.

If there's no one on the other end, we get an error,
and read(2) less (or none) information than we requested.

Luben