Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S964983AbbDXOol (ORCPT ); Fri, 24 Apr 2015 10:44:41 -0400 Received: from mail-db3on0067.outbound.protection.outlook.com ([157.55.234.67]:49440 "EHLO emea01-db3-obe.outbound.protection.outlook.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1754034AbbDXOod (ORCPT ); Fri, 24 Apr 2015 10:44:33 -0400 From: Liran Liss To: Michael Wang , Roland Dreier , Sean Hefty , Hal Rosenstock , "linux-rdma@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "hal@dev.mellanox.co.il" CC: Tom Tucker , Steve Wise , Hoang-Nam Nguyen , "raisch@de.ibm.com" , Mike Marciniszyn , Eli Cohen , Faisal Latif , Jack Morgenstein , "Or Gerlitz" , Haggai Eran , "Ira Weiny" , Tom Talpey , Jason Gunthorpe , Doug Ledford Subject: RE: [PATCH v5 00/27] IB/Verbs: IB Management Helpers Thread-Topic: [PATCH v5 00/27] IB/Verbs: IB Management Helpers Thread-Index: AQHQe0QWgYjx3AmeeEu2Ae1BasUTY51YB1mAgACvMwCAACpLcIABWB0AgAIHe7A= Date: Fri, 24 Apr 2015 14:44:29 +0000 Message-ID: References: <5534B8C9.506@profitbricks.com> <55375C10.8070901@profitbricks.com> <5538A034.4030904@profitbricks.com> In-Reply-To: <5538A034.4030904@profitbricks.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [85.250.13.154] authentication-results: profitbricks.com; dkim=none (message not signed) header.d=none; x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(42134001)(42139001);SRVR:DB4PR05MB448; x-forefront-antispam-report: BMV:1;SFV:NSPM;SFS:(10009020)(6009001)(51914003)(51704005)(501624003)(106116001)(40100003)(77156002)(62966003)(87936001)(54356999)(86362001)(2201001)(122556002)(76176999)(102836002)(2656002)(46102003)(2900100001)(50986999)(33656002)(93886004)(2501003)(5001770100001)(19580405001)(2950100001)(66066001)(15975445007)(19580395003)(74316001)(92566002)(76576001)(4001450100001)(422495003);DIR:OUT;SFP:1101;SCL:1;SRVR:DB4PR05MB448;H:DB4PR05MB0863.eurprd05.prod.outlook.com;FPR:;SPF:None;MLV:sfv;LANG:en; x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(601004)(5002010)(5005006)(3002001);SRVR:DB4PR05MB448;BCL:0;PCL:0;RULEID:;SRVR:DB4PR05MB448; x-forefront-prvs: 05568D1FF7 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 X-OriginatorOrg: Mellanox.com X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Apr 2015 14:44:29.7001 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: a652971c-7d2e-4d9b-a6a4-d149256f461b X-MS-Exchange-Transport-CrossTenantHeadersStamped: DB4PR05MB448 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from base64 to 8bit by nfs id t3OEiopH001346 Content-Length: 17940 Lines: 470 > From: Michael Wang [mailto:yun.wang@profitbricks.com] > [snip] > > > > Depends on who is "we". > > For ULPs, you are probably right. > > > > However, core services (e.g., mad management, CM, SA) do care about > various details. > > In some cases, where it doesn't matter, this code will use management > helpers. > > In other cases, this code will inspect link, transport, and node attributes of > rdma devices. > > > > For example, the CM code has specific code paths for IB, RoCE, and iWARP. > > There is no other CM code; there is no reason to abstract 'CM'. This > > code will have code paths that depend on various specific details. > > That's exactly what we want to stop, we have classified the CM to IB and > IWARP now :-) > We don't want to stop code branches that are not abstractions but rather depend on the specific technology! There is no generic "iWARP CM" - only one. There is no generic "ROCE CM" - only one. There is no generic "IB CM" - only one. At the CM high-level (i.e., whether an ib_dev port registers an IB client), you could consider an rdma_has_cm() call, but this the only place in the code that this check will be called! Hence, no need for a generic check. You want to stop abstract code that uses IB core infrastructure. > > > >> This new transport is only understand by core-layer currently, for > >> user-layer we still reserve the old transport for them, next step is > >> to use bitmask instead of transport, at that time we can erase the > >> new transport and make the whole stuff used by user-layer only :-) > >> > > > > I am not sure that we need a bit mask at all. > > Your helpers already provide all the useful abstractions, which both core > and ULPs call directly. > > All the information is inferred directly from tuples. > > > > Some of the user-space tools need *exactly* the same reasoning. > > For example, management tools manage specific technologies and > protocols, not some abstraction. > > > > So, For user-space, we can think about exposing exactly the same > > helper framework, while providing backward compatibility for the existing > interfaces. > > I'd really like to put the topic on bitmask and user app reform into different > thread... > > bitmask should be next topic, there are many discussion already, but I could > imaging far more discussion there, the user reform should be the last step, > after every thing in kernel settled down :-) > OK > > > >>> > >>> > >>> Detailed remarks > >>> ============== > >>> > >>> 1) The introduction of cap_*_*() stuff should have been introduced > >>> directly > >> in patch 02/27. > >>> This back-and-forth between rdma_ib_or_iboe() and cap_* is confusing > >>> and > >> increases the number of patches in the patch-set. > >>> Do this and remove patches 16-24. > >> > >> We have some discussion about compress the patch set, merge the > >> reform and introducing patch will mix the concept (like the earlier > >> version), IMHO it will increase the difficulty of review... > >> > >> And now since many review already been done, it's not wise to change > >> the whole structure of patch set IMHO... > >> > > > > I think it is because you are conditioning code on one thing, and then > > conditioning the same code on another thing. > > > > This is confusing. > > > > Once we get our abstractions correct (i.e., the right helper > > functions), you replace the existing logic with the suitable helper up-front. > > We need to classify and integrate the concept into mgmt helper, that would > be very helpful for further reform, reform followed by integration sounds not > that bad, correct? > The problem is that it is hard to follow the reasoning for the first use consumer code with the in-complete helper frame-work. > > > >>> > >>> 2)The name rdma_tech_* is lame. > >>> rdma_transport_*(), adhering to the above (*) remark, is much better. > >>> For example, both IB and ROCE *do* use the same transport. > >> > >> We have some discussion on that too, use transport means going back... > >> > > > > No. > > The existing notion of transport was correct. It was the node type that > wasn't. > > And in any case the new helpers didn't use it. > > > > We need the original meaning of transport - see my response to Ira. > > I propose replacing rdma_node_get_transport() with the following helpers: > > - rdma_get_transport() > > - rdma_is_ib_transport() > > - rdma_is_iwarp_transport() > > We can change the name at anytime, tech/transport/protocol/standard, just > one patch later can easily change it and start the topic of naming, any of > these name will unsatisfied someone AFAIK, I'd like to suggest we consider > this as a mark temporarily and focus on the logical issue. Sure. The logical issue is: 1. We need the existing notion of transport, meaning "a bunch of L4+headers + semantics presented to apps". 2. We might need an *additional* notion of "rdma_protocol", which designates a complete wire-format: L2-L4+ including. This could be later a bitmask, a management helper, whatever. Currently, I don't see anything in the existing code that would call such helpers. > > > - ... > > > >>> > >>> 3) The name cap_* as it is used above is not accurate. > >>> You use it to describe technology characteristics rather than > >>> extendable > >> capabilities. > >>> I would suggest having a single convention for all helpers, such as > >> rdma_has_*() and rdma_is_*(). > >>> For example: cap_ib_smi() ==> rdma_has_smi(). > >> > >> That means going back too... > > > > See response to Ira (https://lkml.org/lkml/2015/4/21/951). > > > > > >> > >>> > >>> 4) Remove all capabilities that do not introduce any distinction in > >>> the > >> current code. > >>> We can add them as needed later. > >>> This means remove patches: > >>> - [PATCH v5 22/27] IB/Verbs: Use management helper cap_ipoib() – all > >>> IB devices support ipoib > >>> - [PATCH v5 24/27] IB/Verbs: Use management helper cap_af_ib() – all > >>> IB > >> devices support AF_IB. > >>> > >>> On the other hand: > >>> - rdma_has_multicast() makes sense, since iWARP doesn’t support it. > >>> - cap_ib_sa() might make sense to cut code even further in the CMA, > >>> since > >> RoCE has a GSI but no SA. > >> > >> We have discussion on define these helpers previously, again, name is > >> not really a problem, I would rather to see such changes in the > >> following series after this one working stably :-) > >> > > > > The names are not critical. This comment is about introducing helpers > > that are do not introduce any new semantic notion in the current patch-set. > > > > cap_ipoib(), for example, is brain-dead because only a single > > technology (as of now) enables it: Infiniband. > > This will be dropped in next version :-) > > > > >>> > >>> 5) Do no modify phys_state_show() in [PATCH v5 09/27] IB/Verbs: > >>> Reform IB-core verbs/uverbs_cmd/sysfs It *is* the link layer! > >> > >> Actually nothing changed after the modify, the prev purpose it to > >> eliminate the link layer helpers. > >> > >> But now we are not going to remove the helper any more, so let's drop > >> this modification in next version :-) > >> > > > > You don't add modifications just to drop them later. > > Don't add them in the first place! > > > > This patch-set will remain forever in the kernel commit log - we want > > it to be as self-explaining and coherent as possible. > > > > Remove this. > > What i mean is this will be removed in v6... > > > > >>> > >>> 6) Remove cap_read_multi_sge > >>> It is not device/port feature, but a transport capability. > >>> Use rdma_is_iwarp_transport() instead, or introduce a new transport > >>> flag in > >> 'enum ib_device_cap_flags'. > >>> > >>> 7) Remove [PATCH v5 25/27] IB/Verbs: Use management helper > >> cap_eth_ah(). > >>> Address handles that refer to Ethernet links always have Ethernet > >> addressing. > >>> > >>> In the CMA code, using rdma_tech_iboe() is just fine. This is how > >>> you define > >> cap_eth_ah() anyway. > >>> Currently, this patch just adds clutter. > >> > >> There are also some discussion on these helpers, drop them means > >> going back.. > >> > > > > Back to where? Management helpers are a new concept. Let's get them > right. > > Back to one point during v1~v5. > > > > >> The tech helper is not enough to explain the management purpose, and > >> this can be the wrapper for bitmask stuff too. > >> > > > > As I said, I am not sure that we will need any bitmasks. > > Also see response to Ira (https://lkml.org/lkml/2015/4/21/951). > > Better discussed in another thread. > > > > >>> > >>> 8) Remove patch [PATCH v5 26/27] IB/Verbs: Clean up rdma_ib_or_iboe(). > >>> We do need a transport qualifier, as exemplified in comment 5) > >>> above, and > >> for a complete clean model. > >>> This is after renaming the function to rdma_is_ib_transport()... > >> > >> This means going back again... rdma_is_ib_transport() has been used > >> previously. > >> > >> This helper is just to make the review more easier, we won't need it > >> internally, not to mention after bitmask was introduced :-) > >> > > > > The same... > > > >>> > >>> > >>> Putting it all together > >>> ================== > >>> > >>> We are left with the following helpers: > >>> - rdma_is_ib_transport() > >>> - rdma_is_iwarp_transport() > >>> - rdma_is_usnic_transport() > >>> - rdma_is_iboe() > >>> - rdma_has_mad() > >>> - rdma_has_smi() > >>> - rdma_has_gsi() - complements smi; can be used by the mad code for > >>> clarity > >>> - rdma_has_sa() > >>> - rdma_has_cm() > >>> - rdma_has_mcast() > >> > >> I think we can put the discussion on name and new helpers in future, > >> currently let's focus on these basic reform and make them working > >> stably ;-) > > > > It's not just the names, it's their semantics. > > Any problems with the names proposed above? > > These were once used in old version, again, name can't satisfied anyone at > this moment and I'd like to discuss this after the logical was right, I really > don't want folks to focus on this issue since it won't broken anything and can > be easily changed once we have the agreement. > > Regards, > Michael Wang > OK > > > >> > >> Regards, > >> Michael Wang > >> > >>> > >>> > >>>> Subject: [PATCH v5 00/27] IB/Verbs: IB Management Helpers > >>>> > >>>> > >>>> Since v4: > >>>> * Thanks for the comments from Hal, Sean, Tom, Or Gerlitz, Jason, > >>>> Roland, Ira and Steve :-) Please remind me if anything missed :-P > >>>> * Fix logical issue inside 3#, 14# > >>>> * Refine 3#, 4#, 5# with label 'free' > >>>> * Rework 10# to stop using port 1 when port already assigned > >>>> > >>>> There are plenty of lengthy code to check the transport type of IB > >>>> device, or the link layer type of it's port, but actually we are > >>>> just speculating whether a particular management/feature is > >>>> supported by the > >> device/port. > >>>> > >>>> Thus instead of inferring, we should have our own mechanism for IB > >>>> management capability/protocol/feature checking, several proposals > >> below. > >>>> > >>>> This patch set will reform the method of getting transport type, we > >>>> will now using query_transport() instead of inferring from > >>>> transport and link layer respectively, also we defined the new > >>>> transport type to make the concept more reasonable. > >>>> > >>>> Mapping List: > >>>> node-type link-layer old-transport new-transport > >>>> nes RNIC ETH IWARP > IWARP > >>>> amso1100 RNIC ETH IWARP IWARP > >>>> cxgb3 RNIC ETH IWARP IWARP > >>>> cxgb4 RNIC ETH IWARP IWARP > >>>> usnic USNIC_UDP ETH USNIC_UDP USNIC_UDP > >>>> ocrdma IB_CA ETH IB IBOE > >>>> mlx4 IB_CA IB/ETH IB IB/IBOE > >>>> mlx5 IB_CA IB IB IB > >>>> ehca IB_CA IB IB IB > >>>> ipath IB_CA IB IB IB > >>>> mthca IB_CA IB IB IB > >>>> qib IB_CA IB IB IB > >>>> > >>>> For example: > >>>> if (transport == IB) && (link-layer == ETH) will now become: > >>>> if (query_transport() == IBOE) > >>>> > >>>> Thus we will be able to get rid of the respective transport and > >>>> link-layer checking, and it will help us to add new > >>>> protocol/Technology (like OPA) more easier, also with the > >>>> introduced management helpers, IB management logical will be more > >>>> clear and easier > >> for extending. > >>>> > >>>> Highlights: > >>>> The patch set covered a wide range of IB stuff, thus for those who are > >>>> familiar with the particular part, your suggestion would be > >>>> invaluable ;-) > >>>> > >>>> Patch 1#~15# included all the logical reform, 16#~25# introduced the > >>>> management helpers, 26#~27# do clean up. > >>>> > >>>> Patches haven't been tested yet, we appreciate if any one who > >>>> have > >> these > >>>> HW willing to provide his Tested-by :-) > >>>> > >>>> Doug suggested the bitmask mechanism: > >>>> https://www.mail-archive.com/linux- > >>>> rdma@vger.kernel.org/msg23765.html > >>>> which could be the plan for future reforming, we prefer that to > >>>> be > >> another > >>>> series which focus on semantic and performance. > >>>> > >>>> This patch-set is somewhat 'bloated' now and it may be a good > >>>> timing > >> for > >>>> staging, I'd like to suggest we focus on improving existed > >>>> helpers and > >> push > >>>> all the further reforms into next series ;-) > >>>> > >>>> Proposals: > >>>> Sean: > >>>> https://www.mail-archive.com/linux- > >>>> rdma@vger.kernel.org/msg23339.html > >>>> Doug: > >>>> https://www.mail-archive.com/linux- > >>>> rdma@vger.kernel.org/msg23418.html > >>>> https://www.mail-archive.com/linux- > >>>> rdma@vger.kernel.org/msg23765.html > >>>> Jason: > >>>> https://www.mail-archive.com/linux- > >>>> rdma@vger.kernel.org/msg23425.html > >>>> > >>>> Michael Wang (27): > >>>> IB/Verbs: Implement new callback query_transport() > >>>> IB/Verbs: Implement raw management helpers > >>>> IB/Verbs: Reform IB-core mad/agent/user_mad > >>>> IB/Verbs: Reform IB-core cm > >>>> IB/Verbs: Reform IB-core sa_query > >>>> IB/Verbs: Reform IB-core multicast > >>>> IB/Verbs: Reform IB-ulp ipoib > >>>> IB/Verbs: Reform IB-ulp xprtrdma > >>>> IB/Verbs: Reform IB-core verbs/uverbs_cmd/sysfs > >>>> IB/Verbs: Reform cm related part in IB-core cma/ucm > >>>> IB/Verbs: Reform route related part in IB-core cma > >>>> IB/Verbs: Reform mcast related part in IB-core cma > >>>> IB/Verbs: Reserve legacy transport type in 'dev_addr' > >>>> IB/Verbs: Reform cma_acquire_dev() > >>>> IB/Verbs: Reform rest part in IB-core cma > >>>> IB/Verbs: Use management helper cap_ib_mad() > >>>> IB/Verbs: Use management helper cap_ib_smi() > >>>> IB/Verbs: Use management helper cap_ib_cm() > >>>> IB/Verbs: Use management helper cap_iw_cm() > >>>> IB/Verbs: Use management helper cap_ib_sa() > >>>> IB/Verbs: Use management helper cap_ib_mcast() > >>>> IB/Verbs: Use management helper cap_ipoib() > >>>> IB/Verbs: Use management helper cap_read_multi_sge() > >>>> IB/Verbs: Use management helper cap_af_ib() > >>>> IB/Verbs: Use management helper cap_eth_ah() > >>>> IB/Verbs: Clean up rdma_ib_or_iboe() > >>>> IB/Verbs: Cleanup rdma_node_get_transport() > >>>> > >>>> --- > >>>> drivers/infiniband/core/agent.c | 4 > >>>> drivers/infiniband/core/cm.c | 26 +- > >>>> drivers/infiniband/core/cma.c | 328 ++++++++++++--------------- > >>>> drivers/infiniband/core/device.c | 1 > >>>> drivers/infiniband/core/mad.c | 51 ++-- > >>>> drivers/infiniband/core/multicast.c | 18 - > >>>> drivers/infiniband/core/sa_query.c | 41 +-- > >>>> drivers/infiniband/core/sysfs.c | 8 > >>>> drivers/infiniband/core/ucm.c | 5 > >>>> drivers/infiniband/core/ucma.c | 27 -- > >>>> drivers/infiniband/core/user_mad.c | 32 +- > >>>> drivers/infiniband/core/uverbs_cmd.c | 6 > >>>> drivers/infiniband/core/verbs.c | 33 -- > >>>> drivers/infiniband/hw/amso1100/c2_provider.c | 7 > >>>> drivers/infiniband/hw/cxgb3/iwch_provider.c | 7 > >>>> drivers/infiniband/hw/cxgb4/provider.c | 7 > >>>> drivers/infiniband/hw/ehca/ehca_hca.c | 6 > >>>> drivers/infiniband/hw/ehca/ehca_iverbs.h | 3 > >>>> drivers/infiniband/hw/ehca/ehca_main.c | 1 > >>>> drivers/infiniband/hw/ipath/ipath_verbs.c | 7 > >>>> drivers/infiniband/hw/mlx4/main.c | 10 > >>>> drivers/infiniband/hw/mlx5/main.c | 7 > >>>> drivers/infiniband/hw/mthca/mthca_provider.c | 7 > >>>> drivers/infiniband/hw/nes/nes_verbs.c | 6 > >>>> drivers/infiniband/hw/ocrdma/ocrdma_main.c | 1 > >>>> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 6 > >>>> drivers/infiniband/hw/ocrdma/ocrdma_verbs.h | 3 > >>>> drivers/infiniband/hw/qib/qib_verbs.c | 7 > >>>> drivers/infiniband/hw/usnic/usnic_ib_main.c | 1 > >>>> drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 6 > >>>> drivers/infiniband/hw/usnic/usnic_ib_verbs.h | 2 > >>>> drivers/infiniband/ulp/ipoib/ipoib_main.c | 17 - > >>>> include/rdma/ib_verbs.h | 204 +++++++++++++++- > >>>> net/sunrpc/xprtrdma/svc_rdma_recvfrom.c | 6 > >>>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 51 +--- > >>>> 35 files changed, 584 insertions(+), 368 deletions(-) > >>>> -- > >>>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" > >>>> in the body of a message to majordomo@vger.kernel.org More > >>>> majordomo info at http://vger.kernel.org/majordomo-info.html ????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m???? ????????I?