Usr Include Infiniband Verbs Homework

DescriptionAmountExplanation

User memory By default Open MPI will register as much user memory as necessary (upon demand). However, if is greater than zero, it is the upper limit (in bytes) of user memory that will be registered. User memory is registered for ongoing MPI communications (e.g., long message sends and receives) and via the MPI_ALLOC_MEM function.

Note that this MCA parameter was introduced in v1.2.1.

Internal eager fragment buffers 2 x x ( + overhead) A "free list" of buffers used in the BTL for "eager" fragments (e.g., the first fragment of a long message). Two free lists are created; one for sends and one for receives.

By default, is -1, and the list size is unbounded, meaning that Open MPI will try to allocate as many registered buffers as it needs. If is greater than 0, the list will be limited to this size. Each entry in the list is approximately bytes -- some additional overhead space is required for alignment and internal accounting. is the maximum size of an eager fragment.

Internal send/receive buffers 2 x x ( + overhead) A "free list" of buffers used for send/receive communication in the BTL. Two free lists are created; one for sends and one for receives.

By default, is -1, and the list size is unbounded, meaning that Open MPI will allocate as many registered buffers as it needs. If is greater than 0, the list will be limited to this size. Each entry in the list is approximately bytes -- some additional overhead space is required for alignment and internal accounting. is the maximum size of a send/receive fragment.

Internal "eager" RDMA buffers x x ( + overhead) If is true, RDMA buffers are used for eager fragments (because RDMA semantics can be faster than send/receive semantics in some cases), and an additional set of registered buffers are created (as needed).

Each MPI process will use RDMA buffers for eager fragments up to MPI peers. Upon receiving the 'th message from an MPI peer process, if both sides have not yet setup sets of eager RDMA buffers, a new set will be created. The set will contain buffers; each buffer will be bytes (i.e., the maximum size of an eager fragment).

General

libibverbs is an implementation of the RDMA verbs for both Infiniband (according to the Infiniband specifications) and iWarp (iWARP verbs specifications). It handles the control path of creating, modifying, querying and destroying resources such as Protection Domains (PD), Completion Queues (CQ), Queue-Pairs (QP), Shared Receive Queues (SRQ), Address Handles (AH), Memory Regions (MR). It also handles sending and receiving data posted to QPs and SRQs, getting completions from CQs using polling and completions events.

The control path is implemented through system calls to the uverbs kernel module which further calls the low-level HW driver. The data path is implemented through calls made to low-level HW library which, in most cases, interacts directly with the HW provides kernel and network stack bypass (saving context/mode switches) along with zero copy and an asynchronous I/O model.

Typically, under network and RDMA programming, there are operations which involve interaction with remote peers (such as address resolution and connection establishment) and remote entities (such as route resolution and joining a multicast group under IB), where a resource managed through IB verbs such as QP or AH would be eventually created or effected from this interaction. In such cases, applications whose addressing semantics are based on IP can use librdmacm which works in conjunction with libibverbs.

Thread safe

This library is a thread safe library and verbs can be called from every thread in the process. The same resource can even be handled from different threads (the atomicity of the operations is guaranteed). However, it is up to the user to stop working with a resource after it was destroyed (by the same thread or by any other thread), not doing so may result a segmentation fault.

Fork safe

As a general rule of thumb, fork() should be avoided when using libibvebrs, either by calling it explicitly or by calling it implicitly (by calling other system calls that call it, such as system(), popen(), etc.).

However, if one must use fork() please read the documentation of ibv_fork_init().

Library API

The functions in the library shall be declared as functions and some of them may be declared as macros.

In order to use libibvebrs, the following line must be included in the source code:

#include <infiniband/verbs.h>

#include <infiniband/verbs.h>

Library functions

int ibv_fork_init(void);

int ibv_fork_init(void);

 

Device functions

struct ibv_device **ibv_get_device_list(int*num_devices);   void ibv_free_device_list(struct ibv_device **list);   constchar*ibv_get_device_name(struct ibv_device *device);   uint64_t ibv_get_device_guid(struct ibv_device *device);

struct ibv_device **ibv_get_device_list(int *num_devices); void ibv_free_device_list(struct ibv_device **list); const char *ibv_get_device_name(struct ibv_device *device); uint64_t ibv_get_device_guid(struct ibv_device *device);

 

Context functions

struct ibv_context *ibv_open_device(struct ibv_device *device);   int ibv_close_device(struct ibv_context *context);

struct ibv_context *ibv_open_device(struct ibv_device *device); int ibv_close_device(struct ibv_context *context);

 

Queries

int ibv_query_device(struct ibv_context *context,struct ibv_device_attr *device_attr);   int ibv_query_port(struct ibv_context *context,uint8_t port_num,struct ibv_port_attr *port_attr);   int ibv_query_pkey(struct ibv_context *context,uint8_t port_num,int index,uint16_t*pkey);   int ibv_query_gid(struct ibv_context *context,uint8_t port_num,int index,union ibv_gid *gid);

int ibv_query_device(struct ibv_context *context, struct ibv_device_attr *device_attr); int ibv_query_port(struct ibv_context *context, uint8_t port_num, struct ibv_port_attr *port_attr); int ibv_query_pkey(struct ibv_context *context, uint8_t port_num, int index, uint16_t *pkey); int ibv_query_gid(struct ibv_context *context, uint8_t port_num, int index, union ibv_gid *gid);

 

Asynchronous events

int ibv_get_async_event(struct ibv_context *context,struct ibv_async_event *event);   void ibv_ack_async_event(struct ibv_async_event *event);

int ibv_get_async_event(struct ibv_context *context, struct ibv_async_event *event); void ibv_ack_async_event(struct ibv_async_event *event);

 

Protection Domains

struct ibv_pd *ibv_alloc_pd(struct ibv_context *context);   int ibv_dealloc_pd(struct ibv_pd *pd);

struct ibv_pd *ibv_alloc_pd(struct ibv_context *context); int ibv_dealloc_pd(struct ibv_pd *pd);

 

Memory Regions

struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd,void*addr,size_t length,enum ibv_access_flags access);   int ibv_dereg_mr(struct ibv_mr *mr);

struct ibv_mr *ibv_reg_mr(struct ibv_pd *pd, void *addr, size_t length, enum ibv_access_flags access); int ibv_dereg_mr(struct ibv_mr *mr);

 

Address Handles

struct ibv_ah *ibv_create_ah(struct ibv_pd *pd,struct ibv_ah_attr *attr);   int ibv_init_ah_from_wc(struct ibv_context *context,uint8_t port_num,struct ibv_wc *wc,struct ibv_grh *grh,struct ibv_ah_attr *ah_attr);   struct ibv_ah *ibv_create_ah_from_wc(struct ibv_pd *pd,struct ibv_wc *wc,struct ibv_grh *grh,uint8_t port_num);   int ibv_destroy_ah(struct ibv_ah *ah);

struct ibv_ah *ibv_create_ah(struct ibv_pd *pd, struct ibv_ah_attr *attr); int ibv_init_ah_from_wc(struct ibv_context *context, uint8_t port_num, struct ibv_wc *wc, struct ibv_grh *grh, struct ibv_ah_attr *ah_attr); struct ibv_ah *ibv_create_ah_from_wc(struct ibv_pd *pd, struct ibv_wc *wc, struct ibv_grh *grh, uint8_t port_num); int ibv_destroy_ah(struct ibv_ah *ah);

 

Completion event channels

struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context *context);   int ibv_destroy_comp_channel(struct ibv_comp_channel *channel);

struct ibv_comp_channel *ibv_create_comp_channel(struct ibv_context *context); int ibv_destroy_comp_channel(struct ibv_comp_channel *channel);

 

Completion Queues control

struct ibv_cq *ibv_create_cq(struct ibv_context *context,int cqe,void*cq_context,struct ibv_comp_channel *channel,int comp_vector);   int ibv_destroy_cq(struct ibv_cq *cq);   int ibv_resize_cq(struct ibv_cq *cq,int cqe);

struct ibv_cq *ibv_create_cq(struct ibv_context *context, int cqe, void *cq_context, struct ibv_comp_channel *channel, int comp_vector); int ibv_destroy_cq(struct ibv_cq *cq); int ibv_resize_cq(struct ibv_cq *cq, int cqe);

 

Shared Receive Queue control

struct ibv_srq *ibv_create_srq(struct ibv_pd *pd,struct ibv_srq_init_attr *srq_init_attr);   int ibv_destroy_srq(struct ibv_srq *srq);   int ibv_modify_srq(struct ibv_srq *srq,struct ibv_srq_attr *srq_attr,enum ibv_srq_attr_mask srq_attr_mask);   int ibv_query_srq(struct ibv_srq *srq,struct ibv_srq_attr *srq_attr);

struct ibv_srq *ibv_create_srq(struct ibv_pd *pd, struct ibv_srq_init_attr *srq_init_attr); int ibv_destroy_srq(struct ibv_srq *srq); int ibv_modify_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr, enum ibv_srq_attr_mask srq_attr_mask); int ibv_query_srq(struct ibv_srq *srq, struct ibv_srq_attr *srq_attr);

 

Queue Pair control

struct ibv_qp *ibv_create_qp(struct ibv_pd *pd,struct ibv_qp_init_attr *qp_init_attr);   int ibv_destroy_qp(struct ibv_qp *qp);   int ibv_modify_qp(struct ibv_qp *qp,struct ibv_qp_attr *attr,enum ibv_qp_attr_mask attr_mask);   int ibv_query_qp(struct ibv_qp *qp,struct ibv_qp_attr *attr,enum ibv_qp_attr_mask attr_mask,struct ibv_qp_init_attr *init_attr);

struct ibv_qp *ibv_create_qp(struct ibv_pd *pd, struct ibv_qp_init_attr *qp_init_attr); int ibv_destroy_qp(struct ibv_qp *qp); int ibv_modify_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask); int ibv_query_qp(struct ibv_qp *qp, struct ibv_qp_attr *attr, enum ibv_qp_attr_mask attr_mask, struct ibv_qp_init_attr *init_attr);

 

Posting Work Requests to QPs/SRQs

int ibv_post_send(struct ibv_qp *qp,struct ibv_send_wr *wr,struct ibv_send_wr **bad_wr);   int ibv_post_recv(struct ibv_qp *qp,struct ibv_recv_wr *wr,struct ibv_recv_wr **bad_wr);   int ibv_post_srq_recv(struct ibv_srq *srq,struct ibv_recv_wr *recv_wr,struct ibv_recv_wr **bad_recv_wr);

int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr, struct ibv_send_wr **bad_wr); int ibv_post_recv(struct ibv_qp *qp, struct ibv_recv_wr *wr, struct ibv_recv_wr **bad_wr); int ibv_post_srq_recv(struct ibv_srq *srq, struct ibv_recv_wr *recv_wr, struct ibv_recv_wr **bad_recv_wr);

 

Reading Completions from CQ

int ibv_poll_cq(struct ibv_cq *cq,int num_entries,struct ibv_wc *wc);

int ibv_poll_cq(struct ibv_cq *cq, int num_entries, struct ibv_wc *wc);

 

Requesting / Managing CQ events

int ibv_req_notify_cq(struct ibv_cq *cq,int solicited_only);   int ibv_get_cq_event(struct ibv_comp_channel *channel,struct ibv_cq **cq,void**cq_context);   void ibv_ack_cq_events(struct ibv_cq *cq,unsignedint nevents);

int ibv_req_notify_cq(struct ibv_cq *cq, int solicited_only); int ibv_get_cq_event(struct ibv_comp_channel *channel, struct ibv_cq **cq, void **cq_context); void ibv_ack_cq_events(struct ibv_cq *cq, unsigned int nevents);

 

Multicast group

int ibv_attach_mcast(struct ibv_qp *qp,union ibv_gid *gid,uint16_t lid);   int ibv_detach_mcast(struct ibv_qp *qp,union ibv_gid *gid,uint16_t lid);

int ibv_attach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid); int ibv_detach_mcast(struct ibv_qp *qp, union ibv_gid *gid, uint16_t lid);

 

General functions

int ibv_rate_to_mult(enum ibv_rate rate);   enum ibv_rate mult_to_ibv_rate(int mult);   constchar*ibv_node_type_str(enum ibv_node_type node_type);   constchar*ibv_port_state_str(enum ibv_port_state port_state);   constchar*ibv_event_type_str(enum ibv_event_type event);   constchar*ibv_wc_status_str(enum ibv_wc_status status);

int ibv_rate_to_mult(enum ibv_rate rate); enum ibv_rate mult_to_ibv_rate(int mult); const char *ibv_node_type_str(enum ibv_node_type node_type); const char *ibv_port_state_str(enum ibv_port_state port_state); const char *ibv_event_type_str(enum ibv_event_type event); const char *ibv_wc_status_str(enum ibv_wc_status status);

Resource creation dependency

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *