• hero-banner

Syllabus

Day 1: High Performance Networking Principles
  • Trends in computer architecture and network-based computing
  • CPUs, GPUs, Accelerators, FPGAs
  • Networking technologies and trends
  • Ethernet and RoCE
  • InfiniBand
  • Proprietary networks
  • OSI model and typical SW/HW implementations of various layers of the model
  • Typical BW and latency of TCP vs. Offloaded transports (RDMA)
  • Number of cycles per packet at 100Gb/s speeds and 64b packets
  • Breakdown of CPU utilization
  • Cost of copy
  • Cost of kernel vs. user
  • OS bypass fundamentals
  • HW Context per app: Isolation, scalability
  • Transport offload
  • Memory translation
  • Sync. POSIX copy semantics vs. Async. 0-copy
  • Implications for memory management
  • Implications for memory overcommit
  • The Verbs channel provider model
  • QPs and WQEs
  • CQs and CQEs
  • Memory registration
  • Shared receive queues
  • Arming and signaling
  • Communication semantics
  • Channel Interface
  • Send / Receive
  • RDMA
  • Atomics
  • EXERCISE 1: COMPARATIVE ANALYSIS BETWEEN INFINIBAND AND IP
  • Using TCP socket interface to write a p2p benchmark application
  • Using IB verbs to write a p2p benchmark application
  • Measure performance (ops/sec), compare different aspects
  • Day 2: High Performance Networking Software Design, Applications & Scalability
  • Design patterns
  • QPs and WQEs
  • Eager vs. rendezvous
  • Read-mostly transactions
  • Initiator-target Execution semantics
  • Reactor vs. pro-actor
  • Thread-less libraries and progress routines
  • Task-based scheduling
  • Registration techniques
  • Buffer pool
  • Reg. cache
  • On Demand Paging
  • Polling vs. interrupts
  • Optimizations and heuristics
  • Offloading technologies (Part I)
  • Core-direct (Offloading to NIC)
  • Peer-direct / GPU-direct
  • Congestion & Flow Control
  • Deadline-aware TCP
  • Multipath
  • Credit-based Flow Control
  • App scaling
  • “Active message” paradigm
  • Connection multiplexing
  • Load balancing
  • Optimizations and heuristics
  • EXERCISE 2: KEY-VALUE STORE OVER RDMA VERBS
  • Using RDMA verbs to write a client-server application
  • Server keeps an in-memory key-value table, clients read/write key-value pairs
  • Day 3: Unified Communication X (UCX)
  • Unified Communication X (UCX)
  • UCX overview – past, present and future
  • UCX APIs for HPC
  • UCX APIs for none-HPC applications and use-cases
  • UCX architecture and design
  • EXERCISE 3: UCX-BASED NETWORK FILE SYSTEM
  • Using UCX, write a server app and a client library for a remote filesystem
  • Client library exports open/read/write/seek/close, and connect/disconnect to server
  • Each request, based on size and caching, selects a protocol and accesses the remote file
  • Measure performance (ops/sec) for different access patterns
  • Day 4: Collective Communication
  • Collective Communication Overview
  • An introduction to collective communication
  • MPI collective vs. AI collective vs. PGAS collective
  • Algorithms and optimizations in collective communication
  • All reduce – ring vs. tree
  • Collective communication – network offloads
  • CORE-direct
  • Persistent Communication Offload
  • SHARP
  • State-of-the-art existing libraries
  • NCCL, Gloo, MPI
  • Introduction to Unified Collective Communication Library (UCC)
  • Working Group status report
  • Architecture – APIs and design goals
  • EXERCISE 4: COLLECTIVE COMMUNICATION OFFLOAD IN HPC AND AI
  • MPI collective operation in HPC application
  • AI collective operation in AI application
  • Day 5: RDMA Applications in HPC, Storage & AI
  • OSU University
  • RDMA in HPC and AI applications
  • Tsinghua University
  • RDMA practice sharing in HPC and AI competition
  • USTC
  • RDMA practice in storage applications
  • Course summary
  • RDMA Programming Hackathon Q&A
  • Day 6: RDMA Programming Hackathon
    Day 7: RDMA Programming Hackathon & Interview
  • RDMA programming hackathon
  • Interview (5 min. presentation and 5 min. Q&A)
  • Final results will be announced in China SC2020, Beijing
  • Workshop closing