Skip to content

AI-HPC.org

Main Navigation Home Guide Community AI-HPC Assistant About

English

简体中文

English

简体中文

Appearance

Sidebar Navigation

Introduction

Overview

Architecture

Infrastructure

01. Hardware & Chips

Overview

02. Cluster Architecture

Overview

CPU HPL Benchmark

GPU HPL Benchmark

AMD Tuning Guide

STREAM Benchmark

Benchmark Toolkit

03. Network (IB/RoCE)

Overview

RoCE AI Fabric

ClusterKit Testing

IB Admin Guide

NCCL Test

UFM Install

Linux NAT Gateway

04. Storage Systems

Overview

Advanced Features

Deep Dive: BeeGFS Deployment

Practical: GPFS ECE Deployment

Lustre Install

System Software

05. Automated Provisioning

Overview

Ubuntu Autoinstall

Diskless Boot (PXE+NFS)

Boot Failure caused by LD_LIBRARY_PATH

06. Cloud & Scheduling

Overview

Slurm Install (Rocky)

Slurm Install (Ubuntu)

Slurm User Guide

Slurm + Docker

07. Heterogeneous Computing

Overview

CUDA Install

08. AI Compiler

Overview

Triton 101

Deep Dive into OpenAI Triton

oneAPI Install

LLM Technology

09. Frameworks

Overview

PyTorch Docker Env Setup

PyTorch Docker

10. Pre-trained Models

Overview

Deep Dive: DeepSeek Arch

RTP Hybrid

11. Distributed Training

Overview

12. Inference Engines

Overview

Transformers vs vLLM Benchmark

Deep Dive: vLLM Principles

Applications

13. Industry Apps

Overview

RAGFlow + K8s

14. AI for Science

Overview

Life Science

Weather

CFD

Materials

On this page

LLM Pre-training

Transformer Architecture

Self-Attention Mechanism
Multi-Head Attention
Feed Forward Network (FFN)

Positional Encoding

Absolute Positional Encoding (Sinusoidal)
Rotary Positional Embedding (RoPE)

Training Objectives

Masked Language Modeling (MLM)
Causal Language Modeling (CLM)

Pager

Previous pagePyTorch Docker

Next pageDeep Dive: DeepSeek Arch

AI-HPC Organization

LLM Pre-training ​

Transformer Architecture ​

Positional Encoding ​

Training Objectives ​

LLM Pre-training

Transformer Architecture

Positional Encoding

Training Objectives