Interacting With Assembly In Rust

Master low-level code control while maintaining Rust's safety guarantees. From inspecting generated assembly to writing inline code and compiling for WebAssembly.

Why Assembly Matters in Rust

Rust's commitment to performance without sacrifice makes it ideal for systems programming. When you need ultimate control over generated machine code--whether optimizing critical paths or debugging compilation output--Rust provides powerful tools for direct assembly interaction.

This guide explores how to extract, analyze, and write assembly code alongside your Rust programs, balancing high-level ergonomics with low-level control. Understanding how Rust translates to assembly is essential for performance-critical applications that demand both safety and speed.

Tools for Viewing Assembly Output

Understanding what your Rust code compiles to provides valuable insights for performance tuning and validation. Rust offers multiple pathways to inspect generated machine code.

cargo-asm

The most developer-friendly tool for viewing assembly with Rust source context annotations.

Installing and using cargo-asm
1# Install cargo-asm2cargo install cargo-asm3 4# View assembly for a specific function5cargo asm package_name::module_name::function_name6 7# Show output with source annotations8cargo asm --rust package_name::module_name::function_name

Compiler Flags

Alternative methods for viewing assembly during compilation provide different perspectives on generated code.

Compiler flags for assembly inspection
1# Compile with verbose assembly output2RUSTFLAGS="-C opt-level=3 -C debuginfo=2" cargo build --release3 4# View generated assembly in target directory5ls target/release/deps/*.s6 7# Use cargo-expand to see expanded macros8cargo install cargo-expand9cargo expand fn_name

Inline Assembly with asm! Macro

Rust's asm! macro, stabilized in Rust 1.59, enables writing assembly directly within Rust functions. This provides complete control over instruction selection, register usage, and scheduling--essential for cryptographic operations, system calls, or hardware manipulation that the compiler cannot generate efficiently on its own.

The macro uses register constraints to define how Rust variables map to assembly operands. The in keyword passes input values, out captures outputs, and inlateout handles values used as both input and output. The options field controls behavior with flags like pure (no side effects), nomem (no memory access), and nostack (no stack modification). This combination of syntax and safety features makes inline assembly accessible while maintaining Rust's guarantees. For high-performance computing applications, this level of control can be critical for achieving optimal throughput.

Inline assembly examples in Rust
1use std::arch::asm;2 3#[inline(never)]4pub fn fast_mul(a: u64, b: u64) -> u64 {5 let result: u64;6 unsafe {7 asm!(8 "mul {0}",9 in(reg) a,10 inlateout(reg) b => result,11 options(pure, nomem, nostack)12 );13 }14 result15}16 17// Example: CPUID instruction for processor info18#[inline(never)]19pub fn get_cpu_vendor() -> [u8; 12] {20 let mut vendor = [0u8; 12];21 unsafe {22 asm!(23 "cpuid",24 in("eax") 0,25 out("ebx") vendor[0..4],26 out("ecx") vendor[8..12],27 out("edx") vendor[4..8],28 );29 }30 vendor31}

WebAssembly Integration

WebAssembly provides a portable compilation target executing at near-native speed across platforms. Rust's first-class WebAssembly support enables compiling Rust code to WASM modules for browsers, Node.js, or standalone runtimes.

The Rust WebAssembly ecosystem centers on wasm-pack, which handles compilation, packaging, and publishing of Rust-generated WebAssembly modules. The wasm-bindgen crate generates JavaScript glue code for calling between Rust and JavaScript, supporting strings, objects, arrays, and complex data structures. Optimization techniques include enabling LTO (link-time optimization), using opt-level 'z' for size reduction, removing dead code, and running wasm-opt on the output to minimize binary size while maintaining performance. This approach is particularly valuable for modern web applications that require high performance in browser environments.

Cargo.toml for WebAssembly library
1[lib]2crate-type = ["cdylib", "rlib"]3 4[dependencies]5wasm-bindgen = "0.2"6 7[dependencies.web-sys]8version = "0.3"9features = [10 "console",11 "Window",12 "Document",13 "Element",14 "HtmlElement",15]16 17[profile.release]18lto = true19opt-level = "z"
Rust code with wasm-bindgen annotations
1use wasm_bindgen::prelude::*;2 3#[wasm_bindgen]4pub fn process_data(data: &[u8]) -> usize {5 data.iter().filter(|&&b| b > 0).count()6}7 8// Expose to JavaScript with #[wasm_bindgen]9#[wasm_bindgen]10pub struct Calculator {11 value: i32,12}13 14#[wasm_bindgen]15impl Calculator {16 #[wasm_bindgen(constructor)]17 pub fn new() -> Calculator {18 Calculator { value: 0 }19 }20 21 pub fn add(&mut self, n: i32) {22 self.value += n;23 }24 25 pub fn get_value(&self) -> i32 {26 self.value27 }28}

SIMD and Vectorization

Single Instruction Multiple Data (SIMD) instructions process multiple data elements simultaneously, providing substantial speedups for parallelizable algorithms. Rust provides portable SIMD abstractions through the std::arch module. The SIMD128 proposal is now supported in all major browsers for WebAssembly, enabling significant performance gains for data-parallel operations in web applications.

Portable SIMD implementation
1#[cfg(target_arch = "wasm32")]2use std::arch::wasm32::*;3 4#[cfg(target_arch = "x86_64")]5use std::arch::x86_64::*;6 7pub fn vector_add(a: &[f32], b: &[f32], result: &mut [f32]) {8 assert!(a.len() == b.len() && b.len() == result.len());9 10 #[cfg(target_feature = "simd128")]11 {12 for chunk in result.chunks_mut(4) {13 let i = chunk.len();14 // SIMD implementation15 }16 }17 18 #[cfg(not(target_feature = "simd128"))]19 {20 for (i, (a_val, b_val)) in a.iter().zip(b.iter()).enumerate() {21 result[i] = a_val + b_val;22 }23 }24}

Best Practices for Production

Testing Assembly Code

Inline assembly bypasses Rust's type system, making thorough testing essential:

  • Unit tests verifying function contracts
  • Property-based testing for edge cases
  • Cross-platform CI testing
  • Fuzzing for security-critical operations

Portability Strategies

Maintain portable fallback implementations for platforms without specialized instructions. Use compile-time feature detection with #[cfg] attributes to choose between optimized and portable implementations. For runtime dispatch, use std::is_x86_feature_detected! to select the best implementation available on the current CPU. This approach ensures your code runs correctly everywhere while achieving maximum performance where supported. Following SEO best practices for technical content means ensuring your performance-optimized code remains maintainable and accessible.

Feature detection and fallback strategies
1#[cfg(all(target_arch = "x86_64", target_feature = "avx2"))]2fn optimized_path(data: &[u8]) {3 // AVX2 implementation4}5 6#[cfg(not(all(target_arch = "x86_64", target_feature = "avx2")))]7fn optimized_path(data: &[u8]) {8 // Portable fallback implementation9}10 11// Runtime feature detection12fn get_fast_implementation() -> fn(&[u8]) {13 if std::is_x86_feature_detected!("avx2") {14 optimized_avx215 } else if std::is_x86_feature_detected!("sse4.1") {16 optimized_sse4117 } else {18 portable_implementation19 }20}
Essential tools for Rust assembly and WebAssembly development
ToolPurposeInstallation
cargo-asmView annotated assembly outputcargo install cargo-asm
wasm-packBuild and package WASM modulescargo install wasm-pack
wasm-bindgenGenerate JS-Rust interop codeAdd to Cargo.toml
wasm-optOptimize WASM binary sizevia wasm-pack
cargo-expandView expanded macroscargo install cargo-expand

When Assembly-Level Control Matters

Assembly-level control adds value in specific scenarios:

Cryptographic Operations

Cryptography demands constant-time implementations to prevent timing attacks and maximum performance for high throughput. Inline assembly helps achieve both requirements.

System Programming

Operating systems, device drivers, and embedded systems require direct hardware access that only assembly provides.

Real-Time Graphics

Game development and graphics processing require predictable performance that manual optimization can help achieve consistently.

Ready to Optimize Your Rust Performance?

Our team specializes in high-performance Rust development, from WebAssembly deployments to low-level systems programming.

Frequently Asked Questions