About Me

I Architect the Engines Behind Intelligence.

I build high-performance systems for AI — compilers, kernels, inference engines, and the low-level infrastructure that lets models scale. My work lives where math meets metal: CUDA kernels, IR optimizations, quantization pipelines, and high-efficiency distributed runtimes. I focus on squeezing every last drop of performance out of hardware, reducing memory traffic, reshaping computation graphs, and designing the machinery that turns abstract models into real, blazing-fast systems. I’m obsessed with the boundary between algorithms and architecture — the place where precision, throughput, and engineering discipline collide.

Skills & Systems

(Systems-first • production-ready)

I own the vertical slice that makes modern AI actually run — from hardware-aware kernels and compiler passes to production inference runtimes and observability.

GPU & Low-Level Compute

  • CUDA programming & custom kernel development
  • Tensor Core utilization & warp-level tuning
  • Memory-traffic reduction & profiler-driven optimization
CUDANsightTensor CorescuBLAS

Compiler & IR Optimization

  • IR rewriting & graph-level transforms
  • Operator fusion (MatMul+ReLU, etc.)
  • MLIR / LLVM-based optimization pipelines
MLIRLLVMTritonTorchInductor

AI Inference & Serving

  • Async batching, dynamic worker pools
  • High-throughput inference runtimes
  • Quantization strategies (INT8 / FP16)
FastAPIRedisDockergRPC

Backend & Infra

  • Production APIs & containerized services
  • Caching, queuing and fault-tolerant design
  • CI/CD for performance-sensitive systems
FastAPIDockerKubernetesRedis

Monitoring & Observability

  • Instrumentation & metrics-driven debugging
  • Performance dashboards & alerting
  • SLO/SLI mindset for latency-sensitive systems
PrometheusGrafanaJaeger
GPU & Compiler Layer
CUDA
LLVM/MLIR
Kernels
Quantization
Inference & Runtime
Inference Engines
Async Batching
Worker Pools
Platform & Services
APIs
Caching
Monitoring

Stack visualization showing the vertical slice I own — from metal (GPU & compiler) to serving and platform.

Featured Projects:

View all

Community

GitHubKaggle

Social Media

TwitterLinkedin