Mid-level Software Engineer (Generative AI Cloud Infrastructure)
Company: Arc.dev
Location: Location not specified (Remote)
Type: Full-time
Level: mid
Remote: Yes
Posted: 2026-02-25
About this role
*[Open to candidates based in the US / UK OR Western Europe]*
Mid-Level Software Engineer – AI Cloud & LLM Infrastructure
Full-Time · Remote or Hybrid · Founding Team Opportunity
About Us
We are building a Gen AI Acceleration Cloud an end-to-end platform for the full generative AI lifecycle. Our focus is to deliver blazing-fast LLM inference, scalable fine-tuning, and modern AI cloud infrastructure that GPUs, SmartNICs/DPUs, and ultra-fast networking fabrics.
Our platform powers mission-critical workloads with:
● On-demand & managed Kubernetes clusters
● Slurm-based training clusters
● High-performance inference services
● Distributed fine-tuning and eval pipelines
● Global data centers &heterogeneous GPU fleets
We are looking for a jr-mid Software Engineer to design, build, and scale the core systems behind our AI cloud.
What You’ll Work On
AI Cloud Infrastructure
- Develop and maintain reliable backend services running across cloud data centers.
- Assist in building automation for GPU management, VM provisioning, and high-throughput storage systems.
- Contribute to distributed systems and pipelines that support AI workloads.
LLM & GPU Virtualization Platform
- Help build the software layer for GPU clusters with modern accelerators (H100, GB200, GB300).
- Work on GPU virtualization and management (PCIe passthrough, MIG, SR-IOV) under guidance.
- Support scaling and optimization of storage and data systems for AI training datasets.
Observability, Reliability & Automation
- Contribute to monitoring and observability stacks (Prometheus, Grafana, OpenTelemetry).
- Help implement automated node lifecycle management for distributed training and inference.
- Assist in building testing frameworks for resiliency and fault tolerance.
Core Platform Engineering
- Contribute to internal and open-source platform components.
- Build developer tooling, SDKs, and documentation for platform services.
- Support research and implementati...