Cactus Compute

startup

On-device AI with cloud fallback

Cactus Compute is an AI infrastructure company focused on running models close to users. Its Cactus engine targets low-latency inference on smartphones, laptops, wearables, and edge devices while keeping cloud fallback available for harder requests. The company positions Cactus as a single toolkit for speech, vision, and language models. Its public materials emphasize automatic routing between on-device and cloud execution, lower latency, privacy for sensitive workloads, and lower inference cost when simple requests can stay local. Cactus also publishes open-source work, including the Cactus runtime and Needle, a tiny tool-calling model. The company website lists Y Combinator backing and a team with backgrounds from Oxford, Salesforce, Google, AWS, MIT, and other organizations.

San Francisco, United StatesFounded 20231-10 employees

Visit Website GitHub

1 Models

InfrastructureAI Lab