Opinion: Where will AI inference happen?

Today, AI inference is done in two places: either on a mobile device or in a centralized data center. Not much in between. But that is going to change.

I had a stimulating conversation with John Smee of Qualcomm at the recent 6G@UT event. John and I agreed that while phones and other mobile devices can run big AI models and do amazing things, the models will grow in sophistication faster than the phones can grow in capability. In other words, the batteries and chips will never be able to keep up with the memory and compute requirements of some upcoming AI workloads, and various networks will come into play.

Mobile Experts_AI Inference
Mobile Experts_AI Inference
A diagram from "The impact of AI on Mobile Telecom" white paper. (Source: Mobile Experts)

In the case of GenAI, many tasks (inference, training, data storage) will happen in a centralized hyperscale data center, because these applications don’t need low latency. But physical AI is a different story. Robots, cars, drones and other moving objects need low latency to control physical actions, so the compute power has to be available and nearby.

What is the value of running AI inferencing on-device?

The value of running the AI inference on-device is the certainty that you’ll get an answer within microseconds. That’s why self-driving cars run safety algorithms completely on board the car. This will remain the primary approach for most devices and most physical AI inferences will be simple enough to run on a CPU, onboard the device itself.

Industrial users have more sophisticated models with input from many different sensors. In the oil and gas market, edge computing and AI have been used extensively to optimize operations, with servers located on-site. Manufacturing companies have done the same thing…and now they are migrating from wired to wireless connectivity for their edge AI workloads.

Over the next 10 years, we expect sophisticated AI models to be applied to simple machines like lawn mowers, food delivery wagons, drones and others. Many of these products will be too small for batteries supporting teraflops of compute power. Some will be consumer products that simply can’t cost thousands of dollars. So, we anticipate an end-game where AI compute power is distributed, including the device, a local data center and a central data center.

When will distributed AI infrastructure emerge?

Predicting exactly when this distributed AI infrastructure will emerge is the tricky part. I don’t see any revenue for AI edge computing in the network today…but there is a lot of action taking place in industrial automation on-premises. I believe that this will spill over from the factory to the wider world, and 10 years from now we will see a thriving market for intelligent and cooperative devices that use distributed AI architectures.

Mobile Experts released a more thorough explanation of these four AI compute locations in a free white paper this week. Take a look at that document for a simple explanation of the role played by each computing location. (I am aware that I will get some hate mail for excluding Orbital Data Centers from the white paper, but I didn’t forget about them — and I may add them to my simple diagram someday. The reason for not including them is that they are not available yet and the economics don’t look favorable for anybody that isn’t running a global social media enterprise. See my March 2026 LinkedIn post on this topic.

Joe Madden is principal analyst at Mobile Experts, a network of market and technology experts that analyzes wireless markets. Disclaimer: Nokia is a client of Mobile Experts.


Opinions from industry experts, analysts or our editorial staff do not represent the opinions of Fierce Network.