What is serverless GPU?

Serverless means that the developer should not need to configure servers. The process between the code and deployment should be as small as possible for serverless development.

Important properties for serverless are the following:

Pay-per-use

Usage of the server should be pay-per-use. This means every second or millisecond the server is used should be the billable unit. This does ideally mean that the application does not cost any money when it is not used, and that the cost grows with the usage of the application. For applications that get a higher revenue with increased use, this should be acceptable at any scale.

Scale to zero

Serverless should scale to zero cost when it is not used. In addition to not wasting money on services that are not used, this allows for flexible developer environments. It is possible to spin up a complete replica of the application for a development, test, staging or instance of the application. These instances can then be shut down or stay idle when not used, all instances being independent on one another.

Scalability

The scaling of serverless services should be done by the cloud provider. It should ideally work in the same way with zero or 1000 processes running at the same time. When the developer does not need to worry about scaling or configuring servers, it means that the developer can focus more on actually coding and solving business specific problems, rather than general configuration problems.

Serverless in AI and GPUs

Common serverless platforms such as Lambda on AWS or Cloudflare Workers work quite well for services that need a CPU and are limited in the size of the code. However, with AI, we often need GPUs and the size of the code is normally much larger than the small limits set by these cloud providers.

Several new cloud providers have appeared to solve this problem by offering GPUs that scale to zero and are billed by the second.