AWS re:Invent 2025: The Blueprint for the Agentic AI Era

AWS re:Invent 2025 has once again set the pace for enterprise and infrastructure innovation, charting a clear course from AI models to autonomous agents and the industrial-scale infrastructure needed to run them. As revealed by AWS CEO Matt Garman, Amazon Web Services’ generative AI development platform, Amazon Bedrock, has served over 100,000 customers globally. Among them, more than 50 enterprises process over one trillion tokens each through Bedrock. The agent development tool Amazon AgentCore SDK has been downloaded over 2 million times within just a few months since launch. Meanwhile, AWS hosts a significantly higher number of unicorn startups than any other cloud platform, emphasizing its dominant market position.

AWS re:Invent 2025 Keynote | AWS CEO Matt Garman talks about AI agent

AWS re:Invent 2025 Keynote | The Advent of AI Agents

During his keynote, Garman declared that "(the advent of AI agents) is turning from a technical wonder into something that delivers us real value," describing a future of autonomous digital workers. To power this, AWS revealed a vertically integrated, full-stack AI architecture: from its custom silicon to frontier models and agent runtimes. This ambitious strategy is built upon the core cloud infrastructure that makes it all possible.

In this blog post, we will focus on those critical underlying layers, recapping key announcements in artificial intelligence, the new vector and table capabilities for Amazon S3, innovations in databases, and powerful new compute offerings designed to handle the next generation of workloads.

The Model Portfolio: Expanding Possibilities

During the event, a flood of new AI products and tools was announced. The speech-to-speech conversational Nova 2 Sonic, the all-in-one multimodal Nova 2 Omni, and the highly customizable Nova Forge for deep domain expertise training, just to name a few. Among these significant advancements, Amazon Nova 2 Lite stands out as a compelling release aimed squarely at the challenge of balancing quality and affordability.

A key trend of this year’s conference is the shift from experimental LLMs (large language models) to practical, production-ready intelligence. While powerful, many frontier models remain prohibitively expensive for widespread high-volume business applications. Amazon Nova 2 Lite directly addresses this, positioning itself as a cost-effective reasoning engine optimized for speed, intelligence, and price. It is designed not just as another LLM but as a specific tool for important tasks that need to be done quickly and efficiently, such as automating customer service, document processing, and business intelligence, making advanced reasoning abilities affordable for more organizations.

AWS re:Invent 2025 | Amazon Bedrock with a broad selection of fully managed models

Beyond its own models, AWS expanded the choice available to customers through Amazon Bedrock further. The service now includes the general availability of an additional 18 fully managed open-weight models from leaders like Google, MiniMax AI, Mistral AI, Moonshot AI, NVIDIA, OpenAI, and Qwen. With this launch, Amazon Bedrock provides nearly 100 serverless models, offering a broad and deep range of options. This wide variety of models makes sure that developers can choose from both exciting new technologies and well-known industry standards to find exactly what they need for their specific projects.

S3 Vectors: Extremely Scalable Context

Vector search provides the context that powerful RAG (retrieval-augmented generation) applications require. However, scaling this capability with specialized databases can be costly. Amazon S3 Vectors, which is now generally available, changes the picture by offering the first cloud object storage with native support for vector data, slashing the barrier to entry. The core advantage is economic: AWS states it can reduce the total cost of storing and querying vectors by up to 90% compared to some dedicated vector databases.

echo "Creating the S3 vector bucket..."
aws s3vectors create-vector-bucket --vector-bucket-name "my-vector-bucket"

echo "Creating the vector index..."
aws s3vectors create-index \
    --vector-bucket-name "my-vector-bucket" \
    --index-name "my-vector-index" \
    --data-type "float32" \
    --dimension "1536" \
    --distance-metric "cosine" \
    --metadata-configuration "nonFilterableMetadataKeys=AMAZON_BEDROCK_TEXT,AMAZON_BEDROCK_METADATA"

Simply creating a vector bucket and a vector index within S3 achieves this, providing serverless operation and immense scale. A single index now supports 2 billion vectors, which is a 40x increase from the preview. Queries now return results with approximately 100 milliseconds. Moreover, it integrates directly with Amazon Bedrock Knowledge Base and Amazon OpenSearch, streamlining the development of RAG pipelines and semantic search.

To me, Amazon S3 Vectors is designed to complement high-performance dedicated vector databases. Its <100ms warm query latency (<1s for cold queries) is ideal for cost-sensitive workloads. However, for user-facing applications requiring real-time responses, purpose-built vector databases (or general-purpose databases with vector capabilities) remain compelling choices. S3 Vectors, on the other hand, creates a new, cost-effective tier in the AI data architecture.

OpenSearch: Autonomous and Low-Latency Intelligence

After discussing cost-effective AI models and storage, the next pillar is performance. For generative AI applications that demand real-time, low-latency vector search, such as interactive chatbots or live recommendation engines, purpose-built search engines remain essential. At re:Invent 2025, Amazon OpenSearch Service introduced two key features to meet this need: serverless GPU acceleration and auto-optimization for vector indexes.

The new GPU acceleration lets you create very large vector databases in less than an hour and at a cost that is up to 75% lower for indexing, while the auto-optimization feature saves you weeks of manual adjustments by automatically checking your data to suggest the best index setup. Notably, both features are autonomous. For instance, you don’t need to provision or manage GPU instances. OpenSearch activates them dynamically, and you pay only for the acceleration used during indexing operations.

AWS re:Invent 2025 | Amazon OpenSearch with GPU Acceleration

Together, these capabilities form a seamless workflow. You first use auto-optimization to define your latency and search quality requirements and receive an optimized configuration. Then, you enable GPU acceleration to build that index up to 10x faster and at a quarter of the cost, transforming a process that once required tedious manual settings or code adjustments into one that works autonomously.

Other than additional features on OpenSearch, AWS also announced broad database improvements. These included Database Savings Plans, a new pricing model offering up to 35% savings on a wide range of RDS services in exchange for usage commitment, and new capabilities for Amazon RDS for SQL Server and Oracle, such as SQL Server Developer Edition support and expanded storage up to 256 TiB.

Lambda Managed Instances: Simplicity + Flexibility

As the foundation for AI, storage, and databases grows more powerful, the compute layer must evolve to run these demanding workloads with both agility and control. This year, AWS also introduced AWS Lambda Managed Instances, a brand-new compute option that merges the serverless simplicity of Lambda with the raw power and flexibility of Amazon EC2. This hybrid model eliminates a classic trade-off, allowing developers to run workloads that require specialized hardware or cost optimizations from Amazon EC2 purchasing commitments.

AWS re:Invent 2025 | Lambda Managed Instances

Traditionally, developers chose between the ease of serverless functions and the deep control of EC2 instances. Lambda Managed Instances closes this gap. You configure your application’s requirements, such as vCPUs, memory, and EC2 instances to include/exclude. AWS then fully provisions, manages, scales, and monitors these "managed instances" on your behalf, applying the core serverless principle of removing operational overhead. This model is ideal for workloads that require consistent high performance or need access to specialized hardware like high-performance storage.

AI Infrastructure: The Foundation of Intelligence

While advancements in models, storage, databases, and compute are thrilling, they ultimately depend on powerful hardware to run. The infrastructure announcements from AWS re:Invent 2025 deliver exactly that power. This new generation of infrastructure empowers organizations to build, train, and run cutting-edge AI models wherever their business or compliance needs require, delivering unprecedented capabilities directly to your data or even your data centers.

Amazon EC2 Trn3 UltraServers

To support the next wave of AI, AWS introduced the general availability of Amazon EC2 Trn3 UltraServers, powered by their fourth-generation, purpose-built Trainium3 AI chip.

AWS re:Invent 2025 Keynote | AWS CEO Matt Garman reveals EC2 Trn3 UltraServers

AWS re:Invent 2025 Keynote | Revealing Trn3 UltraServers

These servers are engineered for massive scale and efficiency. A single Trn3 UltraServer can pack up to 144 Trainium3 chips, delivering an astonishing 362 FP8 petaflops of compute, along with massive memory bandwidth to handle complex, long-context tasks. This release is a dramatic leap forward, with AWS reporting up to 4.4x higher compute performance and 4x greater energy efficiency compared to the previous Trn2 generation. For customers, the increase translates into drastically reduced training times and lower operational costs, with some organizations seeing training and inference costs drop by up to 50%. Notably, this infrastructure is already in use, with Amazon Bedrock serving production workloads and AWS customers leveraging it for demanding applications like real-time generative video.

AWS AI Factories

For many enterprises and governments, data sovereignty, regulatory compliance, or latency requirements make moving data to the public cloud challenging. AWS AI Factories address this by delivering a revolutionary new deployment model: a fully managed, high-performance AWS AI infrastructure installed directly in your own data center.

This service effectively creates a private AWS Region for AI within your walls. You provide the data center space and power, and AWS deploys and manages the complete stack (latest Trainium chips, NVIDIA GPUs, petabit-scale networking, and high-performance storage). The strategic impact is profound. AWS AI Factories can accelerate AI buildouts by months or even years compared to organizations attempting to construct such complex infrastructure independently.

Together, the Trn3 UltraServers and AI Factories represent a complete rethinking of AI infrastructure. They offer the strong computing power needed for advanced AI models and the unique ability to use that power wherever it’s most needed, making it easier for businesses to adopt AI on a large scale.

The Blueprint of the Agentic AI Era

The innovations at AWS re:Invent 2025 form a complete, integrated industrial stack. From cost-effective reasoning models and scalable context storage to the high-performance infrastructure of Trn3 UltraServers and AI Factories, every layer is designed to make powerful agentic AI a more practical reality. This comprehensive approach is not only releasing more new products and services but also a strategic vision.

Please allow me to quote an especially insightful observation from Marcus Schuler of Implicator.ai before concluding:

The cloud era abstracted infrastructure. The agent era, if Garman’s vision holds, abstracts work itself. AWS is building the factory floor for that transition. Whether the agents perform as advertised matters less than whether enterprises believe they might.

This perspective highlights something critical: even as the "abstraction of work" accelerates, abstraction doesn’t eliminate foundations. In fact, abstraction layers only succeed when the underlying layers are stronger, not weaker. Modern data infrastructure (both hardware and software) remains foundational. Accessible, well-governed, high-throughput data systems are and will continue to be the backbone that allows these new agentic AI applications to function reliably.