/Generative AI on Kubernetes: Operationalizing Large Language Models

Generative AI on Kubernetes: Operationalizing Large Language Models - Paperback

Name: Generative AI on Kubernetes: Operationalizing Large Language Models
Brand: O'Reilly Media
SKU: 9781098171926
Price: 59.99 USD
Availability: InStock

by Roland Huß, Daniele Zonca

$59.99

Quantity

01

Pay over time for orders over $35.00 with

Availability:In StockContributor:Roland Huß, Daniele ZoncaPublish date:4/7/2026Pages:404

Language:EnglishPublisher:O'Reilly MediaISBN-13:9781098171926ISBN-10:1098171926UPC:9781098171926Book Category:ComputersBook Subcategory:Artificial Intelligence, Data ScienceBook Topic:Generative AI, Natural Language Processing, Neural NetworksSize:9.19 x 7.00 x 0.83 inchesWeight:1.422Product ID:SCXS8AJZWK

Generative AI is revolutionizing industries, and Kubernetes has fast become the backbone for deploying and managing these resource-intensive workloads. This book serves as a practical, hands-on guide for MLOps engineers, software developers, Kubernetes administrators, and AI professionals ready to combine AI innovation with the power of cloud native infrastructure. Authors Roland Hu and Daniele Zonca provide a clear road map for training, fine-tuning, deploying, and scaling GenAI models on Kubernetes, addressing challenges like resource optimization, automation, and security along the way.

With actionable insights with real-world examples, readers will learn to tackle the opportunities and complexities of managing GenAI applications in production environments. Whether you're experimenting with large-scale language models or facing the nuances of AI deployment at scale, you'll uncover expertise you need to operationalize this exciting technology effectively.

Learn how to deploy LLMs more efficiently with optimized inference runtimes
Get hands-on with GPU scheduling, including hardware detection and multinode scaling
Monitor and understand LLM-specific metrics like Time to First Token and token throughput
Know when to fine-tune a model or when retrieval augmentation is the better choice
Discover how to evaluate models with standardized benchmarks before committing GPU resources
Learn to run agentic applications with secure tool integration, identity management, and persistent state

Language:EnglishPublisher:O'Reilly MediaISBN-13:9781098171926ISBN-10:1098171926UPC:9781098171926Book Category:ComputersBook Subcategory:Artificial Intelligence, Data ScienceBook Topic:Generative AI, Natural Language Processing, Neural NetworksSize:9.19 x 7.00 x 0.83 inchesWeight:1.422Product ID:SCXS8AJZWK

Huß, Roland: - Dr. Roland Huss is a seasoned software engineer with over 25 years of experience in the field. Currently working at Red Hat, he is the architect of OpenShift Serverless and a former member of the Knative TOC. Roland is a passionate Java and Golang coder and a sought-after speaker at tech conferences. An advocate of open source, he is an active contributor and enjoys growing chili peppers in his free time.Zonca, Daniele: - Daniele Zonca is a Senior Principal Software Engineer and Architect for model serving of Red Hat OpenShift AI, Red Hat's flagship AI product combining multiple stacks.

Publisher: O'Reilly Media

Contributor(s)

Roland Huß, Daniele Zonca

Author

Roland Huß, Daniele Zonca

Free shipping on orders over $75. Standard shipping takes 3-7 business days. Returns accepted within 30 days of purchase.

Generative AI is revolutionizing industries, and Kubernetes has fast become the backbone for deploying and managing these resource-intensive workloads. This book serves as a practical, hands-on guide for MLOps engineers, software developers, Kubernetes administrators, and AI professionals ready to combine AI innovation with the power of cloud native infrastructure. Authors Roland Hu and Daniele Zonca provide a clear road map for training, fine-tuning, deploying, and scaling GenAI models on Kubernetes, addressing challenges like resource optimization, automation, and security along the way.

With actionable insights with real-world examples, readers will learn to tackle the opportunities and complexities of managing GenAI applications in production environments. Whether you're experimenting with large-scale language models or facing the nuances of AI deployment at scale, you'll uncover expertise you need to operationalize this exciting technology effectively.

Learn how to deploy LLMs more efficiently with optimized inference runtimes
Get hands-on with GPU scheduling, including hardware detection and multinode scaling
Monitor and understand LLM-specific metrics like Time to First Token and token throughput
Know when to fine-tune a model or when retrieval augmentation is the better choice
Discover how to evaluate models with standardized benchmarks before committing GPU resources
Learn to run agentic applications with secure tool integration, identity management, and persistent state

Huß, Roland: - Dr. Roland Huss is a seasoned software engineer with over 25 years of experience in the field. Currently working at Red Hat, he is the architect of OpenShift Serverless and a former member of the Knative TOC. Roland is a passionate Java and Golang coder and a sought-after speaker at tech conferences. An advocate of open source, he is an active contributor and enjoys growing chili peppers in his free time.Zonca, Daniele: - Daniele Zonca is a Senior Principal Software Engineer and Architect for model serving of Red Hat OpenShift AI, Red Hat's flagship AI product combining multiple stacks.

Publisher: O'Reilly Media

Contributor(s)

Roland Huß, Daniele Zonca

Author

Roland Huß, Daniele Zonca

Generative AI on Kubernetes: Operationalizing Large Language Models - Paperback

Highlights

Specifications

About this item

Ratings & Reviews

Specifications

About the Author

About the Publisher

Credits

Shipping & Return

Recently Viewed

Get Help

Explore