Generative AI

Ready to move beyond experimenting with generative AI? Learn how to design, build and evaluate real-world generative AI systems from the ground up.

Study mode

Online

Download brochure

Next start date

Prerequisites Intermediate Python, basic knowledge of machine learning and large language models, and practical experience with prompt engineering. Time commitment 8 weeks (6 - 8 hours per week)

$1,000.00 incl. GST

Payment options

Level up your skills and qualifications as a digital native

Teaming up with the best in industry, our Future Skills courses lean into the future of work to deliver the best in structured, mentor-supported, 100% online education.

As generative AI becomes embedded in everyday systems, the challenge is no longer experimentation. The challenge is delivery.

With workers in AI-exposed roles already seeing skills change 66% faster than other jobs, and AI capability commanding up to 56% higher wages (PwC, 2025), organisations need people who understand how AI tangibly performs in practice.

This course focuses on how generative AI systems are designed, evaluated and operated once they move into production. You’ll build a practical understanding of what makes AI systems reliable, how quality and performance are assessed, and how trade-offs are managed over time.

Rather than focusing on tools or trends, the course builds judgement. You’ll be better equipped to question assumptions, steer technical decisions, and communicate clearly with engineers, leaders and stakeholders.

This Generative AI course will be delivered to you in partnership with Udacity, meaning you’ll have access to both Udacity’s learning and career services as well as RMIT course enablement support through our Learner Success Team. Upon successful completion of the course, you will also receive an RMIT credential which can be uploaded to LinkedIn, verifying your skill mastery in the discipline.

Course Overview

Learn more about our Generative AI course in the video below.

Get RMIT credentialed

After completing an RMIT Future Skills course, you will earn an RMIT credential which can be validated, recognised and shared on social media platforms.

Course Structure

Learn more about Generative AI

Collapse all sections

Lesson 1: Introduction to Generative AI foundations

Employ the abilities of Generative AI with a deep dive into fundamentals. This course examines how various models are developed, how they work, and how to use them to their full potential.

Lesson 2: Generative AI Overview

Explore the fundamentals of generative AI, its key modalities, advanced capabilities, and essential ethical considerations shaping responsible AI development.

Lesson 3: Applications of Generative AI

Explore real-world applications of Generative AI, including LLM-assisted coding, and learn to prompt, validate, and improve AI-generated code and tests.

Lesson 4: Introduction to Foundation Models

Discover foundation models: large, versatile AI systems trained on massive datasets that generalise across tasks, surpassing traditional models in scalability and adaptability.

Lesson 5: Building Applications using Foundation Models

Learn to build text classifiers with foundation models, using zero-shot and few-shot prompt engineering for tasks like sentiment and spam detection, and evaluate classifier accuracy.

Lesson 6: How Generative AI works

Learn how generative AI creates new data with architectures like Transformers and diffusion models, and how training enables creativity, reasoning, and task-specific abilities.

Lesson 7: Evaluating Generative AI Models

Learn how to assess generative AI using human evaluation, exact metrics, AI judges, and benchmarks, ensuring robust performance for open-ended, probabilistic model outputs.

Lesson 8: Implementing Evaluations for Generative AI Models

Learn practical techniques to evaluate generative AI models, from Exact Match to ROUGE, semantic similarity, code correctness, Pass@k, and LLM-as-a-Judge scoring.

Lesson 9: Neural Networks and Multilayer Perceptrons

Explore neural networks from perceptrons to multilayer perceptrons, learning how they adapt via training, gradient descent, and backpropagation to solve complex AI tasks.

Lesson 10: Implementing Neural Networks using PyTorch

Learn to implement neural networks in PyTorch by mastering tensors, model building, loss functions, optimisers, data loading, and complete training loops for practical machine learning.

Lesson 11: Model Interpretability and Ethics

Explore AI model interpretability and ethics, including bias, misinformation, environmental impact, and fairness for responsible development and deployment of AI technologies.

Lesson 12: Generating Text using LLMs

Discover how LLMs generate text token by token using Hugging Face's Transformers, from tokenisation to model use, and explore hands-on demos with efficient generation methods.

Lesson 13: Role-Based Prompting

Explains the theory of using roles or personas to control the tone, style, and expertise of an LLM's output.

Lesson 14: Implementing Role-Based Prompting with Python

Provides hands-on practice in iteratively developing a role-based prompt to create a believable historical figure persona.

Lesson 15: Adapting Foundation Models

Learn to adapt foundation models for specialised tasks using prompt engineering, RAG, fine-tuning, model compression, and agentic AI tools for efficient, tailored AI solutions.

Lesson 16: Applying PEFT on Foundation Models Learn to efficiently customise foundation models with PEFT and SFT, using LoRA to teach LLMs new skills like spelling via hands-on data preparation and fine-tuning.

Lesson 17: Post-Training Foundation Models Explore post-training for foundation models, including supervised and preference fine-tuning, to align AI with human values, improve usability, and ensure responsible interactions.

Lesson 18: Reinforcement Fine-tuning on Foundation Models Learn to fine-tune LLMs for structured tasks like counting and spelling using GRPO and LoRA, applying reinforcement-based reward functions for targeted skill improvements.

Lesson 1: Introduction to LLMs and Retrieval Augmented Generation (RAG) Introduces Large Language Models (LLMs), their core concepts, and the course structure. Covers prerequisites, environment setup, and defines Retrieval-Augmented Generation (RAG).

Lesson 2: The Large Language Model Landscape Explore the four core capabilities of LLMs: generation, summarisation, classification, and reasoning. Real-world applications and the importance of RAG for building trust.

Lesson 3: Implementing a Chatbot with an LLM Learn to build a stateful chatbot using an LLM. Covers managing conversation history, using system prompts to define behaviour, and understanding message roles (system, user, assistant).

Lesson 4: LLM Prompting & Inference Defines prompt engineering and its components. Explains how to control LLM outputs using inference parameters like temperature, top-p, max tokens, and stop sequences.

Lesson 5: Applied Prompting and Inference Apply prompting techniques hands-on. Implement Chain of Thought (CoT) prompting to improve reasoning and test how different inference parameters change model behaviour.

Lesson 6: Prompt Instruction Refinement Explains the theory of systematically refining prompt instructions by modifying components like Role, Task, Context, Examples, and Output Format.

Lesson 7: Applying Prompt Instruction Refinement with Python Provides hands-on practice iteratively refining a prompt to transform a generic recipe analyser into a precise dietary consultant that produces structured JSON.

Lesson 8: Tokens, Embeddings, and Vector Search Covers the foundations of NLP for LLMs. Defines tokenisation, embeddings as semantic vectors, and vector search (similarity search) as the basis for finding relevant information.

Lesson 9 Implementing Tokens, Embeddings, and Vector Search Hands-on practice with tokenisation. Implementing embedding generation and vector search to build a semantic search system from scratch.

Lesson 10: Strategic Model Selection & Economics Learn the business trade-offs of model selection. Covers performance, cost, speed, and control (TCO). Compares general-purpose (generation) vs. specialised (reasoning) models.

Lesson 11: Applying Model Selection and Economics Apply model selection theory. Calculate Total Cost of Ownership (TCO) including error costs. Implement a hybrid model routing system to balance cost and quality.

Lesson 12: Retrieval Augmented Generation (RAG) Workflow Introduces the RAG architecture. Compares naive vs. advanced modular RAG. Covers the data ingestion pipeline, focusing on data formats and intelligent chunking strategies.

Lesson 13: Semantic Search with Vector Databases for RAG Explains semantic search and the role of vector databases. Covers indexing algorithms (HNSW) for speed and advanced retrieval techniques like HyDE and re-ranking.

Lesson: 14 Prompt Engineering for RAG Synthesis Learn to write prompts for RAG. Covers grounding answers in context, handling conflicts, managing uncertainty, and enforcing verifiability by generating inline citations.

Lesson 15: Implementing RAG with Vector Databases Build a complete RAG system. Practice vector database operations in ChromaDB, including adding documents, applying metadata filters, and implementing a retrieval and generation pipeline.

Lesson 16: Evaluating RAG Systems Learn to evaluate RAG system quality. Introduces key metrics: Context Precision, Context Recall, Faithfulness, and Answer Relevancy. Covers frameworks like RAGAS.:

Lesson 17: RAG Evaluation Implementation Implement a RAG evaluation pipeline using the RAGAS framework. Learn to calculate and interpret quality metrics to diagnose and improve your RAG system's performance.

Lesson 1: Introduction Introduction to the Multimodal AI Applications course and what you will learn.

Lesson 2: Multimodal AI Fundamentals Discover multimodal AI fundamentals and technologies, including models and use cases that process and generate text, images, audio, and video for richer, real-world applications.

Lesson 3: Using Multimodal AI Technologies Explore practical applications of multimodal AI by using APIs and open-source models for image captioning and audio transcription, with hands-on exercises and secure credential handling.

Lesson 4: Transformers & Multimodal Processing Explore how transformers unify text, images, audio, and video through attention, embeddings, and fusion strategies, powering state-of-the-art multimodal understanding and generation.

Lesson 5: Multimodal AI Tooling Explore practical tools for building multimodal AI apps, compare commercial and open-source options, and use Pydantic AI to create reliable, structured, vendor-agnostic workflows.

Lesson 6: Introduction to Enterprise Visual Content Processing Explore enterprise visual content processing: core computer vision tasks, digital image representation, and real-world applications for efficiency, safety, and automation.

Lesson 7: Vision Pre-processing Pipelines with HuggingFace Explore vision data pipelines using HuggingFace, from dataset loading to resising and normalisation, with demos and hands-on exercises for effective image pre-processing.

Lesson 8: Understanding Embeddings in Computer Vision Learn how embeddings convert images into compact vectors for efficient search, enable cross-modal tasks with models like CLIP, and power large-scale, robust computer vision systems.

Lesson 9: Image Search Using CLIP Embeddings Explore how to build text-to-image and image-to-image search using CLIP embeddings, combining theory, real-world demos, hands-on practice, and solution walkthroughs.

Lesson 10: Using Multimodal Model APIs for Vision Explore multimodal vision APIs: prompt design, parameter tuning, structured outputs, cost control, integration, and best practices for robust, efficient image analysis.

Lesson 11: Gemini Vision API Basics Explore Gemini Vision API basics by practicing image moderation, learning to analyse images and implement moderation workflows using real-world examples and guided hands-on exercises.

Lesson 12: Vision Transformer Models & Architectures Explore Vision Transformer models: core architecture, image tokenisation, self- and cross-attention, and top models for segmentation, detection, and enterprise use.

Lesson 13: Using Vision Transformers Explore vision transformers with hands-on demos to extract image embeddings and perform object detection and segmentation using state-of-the-art models.

Lesson 14: Vision-Language Models Learn how vision-language models align images and text for tasks like search, captioning, and visual question answering, with a focus on enterprise deployment considerations.

Lesson 15: Multimodal Vision Applications with CLIP Explore zero-shot image classification and auto-labelling for driving scenes using CLIP, enabling efficient, scalable multimodal vision applications.

Lesson 16: Diffusion Models & Image Generation Explore how diffusion models generate images by reversing noise through iterative denoising, a key technique behind modern generative image models.

Lesson 17: Introduction to Enterprise Audio Processing Discover enterprise audio processing, including core speech tasks, use cases, and integration strategies for modern business environments.

Lesson 18: Audio Data Representation Explore how audio is digitised for AI, including sample rate, bit depth, channels, formats, and best practices for preprocessing and analysis.

Lesson 19: Audio Processing with librosa Explore audio processing with librosa to load, resample, convert, analyse and visualise audio data through hands-on exercises.

Lesson 20: Sound Retrieval and Classification Explore audio embeddings for efficient sound classification and retrieval using models like CLAP to enable semantic audio analysis at scale.

Lesson 21: Sound Retrieval and Classification with CLAP Apply CLAP for sound retrieval, similarity and zero-shot classification to detect fan on and off states in real audio data.

Lesson 22: Speech Processing Discover automatic speech recognition with Whisper, a robust multilingual model for transcription, translation, and real-world speech processing.

Lesson 23: Implementing Speech Processing with Whisper & Gemini Explore real-world speech transcription and translation using Whisper and Gemini, including multilingual support and alignment techniques.

Lesson 24: Audio Intelligence Explore advances in audio intelligence, including multimodal systems, speech recognition, text-to-speech, ethics, and enterprise controls.

Lesson 25: Audio Sentiment Analysis with Gemini Explore audio sentiment and command analysis using Pydantic AI and Gemini to extract emotions and recognise spoken commands.

Lesson 26: Audio Classification and Moderation Explore voice content moderation including compliance, privacy, layered detection and operational excellence.

Lesson 27: Building a Basic Voice Moderation System with Gemini Build a voice moderation system using Gemini to transcribe audio, detect personal data disclosures, and flag policy violations.

Lesson 28: Introduction to Enterprise Video Processing Discover how enterprise video AI addresses temporal complexity using efficient frame selection for understanding and moderation.

Lesson 29: AI Models for Video Understanding Explore AI models for real-time detection, motion tracking and temporal understanding to enable scalable video analytics.

Lesson 30: Implementing Object Recognition & Tracking Learn how to detect and track objects in videos, apply multi-object tracking, and count items in practical scenarios.

Lesson 31: Video Understanding & Search Explore methods for analysing and searching video using foundation models, balancing accuracy, cost and performance.

Lesson 32: Video Understanding & Search with Gemini & Clip4Clip Explore automated video description, key moment detection, and natural language video search using AI models and structured outputs.

Lesson 33: Video Classification & Moderation Learn to classify and moderate video by modelling temporal patterns and combining automation with human oversight.

Lesson 34: Video Classification & Moderation with Gemini Build automated systems for video classification and moderation using Gemini and Pydantic AI in real-world scenarios.

Lesson 35: Video Generation Explore generative video AI tools and workflows that turn text, images or footage into dynamic video content.

Lesson 36: Video Generation with Veo 3 Generate marketing videos using Veo 3 with text-to-video and image-to-video workflows, understanding strengths and limitations.

Lesson 37: Multimodal AI Deployment Explore deployment strategies for multimodal AI systems via unified APIs and orchestration approaches.

Lesson 38: Implementation Tools and Serving Strategies Explore tools and strategies for implementing, serving and monitoring AI solutions from prototyping to production.

Lesson 39: Using Gradio and Pydantic AI Build multimodal chatbots and analysis apps using Gradio and Pydantic AI, covering async programming and interface customisation.

Lesson 40: Multimodal AI Performance Monitoring and Logging Learn to monitor and log multimodal AI systems, tracking performance, costs and failures across modalities.

Lesson 41: Logging and Performance Monitoring with Gradio and Arize Phoenix Implement logging and performance monitoring for multimodal AI chatbots to enable robust analytics and debugging.

Course Project: Evaluating Multimodal Applications Learn how to evaluate multimodal AI applications using user feedback, automated metrics and continuous monitoring.

Lesson 43: Testing Multimodal Apps with Pydantic AI Evals Build robust testing frameworks for multimodal AI apps using structured outputs and semantic evaluation techniques.

Lesson 44: Scaling Multimodal AI Architecture Learn strategies to scale multimodal AI systems, focusing on performance, reliability, cost and architectural trade-offs.

Learn with industry experts

Get personalised feedback on your projects as well as practical tips and industry best practice from Udacity’s mentor team.

Our learner success team are here to help you with 1:1 coaching, tips on how to successfully study online, and any questions or concerns you may have.

Why choose RMIT

Get a world-class education and transform your career.

Real world skills

Develop skills that have been validated by industry, while getting credentialed by a world-leading university.

Industry connected

You'll gain knowledge and practical skills from renowned industry partners who are at the forefront of their field.

Flexible delivery

Advance your career while you study. RMIT Online courses let you balance work, study and life commitments.

Supported community

Be guided by a network of industry experts and peers, and supported by our dedicated success team.

Generative AI

Level up your skills and qualifications as a digital native

Course Overview

Get RMIT credentialed

Course Structure

Module 1: Generative AI Fundamentals

Module 2: Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)

Module 3: Multimodal AI Applications