Large Language Models (LLMs) will be at the core of many groundbreaking AI solutions for enterprise organizations. Here are just a few examples of the benefits of using LLMs in the enterprise for both internal and external use cases:
Optimize Costs. LLMs deployed as customer-facing chatbots can respond to frequently asked questions and simple queries. These enable customer service representatives to focus their time and attention on more high-value interactions, leading to a more cost-efficient service model.
Save Time. LLMs deployed as internal enterprise-specific agents can help employees find internal documentation, data, and other company information to help organizations easily extract and summarize important internal content.
Increase Productivity. LLMs deployed as code assistants accelerate developer efficiency within an organization, ensuring that code meets standards and coding best practices.
Several LLMs are publicly available through APIs from OpenAI, Anthropic, AWS, and others, which give developers instant access to industry-leading models that are capable of performing most generalized tasks. However, these LLM endpoints often can’t be used by enterprises for several reasons:
Fine tuning solves these issues. Fine tuning involves another round of training for a specific model to help guide the output of LLMs to meet specific standards of an organization. Given some example data, LLMs can quickly learn new content that wasn’t available during the initial training of the base model. The benefits of using fine-tuned models in an organization are numerous:
Although the benefits of fine tuning are substantial, the process of preparing, training, evaluating, and deploying fine-tuned LLMs is a lengthy LLMOps workflow that organizations handle differently. This leads to compatibility issues with no consistency in data and model organization.
To help remedy these issues, Cloudera introduces Fine Tuning Studio, a one-stop-shop studio application that covers the entire workflow and lifecycle of fine tuning, evaluating, and deploying fine-tuned LLMs in Cloudera’s AI Workbench. Now, developers, data scientists, solution engineers, and all AI practitioners working within Cloudera’s AI ecosystem can easily organize data, models, training jobs, and evaluations related to fine tuning LLMs.
Once the Fine Tuning Studio is deployed to any enterprise’s Cloudera’s AI Workbench, users gain instant access to powerful tools within Fine Tuning Studio to help organize data, test prompts, train adapters for LLMs, and evaluate the performance of these fine-tuning jobs:
To show how easy it is for GenAI builders to build and deploy a production-ready application, let’s take a look at an end-to-end example: fine tuning an event ticketing customer support agent. The goal is to fine tune a small, cost-effective model that , based on customer input, can extract an appropriate “action” (think API call) that the downstream system should take for the customer. Given the cost constraints of hosting and infrastructure, the goal is to fine tune a model that is small enough to host on a consumer GPU and can provide the same accuracy as a larger model.
Data Preparation. For this example, we will use the bitext/Bitext-events-ticketing-llm-chatbot-training-dataset
dataset available on HuggingFace, which contains pairs of customer input and desired intent/action output for a variety of customer inputs. We can import this dataset on the Import Datasets page.
Model Selection. To keep our inference footprint small, we will use the bigscience/bloom-1b1
model as our base model, which is also available on HuggingFace. We can import this model directly from the Import Base Models page. The goal is to train an adapter for this base model that gives it better predictive capabilities for our specific dataset.
Creating a Training Prompt. Next, we’ll create a prompt for both training and inference. We can utilize this prompt to give the model more context on possible selections. Let’s name our prompt better-ticketing
and use our bitext
dataset as the base dataset for the prompt. The Create Prompts page enables us to create a prompt “template” based on the features available in the dataset. We can then test the prompt against the dataset to make sure everything is working properly. Once everything looks good, we hit Create Prompt, which activates our prompt usage throughout the tool. Here’s our prompt template, which uses the instruction
and intent
fields from our dataset:
Train a New Adapter. With a dataset, model, and prompt selected, let’s train a new adapter for our bloom-1b1
model, which can more accurately handle customer requests. On the Train a New Adapter page, we can fill out all relevant fields, including the name of our new adapter, dataset to train on, and training prompt to use. For this example, we had two L40S GPUs available for training, so we chose the Multi Node training type. We trained on 2 epochs of the dataset and trained on 90% of the dataset, leaving 10% available for evaluation and testing.
Monitor the Training Job. On the Monitor Training Jobs page we can track the status of our training job, and also follow the deep link to the Cloudera Machine Learning Job directly to view log outputs. Two L40S GPUs and 2 epochs of our bitext
dataset completed training in only 10 minutes.
Check Adapter Performance. Once the training job completes, it’s helpful to “spot check” the performance of the adapter to make sure that it was trained successfully. Fine Tuning Studio offers a Local Adapter Comparison page to quickly compare the performance of a prompt between a base model and a trained adapter. Let’s try a simple customer input, pulled directly from the bitext
dataset: “i have to get a refund i need assistance”, where the corresponding desired output action is get_refund.
Looking at the output of the base model compared to the trained adapter, it’s clear that training had a positive impact on our adapter!
Evaluate the Adapter. Now that we’ve performed a spot check to make sure training completed successfully, let’s take a deeper look into the performance of the adapter. We can evaluate the performance against the “test” portion of the dataset from the Run MLFlow Evaluation page. This provides a more in-depth evaluation of any selected models and adapters. For this example, we will compare the performance of 1) just the bigscience/bloom-1b1
base model, 2) the same base model with our newly trained better-ticketing
adapter activated, and finally 3) a larger mistral-7b-instruct
model.
As we can see, our rougueL
metric (similar to an exact match but more complex) of the 1B model adapter is significantly higher than the same metric for an untrained 7B model. As simple as that, we trained an adapter for a small, cost-effective model that outperforms a significantly larger model. Even though the larger 7B model may perform better on generalized tasks, the non-fine-tuned 7B model has not been trained on the available “actions” that the model can take given a specific customer input, and therefore would not perform as well as our fine-tuned 1B model in a production environment.
As we saw, Fine Tuning Studio enables anyone of any skill level to train a model for any enterprise-specific use case. Now, customers can incorporate cost-effective, high-performance, fine-tuned LLMs into their production-ready AI workflows more easily than ever, and expose models to customers while ensuring safety and compliance. After training a model, users can use the Export Model feature to export trained adapters as a Cloudera Machine Learning model endpoint, which is a production-ready model hosting service available to Cloudera AI (formerly known as Cloudera Machine Learning) customers. Fine Tuning Studio ships with a powerful example application showing how easy it is to incorporate a model that was trained within Fine Tuning Studio into a full-fledged production AI application.
Cloudera’s Fine Tuning Studio is available to Cloudera AI customers as an Accelerator for Machine Learning Projects (AMP), right from Cloudera’s AMP catalog. Install and try Fine Tuning Studio following the instructions for deploying this AMP right from the workspace.
Want to see what’s under the hood? For advanced users, contributors, or other users who want to view or modify Fine Tuning Studio, the project is hosted on Cloudera’s github.
Cloudera is excited to be working on the forefront of training, evaluating, and deploying LLMs to customers in production-ready environments. Fine Tuning Studio is under continuous development and the team is eager to continue providing customers with a streamlined approach to fine tune any model, on any data, for any enterprise application. Get started today on your fine tuning needs, and Cloudera AI’s team is ready to assist in fulfilling your enterprise’s vision for AI-ready applications to become a reality.
This may have been caused by one of the following: