AI Management

Configure AI providers, manage global prompts, run A/B tests on prompt variants, and monitor AI quality across your Testify platform.

Overview

Testify integrates with multiple AI providers to power question generation, explanations, tutoring, and classification. The AI management system gives administrators full control over which providers are used, what prompts guide the AI, and how AI quality is measured and optimized.

AI Provider Configuration

Testify supports the following AI providers:

Provider	Models	Use Cases
Google Gemini	gemini-pro, gemini-1.5-pro	Question generation, explanations, classification
OpenAI	gpt-4, gpt-4-turbo, gpt-3.5-turbo	Question generation, tutoring, chat
DeepSeek	deepseek-chat, deepseek-coder	Cost-effective question generation
Anthropic Claude	claude-3, claude-3.5	High-quality question generation, explanations
OpenRouter	Multiple models	Flexible model routing

Configuring API Keys

Navigate to "Admin" > "AI Settings"
For each provider you want to enable, enter the API key:
- "Gemini API Key" -- from Google AI Studio
- "OpenAI API Key" -- from OpenAI platform
- "DeepSeek API Key" -- from DeepSeek dashboard
- "Claude API Key" -- from Anthropic console
- "OpenRouter API Key" -- from OpenRouter dashboard
Click "Save Configuration"

Tip: You do not need to configure all providers. Configure at least one to enable AI features. The system falls back to configured providers if the primary is unavailable.

Setting the Default Provider

Go to "Admin" > "AI Settings"
Under "Default Provider", select the provider to use for AI operations
Click "Save"

The default provider is used when no specific provider is requested by the user or feature.

Global Prompts Management

Global prompts are the system-level prompt templates that guide AI behavior across all features. Super admins can view, edit, and version these prompts.

Viewing All Prompts

Navigate to "Admin" > "Global Prompts"
The list shows all configured prompts with:
- Prompt Key -- the internal identifier (e.g., question_generation, explanation_builder)
- Prompt Name -- the human-readable name
- Category -- the feature area (e.g., Generation, Tutoring, Classification)
- Version -- the current version number
- Status -- active or inactive
- Last Updated -- when the prompt was last modified

Editing a Prompt

Go to "Admin" > "Global Prompts"
Click on the prompt you want to edit
Modify the "Prompt Content" -- the actual text sent to the AI model
Optionally update the "Description" to document your changes
Click "Save"

The system automatically:

Increments the version number
Records who made the change
Stores the previous version in the version history
Clears the prompt cache so the new version takes effect immediately

Tip: Test prompt changes in a staging environment before applying them in production. Small wording changes can significantly affect AI output quality.

Creating a New Prompt

Go to "Admin" > "Global Prompts"
Click "Create Prompt"
Fill in:
- "Prompt Key" -- a unique identifier (lowercase, underscores, e.g., custom_feedback_prompt)
- "Prompt Name" -- a descriptive name
- "Category" -- select or type a category
- "Prompt Content" -- the full prompt text
- "Description" -- what this prompt is used for
Click "Save"

Viewing Prompt Version History

Click on a prompt to open its details
Click the "History" tab
View all previous versions with:
- Version number
- Content at that version
- Who made the change
- When the change was made
Click "Restore" on any version to roll back to it

Activating and Deactivating Prompts

Go to "Admin" > "Global Prompts"
Toggle the "Active" switch on any prompt
Inactive prompts are not used by AI services

Tip: Deactivate a prompt instead of deleting it if you want to temporarily disable an AI behavior while preserving the prompt content.

Fetching Active Prompts (API)

AI services at runtime fetch active prompts automatically. The active prompts endpoint returns all enabled prompts as a key-value map for efficient lookup.

Prompt A/B Testing

Run controlled experiments to compare different prompt variants and measure which produces better AI output.

Creating an Experiment

Navigate to "Admin" > "AI Settings" > "A/B Testing"
Click "New Experiment"
Configure the experiment:
- "Name" -- a descriptive name (e.g., "Question Generation v2 vs v3")
- "Description" -- what you are testing
- "Prompt Key" -- which prompt to experiment on
- "Variants" -- define at least 2 prompt variants:
  - "Variant Name" (e.g., "Control" and "New Prompt")
  - "Prompt Text" -- the full prompt for this variant
  - "Traffic Percent" -- percentage of requests to route to this variant
- "Start Date" and "End Date" -- the experiment duration
Verify that traffic percentages sum to 100%
Click "Create"

The experiment starts in Draft status.

Running an Experiment

Go to "A/B Testing" and find your experiment
Click "Activate" to start routing traffic
The system randomly assigns each AI request to a variant based on traffic percentages
Monitor results in real-time on the experiment dashboard

Viewing Experiment Results

Click on an active or completed experiment
The results page shows per-variant metrics:
- Number of requests served
- Average quality score (if quality scoring is enabled)
- Average latency
- User feedback ratings
Statistical significance indicators help determine when a winner can be declared

Concluding an Experiment

Open the experiment
Review results and select the winning variant
Click "Apply Winner" to update the global prompt with the winning variant's text
The experiment status changes to Completed

AI Quality Dashboard

Monitor the quality and performance of AI outputs across the platform.

Accessing the Dashboard

Navigate to "Admin" > "AI Settings" > "Quality Dashboard"
Set the time period (default: 30 days)

Key Metrics

The dashboard displays:

Total AI Operations -- number of AI requests processed
Average Confidence Score -- mean confidence across all AI outputs
Average Latency -- mean response time per AI request
Token Usage -- total tokens consumed by provider
Error Rate -- percentage of failed AI requests

Usage by Model

A breakdown of AI usage per provider/model showing:

Request count
Average tokens per request
Average latency
Average confidence score

Usage by Feature

See which features consume the most AI resources:

Question generation
Explanations
Tutoring
Classification
Chat

Daily Trends

Line charts showing AI usage and quality metrics over time, helping identify patterns and anomalies.

Tip: If you notice a sudden drop in confidence scores or spike in latency, check the AI provider's status page for outages.

AI Explainability

Every AI operation in Testify is logged with explainability metadata for compliance and quality assurance.

What Is Logged

For each AI operation:

Input Summary -- what was sent to the model
Output Summary -- what the model returned
Decision Rationale -- why the AI made its choices
Confidence Score -- the model's self-assessed confidence
Tokens Used -- input and output tokens consumed
Latency -- total processing time

Accessing Explainability Records

Go to "Admin" > "Audit Logs" > "AI Events"
Click on any AI event to see its explainability details
Use the "View Full Job" link to see the complete AI job timeline

See the Audit Logs documentation for more details on navigating AI audit trails.

Managing AI Credits

AI operations consume credits from your organization's credit balance.

Go to "Admin" > "Billing" > "Credits"
View current credit balance
Purchase additional credits as needed
Set credit alerts to be notified when balance is low

Tip: Different AI operations consume different amounts of credits. Question generation typically uses more credits than explanations due to higher token counts.

Best Practices

Prompt Engineering

Be specific about output format (JSON, markdown, plain text)
Include examples of desired output in the prompt
Specify the target audience (e.g., "for grade 10 students")
Set clear quality criteria (e.g., "include at least one distractor that tests common misconceptions")

Provider Selection

Use Gemini or OpenAI for high-quality question generation
Use DeepSeek for cost-effective bulk operations
Configure multiple providers for redundancy
Monitor per-provider quality scores to identify the best fit

Cost Optimization

Monitor token usage in the AI Quality Dashboard
Use shorter, more focused prompts to reduce token consumption
Run A/B tests to find prompts that produce good results with fewer tokens
Set per-user or per-organization AI usage limits if needed

Troubleshooting

AI Features Not Working

Verify at least one AI provider API key is configured
Check that the API key is valid and has sufficient quota
Look for error messages in the audit logs under AI events

Poor AI Output Quality

Review the global prompt for the feature in question
Run an A/B test with improved prompt variants
Try a different AI provider/model
Check confidence scores in the AI Quality Dashboard

High Token Costs

Review usage by model in the AI Quality Dashboard
Consider switching to a more cost-effective provider for bulk operations
Optimize prompts to be more concise while maintaining quality

Overview​

AI Provider Configuration​

Configuring API Keys​

Setting the Default Provider​

Global Prompts Management​

Viewing All Prompts​

Editing a Prompt​

Creating a New Prompt​

Viewing Prompt Version History​

Activating and Deactivating Prompts​

Fetching Active Prompts (API)​

Prompt A/B Testing​

Creating an Experiment​

Running an Experiment​

Viewing Experiment Results​

Concluding an Experiment​

AI Quality Dashboard​

Accessing the Dashboard​

Key Metrics​

Usage by Model​

Usage by Feature​

Daily Trends​

AI Explainability​

What Is Logged​

Accessing Explainability Records​

Managing AI Credits​

Best Practices​

Prompt Engineering​

Provider Selection​

Cost Optimization​

Troubleshooting​

AI Features Not Working​

Poor AI Output Quality​

High Token Costs​