AI Management
Configure AI providers, manage global prompts, run A/B tests on prompt variants, and monitor AI quality across your Testify platform.
Overview
Testify integrates with multiple AI providers to power question generation, explanations, tutoring, and classification. The AI management system gives administrators full control over which providers are used, what prompts guide the AI, and how AI quality is measured and optimized.
AI Provider Configuration
Testify supports the following AI providers:
| Provider | Models | Use Cases |
|---|---|---|
| Google Gemini | gemini-pro, gemini-1.5-pro | Question generation, explanations, classification |
| OpenAI | gpt-4, gpt-4-turbo, gpt-3.5-turbo | Question generation, tutoring, chat |
| DeepSeek | deepseek-chat, deepseek-coder | Cost-effective question generation |
| Anthropic Claude | claude-3, claude-3.5 | High-quality question generation, explanations |
| OpenRouter | Multiple models | Flexible model routing |
Configuring API Keys
- Navigate to "Admin" > "AI Settings"
- For each provider you want to enable, enter the API key:
- "Gemini API Key" -- from Google AI Studio
- "OpenAI API Key" -- from OpenAI platform
- "DeepSeek API Key" -- from DeepSeek dashboard
- "Claude API Key" -- from Anthropic console
- "OpenRouter API Key" -- from OpenRouter dashboard
- Click "Save Configuration"
Tip: You do not need to configure all providers. Configure at least one to enable AI features. The system falls back to configured providers if the primary is unavailable.
Setting the Default Provider
- Go to "Admin" > "AI Settings"
- Under "Default Provider", select the provider to use for AI operations
- Click "Save"
The default provider is used when no specific provider is requested by the user or feature.
Global Prompts Management
Global prompts are the system-level prompt templates that guide AI behavior across all features. Super admins can view, edit, and version these prompts.
Viewing All Prompts
- Navigate to "Admin" > "Global Prompts"
- The list shows all configured prompts with:
- Prompt Key -- the internal identifier (e.g.,
question_generation,explanation_builder) - Prompt Name -- the human-readable name
- Category -- the feature area (e.g., Generation, Tutoring, Classification)
- Version -- the current version number
- Status -- active or inactive
- Last Updated -- when the prompt was last modified
- Prompt Key -- the internal identifier (e.g.,
Editing a Prompt
- Go to "Admin" > "Global Prompts"
- Click on the prompt you want to edit
- Modify the "Prompt Content" -- the actual text sent to the AI model
- Optionally update the "Description" to document your changes
- Click "Save"
The system automatically:
- Increments the version number
- Records who made the change
- Stores the previous version in the version history
- Clears the prompt cache so the new version takes effect immediately
Tip: Test prompt changes in a staging environment before applying them in production. Small wording changes can significantly affect AI output quality.
Creating a New Prompt
- Go to "Admin" > "Global Prompts"
- Click "Create Prompt"
- Fill in:
- "Prompt Key" -- a unique identifier (lowercase, underscores, e.g.,
custom_feedback_prompt) - "Prompt Name" -- a descriptive name
- "Category" -- select or type a category
- "Prompt Content" -- the full prompt text
- "Description" -- what this prompt is used for
- "Prompt Key" -- a unique identifier (lowercase, underscores, e.g.,
- Click "Save"
Viewing Prompt Version History
- Click on a prompt to open its details
- Click the "History" tab
- View all previous versions with:
- Version number
- Content at that version
- Who made the change
- When the change was made
- Click "Restore" on any version to roll back to it
Activating and Deactivating Prompts
- Go to "Admin" > "Global Prompts"
- Toggle the "Active" switch on any prompt
- Inactive prompts are not used by AI services
Tip: Deactivate a prompt instead of deleting it if you want to temporarily disable an AI behavior while preserving the prompt content.
Fetching Active Prompts (API)
AI services at runtime fetch active prompts automatically. The active prompts endpoint returns all enabled prompts as a key-value map for efficient lookup.
Prompt A/B Testing
Run controlled experiments to compare different prompt variants and measure which produces better AI output.
Creating an Experiment
- Navigate to "Admin" > "AI Settings" > "A/B Testing"
- Click "New Experiment"
- Configure the experiment:
- "Name" -- a descriptive name (e.g., "Question Generation v2 vs v3")
- "Description" -- what you are testing
- "Prompt Key" -- which prompt to experiment on
- "Variants" -- define at least 2 prompt variants:
- "Variant Name" (e.g., "Control" and "New Prompt")
- "Prompt Text" -- the full prompt for this variant
- "Traffic Percent" -- percentage of requests to route to this variant
- "Start Date" and "End Date" -- the experiment duration
- Verify that traffic percentages sum to 100%
- Click "Create"
The experiment starts in Draft status.
Running an Experiment
- Go to "A/B Testing" and find your experiment
- Click "Activate" to start routing traffic
- The system randomly assigns each AI request to a variant based on traffic percentages
- Monitor results in real-time on the experiment dashboard
Viewing Experiment Results
- Click on an active or completed experiment
- The results page shows per-variant metrics:
- Number of requests served
- Average quality score (if quality scoring is enabled)
- Average latency
- User feedback ratings
- Statistical significance indicators help determine when a winner can be declared
Concluding an Experiment
- Open the experiment
- Review results and select the winning variant
- Click "Apply Winner" to update the global prompt with the winning variant's text
- The experiment status changes to Completed
AI Quality Dashboard
Monitor the quality and performance of AI outputs across the platform.
Accessing the Dashboard
- Navigate to "Admin" > "AI Settings" > "Quality Dashboard"
- Set the time period (default: 30 days)
Key Metrics
The dashboard displays:
- Total AI Operations -- number of AI requests processed
- Average Confidence Score -- mean confidence across all AI outputs
- Average Latency -- mean response time per AI request
- Token Usage -- total tokens consumed by provider
- Error Rate -- percentage of failed AI requests
Usage by Model
A breakdown of AI usage per provider/model showing:
- Request count
- Average tokens per request
- Average latency
- Average confidence score
Usage by Feature
See which features consume the most AI resources:
- Question generation
- Explanations
- Tutoring
- Classification
- Chat
Daily Trends
Line charts showing AI usage and quality metrics over time, helping identify patterns and anomalies.
Tip: If you notice a sudden drop in confidence scores or spike in latency, check the AI provider's status page for outages.
AI Explainability
Every AI operation in Testify is logged with explainability metadata for compliance and quality assurance.
What Is Logged
For each AI operation:
- Input Summary -- what was sent to the model
- Output Summary -- what the model returned
- Decision Rationale -- why the AI made its choices
- Confidence Score -- the model's self-assessed confidence
- Tokens Used -- input and output tokens consumed
- Latency -- total processing time
Accessing Explainability Records
- Go to "Admin" > "Audit Logs" > "AI Events"
- Click on any AI event to see its explainability details
- Use the "View Full Job" link to see the complete AI job timeline
See the Audit Logs documentation for more details on navigating AI audit trails.
Managing AI Credits
AI operations consume credits from your organization's credit balance.
- Go to "Admin" > "Billing" > "Credits"
- View current credit balance
- Purchase additional credits as needed
- Set credit alerts to be notified when balance is low
Tip: Different AI operations consume different amounts of credits. Question generation typically uses more credits than explanations due to higher token counts.
Best Practices
Prompt Engineering
- Be specific about output format (JSON, markdown, plain text)
- Include examples of desired output in the prompt
- Specify the target audience (e.g., "for grade 10 students")
- Set clear quality criteria (e.g., "include at least one distractor that tests common misconceptions")
Provider Selection
- Use Gemini or OpenAI for high-quality question generation
- Use DeepSeek for cost-effective bulk operations
- Configure multiple providers for redundancy
- Monitor per-provider quality scores to identify the best fit
Cost Optimization
- Monitor token usage in the AI Quality Dashboard
- Use shorter, more focused prompts to reduce token consumption
- Run A/B tests to find prompts that produce good results with fewer tokens
- Set per-user or per-organization AI usage limits if needed
Troubleshooting
AI Features Not Working
- Verify at least one AI provider API key is configured
- Check that the API key is valid and has sufficient quota
- Look for error messages in the audit logs under AI events
Poor AI Output Quality
- Review the global prompt for the feature in question
- Run an A/B test with improved prompt variants
- Try a different AI provider/model
- Check confidence scores in the AI Quality Dashboard
High Token Costs
- Review usage by model in the AI Quality Dashboard
- Consider switching to a more cost-effective provider for bulk operations
- Optimize prompts to be more concise while maintaining quality