Skip to main content

AI Management

Configure AI providers, manage global prompts, run A/B tests on prompt variants, and monitor AI quality across your Testify platform.

Overview

Testify integrates with multiple AI providers to power question generation, explanations, tutoring, and classification. The AI management system gives administrators full control over which providers are used, what prompts guide the AI, and how AI quality is measured and optimized.

AI Management Dashboard

AI Provider Configuration

Testify supports the following AI providers:

ProviderModelsUse Cases
Google Geminigemini-pro, gemini-1.5-proQuestion generation, explanations, classification
OpenAIgpt-4, gpt-4-turbo, gpt-3.5-turboQuestion generation, tutoring, chat
DeepSeekdeepseek-chat, deepseek-coderCost-effective question generation
Anthropic Claudeclaude-3, claude-3.5High-quality question generation, explanations
OpenRouterMultiple modelsFlexible model routing

Configuring API Keys

  1. Navigate to "Admin" > "AI Settings"
  2. For each provider you want to enable, enter the API key:
    • "Gemini API Key" -- from Google AI Studio
    • "OpenAI API Key" -- from OpenAI platform
    • "DeepSeek API Key" -- from DeepSeek dashboard
    • "Claude API Key" -- from Anthropic console
    • "OpenRouter API Key" -- from OpenRouter dashboard
  3. Click "Save Configuration"

Tip: You do not need to configure all providers. Configure at least one to enable AI features. The system falls back to configured providers if the primary is unavailable.

Setting the Default Provider

  1. Go to "Admin" > "AI Settings"
  2. Under "Default Provider", select the provider to use for AI operations
  3. Click "Save"

The default provider is used when no specific provider is requested by the user or feature.

Global Prompts Management

Global prompts are the system-level prompt templates that guide AI behavior across all features. Super admins can view, edit, and version these prompts.

Viewing All Prompts

  1. Navigate to "Admin" > "Global Prompts"
  2. The list shows all configured prompts with:
    • Prompt Key -- the internal identifier (e.g., question_generation, explanation_builder)
    • Prompt Name -- the human-readable name
    • Category -- the feature area (e.g., Generation, Tutoring, Classification)
    • Version -- the current version number
    • Status -- active or inactive
    • Last Updated -- when the prompt was last modified

Global Prompts List

Editing a Prompt

  1. Go to "Admin" > "Global Prompts"
  2. Click on the prompt you want to edit
  3. Modify the "Prompt Content" -- the actual text sent to the AI model
  4. Optionally update the "Description" to document your changes
  5. Click "Save"

The system automatically:

  • Increments the version number
  • Records who made the change
  • Stores the previous version in the version history
  • Clears the prompt cache so the new version takes effect immediately

Tip: Test prompt changes in a staging environment before applying them in production. Small wording changes can significantly affect AI output quality.

Creating a New Prompt

  1. Go to "Admin" > "Global Prompts"
  2. Click "Create Prompt"
  3. Fill in:
    • "Prompt Key" -- a unique identifier (lowercase, underscores, e.g., custom_feedback_prompt)
    • "Prompt Name" -- a descriptive name
    • "Category" -- select or type a category
    • "Prompt Content" -- the full prompt text
    • "Description" -- what this prompt is used for
  4. Click "Save"

Viewing Prompt Version History

  1. Click on a prompt to open its details
  2. Click the "History" tab
  3. View all previous versions with:
    • Version number
    • Content at that version
    • Who made the change
    • When the change was made
  4. Click "Restore" on any version to roll back to it

Activating and Deactivating Prompts

  1. Go to "Admin" > "Global Prompts"
  2. Toggle the "Active" switch on any prompt
  3. Inactive prompts are not used by AI services

Tip: Deactivate a prompt instead of deleting it if you want to temporarily disable an AI behavior while preserving the prompt content.

Fetching Active Prompts (API)

AI services at runtime fetch active prompts automatically. The active prompts endpoint returns all enabled prompts as a key-value map for efficient lookup.

Prompt A/B Testing

Run controlled experiments to compare different prompt variants and measure which produces better AI output.

Creating an Experiment

  1. Navigate to "Admin" > "AI Settings" > "A/B Testing"
  2. Click "New Experiment"
  3. Configure the experiment:
    • "Name" -- a descriptive name (e.g., "Question Generation v2 vs v3")
    • "Description" -- what you are testing
    • "Prompt Key" -- which prompt to experiment on
    • "Variants" -- define at least 2 prompt variants:
      • "Variant Name" (e.g., "Control" and "New Prompt")
      • "Prompt Text" -- the full prompt for this variant
      • "Traffic Percent" -- percentage of requests to route to this variant
    • "Start Date" and "End Date" -- the experiment duration
  4. Verify that traffic percentages sum to 100%
  5. Click "Create"

The experiment starts in Draft status.

A/B Testing

Running an Experiment

  1. Go to "A/B Testing" and find your experiment
  2. Click "Activate" to start routing traffic
  3. The system randomly assigns each AI request to a variant based on traffic percentages
  4. Monitor results in real-time on the experiment dashboard

Viewing Experiment Results

  1. Click on an active or completed experiment
  2. The results page shows per-variant metrics:
    • Number of requests served
    • Average quality score (if quality scoring is enabled)
    • Average latency
    • User feedback ratings
  3. Statistical significance indicators help determine when a winner can be declared

Concluding an Experiment

  1. Open the experiment
  2. Review results and select the winning variant
  3. Click "Apply Winner" to update the global prompt with the winning variant's text
  4. The experiment status changes to Completed

AI Quality Dashboard

Monitor the quality and performance of AI outputs across the platform.

Accessing the Dashboard

  1. Navigate to "Admin" > "AI Settings" > "Quality Dashboard"
  2. Set the time period (default: 30 days)

Key Metrics

The dashboard displays:

  • Total AI Operations -- number of AI requests processed
  • Average Confidence Score -- mean confidence across all AI outputs
  • Average Latency -- mean response time per AI request
  • Token Usage -- total tokens consumed by provider
  • Error Rate -- percentage of failed AI requests

Usage by Model

A breakdown of AI usage per provider/model showing:

  • Request count
  • Average tokens per request
  • Average latency
  • Average confidence score

Usage by Feature

See which features consume the most AI resources:

  • Question generation
  • Explanations
  • Tutoring
  • Classification
  • Chat

Line charts showing AI usage and quality metrics over time, helping identify patterns and anomalies.

Tip: If you notice a sudden drop in confidence scores or spike in latency, check the AI provider's status page for outages.

AI Explainability

Every AI operation in Testify is logged with explainability metadata for compliance and quality assurance.

What Is Logged

For each AI operation:

  • Input Summary -- what was sent to the model
  • Output Summary -- what the model returned
  • Decision Rationale -- why the AI made its choices
  • Confidence Score -- the model's self-assessed confidence
  • Tokens Used -- input and output tokens consumed
  • Latency -- total processing time

Accessing Explainability Records

  1. Go to "Admin" > "Audit Logs" > "AI Events"
  2. Click on any AI event to see its explainability details
  3. Use the "View Full Job" link to see the complete AI job timeline

See the Audit Logs documentation for more details on navigating AI audit trails.

Managing AI Credits

AI operations consume credits from your organization's credit balance.

  1. Go to "Admin" > "Billing" > "Credits"
  2. View current credit balance
  3. Purchase additional credits as needed
  4. Set credit alerts to be notified when balance is low

Tip: Different AI operations consume different amounts of credits. Question generation typically uses more credits than explanations due to higher token counts.

Best Practices

Prompt Engineering

  1. Be specific about output format (JSON, markdown, plain text)
  2. Include examples of desired output in the prompt
  3. Specify the target audience (e.g., "for grade 10 students")
  4. Set clear quality criteria (e.g., "include at least one distractor that tests common misconceptions")

Provider Selection

  1. Use Gemini or OpenAI for high-quality question generation
  2. Use DeepSeek for cost-effective bulk operations
  3. Configure multiple providers for redundancy
  4. Monitor per-provider quality scores to identify the best fit

Cost Optimization

  1. Monitor token usage in the AI Quality Dashboard
  2. Use shorter, more focused prompts to reduce token consumption
  3. Run A/B tests to find prompts that produce good results with fewer tokens
  4. Set per-user or per-organization AI usage limits if needed

Troubleshooting

AI Features Not Working

  • Verify at least one AI provider API key is configured
  • Check that the API key is valid and has sufficient quota
  • Look for error messages in the audit logs under AI events

Poor AI Output Quality

  • Review the global prompt for the feature in question
  • Run an A/B test with improved prompt variants
  • Try a different AI provider/model
  • Check confidence scores in the AI Quality Dashboard

High Token Costs

  • Review usage by model in the AI Quality Dashboard
  • Consider switching to a more cost-effective provider for bulk operations
  • Optimize prompts to be more concise while maintaining quality