AI Task System Terminology and Workflow Variables
This document provides detailed information about the terminology and workflow variables used in the AI Task system.
Core Terminology
- Partitur: YAML-based orchestration file defining a sequence of processing steps (from the German musical term for a score or orchestration)
- Profile: A set of related data identified by a unique profile ID (e.g.,
profile_32101) - Performance: A specific production output instance, identified by combining production name and ID (e.g.,
nepi_32101) - Production: A collection of profiles within a project context, with a name (e.g., “nepi”, “inqua”)
- Instruction: Template file containing prompts for AI models
- Genre: Category of documents within a profile or performance (e.g., audio, transcription, slide, report)
- Step: A single operation in a processing pipeline (transcribe, analyze, convert, etc.)
- Pipe: The complete sequence of steps defined in a partitur file
Workflow Variables
All workflows use standardized variables denoted by double curly braces for consistency and reusability:
{production}: The name of the production (e.g., “nepi”, “inqua”){id}: The numerical identifier of a profile/performance (e.g., “32101”){genre}: The category of documents being processed (e.g., “audio”, “transcription”){no}: The sequential number of a document within its genre (e.g., “01”)
These variables can be referenced from any part of the system, ensuring consistency throughout the pipeline.
Example Variable Usage
# Example variable usage in a partitur file
pipe:
- name: transcribe_audio
type: llm
model: gemini-1.5-pro
tmpl: "transcription_template.j2"
source-file: "profile/profile_{{id}}/audio/document_{{id}}.m4a"
result-file: "profile/profile_{{id}}/transcription/INQUA2_{{id}}_transcription_{{no}}.txt"Flexible Naming Conventions
The system supports flexible naming conventions that can be adapted to each project’s needs:
Standard Naming Pattern
- Audio/Video files:
<prefix>_<id>_audio.<ext>(e.g.,document_32101.m4a) - Transcription files:
<prefix>_<id>_transcription_<version>.<ext>(e.g.,INQUA2_32101_transcription_01.txt) - Analysis files:
<prefix>_<id>_<analysis_type>_<version>.<ext>(e.g.,INQUA2_32101_sequence_01.txt)
Where: - <prefix>: Project or dataset identifier (e.g., INQUA2) - <id>: Profile or document identifier (e.g., 32101) - <version>: Version number (typically 01, 02, etc.) - <ext>: File extension (txt, docx, m4a, etc.)
Customizable Patterns
Patterns can be customized for specific projects while maintaining the required components:
# Example naming configuration
naming:
pattern: "{production}_{id}_{genre}_{no}"
separator: "_"
production: "nepi"
id_format: "{:03d}" # 001, 002, etc.
default_suffix: "txt"Directory Structure
The AI Task system uses a standardized directory structure:
project/
├── partitur/ # Orchestration files (.yml)
├── instruction/ # Instruction templates (.j2)
├── profile/ # Data organized by profile ID
│ └── profile_<id>/ # Individual profile directories
│ ├── audio/ # Audio input files
│ ├── transcription/ # Transcription outputs
│ └── analysis/ # Analysis outputs
├── production/ # Production configuration
│ ├── instructions/ # Production-specific instructions
│ └── <production_id>/ # Production-specific data
│ ├── video/ # Video files
│ ├── transcription/ # Transcription files
│ └── slides/ # Extracted slide images
├── setting/ # Configuration files
├── diagnostic/ # Logs and error tracking
└── script/ # Execution scripts
Global directories also exist at ~/.ai/:
~/.ai/
├── partitur/ # Global partitur files
└── instruction/ # Global instruction templates
Profile and Production Relationship
- Profile Structure:
- Source data is organized in
profile/profile_{{id}}/directories - Each profile contains the original data (e.g., audio recordings)
- Profile IDs uniquely identify each source data instance
- Source data is organized in
- Production Structure:
- Processing outputs can be organized in
production/{{production}}_{{id}}/directories - Each production directory contains the results of processing a specific profile
- The production name designates the workflow itself
- The same ID is used to link a production output to its source profile
- Processing outputs can be organized in
Function Registry
The AI Task system includes a function registry that allows both built-in and custom functions to be used in workflows:
Function Organization
Functions can be organized in multiple ways:
- Built-in Functions: Located in the AI Task package
- Project-level Functions: Located in a project’s function directory
- Profile-specific Functions: Located in profile-specific function directories
Function Loading Order
Functions are loaded in the following order (later definitions override earlier ones): 1. Built-in functions 2. Project-level functions 3. Profile-specific functions
This allows for flexible customization while maintaining a consistent interface.