Empowering the European Media Market: Infalia's Intelligent Core for the MOSAIC Platform

The MOSAIC project has evolved from a centralized concept into a Federated Multi-Node Platform, designed to democratize access to media content across Europe. In this distributed ecosystem, where data sovereignty and global discovery must coexist, Infalia has architected the platform’s intelligent core.
By developing the Middleware Proxy Service, the Media Annotation Service, and the Universal Media Annotation & Asset Orchestration Suite, Infalia provides the critical infrastructure that turns isolated archives into a connected, intelligent European market.

1. Middleware Proxy Service: The Intelligent Federation Engine
The Middleware Proxy Service (Gateway) is the reliable orchestrator of a MOSAIC Node. Built with Node.js and TypeScript, it is an intelligent decision engine that manages the precise balance between local data governance and global accessibility. It implements a Controller-Service-Repository pattern to ensure modularity and testability.
Architectural Innovations
- Federated “Index-Once, Search-Anywhere”: The Gateway implements a Cross-Lingual Information Retrieval (CLIR) architecture. An asset uploaded in one node is processed, translated, and indexed in a shared OpenSearch cluster. This allows a user in Italy to search for “Tramonto” (Sunset) and instantly retrieve assets uploaded in Estonia tagged as “Päikeseloojang”, breaking down linguistic silos.
- Hybrid Data Sovereignty: It enforces a strict separation:
- Local Sovereignty: Sensitive capabilities (Visual Analysis, Asset Storage, Deepfake Detection) run locally within the Node to ensure data privacy and GDPR compliance.
- Shared Intelligence: Language services (Translation, Transcription) are offloaded to high-performance SaaS clusters via secure APIs, reducing the local hardware footprint.
Security & Identity Management
User authentication is decoupled from the application logic using a robust Keycloak integration.
- Stateless Security: The Gateway validates JSON Web Tokens (JWT) for every request.
- Tenancy Enforcement: It extracts custom attributes (Tenant ID, Node ID) from the token to enforce strict data segmentation, ensuring users only access assets they are authorized to see within the federated network.
Linguistic Engineering: The OpenSearch Layer
To support the 8+ target languages, the OpenSearch configuration goes beyond standard text indexing. We implemented a hybrid analyzer strategy:
- Native Analyzers: Used for high-resource languages like English, Spanish, and French.
- Custom ICU Analyzers: For languages with complex morphology (Catalan, Slovenian, Estonian), we configured International Components for Unicode (ICU) analyzers. These handle accent normalization (e.g., “montaña” == “montana”), stemming, and stop-word filtering, ensuring that search relevance is maintained across borders.
Optimized Processing Patterns
To handle the diverse nature of media, the Gateway implements two distinct processing patterns:
“Proxy and Intercept” (Synchronous - Images/Video):
- The Logic: For visual assets, immediate feedback is critical. The Gateway proxies the upload to the storage layer and intercepts the successful response.
- The Benefit: It instantly indexes the metadata in OpenSearch while triggering asynchronous AI enrichment in the background. The user sees their file immediately, while the “Brain” analyzes it.
“Fire-and-Forget” (Asynchronous - Audio/Text):
- The Logic: For computationally heavy tasks like transcribing an hour-long interview, the Gateway accepts the file, returns a
201 Createdinstantly, and manages a background job queue. - The Benefit: This decoupling prevents HTTP timeouts and ensures the UI remains responsive, even when the backend is crunching gigabytes of data.
- The Logic: For computationally heavy tasks like transcribing an hour-long interview, the Gateway accepts the file, returns a
2. The Media Annotation Service: Next-Generation Video Understanding
The true value of MOSAIC lies in extracting meaning from pixels. Infalia’s Media Annotation Service has migrated from generic object detection to a highly specialized, video-centric understanding engine.

Infrastructure: Built for Speed and Scale
- Microservices on gRPC: The system uses low-latency gRPC communication to stream heavy video data to AI models, reducing network overhead by 30-50% compared to REST JSON.
- NVIDIA Triton Inference Server: At its core, Triton manages dynamic model loading. This allows the system to swap models in and out of GPU memory (VRAM) on the fly—critical for running 26+ diverse models on limited hardware.
- AsyncIO Parallelism: The service employs an AsyncIO streaming pattern, allowing multiple models (e.g., Object Detection + Scene Classification) to run concurrently on the same video stream, significantly reducing “Time-to-Insight.”
Next-Generation Model Suite (2025 Refinements)
We have moved beyond standard baselines to integrate state-of-the-art architectures:
- Native Video Understanding (LLaVA-OneVision): Unlike older systems that analyzed single frames, we deploy the LLaVA-OneVision (0.5B parameter) model. It processes video temporally, understanding the narrative context of actions rather than just spotting objects.
- Open-Vocabulary Tagging (RAM): We replaced fixed-list classifiers with the Recognize Anything Model (RAM). This allows the system to tag “unseen” concepts—essential for the unpredictable variety of news media.
- Domain-Specific Analysis (News Tagging): A custom internal module refines tags to apply news-media specific taxonomies, distinguishing between a “crowd” and a “political demonstration.”
- Deepfake Defense Ensemble: To combat misinformation, we deploy an ensemble of EfficientNet-B4 models. These analyze both spatial artifacts (in frames) and temporal inconsistencies (across frames) to flag manipulated content with high precision.
- Identity at Scale: Our Face Recognition pipeline, powered by AdaFace, now benchmarks against a database of 16,000+ public figures, robust even in challenging lighting or crowded scenes.
3. Universal Media Annotation & Asset Orchestration Suite: Enterprise-Grade Persistence
The Universal Media Annotation & Asset Orchestration Suite serves as the persistence layer, tailored for the “Write-Once, Read-Many” demands of media archives. It acts as the “Source of Truth” for all physical files.

- Metadata-Binary Separation: To ensure performance, heavy binary files (BLOBs) are offloaded to MinIO (S3-compatible object storage), while metadata is managed in PostgreSQL. This ensures the database remains lean and fast.
- Forensic “Plumbing”: The Suite natively integrates the outputs of forensic tools (JPEG Ghost, Error Level Analysis), storing these distinct “verification artifacts” alongside the original media for instant recall by journalists.
- Secure Workspaces: Usage of Role-Based Access Control (RBAC) creates isolated environments within the platform, allowing different broadcasters to work securely on shared infrastructure.
4. Impact & Future Outlook: A Symbiotic Leap
The MOSAIC project has been a defining proving ground, creating a symbiotic relationship where Infalia’s technical maturity accelerated the project’s success, while the project’s rigorous demands hardened our solutions for the enterprise market.
Infalia’s Core Contributions to MOSAIC
We delivered the architectural backbone that transformed MOSAIC from a concept into a functioning European ecosystem.
- From Storage to Intelligence: We shifted the platform’s focus from passive file storage to active video understanding. By integrating agents like LLaVA-OneVision (for content description) and Media Authenticity tools (detecting AI-generated manipulations and deepfakes), we ensured MOSAIC doesn’t just host content—it verifies and explains it.
- Solved the Federation Puzzle: Our Middleware Proxy Service solved the critical challenge of “Data Sovereignty vs. Global Discovery,” providing the secure glue that allows distinct European nodes to collaborate without compromising local ownership.
- Enterprise-Grade Security: We introduced rigorous standards—Keycloak authentication, Forensic-Ready storage, and RBAC workspaces—ensuring the platform is trusted enough for professional newsrooms.
What We Gained
Participating in MOSAIC validated our vision for the Universal Media Annotation & Asset Orchestration Suite in a high-stakes, real-world environment.
- Validation at Scale: We proved that our “Fire-and-Forget” and “Proxy” patterns can handle the chaotic, burst-heavy nature of real-world media production, processing gigabytes of video without UI latency.
- AI Hardening: Testing against the diverse, multilingual, and often lower-quality footage of real archives helped us refine our AI models (specifically RAM and Face Recognition) to be robust against “in-the-wild” data, moving beyond academic benchmarks.
- A Market-Ready Product: The Orchestration Suite is no longer just a collection of services; it is now a battle-tested, deployable platform ready to solve the media management crisis for organizations across Europe.
🔗 Learn more about the project at mosaic-media.eu
