The Media Search Engine
The Media Search Engine was the biggest project I built during my first internship at MiraVideo. The search engine was created for video editors, providing them a database of reusable videos, images, and clips, and a search engine for querying. All of these resources are useful for creating and editing short videos/video reels for promoting multi-media channels or simply advertising some products. In this blog, I am going to break down the components of the search engine, and how I built them.
The first component of the search engine is of course, the database that’s backing it up. We had to create our own internal database for the video editors to use, since we also had some sort of contracts going on with video streaming platforms like IQiYi.
Initially, the videos we got from the providers are just raw. hour long mp4 files. I had to come up with an data ETL pipeline for breaking these videos down into documents and vectors. The documents are indexed to Elasticsearch, which is used as our NoSQL database. The vectors are saved to Milvus, which is our vector database. Let’s break down this pipeline in stages.
This concludes the ETL pipeline and now we have the processed data, ready for querying.
<aside> 🛠️
Tools I used for this component
</aside>
Amazon S3 - Cloud Object Storage - AWS
Elasticsearch: The Official Distributed Search & Analytics Engine | Elastic
GStreamer: open source multimedia framework
PaddlePaddle-Parallel Distributed Deep Learning, efficient and extensible deep learning framework.
The High-Performance Vector Database Built for Scale | Milvus
After a few weeks of ingesting data, we have some processed data ready to be queried. I was testing with CLI and Postman for querying videos and images, but obviously for our actual end users, I need to create an application for them. There are mainly two parts for building the search engine webapp.
Since I was writing some pretty complicated Elasticsearch DSL and Milvus queries for getting the data, the first step is to design a simplified query language that all of the video editors could use easily.
After a few iterations of query language design, I had given up on the idea of letting the editors use queries. It seems like the nature of using a “language” to get the result seemed a bit unintuitive to them no matter how simple I made the language to be. I ended up converting this into a list of selectable tags. The actual ES and Milvus query language is automatically constructured based on the combinations of tags the users selected.
<aside> 🛠️
Tools I used for this component
</aside>
Welcome! - The Apache HTTP Server Project
Welcome to Flask — Flask Documentation (3.0.x)
Docker: Accelerated Container Application Development
The last step is to build the webapp and host it. Let’s break it down into steps.
search.mirav.com
That’s it! Now all the editors could go to search.mirav.come
and start hunting for clips and images they want to use for their videos. I learned a lot while building the search engine and had lots of fun. As a part of my internship, I also shared how I felt my entire 1 year experience in this blog.
Thanks for reading! If you are interested, check out more contents like this here.