Arcee Orchestra Build an Agentic Workflow to Augment and Localize YouTube Content
February 21, 2025
Arcee Orchestra (https://www.arcee.ai/product/orchestra) is a platform that turns AI into action. It lets you create workflows that automate tasks, streamline processes, and even power new products. Think of it as a tool for building smart automation that can handle the work for you. It’s as simple to use as any AI chat tool but far more capable.
This workflow demonstrates the use of the YouTube and Google Docs integrations. Here, we build a content assistant, retrieving English captions from a YouTube video, writing a technical blog post based on the video's content, translating the blog post to Chinese and Hindi, and saving all three posts in Google Docs.
⭐️⭐️⭐️ Don't forget to subscribe to be notified of future videos. You can also follow me on Medium at https://julsimon.medium.com or Substack at https://julsimon.substack.com. ⭐️⭐️⭐️
Learn more about Arcee Orchestra on the product page at https://www.arcee.ai/product/orchestra, or by booking a demo at https://www.arcee.ai/book-a-demo.
Transcript
Hi everybody, this is Julien from Arcee. In this video, I'm going to show you an Arcee Orchestra workflow that I use to augment and localize my YouTube content. We're going to start from a YouTube video and extract the English captions automatically generated by YouTube. Then, using a small language model, I'm going to use those captions to write a blog post, save it to Google Docs, and likewise, I am going to translate the blog post into Hindi and Chinese. Let's get started.
In the start node, I have a single input variable, which is the video ID. The video ID is what you see here in the YouTube URL. With the video ID, I am able to retrieve the list of available caption tracks. For this, I used YouTube integration with the list caption track API, and all I need to do is pass the video ID. The output of that API is a JSON document listing the different tracks that are available. One of them is the English caption track. It's very simple for me to prompt a model to get the model and say, based on the JSON document that you're receiving here, find the English caption track and return the unique ID of this track. I could probably use a bit of code to do this as well, but the language model was perfectly capable of doing it right.
Now that I have the ID of the English caption track, I'm using the YouTube integration with the load captions API to download the actual captions from that track. These will come with timestamps as you would expect, and I want to clean them up. I only want a clean caption plain text output. Using a small language model again, I'm prompting it to keep the original text and do just some minor fixes: remove the hesitations that I'm guilty of, make sure the company name is Arcee, not Arcee, which YouTube loves to output. I make it clear that the model name is Virtuoso and not virtual or virtuals. Sometimes the transcription is incorrect. Same for integrations, which are done with a service called Composio. And yes, my first name is Julien with an E, not with an A. I also want the model to add punctuation, which is generally missing from the captions. Just cleaning things up and making sure I get a nice clean output to write the blog post.
Now, using the Virtuoso large model because it's a better writer, I want to write a technical blog based on the video's theme. I want the model to add some of its own knowledge and perspective just to enrich the blog post. I want a proper introduction and conclusion, and I want the call to action with links. I want everything in Markdown formatting. The output of this will be hopefully a nice English language blog post, which I can then store in Google Docs. I'm also going to build Chinese and Hindi translations of the blog post and I'm using Virtuoso small for that. So, very simple prompt, just translate the English post. That's Chinese and that's Hindi. Almost the same prompt. Finally, I'm storing those three blog posts into Google documents using the Google Doc integration. The title of the doc will be YouTube video with the video ID and then the language, and the Markdown text will be the output from one of the previous steps. And that's all there is to it. In the end, I should get my three Google Docs in English, Chinese, and Hindi.
As usual, we could run the workflow here and see the JSON output, but instead, let's run this in the chat. So in the chat, I asked Virtuoso large to write a blog post for this video and I just pasted the full URL of this video here. We can see the workflow executed, and apparently, Virtuoso did pick up that the video ID was this. It did extract this and pass it to the workflow. So that's nice. And we have here the cleaned-up caption. Now let's take a look at Google Docs. Here's the English blog post. We can see this is nicely structured, step-by-step explanation of each box. Model names look good, my first name looks good, and my call to action looks good. So this is nice. I can add this to my posts on Medium and Substack. This is a good way for the viewer to get a quick overview of the video.
And then, of course, I've got my Hindi and Chinese translations. Let's ask Gemini to translate back to English and see what that looks like. Here is the English translation, and I have to say it looks very good. Service names are accurate, and it looks close to the English version. I did the same for the Chinese post, and it does look good too. Of course, native speakers would probably find some small inconsistencies, but for me, this is a really good way to localize my content. From now on, I would post those Hindi and Chinese versions to my Substack. Hopefully, this gets me more viewers and helps folks out there read some cool AI content in their own native language.
All right, that's what I wanted to show you today. A nice little demo of Arcee Orchestra with a combination of YouTube APIs and small language models to augment and localize your YouTube content. That's it for today. Hope you like it. Until next time, my friends, keep rocking.
Julien Simon is the Chief Evangelist at Arcee AI
, specializing in Small Language Models and enterprise AI solutions. Recognized as the #1 AI Evangelist globally by AI Magazine in 2021, he brings over 30 years of technology leadership experience to his role.
With 650+ speaking engagements worldwide and 350+ technical blog posts, Julien is a leading voice in practical AI implementation, cost-effective AI solutions, and the democratization of artificial intelligence. His expertise spans open-source AI, Small Language Models, enterprise AI strategy, and edge computing optimization.
Previously serving as Principal Evangelist at Amazon Web Services and Chief Evangelist at Hugging Face, Julien has helped thousands of organizations implement AI solutions that deliver real business value. He is the author of "Learn Amazon SageMaker," the first book ever published on AWS's flagship machine learning service.
Julien's mission is to make AI accessible, understandable, and controllable for enterprises through transparent, open-weights models that organizations can deploy, customize, and trust.