It’s already been a busy summer. CSU Social just made an appearance at the 2025 Digital Collegium. We presented on how artificial intelligence (AI) can accelerate, but not replace, the work of creatives.
I’ll share takeaways on my section of our presentation, which examined how AI tools like ChatGPT and Google Gemini can assist in time-intensive tasks in post-production. Most importantly, I’ll emphasize why the human touch is, and always will be, the creative lead and visionary.
AI use and visuals at a glance: Three architectures behind the tools
Before delving into examples, it’s crucial to understand that various AI tools rely on distinct underlying systems.
Diffusion Models (used in Photoshop Generative Fill): These generate new images by starting with random noise and gradually “denoising” it into something recognizable, based on training data.
Deep Learning (used in Premiere Pro’s Enhance Audio): These AI systems are fed enormous amounts of sound data to help AI distinguish human speech from background noise.
Large Language Models (LLMs) (like ChatGPT and Gemini): Similar to Deep Learning, these systems are trained on massive amounts of text data to process language-based tasks. This includes transcription, summarization, and quote mapping by analyzing patterns across billions of words.
These architectural aspects of AI are impressive. However, they fall short with emotional and abstract thinking.
Where AI shines: Script structuring with LLMs
After I conduct interviews for a project, one of the most time-consuming steps is building the story through audio transcripts. This is where LLMs can help get you started. LLMs are one of the main infrastructures that make up systems like OpenAI’s ChatGPT and Google Gemini.
Take a look at this diagram. Here, we outline how ChatGPT or Gemini can sift through transcriptions to begin building the foundation of a story through quotes and dialogue. Let’s unpack each step.
Auto-transcription: using tools like Premiere Pro or Descript (AI powered by deep learning or Whisper). Many video editing platforms can transcribe your dialogue for you to export or copy and paste into a Word or Google Drive document.
Theme and topic identification: LLMs with ChatGPT or Gemini are great at picking out recurring subjects and grouping similar ideas. Once I have my transcripts in documents, I ask AI to identify key themes and recurring topics, each with time codes or timeframes for my reference.
Refining themes: This is where I step in. AI can cluster similar ideas, but I use lived experience, intuition, and qualitative context to decide which threads matter and how they integrate.
Narrative building again calls for the human experience. I determine how those themes and threads can be structured to create a story arc or circle. Arcs and circles exist not only in the video in its entirety. It’s also present in every scene we watch. Emotional peaks and resolution is simply something AI cannot intuitively sense or feel.
Quote + timecode mapping: This brings AI back into the workflow. Once I have my themes and topics threaded in a particular order, I ask AI to identify quotes under each theme or topic in the narrative order I’ve laid forth.
Script refinement: This step returns to the human domain, where I fill the gaps (AI will miss a lot of quotes) and remove or add quotes to fine-tune meaning. Finally, I do a final pass for “hallucinations,” where AI likes to make up quotes. Its often helpful to prompt AI to avoid editing or changing any quotes when you ask it to quote map.
LLMs excel at script and text writing because they’re trained on processing and organizing large amounts of data and patterns. However, they are far from magic. They work by analyzing statistical relationships between words, not meaning. Years of developing your own identity and experiences build meaning. It can’t even begin to know what twists your gut or heart and what alleviates those tensions. They help me see the forest, but I still have to walk through the trees.
What AI Misses: Symbolism, lived experience, and emotional intelligence
Remember, emotional context, subtext, and weight are something only the human soul can sense. Let’s examine two of my recent projects, both in my work at CSU and in freelancing.
Case study 1: The Symbolism of Mooncakes
Currently, I’m working on a documentary called, There’s a Lane for Us Here, which explores the legacy, history, and collective memory of the Little Saigon Business District in Denver, Colorado. In this film, one of the bakeries we are featuring, Vinh Xuong, is known for its mooncakes. These mooncakes are a powerful visual metaphor that reveals themes of collective nostalgia, tradition, and hard work.
When I used AI to help script this scene using the previously mentioned methodologies, it failed to identify the most impactful soundbites to illustrate these themes. Instead of selecting quotes that resonate with us as a community, it focused on the frequency of related phrases that were often devoid of emotional depth.

For example, it seemed to create statistical patterns between “tradition” and “Mid-Autumn festival.” Even though the Mid-Autumn festival is directly tied to eating mooncakes, it often misses soundbites that connect “mooncake” to more abstract and complex concepts like nostalgia or homesickness.
Why did it struggle here? Because nostalgia is built on lived experience, not statistical frequency. AI breaks down texts into chunks known as “tokens.” Then, it finds relationships between these tokens through statistical processes. AI doesn’t know what it feels like to be ripped from your homeland or savor something from your childhood for the first time in years. As I said in the presentation, emotional context supplants statistical context in visual storytelling.
Case study 2: Communal Healing at the SAFE Center
Another example we presented was a video project about an art installation at CSU’s Survivor Advocacy and Feminist Education (SAFE) Center for Sexual Assault Awareness Month. Students created clay petals with healing and supportive messages to place on an installation of branches. This piece contained a multitude of communal messages conveyed through imagery of roots and trees. This multi-layered message of deep symbolic meaning is very complex for current AI capabilities.
View this post on Instagram
When I tried to use AI to help script this piece, despite clear prompts, the script became chaotic. It was simply too much, just like with mooncakes. So, I simplified instructions with easily recognizable patterns. For example, I asked it to pull quotes that mention “leaves,” “healing,” and “roots” in an order I chose.
⸻
Final takeaway: AI is the assistant, not the artist
We’ve outlined the value of using AI for getting you started with organizing massive amounts of text data while giving it simple extraction tasks. When it comes to abstract interpretation, you still must look within, like a kung fu movie.
Remember, right now, AI doesn’t understand meaning, lived experiences, emotional stakes, or arcs. They don’t know why a pause in dialogue matters or why a cracked voice can carry more weight than the words themselves. Visual and auditory cues are often non-literal, and it takes a human to unpack this.
AI is built on data patterns and not the human spirit. It can begin the scaffolding, but you still have to build the house. Lean into the power of AI to save time. It’s the assistant, and you’re the director.