Chapter 113: Adding new functions
Left alone in the room, Jeff decided it was the perfect time to improve his current technology.
His focus was on RAZi. He planned to enhance it by adding image generation, voice synthesis, and video creation.
The graphical interface was already amazing and looked high-tech, making it smooth, and clean.
With everything ready, he began his work. He began by opening RAZi's plugin folder, creating a new file named razi_plugins/imagegen.py.
Within the file, he defined a function called generate_image(prompt).
This function processed a text prompt and used the Stable Diffusion engine to generate an image, returning the path to the created file.
To connect it with the user interface, he updated interface.py by introducing a new route, allowing users to input a prompt directly and receive a visual output from the system.
As he define a route in the application to handle image generation requests, specifically accepting POST requests to process a text prompt and return the generated image.
This enables prompts from the front end to be sent to the generate_image() function.
In the frontend, he created an HTML section called 'Image Generator,' including a form with an input field for users to type their prompts and a submit button to send these prompts to RAZi.
Additionally, he added an image preview area, to display the generated image after receiving it from the backend.
With this setup, RAZi was now capable of transforming textual descriptions into visual representations, from anime characters to intricate fantasy worlds, all with just a single line of text.
Now that it was done he moved on the next, he opened the plugin directory and created a new file:razi_plugins/voicegen.py
Inside it, he defined a function that converts text into speech using a locally installed text-to-speech engine, allowing the selection of different voice profiles through the voice parameter.
He configured the engine to use a soft, calming female voice adjusting pitch, rate, and accent settings until it sounded natural and elegant, like a digital assistant with warmth.
Then, in interface.py, he added a new route. This route is set up in the application to process POST requests, taking the AI's reply and directing it to the speak_text function to create an audio file.
In the frontend, just below each response, he added a new line where the audio player is embedded to allow users to hear RAZi's spoken responses.
He paired that with a [Play Voice] button, then a button was created to trigger instant audio playback when clicked, it allows users to hear the voice output.
When he ran the test as he then clicked the play button RAZi responded in a soft female voice.
[Hello owner. What would you like to create today?]
Jeff paused, listening to the flawless delivery, as he nodded with satisfaction.
"That's more like it. You finally have a voice now."
This made him think of a film from the Marvel universe. Compared to those high-end fictional AIs, his creation still felt like it had a long way to go.
So he decided to push further. Now that the voice generation was functioning perfectly, it was time to enhanced it for a bit.
He opened a new file and named it razi_plugins voiceassistant.py. This would serve as the foundation for turning RAZi into a true voice-driven assistant.
Inside it, he created the base function to process inputs and provide responses dynamically, making it a key part of the system's interactive capabilities.
This new feature activated the microphone, allowing it to listen to Jeff's voice, convert his speech into text using speech recognition.
It will then pass it to RAZi's brain for processing, and then speak the reply back using a female voice.
He integrated a voice recognition library that captured input in real time and connected it directly to the existing text-to-speech module, creating a smooth and responsive voice interaction system.
In the Flask interface, he added a new feature with a toggle button that he labeled as.
[Voice Mode ON/OFF]
This enables RAZi to operate in passive listening mode so under the hood, when Voice Mode is active the listener will wait for the wake phrase.
Once triggered it will record his question converting it to text and then route it through RAZi.reply()
The response is then immediately synthesized and played back. With that he started to test it.
[Jeff: Hey RAZi, what's the weather like in Tokyo?]
[RAZi: According to the latest data, Tokyo is currently 27 degrees with light rain]
For a moment she stopped and then spoke again.
[RAZi: Would you like me to set a reminder for an umbrella tomorrow?]
[Jeff: No thank you]
[RAZi: Roger]
He could not help but smile as he watched RAZi respond. It now had the ability to think and speak for itself.
"Sooner or later, you will become Jarvis," he whispered with a hint of pride in his voice.
With that, he turned off the voice mode. If he didn't, RAZi would keep listening continuously, and that was something he wanted to avoid for now.
It doesn't record everything, but it just keeps a lightweight microphone thread running in the background, continuously scanning for his voice pattern.
As soon as it detects the right waveform, tone, or key phrase, it activates. So even if he's coding, eating, or lying in bed, RAZi is still there waiting for his reply.
Once that was done, he opened a new file named videogen.py inside RAZi's plugin folder.
What he aimed to build was inspired by the famous video generator from his world, 'SORA'.
His goal was to train RAZi to create full videos from scratch, bringing scenes to life using nothing but pure language.
Just like SORA, he wanted RAZi to transform words into motion, turning imagination into vivid, animated reality.
This wasn't about presentation anymore, it was about giving RAZi the power of vision, motion, and story.
He began by defining a function called generate_video(prompt), which allowed the system to process descriptive text and break it down into visual scenes.
This function linked directly to his text-to-video engine a diffusion model he trained using short cinematic clips, transitions, and frame movement patterns.
It learned how to animate fog drifting, lights flickering, characters walking, or even a camera slowly panning through a cityscape.
By interpreting each prompt as a scene blueprint, RAZi could now generate dynamic motion instead of flat visuals.
To make it fully immersive, Jeff also embedded voice narration into the video using the voice generation module he had written earlier.
He added a setting for users to choose their preferred voice type like male, female, or robotic as he then layered the spoken audio directly over the animated sequence.
The result was no longer a clip with background music or subtitles, but a self-contained narrated short video, complete with timing, emotion, and camera presence.
He added this to the interface as a new tab called, [Video Generator], complete with prompt input and preview panel.
Jeff did this not just for show but because he believed interaction with AI should feel alive and not static.
Whether for storytelling, educational explainers, visual simulations, or pure creativity, RAZi could now turn thoughts into moving pictures.
He imagined students using it to create school presentations, authors visualizing story scenes, and developers prototyping cinematic sequences all with just a sentence.
With this, RAZi had gone beyond being a tool it had become a director, a narrator, and a dreamer.
"Sheesh, now I feel bad for all the animators and artists who pour their talent into creating art. If this ever goes public, they will lose their jobs the same as my first world" he hissed under his breath.
Even though the core system was already functional, he returned to the very first finished function.
Now that the foundation was complete, all that remained was refinement and improvement, so it needs polishing of each part until it reached perfection.
With that he spent a lot of hours not going out on his room, coding and enhancing RAZi.
...
...
1st: Special thanks to 'Essos👑' – the GOAT of the month, for both the rewarding gifts and golden tickets! Much love, brotha!
2nd: Big thanks to 'Pat_funding👑' for the unwavering support since the very start of my journey and for the golden tickets and gifts!
3rd: Special shoutout to 'Devon1234👑' – The same GOAT of this month, for all the amazing gifts! You're absolutely RAD!