src/engines/TTSEngine/BaseTTSEngine.py

import moviepy.editor as mp
import whisper_timestamped as wt

from typing import TypedDict
from torch.cuda import is_available
from abc import ABC, abstractmethod

from ..BaseEngine import BaseEngine


class Word(TypedDict):
    start: str
    end: str
    text: str


class BaseTTSEngine(BaseEngine):
    @abstractmethod
    def synthesize(self, text: str, path: str) -> None:
        pass

    def remove_punctuation(self, text: str) -> str:
        return text.translate(str.maketrans("", "", ".,!?;:"))

    def fix_captions(self, script: str, captions: list[Word]) -> list[Word]:
        script = script.split(" ")
        new_captions = []
        for i, word in enumerate(script):
            original_word = self.remove_punctuation(word.lower())
            stt_word = self.remove_punctuation(word.lower())
            if stt_word in original_word:
                captions[i]["text"] = word
                new_captions.append(captions[i])
            # elif there is a word more in the stt than in the original, we

    def time_with_whisper(self, path: str) -> list[Word]:
        """
        Transcribes the audio file at the given path using a pre-trained model and returns a list of words.

        Args:
            path (str): The path to the audio file.

        Returns:
            list[Word]: A list of Word objects representing the transcribed words.
            Example:
            ```json
            [
                {
                    "start": "0.00",
                    "end": "0.50",
                    "text": "Hello"
                },
                {
                    "start": "0.50",
                    "end": "1.00",
                    "text": "world"
                }
            ]
            ```
        """
        device = "cuda" if is_available() else "cpu"
        audio = wt.load_audio(path)
        model = wt.load_model("small", device=device)

        result = wt.transcribe(model=model, audio=audio)
        results = [word for chunk in result["segments"] for word in chunk["words"]]
        for result in results:
            # Not needed for the current use case
            del result["confidence"]
        return results

    def force_duration(self, duration: float, path: str):
        """
        Forces the audio clip at the given path to have the specified duration.

        Args:
            duration (float): The desired duration in seconds.
            path (str): The path to the audio clip file.

        Returns:
            None
        """
        audio_clip = mp.AudioFileClip(path)

        if audio_clip.duration > duration:
            speed_factor = audio_clip.duration / duration

            new_audio = audio_clip.fx(
                mp.vfx.speedx, speed_factor, final_duration=duration
            )

            new_audio.write_audiofile(path, codec="libmp3lame")

        audio_clip.close()
fix(GenerationContext.py): fix typo in variable name powerfulllmengine to powerfulllmengine for better readability feat(GenerationContext.py): add setup_dir method to create a directory for output files with a timestamp feat(GenerationContext.py): call setup_dir method before generating script and synthesizing audio to ensure output directory exists feat(prompts/fix_captions.yaml): add a new prompt file to provide instructions for fixing captions fix(BaseTTSEngine.py): add force_duration method to adjust audio clip duration if it exceeds a specified duration feat(CoquiTTSEngine.py): add options for forcing duration and specifying duration in the UI feat(utils/prompting.py): add get_prompt function to load prompt files from a specified location fix(gradio_ui.py): set equal_height=True for engine_rows to ensure consistent height for engine options 2024-02-15 12:27:13 +01:00			`import moviepy.editor as mp`
:rocket: Maaany things 2024-02-15 14:11:16 +01:00			`import whisper_timestamped as wt`

			`from typing import TypedDict`
			`from torch.cuda import is_available`
Some stuff 2024-02-13 14:15:27 +01:00			`from abc import ABC, abstractmethod`
:rocket: Maaany things 2024-02-15 14:11:16 +01:00
Some stuff 2024-02-13 14:15:27 +01:00			`from ..BaseEngine import BaseEngine`

Formatting 2024-02-15 17:54:13 +01:00
:rocket: Maaany things 2024-02-15 14:11:16 +01:00			`class Word(TypedDict):`
			`start: str`
			`end: str`
			`text: str`
Some stuff 2024-02-13 14:15:27 +01:00

Formatting 2024-02-15 17:54:13 +01:00			`class BaseTTSEngine(BaseEngine):`
Some stuff 2024-02-13 14:15:27 +01:00			`@abstractmethod`
🐛 fix(GenerationContext.py): fix indentation issue in process() method ✨ feat(GenerationContext.py): add support for z-index of moviepy clips to improve video rendering The indentation issue in the process() method has been fixed. The z-index of moviepy clips has been added to improve the rendering of the video. This allows the clips to be rendered in different layers based on their index, resulting in a more visually appealing video. 2024-02-21 09:06:36 +01:00			`def synthesize(self, text: str, path: str) -> None:`
:rocket: 2024-02-14 17:49:51 +01:00			`pass`
Formatting 2024-02-20 14:47:54 +01:00
🐛 fix(GenerationContext.py): fix import statements and add support for captioning engine ✨ feat(GenerationContext.py): add support for captioning engine in the GenerationContext class The import statement for the `moviepy.editor` module is changed to `moviepy.editor as mp` to improve code readability. Additionally, the `gradio` module is imported as `gr` to improve code readability. The `GenerationContext` class now includes a `captioningengine` parameter and initializes a `captioningengine` attribute. The `setup_dir` method is modified to include a call to create a directory for the output files. The `get_file_path` method is modified to return the file path based on the output directory. The `process` method is modified to include additional steps for captioning. The `timed_script` attribute is added to store the result of the `ttsengine.synthesize` method. The `captioningengine` is used to generate captions and store them in the `captions` attribute. The final video is rendered using the `moviepy` library and saved as "final.mp4" in the output directory. 2024-02-17 18:47:30 +01:00			`def remove_punctuation(self, text: str) -> str:`
			`return text.translate(str.maketrans("", "", ".,!?;:"))`

			`def fix_captions(self, script: str, captions: list[Word]) -> list[Word]:`
			`script = script.split(" ")`
			`new_captions = []`
			`for i, word in enumerate(script):`
			`original_word = self.remove_punctuation(word.lower())`
			`stt_word = self.remove_punctuation(word.lower())`
			`if stt_word in original_word:`
			`captions[i]["text"] = word`
			`new_captions.append(captions[i])`
Formatting 2024-02-20 14:47:54 +01:00			`# elif there is a word more in the stt than in the original, we`
Formatting 2024-02-15 17:54:13 +01:00
:rocket: Maaany things 2024-02-15 14:11:16 +01:00			`def time_with_whisper(self, path: str) -> list[Word]:`
Formatting 2024-02-15 17:54:13 +01:00			`"""`
			`Transcribes the audio file at the given path using a pre-trained model and returns a list of words.`

			`Args:`
			`path (str): The path to the audio file.`

			`Returns:`
			`list[Word]: A list of Word objects representing the transcribed words.`
			`Example:`
			```json
			`[`
			`{`
			`"start": "0.00",`
			`"end": "0.50",`
			`"text": "Hello"`
			`},`
			`{`
			`"start": "0.50",`
			`"end": "1.00",`
			`"text": "world"`
			`}`
			`]`
			```
			`"""`
			`device = "cuda" if is_available() else "cpu"`
			`audio = wt.load_audio(path)`
🐛 fix(GenerationContext.py): fix import statements and add support for captioning engine ✨ feat(GenerationContext.py): add support for captioning engine in the GenerationContext class The import statement for the `moviepy.editor` module is changed to `moviepy.editor as mp` to improve code readability. Additionally, the `gradio` module is imported as `gr` to improve code readability. The `GenerationContext` class now includes a `captioningengine` parameter and initializes a `captioningengine` attribute. The `setup_dir` method is modified to include a call to create a directory for the output files. The `get_file_path` method is modified to return the file path based on the output directory. The `process` method is modified to include additional steps for captioning. The `timed_script` attribute is added to store the result of the `ttsengine.synthesize` method. The `captioningengine` is used to generate captions and store them in the `captions` attribute. The final video is rendered using the `moviepy` library and saved as "final.mp4" in the output directory. 2024-02-17 18:47:30 +01:00			`model = wt.load_model("small", device=device)`
Formatting 2024-02-15 17:54:13 +01:00
			`result = wt.transcribe(model=model, audio=audio)`
Update results variable to access words from segments in BaseTTSEngine.py 2024-02-15 18:13:48 +01:00			`results = [word for chunk in result["segments"] for word in chunk["words"]]`
Formatting 2024-02-15 17:54:13 +01:00			`for result in results:`
			`# Not needed for the current use case`
			`del result["confidence"]`
			`return results`
:rocket: Maaany things 2024-02-15 14:11:16 +01:00
fix(GenerationContext.py): fix typo in variable name powerfulllmengine to powerfulllmengine for better readability feat(GenerationContext.py): add setup_dir method to create a directory for output files with a timestamp feat(GenerationContext.py): call setup_dir method before generating script and synthesizing audio to ensure output directory exists feat(prompts/fix_captions.yaml): add a new prompt file to provide instructions for fixing captions fix(BaseTTSEngine.py): add force_duration method to adjust audio clip duration if it exceeds a specified duration feat(CoquiTTSEngine.py): add options for forcing duration and specifying duration in the UI feat(utils/prompting.py): add get_prompt function to load prompt files from a specified location fix(gradio_ui.py): set equal_height=True for engine_rows to ensure consistent height for engine options 2024-02-15 12:27:13 +01:00			`def force_duration(self, duration: float, path: str):`
:rocket: Maaany things 2024-02-15 14:11:16 +01:00			`"""`
			`Forces the audio clip at the given path to have the specified duration.`

			`Args:`
			`duration (float): The desired duration in seconds.`
			`path (str): The path to the audio clip file.`

			`Returns:`
			`None`
			`"""`
fix(GenerationContext.py): fix typo in variable name powerfulllmengine to powerfulllmengine for better readability feat(GenerationContext.py): add setup_dir method to create a directory for output files with a timestamp feat(GenerationContext.py): call setup_dir method before generating script and synthesizing audio to ensure output directory exists feat(prompts/fix_captions.yaml): add a new prompt file to provide instructions for fixing captions fix(BaseTTSEngine.py): add force_duration method to adjust audio clip duration if it exceeds a specified duration feat(CoquiTTSEngine.py): add options for forcing duration and specifying duration in the UI feat(utils/prompting.py): add get_prompt function to load prompt files from a specified location fix(gradio_ui.py): set equal_height=True for engine_rows to ensure consistent height for engine options 2024-02-15 12:27:13 +01:00			`audio_clip = mp.AudioFileClip(path)`
Formatting 2024-02-15 17:54:13 +01:00
fix(GenerationContext.py): fix typo in variable name powerfulllmengine to powerfulllmengine for better readability feat(GenerationContext.py): add setup_dir method to create a directory for output files with a timestamp feat(GenerationContext.py): call setup_dir method before generating script and synthesizing audio to ensure output directory exists feat(prompts/fix_captions.yaml): add a new prompt file to provide instructions for fixing captions fix(BaseTTSEngine.py): add force_duration method to adjust audio clip duration if it exceeds a specified duration feat(CoquiTTSEngine.py): add options for forcing duration and specifying duration in the UI feat(utils/prompting.py): add get_prompt function to load prompt files from a specified location fix(gradio_ui.py): set equal_height=True for engine_rows to ensure consistent height for engine options 2024-02-15 12:27:13 +01:00			`if audio_clip.duration > duration:`
			`speed_factor = audio_clip.duration / duration`
Formatting 2024-02-15 17:54:13 +01:00
			`new_audio = audio_clip.fx(`
			`mp.vfx.speedx, speed_factor, final_duration=duration`
			`)`

			`new_audio.write_audiofile(path, codec="libmp3lame")`

			`audio_clip.close()`