Sharing an (almost) hands-free live-mode audio conversation hack for Duck.ai
Hey everyone!
So, recently I came into contact with GPT and Gemini live audio conversation modes. Some people around have been using it. Privacy concerns aside, it was pretty neat. Not really my style, but it got me thinking.
Duck.ai has a voice chat option, but it is not available for Firefox-based browsers. So, Abrowser and Tor Browser are left without it. I was caught up in one of those "too busy thinking if I could, to care thinking if I should" moments ahah, and I hacked a solution that allows for a semi-hands free live conversation with duck.ai with TTS and STT both being made locally.
STT is the most worrying part of the whole process, seeing as voice is as good an unique identifier as a fingerprint these days. I wanted a solution that would do both locally (this could be improved and use a local AI LLM model as well, but my machine can't really handle those, especially for a real-time conversation so... duck.ai is second best option I guess, use it through Tor and be careful what you share with it).
So, first things first, we need Speech Note for STT and TTS. You can grab it from flathub:
flatpak install net.mkiol.SpeechNote
The rest are small command line tools that you probably already have installed in Trisquel (xdotool xclip sha256sum notify-send).
The solution consists of two parts:
A userscript for ViolentMonkey on the browser, that will monitor for new replies from Duck.ai and copy those to clipboard;
A bash script that will:
1. Ask you to select which window is the chat taking place (so you can then have it in the background, allowing for the conversation to happen while you work in other windows and programs);
2. Monitor for changes in clipboard and, in turns, paste that into Duck.ai chat, or order Speech Note to read it aloud;
3. The order is as indicated: run script, select Duck.ai window, hit Speech Note hotkey for "listen into clipboard" action (usually it's ctrl+alt+shit+j), wait for audio to be converted into clipboard text which will be copied and pasted into Duck.ai, wait for reply which userscript will copy into clipboard, listen to the clipboard-audio-converted by Speech Note, hit Speech Note hotkey again and keep talking....
Fully hands free would require some tweaking with constant listening in Speech Note and a different type of microphone that would mute itself when needed. Possible, but not worth my trouble for now. Anyone is free to improve on top of this though :)
So, yeah... install Speech Note, install the userscript into ViolentMonkey in the browser, chmod +x the bash script, and run it, select the Duck.ai window, and the rest will do its own thing. Notifications are there just to help you check if something was misinterpreted, but you can remove those if you don't need them. A powerful enough machine can run TTS and STT engines of higher quality at higher speed than mine, so you probably will have no need for those. My T400 sometimes mistakes a word here and there ahah, having the notifications helps!
I had a previous version cleaning things like * # and most emojis (so the audio reply wouldn't "read" those, making it harder to understand the actual message) but now I am only handling that at instruction level (forbidding the AI from using those symbols). Though if anyone wants, I can share that as well.
One VERY important detail, as you see this solution works entirely around the clipboard. You cannot copy/paste other stuff while using it, right? Wrong ;)
You have the option of selecting text and using the "middle click" to paste that content, thus using a different clipboard (again, an improvement could be made using a clipboard manager, but that's too much trouble for my current goal of just "proving the concept" lol).
Not sure if anyone wants this, sharing for benefit of anyone who does :)
Have fun!
GNUser
| Attachment | Size |
|---|---|
| bashscript.txt | 2.13 KB |
| userscript.txt | 4.64 KB |

