OS X has voice output buit-in, usable from the shell by way of the say
command. You can use several voices in English or download more for other languages.
mgl/sage
as described previously.gfsage Use english gfsage LANGUAGE Use this language gfsage [OPTIONS] where OPTIONS are: -h --help print this page -i INPUT --input-lang=INPUT Make queries in LANGUAGE -o OUTPUT --output-lang=OUTPUT Give answers in LANGUAGE -v[VOICE] --voice[=VOICE] use voice output. To list voices use ? as VOICE. -F --with-feedback Restate the query when answering.
The options relevant here are -v
and -F
. Use the first to select voice output. With no argument it will pick the first available voice for the OUTPUT voice selected:
./gfsage -i english -v
Voiced by Agnes
... It will use Agnes as English voice. Notice that if you do not give a -o
option, the OUTPUT language is assume to be the same as the INPUT language.
To list the available voices use:
./gfsage -i english -v?
Agnes, Albert, Alex, Bahh, Bells, Boing, Bruce, Bubbles, Cellos, Daniel, Deranged, Fred, Hysterical, Junior, Kathy, Princess, Ralph, Trinoids, Vicki, Victoria, Whisper, Zarvox
It will list the English voices. To use a specific voice write:
./gfsage -i german -vYannick
Voiced by Yannick
The option -F
is to make the system paraphrase your query on answering. First, get a simple answer:
./gfsage -i english
Login into localhost at port 9000
Session ID is df7ad7c769f2faac68b6bb9489bb97e2
waiting... EmptyBlock 3
sage> compute the factorial of 5.
(4) 120
answer: it is 120 .
... and now the same with paraphrasing:
./gfsage -i english -F
Login into localhost at port 9000
Session ID is 88549994a28940fe0657eb9e506a5e84
waiting... EmptyBlock 3
sage> compute the factorial of 5.
(4) 120
answer: the factorial of 5 is 120 .
So, to experience voice output in its full glory you have to use both -v
and -F
.
Following a suggestion from Aarne, I found some Google service for speech input, but the experiments are not encouraging:
I recorded Compute this
into a mp4 file using QuickTime Player on the mac
Converted it to flac using:
sox compute.m4a compute.flac rate 16k
And get into the service by:
curl -H "Content-Type:audio/x-flac; rate=16000" "https://www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=en-US" -F "myfile=@compute.flac
But got:
`{"status":0,"id":"56bdb158dd66b25fc2e221364004e620-1","hypotheses":[{"utterance":"coffee lol","confidence":0.46219563}]}`
Other examples:
"I like pickles" ⇒ "I like turtles"
"The determinant of x" ⇒ "new york" (with confidence 0.88!)
"Compute this" ⇒ "coffee lol"
Of course I'm not a native English speaker, but I expected a better performance.