In my previous post I showed how I wrote a python script to read out the latest news headlines using Googles text to speech api. As I commented in that post, voice recognition and talking devices seem to be the in thing with the release of the Amazon Echo and Google Home.
In this post I show how I created a python script to record sound on your raspberry pi, invoke the google cloud speech api to interpret what was said, and then perform a command on your raspberry pi - so a bit like a basic Amazon Echo.
Setting up your mic
Before I get into the python code, you need a mic setup. As the Raspberry Pi does not have a soundcard you will need a USB mic or a webcam which has an inbuilt mic. I went for the latter and used a basic webcam from logitech.
Once you have your mic plugged in, follow the instructions in the "Step 1: Checking Your Microphone" in: https://diyhacking.com/best-voice-recognition-software-for-raspberry-pi/
Install prerequisites
There is one python library you need which is pycURL, which is used to send data to the Google Cloud Speech Api. Follow the instructions here: http://pycurl.io/docs/latest/install.html
You will also need to install SoX which is an opensource tool to analyse sound files. This is used in the script to detect whether any sound is on the recorded audio, before trying to send it to the google api.
You can install this by running:
One more thing to install, flac . Flac is used to record your sound file in a lossless format which is required by the google api:
You can install this by running:
The other interesting part of the script to look at is, where it sends the data over to the Google Cloud Speech Api.
It creates a JSON message, and then encodes the audio in base64.
Within the outgoing JSON message, there is a phrases section, where I've included my trigger word "Jarvis", which makes it more likely the speech engine recognises this
The final bit then gets the text from the response.
In this post I show how I created a python script to record sound on your raspberry pi, invoke the google cloud speech api to interpret what was said, and then perform a command on your raspberry pi - so a bit like a basic Amazon Echo.
Setting up your mic
Before I get into the python code, you need a mic setup. As the Raspberry Pi does not have a soundcard you will need a USB mic or a webcam which has an inbuilt mic. I went for the latter and used a basic webcam from logitech.
Once you have your mic plugged in, follow the instructions in the "Step 1: Checking Your Microphone" in: https://diyhacking.com/best-voice-recognition-software-for-raspberry-pi/
Install prerequisites
There is one python library you need which is pycURL, which is used to send data to the Google Cloud Speech Api. Follow the instructions here: http://pycurl.io/docs/latest/install.html
You will also need to install SoX which is an opensource tool to analyse sound files. This is used in the script to detect whether any sound is on the recorded audio, before trying to send it to the google api.
You can install this by running:
sudo apt-get install sox
One more thing to install, flac . Flac is used to record your sound file in a lossless format which is required by the google api:
You can install this by running:
sudo apt-get install flac
Setup Google Cloud Speech Api
To do the voice to text processing I am using the speech api which is part of Google Cloud. It is in beta at the moment and offering a free trial.
Follow the instructions on the their site to get your api key which will be needed in the script:
The current downside I've found with this api is the latency. It's currently taking 5-6 seconds for a response to process a 2 second audio file. The google help files yes the response time should be similar to the length of audio being processed.
Python Script
Now to the actual python code.
All the files required can be downloaded from here:
The main file to look at is speechAnalyser.py.
This script does the following:
1. If no audio is playing (you don't want to record if you're playing something on your speakers), records sound from your microphone for 2 seconds
2. Uses SoX to check if any sound is on the file and is above a certain amplitude - this helps to not bother processing when there is silence or just background noises
3. If there is sound at a sufficient amplitude, then send the audio to the google api with a JSON message. As said earlier the google api takes a 5-6 seconds and returns a JSON message with the words detected.
4. If the trigger word in this case "Jarvis" is said during these two seconds, a beep sound is played.
5 Records another 3 seconds to listen for a user speaking a commandand sends to the google api like step 3
6.Checks if keyword found in returned text and executes the appropriate command. For example if "news" is mentioned it invokes the GetNews script which I described in my previous post.
7. Loops back to Step 1.
7. Loops back to Step 1.
Remeber to change the line below where it says with the key which was provided when you set up the Google Cloud Speech api
key = '' stt_url = 'https://speech.googleapis.com/v1beta1/speech:syncrecognize?key=' + ke
Also you should customise your commands in the following section of code:
def listenForCommand(): command = transcribe(3) print time.strftime("%Y-%m-%d %H:%M:%S ") + "Command: " + command success=True if command.lower().find("light")>-1 and command.lower().find("on")>-1 : subprocess.call(["/usr/local/bin/tdtool", "-n 1"]) elif command.lower().find("light")>-1 and command.lower().find("off")>-1 : subprocess.call(["/usr/local/bin/tdtool", "-f 1"]) elif command.lower().find("news")>-1 : os.system('python getNews.py') elif command.lower().find("weather")>-1 : os.system('python getWeather.py') elif command.lower().find("pray")>-1 : os.system('python sayPrayerTimers.py') elif command.lower().find("time")>-1 : subprocess.call(["/home/pi/Documents/speech.sh", time.strftime("%H:%M") ]) elif command.lower().find("tube")>-1 : os.system('python getTubeStatus.py') else: subprocess.call(["aplay", "i-dont-understand.wav"]) success=False return success
The other interesting part of the script to look at is, where it sends the data over to the Google Cloud Speech Api.
It creates a JSON message, and then encodes the audio in base64.
Within the outgoing JSON message, there is a phrases section, where I've included my trigger word "Jarvis", which makes it more likely the speech engine recognises this
The final bit then gets the text from the response.
#Send sound to Google Cloud Speech Api to interpret
#----------------------------------------------------
print time.strftime("%Y-%m-%d %H:%M:%S ") + "Sending to google api"
# send the file to google speech api
c = pycurl.Curl()
c.setopt(pycurl.VERBOSE, 0)
c.setopt(pycurl.URL, stt_url)
fout = StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION, fout.write)
c.setopt(pycurl.POST, 1)
c.setopt(pycurl.HTTPHEADER, ['Content-Type: application/json'])
with open(filename, 'rb') as speech:
# Base64 encode the binary audio file for inclusion in the JSON
# request.
speech_content = base64.b64encode(speech.read())
jsonContentTemplate = """{
'config': {
'encoding':'FLAC',
'sampleRate': 16000,
'languageCode': 'en-GB',
'speechContext': {
'phrases': [
'jarvis'
],
},
},
'audio': {
'content':'XXX'
}
}"""
jsonContent = jsonContentTemplate.replace("XXX",speech_content)
#print jsonContent
start = time.time()
c.setopt(pycurl.POSTFIELDS, jsonContent)
c.perform()
#Extract text from returned message from Google
#----------------------------------------------
response_data = fout.getvalue()
end = time.time()
#print "Time to run:"
#print(end - start)
#print response_data
c.close()
start_loc = response_data.find("transcript")
temp_str = response_data[start_loc + 14:]
#print "temp_str: " + temp_str
end_loc = temp_str.find("\""+",")
final_result = temp_str[:end_loc]
#print "final_result: " + final_result
return final_result
I have to give a big shout out to the following sites which gave me ideas on how to write this script:
https://diyhacking.com/best-voice-recognition-software-for-raspberry-pi/ - This contains the instructions on how to setup a microphoen on the raspberry pi
https://github.com/StevenHickson/PiAUISuite - Full Application which does what the above script does but is configurable. But not sure if it still works with the new Google Speech Api
Hey, this is really cool. I installed everything but when I try to run it I get these errors output:
ReplyDeletesh: 1: flac: not found
arecord: begin_wave:2516: write error
Traceback (most recent call last):
File "speechAnalyser.py", line 199, in
spokenText = transcribe(2) ;
File "speechAnalyser.py", line 57, in transcribe
maxAmpValue = float(maxAmpValueText)
ValueError: could not convert string to float: open i
Have you experienced this error? It seems like something really simple to fix but I'm pretty new with this.
Uncomment some of the print statements to use to debug (remove the '#' symbols).
ReplyDeleteAlso did sox install properly? Check by running on the command line. Sox is used to check if the file is silent by looking at the maximum amplitude.
Also is the test.flac file saved down?
Ok, so I tested Sox by converting a .wav file to a .au file, so it seems it's been installed properly. I uncommented the print statements but I'm not sure what you mean by the test.flac being "saved down". I now get this error:
ReplyDeletelistening ..
sh: 1: flac: not found
arecord: begin_wave:2516: write error
Popen outputsox FAIL formats: can't open input file `test.flac': No such file or directory
Max Amp Start: 23
Max Amop Endp: 30
Max Amp: open i
Traceback (most recent call last):
File "speechAnalyser.py", line 199, in
spokenText = transcribe(2) ;
File "speechAnalyser.py", line 57, in transcribe
maxAmpValue = float(maxAmpValueText)
ValueError: could not convert string to float: open i
Looks like the arecord command isn't creating the sound file as expected "test.flac".
DeleteCan you see a file called "test.flac" exists in the same folder as where speechAnalyser.py is stored?
Try running arecord from the command line to test if it is working.
I tested arecord on its own and it definitely works. I can't see "test.flac" stored in the folder so that line definitely isn't running properly. Are there meant to be apostrophes in line 34?
ReplyDeleteAdd a line before 33 with this:
ReplyDeleteprint 'arecord -D plughw:1,0 -f cd -c 1 -t wav -d ' + str(duration) + ' -q -r 16000 | flac - -s -f --best --sample-rate 16000 -o ' + filename)
this will print out the actual command being sent to arecord. Then yon can try running that on the command line separately.
My suspicion is that flac isn't installed (which might be missing from my instructions)
you will need to run :
sudo apt-get install flac
You were right, I just needed to install flac. The only problem now is that it won't accept my API key for google cloud speech. Every time it tries to send the file to google it says "API key not valid. Please pass a valid API key."
ReplyDeleteAny idea what's going wrong there?
In the Google Cloud Platform console, get into the API Manager and select "Credentials". Click on create credentials and select "API Key".
ReplyDeleteDid you sign up fully to the goolge cloud platform. You need to provide payment details even though it's an beta limited time free trial.
Ye I'm sure I'm fully setup and everything. I'll regenerate the API key and have another go later.
ReplyDeleteIs it definitely an API key you're using, not a service account key?
ReplyDeleteYes , api key for sure. See this screenshot:
Deletehttps://dl.dropboxusercontent.com/u/427946/Rpi%20Speech/googlecloudapikey.JPG
Hi! Great tutorial. I've set up everything without any problems and also changed google cloud speech to wit.ai - working like a charm :)
ReplyDeleteThat's great. I had a play with wi.ai. It's very powerful to interpret text. But found it's speech recognition wasn't as good as googles.
DeleteAfter few days i can say that you are 100% true :) Switched back to google API
DeleteWhat kind of latency do you see with google cloud speech? For me it's consistently been ~5 seconds.
DeleteI'm using now Google Speech Recognition instead of Cloud Speech API. It's much faster than Google Cloud Services. Check out https://github.com/Uberi/speech_recognition and r.recognize_google(audio) method
DeleteTurned out I hadn't actually enabled the api key. It works perfectly. Thanks for all your help!
ReplyDeleteGreat news. Let us know what you use it for
Deleteis there anything else than google cloud platform speach api i can use? because i cant create acount
ReplyDeleteHi, you can try out these:
Delete1) wit.ai : https://wit.ai/docs/http/20160330#get-intent-via-speech-link
2)Jasper : https://jasperproject.github.io/
3)Microsoft Bing Speech API: https://www.microsoft.com/cognitive-services/en-us/speech-api
I've only tried out wit.ai and it's good but didn't find speech recognition as good as Googles. I want to try out the Micrsoft Bing api as the latency (~5secs) is annoying with Google
This comment has been removed by the author.
ReplyDeleteHey nice post dude, It would be awesome if you create a module for new google cloud speech API in jasper project. It would help a lot of us.
ReplyDeletehttps://github.com/jasperproject/jasper-client
I am getting this error while installing pycurl.
ReplyDeleteUsing curl-config (libcurl 7.38.0)
running install
running build
running build_py
running build_ext
building 'pycurl' extension
arm-linux-gnueabihf-gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fno-strict-aliasing -D_FORTIFY_SOURCE=2 -g -fstack-protector-strong -Wformat -Werror=format-security -fPIC -I/usr/include/python2.7 -c src/pycurl.c -o build/temp.linux-armv7l-2.7/src/pycurl.o
src/pycurl.c: In function ‘PYCURL_OPT’:
src/pycurl.c:62:20: warning: typedef ‘compile_time_assert_fail__’ locally defined but not used [-Wunused-local-typedefs]
{ typedef int compile_time_assert_fail__[1 - 2 * !(expr)]; }
^
src/pycurl.c:69:5: note: in expansion of macro ‘COMPILE_TIME_ASSERT’
COMPILE_TIME_ASSERT(OPTIONS_SIZE == CURLOPT_HTTP200ALIASES - CURLOPTTYPE_OBJECTPOINT + 1);
^
src/pycurl.c: In function ‘do_curl_setopt’:
src/pycurl.c:1076:25: error: ‘CURLOPT_PASSWDDATA’ undeclared (first use in this function)
option == CURLOPT_PASSWDDATA))
^
src/pycurl.c:1076:25: note: each undeclared identifier is reported only once for each function it appears in
src/pycurl.c:1172:17: warning: implicit declaration of function ‘curl_formparse’ [-Wimplicit-function-declaration]
res = curl_formparse(str, &self->httppost, &last);
^
src/pycurl.c:1239:15: error: unknown type name ‘curl_passwd_callback’
const curl_passwd_callback pwd_cb = password_callback;
^
src/pycurl.c:1239:45: warning: initialization makes integer from pointer without a cast
const curl_passwd_callback pwd_cb = password_callback;
^
src/pycurl.c:1277:14: error: ‘CURLOPT_PASSWDFUNCTION’ undeclared (first use in this function)
case CURLOPT_PASSWDFUNCTION:
^
src/pycurl.c: In function ‘initpycurl’:
src/pycurl.c:2404:35: error: ‘CURLOPT_PASSWDFUNCTION’ undeclared (first use in this function)
insint_c(d, "PASSWDFUNCTION", CURLOPT_PASSWDFUNCTION);
^
src/pycurl.c:2405:31: error: ‘CURLOPT_PASSWDDATA’ undeclared (first use in this function)
insint_c(d, "PASSWDDATA", CURLOPT_PASSWDDATA);
^
error: command 'arm-linux-gnueabihf-gcc' failed with exit status 1
by default this is a female voice so is there any chance to convert into male voice?
ReplyDelete