Link to GitHub:
https://github.com/erlendhel/smartmirror
In hindsight, the software part of the project has mostly gone according to plan, the biggest turnaround we had to do, was to change from a touch screen to speech recognition. This was mainly due to the fact that the foil we decided to use to provide the mirror effect wouldn’t support touch. This meant that we had to implement the speech recognition function, which in turn meant that we had to facilitate the smartmirror app for touchless navigation throughout. The user will be displayed a progress bar at the bottom of the screen, indicating when voice is being recorded. By saying a valid command in the timeinterval it takes the bar to go from 100% to 0%, the application will perform the chosen command.
Speech Recognition
In order to implement speech recognition, we used the python package SpeechRecognition, which has support for the Google Cloud Speech API, which is the one we ended up using. The decision of using the Google Cloud Speech API was made because the interface seemed the easiest and also since the services provided from the Google API are well tested. Since we had a limited budget ($1 soundcard and $3 microphone), we tried to limit the negative factors by choosing safe.
How we used the API
Our initial plan for the speech functionality for the mirror was to have a thread which continously listened for commands, and then returned them to the main program, which would handle the different command. While testing on Windows, the listen() function, which is a part of the SpeechRecognition worked fine, and was able to process the commands we gave it while running continously. When we started testing the code on the Raspberry Pi, the functionality was weak at best. Usually it would listen endlessly without handling the commands it got, while it worked fine one or two times. This resulted in a change of design of the original plan.
Instead of using a listener, we decided to use recordings which were timed to three seconds, for some reason, the recordings were able to process the speech. The short answer to how this works, is that the smartmirror will record three seconds of audio, send it to the Google API for processing, get a return value from the API which should be a word/command, and then use the command to navigate the smartmirror.
Database of words
As mentioned earlier, this smartmirror has been made on a tight budget. Low-cost hardware, coupled with not so perfect pronounciation gave some weird returns from the Google API. In order to be able to work around this, and provide a ‘smooth as possible’ flow for the user. We had to create our own database of keywords for the different commands used in the smartmirror. The more we tested, the more words we got. Although this might be a way of ‘cheating’ the system, the structure of the smartmirror and the set number of commands allowed us to use similar words to represent our actual commands (an example is ‘Logan’ for ‘login’).
To use the database of words which we created throughout the testing period. The smartmirror sends the value which it gets returned from the API to functions which in turn iterate through lists of ‘allowed keywords’ for a given command. When it finds a keyword representing a command, it returns the command to the main program.
Face Recognition
The way we decided to differentiate between users of the smartmirror, was to use Face Recognition. We decided to use the OpenCV library together with a RaspiCam to provide this feature. In order to be able to use the feature as we wanted in the smartmirror app, we had to do a lot of testing in terms of speed and accuracy to get to a level we felt was satisfactory.
Finding the right algorithm
The first thing we experienced when using OpenCV together with our RaspiCam, was that it was very inaccurate in it’s predictions. After testing FisherFaces, EigenFaces and LBPH (Local Binary Pattern Histograms), we opted for using the LBPH. Using this we had a very high rate of success when recognizing different users compared to the other two. Especially FisherFaces seemed to be very sensitive to light, which basically meant we had to provide new training pictures for the face recognizer every time the lighting changed. In addition, both EigenFaces and FisherFaces use the entire database of images for a given user when searching for a match, whereas the LBPH algorithm only uses one at a time, this eases the processing time.
LBPH works in the way that it produces a matrix representing the face of a user/subject. The matrix consists of binary-numbers representing different areas of the face. When setting up a profile of a given face, and when comparing to the image fed to the algorithm by the smartmirror. LBPH looks at relative difference between neighbouring tiles of the matrix in order to provide a final number representing the accuracy of the prediction.
When starting out with FisherFaces, we were happy with accuracy-numbers in the ranges of 750-900. At the end of the project with LBPH, we set the threshold for ‘accepted accuracy’ to 50 (Lower numbers indicate higher accuracy).
Connecting with the database
As the OpenCV library works, the smartmirror has to provide training data for it to work. The training data are the pictures of users of the smartmirror, which is stored locally on the Raspberry Pi. In order to be able to make a prediction when given a face, the algorithm has to have a way to sort the different faces of the database. Locally on the Raspberry, the images taken during registration (which are used in the face recognition) are stored in folders named s1, s2, s3, etc. When training, the algorithm will iterate through the folders and read the integers and at the same time process the images stored in the folders in order to bind all images in s1 to the index of 1, all images in s2 to the index of 2 and so on. This will produce two vectors which will represent a profile of a face found in a image (which will depend on the algorithm used), and also a reference to the indexing number of the folder. If during login, the face which is processed corresponds to a profile indexed to 1, the face recognition knows that the face trying to logon belongs to index number 1.
We also decided to use this index as a reference to our database’s Primary Key. If a face is recognized, the index is stored to a variable and used as a referencing variable in the database throughout the smartmirror.
Database
To store the different users and preferences we’re using a SQLite3 database which is stored locally on the Raspberry Pi. The ‘users’ table consists of a ID (Primary Key), name, path to images, three news sources, a users preferred destination and also preferred type of travel. As mentioned earlier, the referencing from the smartmirror to the database is done using the ID returned from the face recognition at login.
Other modules
In addition to the named modules, we have utilized NewsAPI, WeatherAPI, and Google API for travel. These are used as described in earlier blog posts.
User specific data
When the application is started, the user can see a startup screen with three
options: sign in, register and guest. By saying something along the lines
“sign in” or “log in”, the smartmirror will check if the face is a registered
user in the database. If a known face is found within 10 seconds, the user is
met with a welcome screen, and then redirected to the main screen.
Not user specific data
In the main screen, the user can see various personal information fetched from
the database. The navigation part of the main screen will
display the users chosen destination and travel type, and tell the user the
time it will take to get there. The news part will show the users chosen
preferred news and display the news icons. By saying the name of the source,
e.g. “BBC News”, the user will be redirected to the spoken source. Here the
user will be displayed with the 10 latest titles from the news soruce.
Regardless of which user is logged in, the main screen will always show the
date and time. The main screen will also show the current temperature in Kongsberg,
as well as a image describing the weather, e.g. sunny. By saying the word weather,
the user will be redirected to the weather screen. Here more detailed weather data
will be displayed. This contains a daily forecast, a weekly forecast, as well as a
more detailed current weather description.
Brushing your teeth
Anytime when in the main screen, the user can say the word ‘hello’ to get
handed a toothbrush with toothpaste on.
Navigating backwards
In any screen, the user can go back to the screen he/she came from by saying
something along the lines of ‘go back’, ‘return’ or ‘previous’. When in the
main screen, the user can also log out by saying for instance ‘log out’, ‘sign
out’ or ‘return’. The application will then go back to the startup screen.
Not finished modules
If we would have more time in the project, we would continue on the
registration and setting modules. The registration is one of the choices in
the startup screen, and allows for a new user sign up by spelling their name,
as well as letting the mirror take pictures of their face to use for the login.
The setting module would allow for a already registered user to adjust their preferences.
This would include changing their chosen destination and travel type, as well
as mixing up their preferred news sources.