By Md.Sahil Khan
In this article I will let you know two things, first thing is the automation and the second is speech recognition.
When we speak of "automation", people usually think more about major changes in technology and job losses. But there are much more good things about automation than bad. I am glad to say that automation is a boon for expert procrastinators and lazy person.
Automation is basically the process by which various actions, procedures, and operations that can be performed by machines with minimal human assistance are programmed to do so. Automation avoids manual repetition of common tasks by offloading them to the system.
There is a huge range of tasks that you might opt to automate by writing a python script. Python users can use their creativity to create innovative automated solutions for the repetitive tasks they encounter in their daily life.
When we speak of "speech recognition", movies and TV shows love to depict robots who can understand and talk back to humans. Movies like Star Wars, Robots are filled with such marvels. But what if all of this exists in this day and age? Which it certainly does. You can write a program that understands what you say and respond to it. All of this is possible with the help of speech recognition in Python, you can create programs that pick up audio and understand what is being said.
Speech Recognition incorporates computer science and linguistics to identify spoken words and converts them into text. It allows computers to understand human language. In other words speech recognition is a machine's ability to listen to spoken words and identify them. You can also use speech recognition in Python to give you a reply for the asked question.
How does speech recognition work?
Speech recognition starts by taking the sound energy produced by the person speaking and converting it into electrical energy. It then converts this electrical energy from analog to digital, and finally to text.
To give you an idea of the breadth of how users utilize Python to automate tasks with speech recognition, here I will demonstrate you how to automate our college management system with speech recognition using Python.
what is Python?
Python is an interpreted, interactive, object-oriented programming language.
Why I use Python?
Python offers great readability and approachable syntax. The latter resembles plain English, which makes it an excellent choice to start your journey with. When compared with other languages, Python clearly stands out as one of the simplest in the branch.
The advantages of Python I mentioned above make the learning process fast and pleasant. With little time and effort, you will gain enough knowledge to write simple scripts. This smooth learning curve significantly speeds up development, even for experienced developers.
what you will need?
2. Some of the libraries of python installed in your system.
3. Chrome browser in your system.
import speech_recognition as sr
from selenium import webdriver
from time import ctime
import time
import os
import playsound
import webbrowser
import random
from gtts import gTTS
import pyaudio
import pyautogui
import urllib.request
sr.Microphone(device_index=1)
r=sr.Recognizer()
r.energy_threshold=5000
def record_audio(ask=False):
with sr.Microphone() as source:
if ask:
aura_speak(ask)
audio=r.listen(source)
voice_data=''
try:
voice_data=r.recognize_google(audio)
except sr.UnknownValueError:
aura_speak('Sorry I did not get that')
except sr.RequestError:
aura_speak('Sorry my Speech service is down')
return voice_data
def respond(voice_data):
if 'what is your name' in voice_data:
aura_speak('My name is Aura')
if 'what time is it' in voice_data:
aura_speak(ctime())
if 'search' in voice_data:
search=record_audio('What do you want to search for?')
url='https://google.com/search?q='+search
webbrowser.get().open(url)
aura_speak('Here is what I found for'+search)
if 'exit' in voice_data:
aura_speak('Thank you')
exit()
if 'college management system' in voice_data:
aura_speak('Opening cms')
cms()
def cms():
username='your registration no'
password='your password'
url='https://cms.gift.edu.in/index.php?r=site%2Flogin'
si=webdriver.Chrome('G:\\Program Files\\Python\\Python39\\Lib\\site-packages\\chromedriver_win32\\chromedriver.exe')
si.get(url)
si.find_element_by_id('loginform-username').send_keys(username)
si.find_element_by_id('loginform-password').send_keys(password)
si.find_element_by_name('login-button').submit()
time.sleep(5)
si.find_element_by_css_selector('#langModal > div > div > div.modal-footer > button').click()
input('Press anything to quit')
si.quit()
def aura_speak(audio_string):
tts=gTTS(text=audio_string,lang='en')
r=random.randint(1,10000000)
audio_file='audio-'+str(r)+'.mp3'
tts.save(audio_file)
playsound.playsound(audio_file)
print(audio_string)
os.remove(audio_file)
time.sleep(1)
aura_speak('How can I help you?')
while 1:
voice_data=record_audio()
respond(voice_data)
Understanding the Code
Before writing the code you must have to download PyAudio wheel and latest Chrome driver in your System. Then install the dependencies or libraries in your system.
you have to install speech_recognition, selenium, playsound, gTTS, pyaudio, pyautogui and rest other libraries are built-in libraries in python so need to install them.
Now start writing code by importing all the libraries.
1.speech recognition module-
Speech recognition technology allows computers to take spoken audio, interpret it and generate text from it.
2.selenium-Selenium is a powerful tool for controlling web browsers through programs and performing browser automation.
3.time- Python has defined a module, 'time' which allows us to handle various operations regarding time.
4.os-The OS module in Python provides functions for interacting with the operating system.
5.playsound-The playsound module contains only a single function named playsound(). It requires one argument: the path to the file with the sound we have to play. It can be a local file, or a URL. There's an optional second argument, block, which is set to True by default. It works with both WAV and MP3 files.
6.webbrowser-The webbrowser module provides a high-level interface to allow displaying Web-based documents to users.
7.random-Python has a built-in module that you can use to make random numbers.
8.gTTs-gTTS (Google Text-to-Speech)is a Python library and CLI tool to interface with Google Translate text-to-speech API. We will import the gTTS library from the gtts module which can be used for speech translation.
9.pyaudio-PyAudio provides Python bindings for PortAudio, the cross-platform audio I/O library. With PyAudio, you can easily use Python to play and record audio on a variety of platforms.
10.pyautogui-PyAutoGUI is a Python automation library used to click, drag, scroll, move, etc. It can be used to click at an exact position.
11.urllib-Urllib module is the URL handling module for python. It is used to fetch URLs (Uniform Resource Locators).
Here Microphone() is used to get voice input with microphone. r is a variable which set Microphone object in Recognizer() method for recognizing speech. energy_threshold is associated with the loudness of the audio file.
Here I have defined a function called aura_speak and passes a string data type as variable audio_string. Then the tts variable will store the google text to speech()method and inside that two parameters passed, text contains the audio_string which is to be converted from text to speech and the lang is the language which the speech should be converted i.e. 'en' means the English language. Then r variable contain the random integer count from 1 to 10000000 and randint() method will take any integer
from the range given. audio_file contain the audio file name 'audio-' concatenating with string(r) and the extension'.mp3'. Then tts.save() will save the audio file where the parameter is the filename i.e. audio_file. Then playsound() will play the sound of the audio file taken parameter as audio_file. Then printing the audio string in the console for confirming and then os.remove() will remove the audio file which is saved. So when the function aura_speak('How can I help you?') is called the string 'How can I help you?' will pass to the defined function aura_speak(audio_string) and do the work as to save and play sound and we can hear it.
In the while loop record_audio() function is called and stored in the variable voice_data then respond(voice_data) is called and passes the parameter voice_data.
The record_audio(ask=False) function is defined above and initializes the ask variable as False which is of string data type. Inside the function we have with sr.Microphone() as source, here the source is going to be our microphone. Inside it we have 'if' statement which check the variable 'ask' is true. If true it will call the function aura_speak(ask) and passes 'ask' as parameter to it. 'audio' is a variable which set the recognizer object in listen() method and passing source which is our microphone. Then initializing voice_data as empty string and inside the try block voice_data variable storing whatever we say for that I have used r.recognize_google(audio) and passes audio. In the except block I have set the microphone object to UnknownValueError and called the aura_speak() function and passes 'Sorry I did not get that'. Another exception is RequestError which calls the function aura_speak() and passes 'Sorry my Speech is down'. Then return voice_data.
I have defined respond() function and passes voice_data. Inside this function I have taken some statements in 'if' block and if that string is being tell by the user then it will call the function aura_speak() and perform the following tasks.
If 'what is your name' is told by user then it will call aura_speak() and pass as 'My name is Aura'.
If 'what time is it' is told by user then it will say the current time through ctime() method.
If 'search' then it will call the function record_audio(), pass 'What do you want to search for?' and store it in search variable. Then the url variable contain the url for google and concatenate with search variable. The webbrowser object will get and open the url in the web browser, then aura_speak() function called and pass 'Here is what I found for' concatenate search variable.
If 'exit' then aura_speak() called and pass 'Thank you' and then exit() method will close the running program.
If 'common management system' is told by the user then it will call aura_speak() and pass 'Opening cms' and after that cms() function is called.
The cms() function will open your cms in the webbrowser. Inside the function username variable will store your registration no and password variable will store the password of your cms. url variable will store the url of your cms. si variable will store the chrome driver file location and get() will get the url in it. While performing any action on a web page using selenium, there is need of locators to perform specific tasks. Locators in web page are used to identify unique elements within a webpage. Web elements could be anything that the user see on the page such as title, table, links, buttons, toggle button, or any other HTML element. In order to find element by ID find_element_by_id() method is used and to find element by NAME find_element_by_id() method is used. Like wise find_element_by_css_selector() is used to find element by css selector. send_keys() method in selenium is used to send text to any field, such as input field of a form. It replaces its contents (i.e username and password) on the webpage in your browser. sumbit() method is used to submit a form after you have sent data to a form and click() method will click on the form. So these all things are putting the username and password in the form and submitting the form in the browser. And after few seconds your cms will open. At last after an input from user quit() method will stop your program. The time.sleep() method will delay the time.
This type of automation should be build to ease our work which does not harm to anybody and there is no job losses also.