Baby Cry Detection and Music Player










Auto Baby Cry Detector Music Player


Akshay Gade[1], Ashutosh Koli[2], Samarth Misal[3], Sanskruti Kakade [4], Prof. Varsha Dange [5]

akshay.gade20@vit.edu[1], ashutosh.koli20@vit.edu[2], samarth.misal20@vit.edu[3], sanskruti.kakade20@vit.edu [4]

Department of Computer Engineering, [1][2][3][4][5]

Vishwakarma Institute of Technology, Bibwewadi, Pune, Maharashtra-411037

Abstract - Effective automatic baby cry sound detection plays a significant role in many applications of smart baby condition monitoring. In this paper, we compare machine learning and classical approaches for detection of baby cry sounds in various domestic environments under challenging signal-to-noise ratio conditions. Automatic cry detection has applications in commercial products such as baby remote monitors as well as in medical. We design a system in which baby’s cry will be detected and music player will be played after detection in order to create soothing environment for baby. Also evaluate K-Nearest Neighbors (KNN) architectures for baby cry detection, and compare their performance to that of classical machine-learning approaches. In addition to feed-forward KNNs which are able to capture temporal behavior of acoustic events. In both cases, the input consists of the k closest training examples in data set. The output depends on whether k-NN is used for classification.

Keyword -- Baby cry detection, Deep learning, Audio detection, K-Nearest Neighbors
 
I. Introduction

Due to the busy lifestyle of the human being this truly affects the basic livelihood of the human being. In busy lifestyle parents are busy in their professional life, so they do not get sufficient time to take care of their babies. it is inconvenient for parents to constantly watch over their newborn baby while doing their work. So, we have designed an simple system which helps parents in taking care of baby.
This system proposes a simple voice detection system which can be applied practically for designing a device with capability to detect a baby’s cry and automatically turning on a Baby sleep music. Whenever the baby cries it is detected by the system with the help of Mic and in order to that turns on the music and sleep mechanism which creates soothing sound which makes baby sleep gently. In this project a program is implemented to detect an infant’s crying. It can detect baby’s cry while ignoring other sounds like clap, sneeze, fan, sudden sounds, environment sounds etc.

Currently there are many types of baby monitoring systems with wearable option, android applications, wirelessly controlled camera systems etc. Most of these systems are covered only home using Wi-Fi or Bluetooth. Due to this condition, employed parents (especially mothers) cannot ensure the safety of their babies because, they are unable to connect with the child when they’re at working places. There are few products which have remote monitoring facilities. However, those are priced high-end products which are not affordable in developing countries like Sri Lanka. Further, these products are not easy to set up and by fixing near to the baby may cause health hazards due to electromagnetic radiation. This product is designed for an affordable cost and it can be used from birth to 12 months of babies with ability to detect the crying immediately and send notification to warn parents/caregiver while a soft sound to sooth the baby covered only home using Wi-Fi or Bluetooth. Due to this condition, employed parents (especially mothers) cannot ensure the safety of their babies because, they are unable to connect with the child when they’re at working places. There are few products which have remote monitoring facilities. However, those are priced high-end products which are not affordable.

Cry research involves data collection, cry signal processing, feature extraction and selection, and classification. Due to the sensitivity of cry data, it has been difficult for researchers to acquire data needed. Researchers either record cry clips by themselves or ask permissions for datasets from other authors. Signal processing is a must to remove background noises and perform cry segmentation to build cry databases. Once the database is available, feature extraction is the step to extract features from different domains of the cry signals. Features extracted from time domain, cepstral domain, or prosodic domain, etc. represent different aspects of the cry signal. Selecting the most appropriate features and reducing the feature dimensions are another task to build effective classification models. Applying appropriate machine learning models for specific cry features is vital for classification or detection accuracy.

The performance evaluation is carried out using an annotated database containing several hours of recordings of babies in domestic environments. We show considerable performance gain compared to classical machine-learning approaches, especially at the low false-positive rate regime.

KNN to classify new data, make a prediction, we use the already known data (or labeled data) as a kind of look-up table.



II. CRY SOUND RECOGNITION METHOD

In this study, we present cry sound recognition methods based on the MFCC features and machine classifiers such as, KNN[3]. The performance of the cry sound detection methods is evaluated under different background sounds encountered in indoor home environments. The block diagram of ML based cry sound recognition method is shown in the mentioned diagram, consisting of four steps: preprocessing, extraction of features, cry models, identification. Each cry sound detection steps are explained in the further subsection:


A. Preprocessing

Practically, cry sound signals which are recorded are commonly corrupted with low-frequency noise components such as micro- phone artifacts, recording instrument biasing and power-line interference which are generated by the sensors movements and electromagnetic (EM) since the audio sensor is exposed to the environments. Therefore, the audio recorded signal is sent between a High Pass Filter (HPF) with 60 Hz threshold frequency. Next, the signal is split up into frames with three frame lengths such as 100 milliseconds, 250 milliseconds and 500 milliseconds which are considered for performance evaluation. The intensity of the cry sound varies due to the stochastic nature of cry sound production and the area of the sound observing zone which is time fluctuating from acoustic sensors the source of sound. In practice the sound origin location can be unknown. Although the function is not sensitive to the amplitude rate, the normalization of amplitude is performed on the zero-mean audio formed signal to limit changes in the sensitivity of microphone.
 
B. MFCC Feature Extraction

In this study, we use the MFCC features for recognizing the cry sounds signals. Each of the frame, we extract MFCC features for the cry, fan and air-conditioner, music, and speech signals.
Fourier Spectrum: Window formatted audio frames z[n] are taken by using FFT, which gives energy distributions over frequencies thus giving the magnitude of the spectrum.

Mel-Frequency Spectrum:

The magnitude of spectrum is procreated by using 26

band-pass filters which have places at regular intervals on the Mel- scale. This concerns to linear frequency f expressed as:

(f)=11.25 1n (1+ f) (3)

700

Mel-frequency is the log of f (linear frequency) which manifests similar effects of audio in the people’s view.


C. Algorithm

The problem of finding the hidden structure in unlabeled data. Clustering is one approach of unsupervised learning. The goal of clustering to separate the data based on similarities between varieties of classes. Each cluster has a cluster center called centroid.



Steps involved in clustering:

  • Stores all the available cases and classifies the new data or case based on a similarity measure.
  • Features extracted are used in KNN to make the training data.
  • When new user input (speech) is given, KNN algorithm identifies the speaker based on the training data.


III. Methodology of Infant Cry Detection

The aim of the detection algorithm is to classify each incoming segment of a stream of input audio signals as ‟cry‟ or “no cry‟ The algorithm analyses the signal at various time- scales (segments of several seconds, sections of about 10 second, and frames of several tens of milliseconds). Figure shows the audio processing algorithm to detect cry signal.








Fig: Infant cry detection algorithm –block scheme


· A KNN is applied and the amount of activity is calculated for each segment.

· Each segment is further divided into sections of 1 second, with an overlap of 50%.

· If the activity duration of a given section is below to a predefined threshold (30%), the section is considered as having insufficient activity, and is classified as ‟no cry‟ or “cry”.

· If the activity is above the threshold, the section is divided into short-time frames (with duration of 32 msec and a hop size of 16 msec).

· Each frame is classified either as ‟cry‟ or as ‟no cry‟, based on its extracted features using a k-NN classifier.

· For each section, if at least half of the frames are classified as ‟cry‟, the whole section is considered as ‟cry‟, Otherwise, it is considered as ‟no cry‟.

· Use K-NN classifier to classify the data in which each frame is classified either as a crying sound (‟1‟), if close enough to cry training samples, or as ‟no cry‟ (‟0‟). The signal is divided into consecutive and overlapping segments, each of 10 seconds, with a step of 1 second.


IV. Proposed system



A simple voice detection system which can be applied practically for designing a device with capability to detect a baby’s cry and automatically turning on a Baby sleep music.

That monitors child all the time and identifies infant cry and notifies parents with text message or alert message that their infant is crying based on emergency of the cry. It uses speech signal techniques on hardware components to build a real time system. Proposed system processes real time audio signal by applying signal processing techniques and extracts the features and intelligently detects the infant cry signal.


V. Performance Evaluation
A. Dataset


The dataset or this study consists of audio recordings (sampled at 44, 100 Hz) of babies. The babies were recorded continuously for several days in a domestic environment. the recordings were fully annotated, with about 50 different event types, such as crying, parents talking, door opening/closing, etc.


B. Training and Test Process

Our training corpus contained 20% of the labeled data, whereas the test corpus contained the remaining 80%. we trained our KNN architectures which is an adaptive learning rate optimization algorithm, with an initial learning step of 0.00001. the gradient in each iteration was evaluated using mini-batches of 32 segments. our loss function was cross-entropy loss. the networks were trained over of the training data.
 
C. Result Analysis

Our System Records audio for 6 secs at sample rate of 4100Hz then based on the extracted features system will detect either the baby is crying or not and will display the output, if yes then it will send text message to the registered mobile number and will play a song with higher priority, if baby stops crying at any particular song, then system will increase the priority of that song and will play it first when next time the baby is crying.

Fig: Output



Fig: Received SMS


VI. CONCLUSION

With the help of the proposed cry detection algorithm, it can easily identify the cry and verified it by using KNN with accurate results. Other than using only MFCC, the combination of Pitch and MFCC gives a more promising approach to cry detection. All of these can improve the recognition accuracy. Cry detection has been challenging because of the highly variable nature of input speech signals. Speech signals in training and testing sessions can be different due to many facts such as:

• Baby's voice change with time

• Variations in recording environments play a major role.

Therefore, increasing more training samples of different noises and speeches would give more accurate results. We compare deep learning and classical approaches for detection of baby cry sounds in various domestic environments. Automatic cry detection has applications in commercial products (such as baby remote monitors) as well as in medical and psycho-social research. We try to design and evaluate several architectures for baby cry detection, and compare their performance to that of classical machine-learning approaches.








Comments

Post a Comment

Popular posts from this blog

Algorithm Visualizer