“Singing is a universal language of beauty, accessible to everyone, regardless of age or nationality. There is no falseness in beautiful vocals where the sounds are in harmony. A melody is a set of sounds. Conventional points denoting sounds are notes. To sing well means to ‘hit the notes’, that is, to repeat the melody exactly, or to come up with your own, where certain sounds are combined within it” writes Ekaterina Karpenko – a vocal coach with many years of experience, who developed the methodology for an online vocal coaching service. The program teaches the user to ‘hit the notes’ and evaluates the accuracy accordingly.
Since 2015, EDISON programmers have been developing ‘Clever singing’ – a service that displays singing in the form of a two-dimensional graph. The abscissa represents the time in milliseconds, and the ordinate, divided up into notes, represents the frequency of the sound in hertz. The algorithm considers the pitch of the sound and puts it on a plane, thereby making a singing graph. The program analyzes performance and compares discrepancies with the reference track recorded in advance by the teacher. At the end of the exercise, the program sets a score based on whether the notes are hit.
“Due to an unusual issue, we had been looking for a contractor for a long time. EDISON's team was brave enough to take on the task. The promptness and politeness of the staff contributed to further cooperation.”
Building an algorithm that determines the pitch of a sound digitized by a microphone proved to be quite the challenge. Under ideal conditions, the autocorrelation technique can be used. In real conditions, there are a lot of extraneous noises to deal with: the rustling of clothes, the sounds of the street, and so on. As a result, the accuracy of the algorithm falls unacceptably. The use of noise reduction filters aims to mathematically prepare the audio stream before autocorrelation, but even its use failed to yield any results. At some point, the engineers tried out the Aubio library in C, which implements several algorithms, one of which, YINFFT, turned out to be the best in terms of accuracy and speed. For the site, YINFFT was ported to JavaScript.
The signal is divided into frames with lengths of about 12 ms each. Sounds quieter than the minimum volume are filtered out, then the Hanning window function is applied. The prepared signal enters the input of the YINFFT pitch calculation algorithm. The output is the sound pitch and the accuracy level. Unreliable data is rejected, and a graph is built on the remaining filtered data.
Scope of work:
- creating a website;
- creating a training program;
- the ability to share the assessment of the exercise on social networks;
- enabling ‘Robokassa’ payments.