Ideas about note pitch recognition

The intention is to devise an algorithm that will recognise the start,
end, amplitude, tuning of multiple notes given mono sound quantised at
44100 signed 16-bit samples per second. This is often done using
methods based on FFT transforms.

The approach to be explored will use only integer arithmetic, and will
use separate recognisers for each possible note. Each recogniser will
have a buffer containing a weighted average of samples for recent
cycles of the note it is attempting to recognise. A recogniser for A3
(A below middle C) who nominal frequency is 220 Hz will have a buffer
with a size of about 44100/220, ie about 200. The buffer size is
chosen to be an even number. For each octave the sample rate is
doubled or halved a suitable number of times to ensure the buffer
sizes of all the notes in that octave are in the range 88 to 168.


The samples at different sample rates are calculated by first
multiplying the given sample rate (typically 44100) by 4 and
interpolating the intemediate points using cubic splines. Other sample
rates are obtained by successively halving the rate and taking
alternate samples. This will be done about 7 times to give a range
from C1 to B6more which is nearly the the full range of a piano
keyboard.

Each recogniser merges the next cycle's worth of samples into its
buffer by a formula such as:

          buf[i] := ((1000-a) * buf[i] + a * sample) / 1000

where alpha is a parameter controlling the rate of decay of recent
sample information. This will essentially causes the buffer tune into
frequencies close to the recogniser's nominal frequency, causing
frequencies other than multiples of the nominal frequency to be
filtered out.

Some smoothing of the waveform might be applied by a transform such
as:

               buf[i] := (buf[i-1] + 2*buf[i] + buf[i+1]) / 4

This will hopefully help to filter out some other unwanted sounds.

An estimate of the amplitude and phase of the nominal pitch is
calculated by forming the convolution of the buffer with a square wave
of the same period. The phase being chosen to maximise the result.
This calculation can be done in time proportional to the size of the
buffer.


+1             ---------------------
              |                     |
 0 ---------------------------------------------
              |                     |
-1 -----------                       -----------
   ^          ^         ^                      ^
   0        phase       i                    period



LET sq(i, phase, period) =  VALOF
{ LET halfperiod = period/2
  LET val = +1
  IF phase>halfperiod DO val, phase := -1, phase - halfperiod
  IF phase < i < phase+halfperiod RESULTIS val
  RESULTIS -val
}

The amplitude is calculated by:

LET amplitude(buf, phase, period) = VALOF 
{ LET res = 0
  FOR i = 1 TO period DO res := res + buf[i] * sq(i, phase, period)
  RESULTIS res/period
}

The phase is chosen that maximises the result of the amplitude call.
The actual implementation calculates this value much more efficiently.

If the note received is not exactly in tune with the natural frequency
of the buffer length, the phase will (slowly) change.  The observed
rate of phase change can be used to calculate the pitch of the
received note more accurately. If the value of phase increases by one
for each cycle, the received frequency will be:

nominal_frequency * (period+1) / period

The rate of phase shift could be used to improve the merging of
of input samples into the buffer.

The note recognisers are grouped in octaves as follows:

         Notes      Sample rate

Octave1  C1 to B1      5512
Octave2  C2 to B2     11025
Octave3  C3 to B3     22050
Octave4  C4 to B4     44100   Note that C4 is middle C
Octave5  C5 to B5     88200
Octave6  C6 to B6    176400

The buffer sizes for each of the 12 semitones of the scale
are octave independent and are as follows:

C  168
C# 158
D  150
D# 140
E  132
F  126
F# 118
G  112
G# 106
A  100
A#  94
B   88


note  freq    rate      upb

A0      27    5512      200
A1      55   11025      200
A2     110   22050      200
A3     220   44100   1  200
A4     440   88200   2  200
A5     880  176400   4  200
A6    1760  352800   8  200
A7    3520  705600  16  200
A8    7040 1511200      100

