384438 DSP work (note detection)

N/A

En cours

Publié

il y a plus de 14 ans

N/A

Payé lors de la livraison

This is work for a iPhone app thats already on the market, and we are now looking to improve the note detection accuracy. Here are the project specs as they stand right now. 1. general specs - the voice data being analyzed is read from a prerecorded file (i.e. it doesn't need to be analyzed in real time) - the file format is Apple AIFF (will always be single singer/single channel. Currently being recorded at 44.1kHz though this can be changed) - there are three functions being used (all three are quoted in full at the end of this email) - they start with the AIFF file and return an array of frequencies (one for each FFT window) a. getData() - reads the AIFF file and writes amplitudes scaled to be between (-1.0 and 1.0) to the inData buffer. This buffer contains amplitudes with no processing applied to them at all b. getFrequencies() - takes a pointer to the amplitude buffer as one of its arguments (the "inData" argument). Writes processed frequency values to the array pointed to by "outData". It currently just picks the partial with the highest amplitude c. fft() - called by getFrequencies() within its "main processing loop" - some variables are hard coded where they shouldn't be, just for convenience 2. voice fluctuation issue with users with poor singing technique - to deal with this problem we are currently detecting at 44.1kHz with a window of 2048 and then using several relatively straightforward algorithms to obtain the most valid note from among groups of 4 or more consecutive frequencies returned by the three functions quoted above - so one question we have is if you know of a better approach to this problem 3. setup and context - we would send you the full XCode project file ready to compile. So you would just need a provisioning profile from Apple. If you don't have a developer account with Apple then we can provide you with a membership so you can obtain the necessary profile for testing. Won't take anytime at all to get you set up with that. 4. performance - starting with a 30sec recording from the AIFF file it currently takes approx. 6 seconds to read and analyze that data using the three functions mentioned [i.e. getData(), getFrequencies(), and fft() ] and the filtering code on an iPhone 3GS - we could tolerate an analysis time of 15 secs for an equivalent recording - for performance testing you would simply be comparing the efficiency of your code to that of the Apple AppStore-bought app. Anything matching that speed or exceeding it is fine 5. accuracy, time resolution, and range - basically we would like to get as much accuracy and time resolution as is reasonably possible to achieve, within the performance boundaries mentioned above - of the two, accuracy is more important than time resolution - the range should cover the entire human concert voice (bass to soprano). We can skip a couple of the lowest notes if that's too taxing on the other parameters. If possible, we would like the upper range to go considerably higher than that (say to approx. 2100Hz) 6. APIs - the only API requirement is that you don't use any private APIs in the form of a DLL, etc. (we need all code for this in the open, including the DFT/FFT code if used). Note that the FFT approach is not a requirement. If you know of a better approach in terms of accuracy and efficiency for voice then you're free to use that 7. the final product - you can change the three base functions in any way you wish. Or write new ones if that would be faster. The only thing we would like is that the data flow be clearly the same (i.e. start with the AIFF file, and return an array of processed frequencies for the time resolution being used) Let me know when your DSP engineers could get started & there quote for this. Thanks so much, Kevin Falk /*--------------------------- functions -------------------------------*/ void SoundProcessor::getData(){ FILE *handle = fopen([[NSHomeDirectory() stringByAppendingPathComponent:(at symbol goes here)"/Library/Caches/[login to view URL]"] UTF8String], "rb"); if (handle != NULL){ UInt8 bytes[4]; // move file pointer to SSND chunk fseek(handle, 4080, SEEK_SET); // get size of the audio data in bytes fseek(handle, 4, SEEK_CUR); fread(bytes, sizeof(UInt8), 4, handle); UInt32 nBytesData = uInt32FromByteArray(bytes) - 8; // initialize data arrays dataSize = (UInt32)round(nBytesData/2); if (inData) // inData and outData are both instance variables delete [] inData; inData = new float[dataSize]; // size of output array should be equal to the number of windows if (outData) delete [] outData; outData = new double[(int)ceil(dataSize/FFT_FrameSize)]; // get data offset from SSND ID fread(bytes, 4, sizeof(UInt8), handle); UInt32 dataOffset = uInt32FromByteArray(bytes); // read data to memory fseek(handle, 4096 + dataOffset, SEEK_SET); // total offset: 4080 + 16 + dataOffset UInt8 sBytes[2]; for (int i=0; i < dataSize; i++){ fread(sBytes, 2, sizeof(UInt8), handle); inData[i] = normFloatFromByteArray(sBytes); } fclose(handle); } } int SoundProcessor::getFrequencies(long numSampsToProcess, long fftFrameSize, long osamp, float sampleRate, float *indata, double *outData){ static float gInFIFO[FFT_FrameSize]; static float gFFTworksp[2*FFT_FrameSize]; static float gLastPhase[FFT_FrameSize/2+1]; static float gAnaFreq[FFT_FrameSize]; static float gAnaMagn[FFT_FrameSize]; static long gRover = false, gInit = false; double magn, phase, tmp, window, real, imag; double freqPerBin, expct; long i,k, qpd, inFifoLatency, stepSize, fftFrameSize2; int count = 0; // window counter double maxAmplitude = 0.; // tracks the max amplitude for the samples (used externally) // convenience vars fftFrameSize2 = fftFrameSize/2; stepSize = fftFrameSize/osamp; freqPerBin = sampleRate/(double)fftFrameSize; expct = 2.*M_PI*(double)stepSize/(double)fftFrameSize; inFifoLatency = fftFrameSize-stepSize; if (gRover == false) gRover = inFifoLatency; // initialize static arrays if (gInit == false) { memset(gInFIFO, 0, FFT_FrameSize*sizeof(float)); memset(gFFTworksp, 0, 2*FFT_FrameSize*sizeof(float)); memset(gLastPhase, 0, (FFT_FrameSize/2+1)*sizeof(float)); memset(gAnaFreq, 0, FFT_FrameSize*sizeof(float)); memset(gAnaMagn, 0, FFT_FrameSize*sizeof(float)); gInit = true; } // main processing loop for (i = 0; i < numSampsToProcess; i++){ // check if buffered data is sufficient gInFIFO[gRover] = indata[i]; gRover++; if (gRover >= fftFrameSize){ // buffer is ready gRover = inFifoLatency; // do windowing and real/imaginary interleave for (k = 0; k < fftFrameSize;k++) { window = -.5*cos(2.*M_PI*(double)k/(double)fftFrameSize)+.5; gFFTworksp[2*k] = gInFIFO[k] * window; gFFTworksp[2*k+1] = 0.; } // transform fft(gFFTworksp, fftFrameSize, -1); for (k = 0; k <= 128; k++) { // cuts off at frequencies over ~2700 // de-interlace FFT buffer real = gFFTworksp[2*k]; imag = gFFTworksp[2*k+1]; // compute magnitude and phase magn = 2.*sqrt(real*real + imag*imag); phase = atan2(imag,real); // compute phase difference tmp = phase - gLastPhase[k]; gLastPhase[k] = phase; // subtract expected phase difference tmp -= (double)k*expct; // map delta phase into +/- Pi interval qpd = tmp/M_PI; if (qpd >= 0) qpd += qpd&1; else qpd -= qpd&1; tmp -= M_PI*(double)qpd; // get deviation from bin frequency from the +/- Pi interval tmp = osamp*tmp/(2.*M_PI); // compute the k-th partials' true frequency tmp = (double)k*freqPerBin + tmp*freqPerBin; // store magnitude and frequency in analysis arrays gAnaMagn[k] = magn; gAnaFreq[k] = tmp; } // get the fundamental frequency for the current frame/window // (for now just get the frequency with max amplitude) int j, maxIndex; double magnitude; for (j=0; j < 128; j++){ // ignore frequencies above 2100Hz if (gAnaMagn[j] > magnitude && gAnaFreq[j] < 2100.0){ magnitude = gAnaMagn[j]; maxIndex = j; } } if (magnitude > 0.3) // treat windows with max amplitude < 0.3 as silence outData[count] = gAnaFreq[maxIndex]; else outData[count] = 0.; count++; // next window // update the max amplitude recorded so far (convenience var for external code) double exMax = 0.; for (int m=0; m < 128; m++){ if (gAnaMagn[m] > exMax) exMax = gAnaMagn[m]; } if (exMax > maxAmplitude) maxAmplitude = exMax; } } if (maxAmplitude > 3.0) return 3; else return 1; } void SoundProcessor::fft(float *fftBuffer, long fftFrameSize, long sign){ float wr, wi, arg, *p1, *p2, temp; float tr, ti, ur, ui, *p1r, *p1i, *p2r, *p2i; long i, bitm, j, le, le2, k; long test1 = (long)(2*fftFrameSize-2), test2 = (long)(log(fftFrameSize)/log(2.)+.5), test3 = (long)(fftFrameSize/4); for (i = 2; i < test1; i += 2) { for (bitm = 2, j = 0; bitm < 2*fftFrameSize; bitm <<= 1) { if (i & bitm) j++; j <<= 1; } if (i < j) { p1 = fftBuffer+i; p2 = fftBuffer+j; temp = *p1; *(p1++) = *p2; *(p2++) = temp; temp = *p1; *p1 = *p2; *p2 = temp; } } for (k = 0, le = 2; k < test2; k++) { le <<= 1; le2 = le>>1; ur = 1.0; ui = 0.0; arg = M_PI / (le2>>1); wr = cos(arg); wi = sign*sin(arg); if (le2 <= 512){ for (j = 0; j < le2; j += 2) { p1r = fftBuffer+j; p1i = p1r+1; p2r = p1r+le2; p2i = p2r+1; for (i = j; i < test3; i += le) { tr = *p2r * ur - *p2i * ui; ti = *p2r * ui + *p2i * ur; *p2r = *p1r - tr; *p2i = *p1i - ti; *p1r += tr; *p1i += ti; p1r += le; p1i += le; p2r += le; p2i += le; } tr = ur*wr - ui*wi; ui = ur*wi + ui*wr; ur = tr; } } } }

384438 DSP work (note detection)

N/A

N/A

Concernant le projet

Cherchez-vous à gagner de l'argent ?

Avantages de faire une offre sur Freelancer

À propos du client

Vérification du client

Autres travaux de ce client

Travaux similaires