What I need is very simple. Right now I have an algorithm available in C++ code and I want to be able of processing this algorithm in parallel mode using threads to improve overall performance.
The functioning is detailed below:
- A pre-fetching thread looks for the files that are about to be processed and places these files in RAM (I/O from RAM is faster than disk)
- If all the available RAM space has been used, it just stores file pointers to these files on disk
- Working threads go to the RAM location and get files for processing, mark the file as "under process"
- Each working threads processes the file, outputs the result to the screen and moves onto the next file
- Processed files are removed from the RAM list by working threads
Please look on this image as reference: [url removed, login to view] (also attached to this message)
The algorithm to implement is ssdeep, source code available at: [url removed, login to view]
My intention is that your code can later be used for other algorithms, so it needs to be generic enough to allow using other algorithms in the future.
- needs to compile in Windows and Linux
In terms of configuration, I want to be able of specifying:
- number of working threads
- max RAM size used and/or percentage of RAM to use
For a professional developer, this is a quick work.
As final test, your multi threaded version needs to run faster than the single-threaded ssdeep across a significant set of files.