Refactoring the reading of data from files by a C++ program Skills and experience required 1. Expert in C++ programming, especially with the GNU C++ compiler and GNU C++ Standard library: specifically with file input/output, string handling and regular expression syntax. 2. Basic grounding in mathematics (sufficient to understand the definitions of a matrix and a vector) 3. Experience with using GNU Concurrent Versions System for version control Details of project We have developed a C++ program for statistical analysis named ADMIXMAP, that runs as a console application either on GNU/Linux or Windows. When started, ADMIXMAP reads data from up to five different files. We want to refactor the program so that reading data from files, and passing the data to program objects when required, is undertaken by a separate object. The work required is as follows:- 1. Create a new class “InputData?? and edit the main subroutine to instantiate an object of class InputData when the program starts up 2. Search the source code of ADMIXMAP to locate all code that reads data from files into program objects. Substitute this code with member functions of the class InputData. These functions should read data from files into arrays, and pass the contents of these arrays to program objects. Each input file should be read once only. 3. Some code to parse input data has been written using a locally-written C++ wrapper (liblshtm) for the GNU regex C library. This should be replaced by code that uses the String and Regex classes implemented in the GNU C++ Standard Library (as described in http://docs.freebsd.org/info/libg++/libg++.[url removed, login to view]). If this project is completed satisfactorily, we may have further work on the program for the coder.
The contract will have been satisfactorily completed when (a) Code for the new class InputData has been written and placed in a file [url removed, login to view] with an accompanying header file InputData.h. Documentation should be adequate for another programmer to use the class and its methods. (b) Code to instantiate an object of class InputData has been added to ADMIXMAP, and all ADMIXMAP code that currently reads data from input files into program objects has been replaced by member functions of the InputData class. (c) The program ADMIXMAP, refactored as above, works with the test script supplied, and is tolerant to variation of input data file formats as specified below:- (a) Use of tabs instead of spaces (b) Use of “/?? instead of “,?? (c) Use of any of the following strings to denote missing values: a hash symbol (“#??), a period (“.??), or “NA??. (d) Enclosing or not enclosing a string in quotes (except where a string variable contains spaces) (e) Extra spaces, or extra blank lines at the end of the file. Program development can be undertaken either in Windows using the MinGW port of the GNU C++ compiler, or in GNU/Linux. The final version must run on both platforms (as the current version does). (d) the changes to the source code have been merged with the current working version in our CVS archive, and the merged version has been tested by running the Perl test script supplied. This test script compares the program output with the output from the previous version. Refactoring should not change the program output. We estimate that this project is about five to seven days’ work for an experienced programmer. Before quoting a price, the coder should estimate the workload by browsing the source code. The source code can be downloaded from our CVS archive To access the CVS repository, use the following settings, with a blank password:- Authentication: :pserver Path: /dsk3/cvs Host address: [url removed, login to view] User name: anonymous There are three directories in this archive:- genepi/admixmap: this directory contains the code for program ADMIXMAP. genepi/gslwrap-ch: this directory contains C++ wrappers for matrix and vector C libraries. ADMIXMAP uses a method in the matrix class to read data from file ??" code in ADMIXMAP that uses this method should be replaced as outlined above. genepi/liblshtm: this directory contains a locally-written C++ wrapper for the GNU regex C library. Code in ADMIXMAP that uses this library to parse input data should be replaced with code using the GNU Standard C++ library String and Regex classes as outlined above. Clarification of how the program works, and what is required, will be available by email from the statistician who is currently developing the program. All source libraries used by ADMIXMAP, and the ADMIXMAP program itself, are freely available under a GNU GPL. Any third-party source code used in this project should be available under a GPL. For work done under this project, the copyright owner (for purposes of upholding the GPL) is University College Dublin.
Program runs as a console application either in GNU/Linux or in Windows XP. Program development can be undertaken either in Windows using the MinGW port of the GNU C++ compiler, or in GNU/Linux. The refactored program must run on both Windows XP and GNU/Linux (as the current version does).