Postgres Programming / Matching Algorithm

We require some postres programming to create a matching algorithm for identifying duplicate customer records within our database of around 120 million customer records (that may or may not be unique).

I have a specific approach in mind that involves scoring potentially matched customers on their specific attributes and then choosing the one with the highest score over a certain threshhold.

The second element of the project involves creating households for customers who likely live within the same home.

The third element is to create a table of nick-names such that common customer names (bill / william, robert / bob, jake / jacob) can be scored equally.

The fourth element is to create a table of invalid email addresses such that these email addresses will not be utilized for matching (even though two or more customers may share these invalid email addresses).

The fifth element of the project is to allow for some degree of typo or misspellings - for example, transpositions within the data such that Johnson would match Johnsno

In the past we have attempted similar functionality using postgres's full text search, but that didn't give enough control over the qualifications for a match.

The application that will consume this will be Ruby / RAILS, but because of the database size and performance requirements, we are not able to get the sort of performance we need using ruby.

The successful completion of this project will include 1) the code necessary to create whatever table structure, functions, triggers, etc. for the matching algorithm, 2) documentation for the previous, 3) table structure for the nick-names, 4) documentation as to how to test, 5) code necessary for identifying households

Your proposal should include specific details about 1) this project, and 2) your approach to solving this problem - specifically how your code will allow us to a) identify duplicates within the existing data, and b) before inserting new customers.

I really don't need to know how many years of experience and in what languages and technologies you or your team has - I mainly want evidence that you understand the problem we're trying to solve and how you intend to solve it.

Here are some examples of customers we would like to match:

Robert Johnson

1335 Amble Way

Madison, WI 55008



Bob Johnson

1532 Fourth Street North

Madison WI 55008


Desired Skills

PostgreSQL Administration

Compétences : PostgreSQL

en voir plus : what you need to know for programming, what's algorithm, what is the algorithm, what is ruby programming, what is data structure in programming, what is database programming, what is a programming algorithm, what is algorithm in programming, what is algorithm in data structure, what is algorithm, what is a algorithm in programming, what is a algorithm, what can i create with ruby on rails, what are the application of data structure, what are programming languages, what algorithm, us algorithm, the algorithm is, text search algorithm, text matching algorithm, test algorithm, team approach database, sort algorithm, solving algorithm, search algorithm in c

Concernant l'employeur :
( 17 commentaires ) Malerkotla, India

Nº du projet : #8542759

Décerné à:


This is a typical "master data management" problem in which data sets from different sources are integrated properly into a single master database by removing duplicates and standardizing data. I've already done this t Plus

%selectedBids___i_sum_sub_7% %project_currencyDetails_sign_sub_8% USD en 10 jours
(0 Commentaires)

7 freelance font une offre moyenne de $993 pour ce travail


To whom it may concern, I already have such a score based search algorithm (programmed as a DB function) at hand and using it with MySQL for a municipality. Though there will be some adapting work for your needs. Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 10 jours
(1 Évaluation)

Sir, forgive me but this is only a partial explanation on how i would tackle the problem, mainly because of the 1500 character constraint. What you actually require is two very different things. The first is to perfo Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 45 jours
(0 Commentaires)

I have good experience in Postgres database to complete this project Here is what I understand the project and correct me if I am wrong, Implement Postgres stored procedure algorithm to give a matching scrore for Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 14 jours
(0 Commentaires)

Hello, as requested here is my short solution plan: 1. Create the table for the invalid email addresses and a filter function. 2. Create a table for the nick-names and the synonym full names 3. Create a functio Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 5 jours
(0 Commentaires)

Hi I think that we can create some objects to store duplicate costumers. It is not a good idea make this online, but run a pgplsql to populate this data. About nicknames, we can use a table to store data or use a p Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 7 jours
(0 Commentaires)

Предложение еще не подано

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 15 jours
(0 Commentaires)

Sun, 27 Sep 2015 19:49:43 +0530 Efficacy of changes presumes tests already exist for verification of duplicates. Rows marked duplicate need confirmation (e.g. using known queries) to avoid false positives. Can Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 10 jours
(0 Commentaires)