We require some postres programming to create a matching algorithm for identifying duplicate customer records within our database of around 120 million customer records (that may or may not be unique).
I have a specific approach in mind that involves scoring potentially matched customers on their specific attributes and then choosing the one with the highest score over a certain threshhold.
The second element of the project involves creating households for customers who likely live within the same home.
The third element is to create a table of nick-names such that common customer names (bill / william, robert / bob, jake / jacob) can be scored equally.
The fourth element is to create a table of invalid email addresses such that these email addresses will not be utilized for matching (even though two or more customers may share these invalid email addresses).
The fifth element of the project is to allow for some degree of typo or misspellings - for example, transpositions within the data such that Johnson would match Johnsno
In the past we have attempted similar functionality using postgres's full text search, but that didn't give enough control over the qualifications for a match.
The application that will consume this will be Ruby / RAILS, but because of the database size and performance requirements, we are not able to get the sort of performance we need using ruby.
The successful completion of this project will include 1) the code necessary to create whatever table structure, functions, triggers, etc. for the matching algorithm, 2) documentation for the previous, 3) table structure for the nick-names, 4) documentation as to how to test, 5) code necessary for identifying households
Your proposal should include specific details about 1) this project, and 2) your approach to solving this problem - specifically how your code will allow us to a) identify duplicates within the existing data, and b) before inserting new customers.
I really don't need to know how many years of experience and in what languages and technologies you or your team has - I mainly want evidence that you understand the problem we're trying to solve and how you intend to solve it.
Here are some examples of customers we would like to match:
1335 Amble Way
Madison, WI 55008
1532 Fourth Street North
Madison WI 55008
This is a typical "master data management" problem in which data sets from different sources are integrated properly into a single master database by removing duplicates and standardizing data. I've already done this t Plus
7 freelance font une offre moyenne de $993 pour ce travail
To whom it may concern, I already have such a score based search algorithm (programmed as a DB function) at hand and using it with MySQL for a municipality. Though there will be some adapting work for your needs. Plus
Sir, forgive me but this is only a partial explanation on how i would tackle the problem, mainly because of the 1500 character constraint. What you actually require is two very different things. The first is to perfo Plus
I have good experience in Postgres database to complete this project Here is what I understand the project and correct me if I am wrong, Implement Postgres stored procedure algorithm to give a matching scrore for Plus
Hello, as requested here is my short solution plan: 1. Create the table for the invalid email addresses and a filter function. 2. Create a table for the nick-names and the synonym full names 3. Create a functio Plus
Hi I think that we can create some objects to store duplicate costumers. It is not a good idea make this online, but run a pgplsql to populate this data. About nicknames, we can use a table to store data or use a p Plus
Sun, 27 Sep 2015 19:49:43 +0530 Efficacy of changes presumes tests already exist for verification of duplicates. Rows marked duplicate need confirmation (e.g. using known queries) to avoid false positives. Can Plus