
In Progress
Posted
We are building Pharmelis Registry — a canonical database for pharmaceuticals. To make any pharmaceutical product understandable, anywhere, in any language. Pharmaceutical data today is fragmented, inconsistent, and multilingual. There is no reliable way to identify and reconcile products globally across sources and markets. Pharmelis Registry aims to capture real-world data and resolve it into clear, unambiguous product identities. Objective Design a system that: - ingests heterogeneous pharmaceutical data (CSV, APIs, websites, PDFs, images) - works across countries and languages - handles messy, inconsistent data - resolves product identity across sources - produces a consistent global reference - includes an internal dashboard to operate the system The dashboard must allow: - inspecting ingested data - reviewing identity decisions - monitoring system coverage - adding and managing data sources Deliverables Provide a concise architecture document covering: - identity resolution strategy (core part) - ingestion approach for different source types - core data model (what is stored vs computed) - system architecture and data flow - dashboard design and capabilities - recommended tech stack and trade-offs Required Profile Strong in: - Python - data engineering / pipelines - web scraping / ingestion - API/backend development - dashboard or web app development Experience with messy data, search systems, or entity resolution is expected.
Project ID: 40411582
29 proposals
Remote project
Active 17 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
29 freelancers are bidding on average $22 USD/hour for this job

Hi, This is Elias from Miami. I have gone through your project description and understand you’re looking to build Pharmelis Registry — a canonical database for pharmaceuticals. This system will serve as a global repository for pharmaceutical data. With over 10 years of experience in backend development and data management, I've successfully built scalable databases and integrated APIs for various applications. I’m confident I can help you design a robust architecture for this project. To approach this, I would start by defining the database structure to ensure efficient data storage and retrieval. Then, I’ll focus on creating secure APIs for data access and integration, while also considering scalability for future growth. I have a few questions to get a better understanding: Q1 – What specific data will be included in the Pharmelis Registry, and how will it be structured? Q2 – Are there any existing systems or databases that we need to integrate with? Q3 – What user roles do you envision for accessing the registry, and how will authentication be handled? I’d be happy to go through the details and suggest the best technical approach. Looking forward to hearing from you.
$50 USD in 10 days
7.9
7.9

Hi, To design a system for Pharmelis Registry, I'll create a robust architecture that ingests and reconciles heterogeneous pharmaceutical data. This will include: - Developing a clear identity resolution strategy - Designing the ingestion approach for various source types - Creating a core data model - Outlining the system architecture and data flow - Designing the internal dashboard for data management - Recommending a suitable tech stack I will approach this by analyzing the requirements, ensuring the system can handle messy data, and implementing a structured workflow for data ingestion and processing. Ready to start once you provide further details on existing data sources and any specific requirements. Thanks!
$20 USD in 40 days
7.1
7.1

⭐⭐⭐⭐⭐ Design the Architecture of Pharmelis Registry (Global Pharmaceutical Data System) -- 4 ❇️ Hello. After carefully reviewing the goals and challenges of your project, I am excited to propose a robust and scalable architecture for Pharmelis Registry. With my extensive background in data engineering, Python development, and handling complex, multilingual datasets, I am uniquely equipped to lead this initiative towards a successful completion. ➡️ Why Me? I hold a PhD in Information Technology with over 10 years of experience in data pipeline construction, API development, and web application creation. My proficiency in Python and various data processing technologies will directly contribute to designing a system that efficiently ingests, processes, and unifies pharmaceutical data from diverse sources worldwide. ➡️ Lets discuss your project in more detail, and I can also provide access to case studies from my previous projects which align closely with your requirements. ➡️ Some of my similar work: ✅ Development of a multilingual content aggregation system for a global news outlet ✅ Creation of a data reconciliation tool for mismatched financial records across multiple countries ✅ Design and implementation of a centralized medical records system for a network of European hospitals ✅ Automated data extraction and analysis platform that processes complex datasets from PDFs, images, and web sources ✅ Custom dashboard development for real-time data monitoring and management in public health sector These experiences have honed my skills in creating systems that not only meet the technical requirements but are also adaptable to the practical demands of international data variability and the need for clear, actionable insights. Waiting for your response! Best Regards, Dr. Muhammad Asad
$18 USD in 30 days
6.9
6.9

Hey! We’re a team of 62 professionals specializing in data engineering and backend systems with 9+ years of experience building scalable pipelines, entity resolution systems, and internal dashboards. Here's how we can help: * Design robust identity resolution logic for messy multilingual pharma data * Build scalable ingestion pipelines for APIs, files, web, and PDFs * Define clean data model separating raw, normalized, and resolved layers * Develop internal dashboard for review, monitoring, and source control We’ll deliver a concise architecture covering resolution strategy, ingestion flows, data modeling, system design, dashboard structure, and clear tech stack trade-offs. Could you clarify if you want identity resolution to rely more on deterministic rules first, or include probabilistic matching and scoring from the beginning?
$20 USD in 40 days
5.4
5.4

Hello!, This is James from Hollywood. I’ve carefully reviewed your project on designing the architecture for the Pharmelis Registry, and I'm excited about the opportunity to help you build this essential canonical database for pharmaceuticals. With over 15 years of experience in backend development, data management, and API integration, I understand the complexities involved in creating a robust data system. My expertise in Python, Django, and data processing ensures that we can deliver a scalable and efficient solution tailored to your needs. To get started, I’d love to clarify a few points: Could you please clarify the following questions to help me better understand the project? 1. What specific data sources do you anticipate integrating into the Pharmelis Registry? 2. Are there any compliance requirements we should consider while designing the architecture? 3. What is the timeline you envision for the initial launch of the registry? My approach involves structured milestones, ensuring open communication and timely delivery. I have developed similar projects, such as a pharmaceutical inventory management system and a patient data integration platform, which can serve as references for my work ethic and capabilities. Let’s connect and discuss how we can make the Pharmelis Registry a success!
$50 USD in 10 days
5.3
5.3

You want a canonical registry that can take CSVs, APIs, websites and scanned labels and resolve messy, multilingual product records into one unambiguous identity — that’s exactly the problem I enjoy solving. The hard part isn’t just ingesting formats; it’s preserving provenance, handling packaging/strength/formulation variants, and surfacing confidence so humans can correct the graph where automated matches are uncertain. I recently designed the ingestion and entity-resolution architecture for a global medical-supply registry that consolidated 60+ source types and reduced duplicate identities by ~85%. My deliverable will be a concise architecture doc covering connectors (CSV/API/scrapers/OCR), a normalization layer, a canonical entity graph with probabilistic + rule-based matching, separation of raw vs computed stores, and a Django dashboard for review, coverage monitoring, and source management. Recommended stack: Python, Airflow, PostgreSQL + Elasticsearch, Celery, Docker/K8s — with trade-offs and monitoring/metrics included. Quick question: which countries and source formats should I prioritize first, and do you already have any authoritative identifiers (GTIN, national codes) to seed matching?
$20 USD in 7 days
4.8
4.8

Dear Client, I’m an experienced full-stack developer with over 10 years of experience in web and mobile application development, specializing in building scalable, responsive, and high-performance solutions for diverse business needs. I understand you are looking for a reliable developer to build or improve your project, including web or mobile applications similar to CRM, dashboards, or APIs, and I have worked on similar solutions successfully. My skills in React, Vue, Laravel, PHP, Python, REST APIs, and database design ensure efficient and high-quality delivery. Feel free to share more details or ask questions. I’m ready to refine my approach to match your exact requirements. Looking forward to working with you. Best regards, Md Ruhul Ajom
$20 USD in 40 days
4.9
4.9

Hi, With 26+ years of industry experience, we’ve designed data platforms that handle fragmented, multilingual, and inconsistent datasets across healthcare and compliance domains. We’ve carefully reviewed your requirements and fully ✅ understand that Pharmelis Registry is fundamentally an identity resolution system for pharmaceutical data at global scale. ⚡ ✨ Our Architecture Approach We’ll design the system around a strong identity core with scalable ingestion: 1️⃣ Multi-Source Ingestion – CSV, APIs, web scraping, PDFs (OCR), images 2️⃣ Normalization Layer – Standardize names, units, languages, formats 3️⃣ Identity Resolution Engine (Core) Fuzzy matching + rule-based + ML-assisted linking Confidence scoring + human-in-the-loop validation Canonical entity creation with versioning 4️⃣ Data Model – Raw → normalized → resolved entities (separated layers) 5️⃣ Pipeline Architecture – Async processing (queue-based, scalable) 6️⃣ Search Layer – Fast lookup using indexed/global identifiers Dashboard Design ? View raw vs resolved data side-by-side Review/override identity decisions Monitor ingestion coverage & quality Manage sources and pipelines Why Us Strong experience in entity resolution & messy data systems Built pipelines for multi-source data reconciliation Focus on practical, scalable architecture (not theoretical) Ready to design a system that can reliably standardize pharmaceutical data across markets and languages. With regards, Harshvir Singh
$20 USD in 40 days
5.0
5.0

Building a canonical database like Pharmelis Registry requires a precise balance between strict data normalization and the flexibility to handle international regulatory variances. I recently architected a high-concurrency master data management system for a global healthcare provider, successfully implementing a single source of truth for over 60,000 unique SKUs across multiple jurisdictions. My approach ensures every entry is globally unique and compliant with standards like GS1 and IDMP, preventing the data fragmentation that often plagues medical registries. I prioritize designing systems where data integrity is the core foundation for growth. To ensure Pharmelis Registry remains the gold standard, I propose a microservices architecture using PostgreSQL for transactional integrity, paired with an ElasticSearch layer for high-speed global discovery. I will design an extensible schema that captures complex drug relationships—active ingredients, dosage forms, and manufacturer metadata—while utilizing a high-performance RESTful API layer with OAuth2 for seamless integration. I will also incorporate an automated ETL pipeline with validation logic to ingest data from diverse international sources, ensuring the registry remains synchronized as your data volume grows. Are you prioritizing real-time synchronization with bodies like the EMA or FDA, or is the focus on batch ingestion for the pilot? I am curious about the expected scale for API consumers, as this will dictate whether we implement a multi-region database cluster from day one. I am available for a brief chat or a call to dive into the specific data structures and align on the technical roadmap. Let’s connect to discuss how we can build this to be both scalable and reliable.
$25 USD in 7 days
3.3
3.3

Hello, I understand Pharmelis Registry aims to unify global pharmaceutical data into a reliable, multilingual canonical reference. I’ll craft a robust architecture that seamlessly ingests heterogeneous data, resolves identities across sources, and presents a clear dashboard for governance and operations. Solution highlights: - Identity resolution strategy: canonical IDs with cross-source linking, probabilistic matching + rule-based disambiguation, multilingual normalization, audit trails, and performance-focused indexing (e.g., inverted index + fuzzy search). - Ingestion approach: modular workers for CSV, APIs, PDFs/images (OCR for scanned docs), and web scraping with respect-to data provenance; schema mapping with strict validation and schema-on-read capabilities for flexibility. - Core data model: authoritative product_identity, cross_reference links, multilingual labels, provenance metadata, computed fields (score, confidence, coverage metrics). - System architecture & data flow: ingest workers -> staging area -> identity engine -> canonical store -> dashboard API; event-driven processing, materialized views for dashboards. - Dashboard design: data ingest inspection, identity decision reviews, coverage monitoring, source management, audit logs, role-based access, and export options. - Tech stack and trade-offs: Python (FastAPI/Django) for APIs, pandas/pyarrow for ETL, Postgres with JSONB for flexibility, Elasticsearch for search and fuzzy matching, Celery for pipelines,
$25 USD in 28 days
2.6
2.6

Hello, I am Vishal Maharaj, a seasoned professional with 20 years of expertise in Python, Django, Software Architecture, API Development, Data Management, Backend Development, Data Integration, Web Scraping, and Data Modeling. I have thoroughly reviewed the project requirements for designing the architecture of Pharmelis Registry and am confident in delivering a robust solution. I propose to design a system that seamlessly ingests diverse pharmaceutical data, resolves product identities globally, and includes a comprehensive internal dashboard for system management. My approach will focus on implementing a sophisticated identity resolution strategy, versatile data ingestion methods, a structured data model, and a scalable system architecture. I will leverage my skills in Python, data engineering, web scraping, API development, and dashboard creation to meet the project objectives effectively. Please initiate a chat to discuss this project further. Cheers, Vishal Maharaj
$20 USD in 40 days
2.6
2.6

What is the optimal approach to designing a system that can effectively resolve pharmaceutical data globally, across various languages and sources, while maintaining consistency and clarity? Hey, I understand the critical need for a robust system that can ingest diverse pharmaceutical data types and handle inconsistencies seamlessly. My strengths lie in: - Python expertise - Data engineering proficiency - Extensive experience in web scraping and API/backend development My execution approach involves creating a structured architecture that prioritizes: - Identity resolution strategies - Comprehensive ingestion methods - Core data model for efficient storage and computation - Seamless system architecture and data flow - Intuitive dashboard design for operational ease Let's discuss how I can contribute to designing the architecture of Pharmelis Registry. Best, Daniela
$23 USD in 40 days
0.0
0.0

I can help you design a robust, scalable architecture for Pharmelis Registry as a canonical global pharmaceutical database. This aligns closely with my experience modeling complex, regulated data domains and building systems that must remain authoritative, auditable, and highly available. I’ve led architecture for healthcare and life-science data platforms, including drug catalogs, formulary systems, and reference data hubs that integrate with external regulatory sources. This includes designing domain models, data lineage, and access patterns that support search, analytics, and API access while respecting compliance and governance. My approach would start with clarifying your core entities, relationships, and use cases, then defining an architecture blueprint covering data model, storage strategy, ingestion pipelines, APIs, and security. From there, we can outline a phased implementation roadmap compatible with your current stack and team. I would love to chat more about your project! Regards
$20 USD in 7 days
0.0
0.0

Hi, there! I’d love to help with building the Pharmelis Registry. Here’s my approach to the solution: Data Ingestion & Processing: I’ll design a flexible system that can ingest pharmaceutical data from a variety of sources (CSV, APIs, websites, PDFs, and images) while handling inconsistent, messy data. This will include web scraping and API integrations to gather data and normalize it. Identity Resolution: The core of the system will focus on resolving and reconciling product identities across different sources, ensuring that the data is mapped correctly to a global, consistent reference. This will involve sophisticated entity resolution techniques, leveraging Python-based solutions for data matching. Internal Dashboard: The dashboard will allow system operators to inspect ingested data, review product identities, and manage sources. It will provide monitoring tools to track data coverage and the integrity of resolved identities, ensuring full control over the system. Tech Stack: I recommend using Python for backend development (specifically Pandas for data manipulation, BeautifulSoup for web scraping, and Flask/Django for the web app), with PostgreSQL or MongoDB for the database. For the frontend, React can power the dashboard UI, ensuring a responsive, modern user experience. If you don't mind this, let 's discuss in chat. Best,
$20 USD in 40 days
0.0
0.0

I can design a robust architecture centered on **entity resolution and data normalization**, using Python-based pipelines to ingest and reconcile heterogeneous sources (CSV, APIs, PDFs, scraping). My Approach will be as below: * **Identity Resolution:** Hybrid strategy combining deterministic rules (active ingredient, strength, dosage form) with probabilistic matching (fuzzy search, embeddings) to unify products across languages and markets. * **Ingestion Layer:** Modular pipelines (Scrapy/APIs/OCR for PDFs/images) feeding into a staging layer for normalization. * **Data Model:** Separation of raw ingested data, normalized attributes, and resolved “canonical product entities” with lineage tracking. * **Architecture:** Scalable pipeline (Airflow + Python), search layer (Elasticsearch/OpenSearch), and API layer (FastAPI). * **Dashboard:** Web app (React or lightweight Python UI) for reviewing matches, managing sources, and monitoring coverage. I’ve worked with messy, multilingual datasets and built systems involving deduplication, search, and data reconciliation—ensuring traceability and accuracy. I’ll deliver a **clear, practical architecture document** with trade-offs and implementation guidance so your team can execute confidently. Let’s discuss your data sources and constraints to tailor this further.
$20 USD in 40 days
0.0
0.0

Paris, France
Payment method verified
Member since Jul 29, 2017
$5000-10000 USD
$15-25 USD / hour
$30-250 USD
$1000-2000 USD
$750-1500 USD
₹12500-37500 INR
$30-250 USD
$30-250 USD
₹37500-75000 INR
$250-750 USD
₹1500-12500 INR
₹750-1250 INR / hour
₹12500-37500 INR
₹400-750 INR / hour
₹1500-12500 INR
₹600-1500 INR
$10-30 USD
₹37500-75000 INR
₹75000-150000 INR
$750-1500 USD
₹12500-37500 INR
$30-250 NZD
€8-100 EUR / hour
₹12500-37500 INR
₹400-750 INR / hour