
Closed
Posted
Paid on delivery
I need a reusable ETL framework built inside Databricks notebooks, version-controlled in Bitbucket and promoted automatically through a Bitbucket Pipeline. All source data arrives via GraphQL APIs, so the job includes handling authentication, pagination, and schema inference before landing raw payloads in Delta tables. A dedicated cleaning stage must then standardise and validate the data before it moves on to the curated layer. The structure should be modular—ideally a bronze/silver/gold notebook hierarchy—so I can slot in new sources or extra transformations without touching the core logic. I also want a lightweight Python package (wheel) that wraps the GraphQL connector and can be attached to any cluster. Acceptance criteria • Parameter-driven notebooks organised by layer. • Reusable GraphQL connector packaged as a .whl. • Bitbucket Pipelines yaml that runs unit tests, uses the Databricks CLI to deploy notebooks, and executes an integration test on commit. • Clear README detailing how to add a new API endpoint and where to place cleaning logic. Leverage native tools—PySpark, SQL, Delta Lake, dbutils—while keeping external libraries to a minimum and fully documented. Please share a brief outline of your approach and any relevant Databricks + Bitbucket CI experience so we can move forward quickly.
Project ID: 40214692
114 proposals
Remote project
Active 6 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
114 freelancers are bidding on average $468 CAD for this job

Hi there, I’ve read your Databricks ETL CI framework needs and I’m confident I can deliver a clean, reusable solution that scales as you add sources. I’ve built modular bronze/silver/gold pipelines in Databricks, with Delta tables and clean separations between raw ingestion, cleaning, and curated layers. I’ll package a lightweight GraphQL Python connector as a .whl that plugs into any cluster, keep dependencies to a minimum, and document how to extend it for new endpoints. The notebooks will be parameter-driven by layer, so you can swap in new sources without touching core logic. I’ll configure a Bitbucket Pipeline to run unit tests, deploy notebooks via the Databricks CLI, and execute an integration test on each commit. A clear README will explain how to add a new API, where to place cleaning logic, and how to slot in new transformations. The approach is pragmatic: authenticate, paginate, and infer schema from GraphQL responses, land raw payloads into Delta bronze, standardise and validate in silver, then move to gold. I’ll ensure secure handling of credentials, robust error handling, and thorough logging throughout. Next steps would be a quick kickoff on environment setup and a sample endpoint. What is your preferred authentication method for GraphQL (token-based, OAuth, or both), and do you have any preferred schema inference strategy or custom data quality rules you want enforced during the cleaning stage? What are the expected data volumes and latency targets for each l
$750 CAD in 21 days
8.0
8.0

Drawing from over 13 years of experience in developing customized python applications, data mining, and extraction, I am well-equipped to tackle your Databricks ETL CI Framework project. With specific emphasis on Python's suitability for high-impact, result-driven solutions, I will leverage native tools such as PySpark, SQL and Delta Lake to create a reusable and modular ETL framework that specifically caters to your requirements. Moreover, I will provide in-depth documentation ensuring ease of future integration or adaptation without disturbing the core logic. As an accomplished coder, I have worked extensively with Bitbucket and possess a knack for creating effective CI/CD pipelines. My extensive knowledge of the Databricks CLI also ensures proficient deployment of notebooks. I plan to avail this expertise to closely align my work with your vision and optimize the project's stability by importing only essential libraries while minimizing any external dependency. In conclusion, by combining my proven skills in Python coding and a deep understanding of Databricks and Bitbucket platforms you can be assured that this project will be delivered strongly meeting all acceptance criteria while prioritizing long-term uses like easy extensibility and maintainability. Let us join forces for an efficient delivery!
$500 CAD in 3 days
7.0
7.0

Hello, I specialize in Databricks data pipelines and have built and customized large scale ETL systems that teams reuse every day. The main challenge here is keeping GraphQL data clean, flexible, and easy to promote without breaking existing flows. I am certified in Databricks and PySpark development, and I will solve this using bronze/silver/gold notebooks, Delta Lake, and a reusable GraphQL connector packed as a Python wheel. Bitbucket Pipelines will handle tests and auto deploy using Databricks CLI. A few questions: Do APIs change schema often? Should failed records be stored or dropped? Do you want full reloads or incremental pulls? Will multiple teams add sources in parallel? This will save setup time and keep data trusted. Best regards, Dev S.
$1,000 CAD in 14 days
6.4
6.4

As an AI solutions expert with over a decade of experience, I'm confident in my ability to tackle your Databricks ETL CI Framework project. My team and I have extensive skills in Python and SQL, leveraging native tools like PySpark, SQL, Delta Lake, and dbutils, which aligns perfectly with your specific requirements. We specialize in building modular frameworks and constructing lightweight packages, ensuring your system remains flexible for any future additions without compromising its core logic. Our track record speaks for itself: we've achieved a 98% project completion rate with top ratings. Our client-centric approach centers on transparent communication and tailored solutions; we aim to adapt our work to meet your unique needs and ensure that you are fully satisfied throughout the project's journey. Furthermore, our breadth of expertise encompasses both Databricks and Bitbucket CI. We understand the nitty-gritty of these tools and consistently stay updated with their latest functionalities. This allows us to seamlessly integrate your system into Bitbucket Pipelines and automate key tasks while executing effective version control via Bitbucket. At the end of the day, we're driven by passion and innovation, ready to help you build the future you envision. Let's get started on this exciting project together!
$700 CAD in 5 days
6.5
6.5

Hi, I will design and implement a reusable, modular ETL framework fully native to Databricks, structured around a clear bronze, silver, and gold notebook hierarchy. The solution will ingest data from GraphQL APIs, handling authentication, pagination, and schema inference before landing raw payloads into Delta Lake tables. Each layer will be parameter-driven, allowing new data sources or transformations to be added without modifying core framework logic. A lightweight, well-documented Python package will be delivered as a wheel to encapsulate the reusable GraphQL connector, making it easy to attach to any Databricks cluster and reuse across pipelines. Data cleaning and validation logic will be isolated in the silver layer to standardize schemas and enforce quality checks before promotion to curated datasets. CI/CD will be implemented using Bitbucket Pipelines, including automated unit tests, Databricks CLI–based notebook deployment, and an integration test executed on each commit. The final delivery will include a clear README explaining framework structure, how to onboard new GraphQL endpoints, and where to extend cleaning logic, ensuring long-term maintainability and scalability. Regards, Asif Al Balushi
$750 CAD in 5 days
5.5
5.5

Hi there, I am a Data Scientist and am a professional responsible for extracting actionable insights and knowledge from large volumes of data. As an experienced Data Scientist in the field of machine learning, I am highly proficient in Python and have a deep understanding of algorithms and data structures. My skills make me a great fit for your project as I can guide you through comprehensive coverage of data structures and algorithms while providing patient and thorough explanations. I have over 12-plus years of experience with Python Library Pandas, Karas, TensorFlow, NumPy, PyCharm, Py torch, Open CV, NLP, and others. With over a decade's worth of experience under my belt, including expertise in NLP, Neural Networks, CNNs, RNNs, LSTM, GANs just to mention a few, I can provide you not only with knowledge but also how to apply it efficiently. Partnering with me ensures you have a patient, knowledgeable and skilled tutor who is dedicated to your success in this field. My top priority is to provide a high quality of work, https://www.freelancer.com/u/GdevDataSceince Let's discuss this further via chat, and I'll start your project right now. Thanks Gdev
$250 CAD in 7 days
5.7
5.7

Hello client, I'm Denis Redzepovic, an experienced developer with expertise in PySpark, Python, SQL and ETL. I have worked extensively on diverse Python projects, ranging from backend development and automation to data processing and API integrations. My deep understanding of Python’s libraries and frameworks allows me to build efficient, scalable, and maintainable solutions. I pay close attention to code quality and performance to ensure your project runs flawlessly. With my solid experience, I’m confident I can deliver results that exceed your expectations. I focus on writing clean, maintainable, and scalable code because I know the difference between 99% and 100%. If you hire me, I’ll do my best until you’re completely satisfied with the result. Let’s discuss your project details so I can tailor the perfect Python solution for you. Thanks, Denis
$300 CAD in 7 days
5.6
5.6

Hello, I’m excited about the opportunity to contribute to your project. With strong experience building modular Databricks ETL frameworks and shipping them through Bitbucket Pipelines, I can deliver a parameter-driven bronze/silver/gold notebook structure plus a lightweight Python wheel that handles GraphQL auth, pagination, schema inference, and reliable landing into Delta tables. I’ll tailor the CI/CD flow so Bitbucket runs unit tests, deploys notebooks via the Databricks CLI, and triggers an integration run on commit, with a clean README that makes adding new endpoints and cleaning rules straightforward without touching core logic. You can expect clear communication, fast turnaround, and a well-documented, production-ready framework that fits seamlessly into your existing Databricks and Bitbucket workflow. Best regards, Juan
$500 CAD in 3 days
5.6
5.6

With my hands-on experience in Python and Full-stack development, I am confident in executing your ambitious Databricks ETL CI Framework project. I have a proven track record of delivering projects on time while maintaining a 100% job completion rate and receiving positive client reviews. My dynamic approach and availability for any emergency modifications will help us overcome any potential challenges efficiently. My work as a full-stack developer encompasses numerous technologies like WordPress, React, Svelte, Vue, Node.js with Express, Django, Laravel, etc. which will be fundamentally useful for building your desired modular structure that allows easy addition of new sources and transformations. My focus on performance and user experience aligns perfectly with your objective to keep external libraries minimal and well-documented while leveraging PySpark, SQL, Delta Lake, dbutils. Lastly, I want you to know that my proficiency extends beyond just technical capabilities. I always maintain close working relationships with my clients to ensure their needs are fully understood and met. You will receive more than just a reusable GraphQL connector packaged as a .whl or a Bitbucket Pipelines yaml; you will also gain the support of a dedicated professional providing clear documentation and instructions in the README file. Don't hesitate to choose me for this crucial project—I'm eager to create value for your data pipelines!
$500 CAD in 3 days
5.6
5.6

I’ve built modular, parameter-driven ETL pipelines in Databricks with layered bronze/silver/gold notebooks that make adding new sources seamless. For your GraphQL ingestion, I’ll develop a lightweight, reusable Python wheel handling auth, pagination, and schema inference, easily attachable to any cluster. The Bitbucket Pipeline will automate testing, notebook deployment via the Databricks CLI, and integration checks to ensure smooth CI/CD. I prioritize clear documentation and minimal dependencies while leveraging PySpark, Delta Lake, and dbutils for a clean, maintainable architecture. Looking forward for your positive response in the chatbox. Best Regards, Arbaz T
$600 CAD in 7 days
5.2
5.2

Python HUB, I can build a reusable, modular ETL framework in Databricks with a bronze/silver/gold notebook hierarchy, version-controlled in Bitbucket, and promoted via Pipelines. It will handle GraphQL sources, include a Python connector (.whl), and come with unit tests, integration tests, and clear documentation. Ready to start immediately.
$795 CAD in 1 day
5.3
5.3

Hello This is a well-structured requirement, and it aligns closely with how I typically design scalable data platforms in Databricks. I have built modular ETL frameworks using a bronze/silver/gold architecture, integrated API-based ingestion pipelines, and implemented CI/CD workflows through Bitbucket Pipelines with automated deployment to Databricks workspaces. For CI/CD, I would configure a Bitbucket Pipelines YAML file that: • Installs dependencies and builds the GraphQL connector wheel • Runs unit tests • Uses the Databricks CLI to deploy notebooks and artifacts • Executes an integration test job against a staging workspace • Promotes artifacts automatically on successful commits The repository would include a structured README explaining how to onboard a new GraphQL endpoint, how to configure parameters, and where to implement additional cleaning or transformation logic without breaking the framework. External libraries would be minimal and fully documented, leveraging native PySpark, Delta Lake, SQL, and dbutils wherever possible. I’ve previously worked on Databricks-based ingestion systems using API connectors (REST and GraphQL), Delta Lake pipelines, and Bitbucket-driven CI/CD, so I’m comfortable delivering both the technical framework and the deployment automation. Looking forward to your reply Regards.
$700 CAD in 2 days
5.2
5.2

✋ Hi there. I can build a modular Databricks ETL framework with reusable notebooks, automated Bitbucket CI/CD, and a packaged GraphQL connector for seamless ingestion, cleaning, and transformation of your data into Delta tables. ✔️ I have solid experience creating Databricks ETL pipelines, using PySpark, SQL, Delta Lake, and dbutils, combined with version-controlled deployment through Bitbucket Pipelines. In a recent project, I developed a bronze/silver/gold ETL hierarchy, packaged a reusable API connector as a Python wheel, and automated deployment and integration testing on commit. ✔️ For your framework, I will handle GraphQL authentication, pagination, and schema inference, landing raw data in Delta tables, followed by a cleaning stage for validation and standardization. Notebooks will be parameter-driven and modular, making it easy to add new sources or transformations without modifying core logic. ✔️ I will deliver the .whl connector, organized notebooks, Bitbucket Pipeline yaml for automated testing and deployment, and a clear README explaining how to extend the framework with new endpoints or cleaning logic. Everything will be documented and rely mainly on native Databricks tools. Let’s chat to review your current sources and CI preferences so we can plan the implementation. Best regards, Mykhaylo
$500 CAD in 7 days
5.0
5.0

Hi, I can build this Databricks ETL framework cleanly and production-ready. I’m a Machine Learning Engineer with 8+ years in production, designing bronze/silver/gold pipelines with PySpark and Delta Lake. On a recent GraphQL ingestion project, pagination and schema drift were the main challenges; I solved them with a reusable connector, schema inference with safeguards, and CI via Bitbucket Pipelines deploying notebooks through the Databricks CLI. I communicate clearly and deliver reliable systems.
$500 CAD in 7 days
5.0
5.0

I will design a modular ETL framework in Databricks notebooks with a bronze/silver/gold hierarchy, utilizing PySpark, SQL, and Delta Lake, and create a reusable GraphQL connector as a Python package, while implementing automated deployment and testing through Bitbucket Pipelines, meeting the specified acceptance criteria and adapting to the proposed budget. Waiting for your response in chat! Best Regards.
$525 CAD in 3 days
4.8
4.8

Hi there, I’m very interested in building your reusable Databricks ETL CI framework and I have professional experience as a freelancer delivering modular data pipelines with PySpark, Delta Lake, and automated CI/CD workflows. My approach would be to design a clean bronze/silver/gold notebook hierarchy with parameter-driven execution, where raw GraphQL payloads are ingested with proper authentication, pagination, and schema inference into Delta tables, followed by a dedicated cleaning/validation stage before publishing curated datasets. I can package the GraphQL connector into a lightweight Python wheel for easy cluster attachment, and set up Bitbucket Pipelines to run unit tests, deploy notebooks via the Databricks CLI, and trigger integration tests on each commit. I’ll also provide a clear README showing exactly how to add new API endpoints and extend transformations without touching core logic. Happy to discuss the details over DMs. With regards, Rojan Uprety
$365 CAD in 7 days
4.5
4.5

With my extensive background in software development and diverse skill set, I am confident in my ability to not only meet but exceed your expectations for this project. I understand the importance of a modularity for long-term flexibility and that's why I propose building your requested ETL framework with a bronze/silver/gold notebook hierarchy. This structure allows easy integration of future data sources and transformations without disrupting the core logic, saving you valuable time and effort. Finally, my proficiency in Python will allow me to provide you with a lightweight Python package (wheel). Built with versatility in mind, this package will enable efficient usage of the GraphQL connector across any cluster. My dedication to comprehensive documentation means that even someone unfamiliar with my work can easily integrate new API endpoints and perform cleaning operations efficiently using my solution. I look forward to working with you to create an ETL framework that reflects these values and fully meets your needs. Let's make your data work for you effortlessly!
$250 CAD in 7 days
6.2
6.2

Hello, I understand you need a reusable ETL framework built in Databricks notebooks, version-controlled in Bitbucket, and automated via a Bitbucket Pipeline. Here's how I can help: - Modular ETL Framework: I will structure the ETL pipeline using a bronze/silver/gold notebook hierarchy, allowing you to easily add new data sources or transformations without disrupting core logic. Each notebook layer will be parameter-driven for flexibility. - GraphQL Integration: I’ll handle authentication, pagination, and schema inference for the GraphQL APIs, ensuring raw payloads land in Delta tables. After that, I’ll implement a dedicated cleaning stage to standardize and validate the data before moving it to the curated layer. - Reusable GraphQL Connector: I will package the GraphQL connector as a lightweight Python wheel (.whl), which can be attached to any Databricks cluster. - CI/CD with Bitbucket Pipelines: I’ll configure Bitbucket Pipelines to run unit tests, deploy notebooks using the Databricks CLI, and execute integration tests on each commit. - Documentation: A clear README will outline how to add new API endpoints and where to place cleaning logic. With my experience in Databricks, PySpark, and Bitbucket CI, I can deliver an efficient, scalable solution. Let’s get started! Best regards, Munib S.
$500 CAD in 7 days
4.5
4.5

As a seasoned programmer with a wealth of experience in data analytics and computation, I am uniquely positioned to tackle the demands of your project. My expertise in Databricks, Bitbucket, and CI deployment is extensive, and I have successfully built numerous ETL frameworks in similarly complex scenarios. The bronze/silver/gold hierarchy you seek can be implemented effectively by leveraging my skills in Python, SQL, PySpark and Delta Lake. Additionally, my proficiency extends to GraphQL APIs and their attendant considerations such as authentication, pagination, and schema inference. I have developed a lightweight Python package before that acts as a wrapper for connectors such as these and can gracefully accommodate new sources or additional transformations without changing core logic; making it an ideal fit for your needs.
$250 CAD in 7 days
4.4
4.4

Hello, there! I have the breadth and depth of experience needed to execute this project successfully. My primary focus over the years has been on building robust and scalable applications; this includes tailored web and mobile applications for enterprises similar to the design structure you mentioned. This makes me particularly well-suited for your ETL CI Framework project. In terms of technical aptitude, I have extensive experience with Python, Databricks, Bitbucket, and deploying full CI/CD environments. Despite being skilled in diverse languages such as .NET Framework, TypeScript, and JavaScript among others, I can assure you that native tools will be the mainstay of your project. I completely agree with your desire to minimize dependence on external libraries wherever possible and foster a clean written environment- PySpark, SQL, Delta Lake, dbutils would be the core of the framework. In conclusion, by combining my extensive practical knowledge surrounding similar data-driven projects with my deep understanding of Databricks and Bitbucket CI methodologies, I am poised to provide top-notch results while adhering to all your acceptance criteria. Together we can build an ETL Framework that optimally utilizes the power of native tools like PySpark while remaining simple yet robust! Let's collaborate to build sophisticated software that empowers your operations!
$500 CAD in 7 days
4.6
4.6

kanata, Canada
Payment method verified
Member since Aug 28, 2013
$30-250 CAD
$10-30 CAD
$30-250 CAD
$10-30 CAD
$10-30 CAD
$30-250 USD
₹12500-37500 INR
$750-1500 USD
£5-10 GBP / hour
$750-1500 USD
£100 GBP
$25-50 AUD / hour
$30-250 USD
$1500-3000 USD
$750-1500 USD
€6-12 EUR / hour
$200-300 USD
$25-50 USD / hour
₹145000-155000 INR
₹1500-12500 INR
₹100-400 INR / hour
₹12500-37500 INR
min $50 USD / hour
₹12500-37500 INR
₹12500-37500 INR