How To Set Up Zeppelin For Analytics And Visualization
In this article, you learn how to create and configure a Zeppelin instance on an EC2, and about notebook storage on S3, and SSH access.
...Livreur, Chaussures pour une Infirmière). D. Système de Rémunération (Data-to-Cash) Gain : 1,00 par facture / 1,50 par contrat soumis. But : Rémunérer l'utilisateur pour alimenter la base de données. 3. Le Volet B2B : Altro Insight (La Plateforme Data) Objectif : Une usine automatisée de vente de données (SaaS + E-commerce). A. Le Moteur de Traitement (L'Usine Interne) Nettoyage Automatisé (ETL) : Chaque donnée entrante est nettoyée, standardisée et anonymisée avant d'être stockée. Organisation : Les données sont classées par tags : Tranche d'âge, Genre, Métier, Région, Salaire, Employeur. B. Le Mod&egrav...
...Microsoft Dynamics 365. • Une expérience en environnement Sales, Marketing ou Customer Insights est un plus. Compétences clés • Excellente maîtrise de Microsoft Dynamics 365 CRM (Sales, Customer Insights, Marketing, Service). • Connaissances solides des environnements Power Platform (Power BI, Power Automate, Dataverse). • Bonne compréhension des architectures d’intégration (API, Azure, middleware, ETL). • Méthodologie projet : Agile / Waterfall hybride, gestion de backlog et gouvernance. • Compétences rédactionnelles et sens du reporting. • Une expérience de Microsoft Dynamics 365 F&O ou Business Central, ainsi que finance / achats est un plus Soft Skills • Le...
...Fonctionnelles : • Connaissance approfondie des processus industriels (supply chain, production, logistique, maintenance) • Maîtrise des méthodologies AMOA / gestion de projet (cycle en V, Agile, Prince2, PMP…) • Capacité à modéliser des processus (BPMN, UML…) Techniques : • Connaissance des ERP industriels (SAP, Sage X3, Microsoft, etc.) • Notions de base en architecture SI et interfaces (API, ETL, base de données) • Bonne compréhension des outils MES, Ordonnancement, GMAO ou PLM Transverses : • Excellentes capacités d’analyse et de synthèse • Aisance relationnelle, écoute, pédagogie, sens de l’organisation et autonomie • Rigueur dans...
Base de données Oracle, ETL Pentaho Présentiel sur Beaune (21) + Télétravail
...formation BAC +5, vous justifiez d'une expérience de > 5 ans sur des projets SI avec de forts impacts métier. Vous justifiez d'une expérience significative en intégration de données complexes, depuis les phases de recueil de besoin, jusqu’à la conception et la réalisation. Vous avez une expérience sur les langages de la data : Python, Java, SQL, Elasticsearch ainsi que sur les outils de collectes ETL (Talend, Datastage, Informatica...) et de traitement de flux de données : Kinesis Kafka, Spark... Vous avez des connaissances dans les solutions Cloud : AWS, Azure, Google et des outils de déploiement d’infrastructure du type « Terraform » et de gestion de version type GitLa...
Bonjour RamziTRA, j'ai remarqué votre profil et je voudrais vous poser une question : Quels sont vos skills sur TALEND (ETL) nous sommes mon associé et moi à la recherche d'une compétence TALEND pour le besoin d'un projet. Merci d'avance pour votre retour.
Hello, this is a short job for translation. Need very good english quality! Here is the text: Produits les plus populaires en stock. Nos LED sont disponibles en blanc chaud, blanc neutre et blanc froid. Tous nos produits sont certifiés cUL ou ETL .. Voir schémas de raccordement Un rêve, une passion... LedDesign c'est le fruit de 10 années d'expérience en contrôle de la qualité et en importation internationale. Nous sommes fier de vous présenter des produits de qualités, fiables et solides. De la fabrication, à la certification, jusqu'au tests qualités notre équipe est l'à ! Expédition au Canada en 24 à 48h Nous savons que chaque projet et pr...
I have several disparate sources holding our inventory information and I want it all pulled together into one clean, well-structured Microsoft SQL Server database file. The job is straightforward in principle: extract every piece of inventory data you can reach, reconcile d...complete when I can: • restore or run the script on my Microsoft SQL Server instance without errors, • query a consolidated Items table and see one row per unique SKU, • join stock levels to their warehouse locations with no orphan records, and • export a sample CSV of current inventory that matches today’s live figures. T-SQL proficiency is essential; SSIS, Azure Data Studio, or any other ETL tooling you prefer is fine as long as the final deliverable is the ready-to-go SQL Se...
Estoy por arrancar una migración con Informatica PowerCenter y necesito apoyo directo en la construcción de los procesos ETL. El conjunto de datos con el que trabajaremos está en Oracle y lo extraeremos desde una base de datos SQL; el objetivo es una migración limpia, controlada y auditable hacia el nuevo esquema. Qué espero de ti • Diseñar y desarrollar mappings, sesiones y workflows en PowerCenter que cubran todo el flujo de extracción, transformación y carga. • Incluir manejo de errores, validaciones de calidad y registro detallado de eventos. • Documentar cada mapping en el repositorio y entregar un pequeño manual de despliegue/rollback. Aceptaremos el trabajo cuando: - Todos los procesos s...
...through Power BI, so once the model is validated I’ll ask you to craft intuitive dashboards that highlight drivers, confidence ranges and any red-flag anomalies the model detects. Solid statistical grounding is essential; I want clear explanations of feature importance, assumptions and limitations that business stakeholders can grasp quickly. Big-data exposure, cloud familiarity (Azure, AWS or GCP), ETL pipeline design and MLOps practices are all welcome extras—you’ll have room to propose improvements if they make the solution more robust or scalable. Deliverables I need from you: • A well-documented predictive model with reproducible code and clear version control • Cleaned and transformed datasets stored back into SQL (or a recommended alternativ...
...social protection, and legal frameworks. Below are the positions and their respective qualifications: 1. Senior Project Manager - PMP or PRINCE2 certification - 10 years of experience managing complex projects in IT/e-government 2. Principal IT Architect - Degree in Computer Engineering or equivalent, TOGAF certification - 10 years of experience as an enterprise architect with expertise in API, ETL, cloud technologies 3. Business Architects/Analysts - Degree or Master’s in Computer Security, Data Science, or Computer Science - Expertise in implementing Digital Public Infrastructure (DPI) approaches 4. Cybersecurity and Data Security Expert - Degree or Master’s in Cybersecurity or IT Governance - 10 years of experience with CSIRT/SOC, PKI, IAM 5. Data Management E...
...recorded purchase prices Notifications or alerts for discrepancies or missing data Secure storage of credentials (where needed) and compliance with data protection/privacy guidelines Easy scalability as the number of suppliers increases Required skills/experience: API or data source integration (REST/SOAP, CSV/JSON/XML imports; web scraping only when permissible) Experience with data pipelines, ETL processes, scheduling (Cron, CI/CD tools, Zapier/Integromat or similar) Knowledge of price and stock data reconciliation, logging, error handling Handling secure authentication, token management, secrets management Ability to implement discrepancy alerts (email/Slack/chat alerts) and provide a simple dashboard or reporting interface What we offer: Clear list of suppliers with access ...
...auditability, integrity, and security 3. HIGH-LEVEL CLOUD ARCHITECTURE Core components: Network Layer AWS VPC (multi-AZ) Private subnets per regulated institution Central supervisory subnet Data Layer S3 Data Lake (Raw / Processed / Curated) Redshift / Aurora (analytics storage) Object Lock for integrity Compute & Processing Lambda (validation, rules engine) EC2 (stress testing engines, Monte Carlo) Glue ETL (transform pipelines) Step Functions (workflow orchestration) Streaming & APIs API Gateway (data submissions) Kinesis (real-time data ingestion) AI / ML SageMaker (fraud detection, early warning models) Neptune (graph AML network analytics) Redshift ML (ratio prediction) Monitoring & Security IAM / RBAC KMS encryption CloudTrail / GuardDuty CloudWatch Visualizati...
I need a reusable ETL framework built inside Databricks notebooks, version-controlled in Bitbucket and promoted automatically through a Bitbucket Pipeline. All source data arrives via GraphQL APIs, so the job includes handling authentication, pagination, and schema inference before landing raw payloads in Delta tables. A dedicated cleaning stage must then standardise and validate the data before it moves on to the curated layer. The structure should be modular—ideally a bronze/silver/gold notebook hierarchy—so I can slot in new sources or extra transformations without touching the core logic. I also want a lightweight Python package (wheel) that wraps the GraphQL connector and can be attached to any cluster. Acceptance criteria • Parameter-driven notebooks organ...
Job Title: NestJS Backend Developer – High-Performance Car Bulk Import (ETL) The Challenge We are looking for a senior-level NestJS developer to build a robust, production-ready car data import engine for our vehicle marketplace. This isn't a simple "upload and save" task; we require a sophisticated streaming pipeline capable of processing massive datasets (CSV, XML, JSON) with minimal memory footprint and high reliability. Core Task Build a POST /imports/cars endpoint that: Automatic Format Detection: Handles multipart/form-data and identifies the file type (CSV, XML, or JSON) programmatically. Stream-Based Processing: Processes data using Node.js Streams / AsyncIterables. The application should never load the full file into RAM. Data Pipeline: Implements a...
- Designed automated ETL routines to standardize disparate source formats, reducing manual reconciliation time by 40%. Performed root‑cause analysis and cohort studies to identify process bottlenecks and cost drivers. Built enterprise dashboards with row‑level security, incremental refresh, and performance tuning; improved executive visibility into KPIs. Implemented and supported ERP reporting modules, mapped master data across modules, and led data migration and validation during upgrades. - Established validation rules, lineage documentation, and data quality KPIs to maintain trust in analytics. - Replaced manual spreadsheets with parameterized Power BI reports and scheduled dataflows, saving recurring effort and reducing errors. - Performance Metrics: Delivered dashboards that i...
...always one step away. I’m comfortable with tools such as Python, Pandas, LangChain, Node, SQL, Power BI, Tableau, or any similar stack you can justify. Key deliverables • Deployed WhatsApp agent(s) connected through the WhatsApp Business API - WhatsApp channel is ready. • Retrieval-augmented knowledge base so the bots surface the latest information without hallucinations .. Critical • Automated ETL jobs (n8n, Airflow, or your suggested alternative) feeding a structured data store • Reusable analysis scripts/notebooks with documented logic • Interactive dashboards and geo-visualisations accessible via web link • Deployment guide and a brief hand-off walkthrough Please respond ASAP with links to past conversational AI or data-analyti...
... deployment, and monitoring of integration solutions Required Skills Strong hands-on experience with SnapLogic Integration Platform Experience with REST/SOAP APIs and Web Services Knowledge of JSON, XML, and data transformation techniques Experience working with databases such as SQL Server, MySQL, or Oracle Understanding of cloud platforms (AWS, Azure, or GCP is a plus) Familiarity with ETL/ELT concepts and integration patterns Strong debugging and problem-solving skills Preferred Qualifications Experience with CI/CD and version control tools Exposure to enterprise applications like Salesforce, NetSuite, SAP, or similar Good communication and documentation skills Ability to work independently and in a collaborative team environment Engagement Details Long-term opp...
...and loads it downstream for ETL processing. The transfer works, but network-level performance is far below what I need. Here is what I’m looking for: • Diagnose the current ADO.NET-to-Oracle connection, identify any network, buffer, or packet-size bottlenecks, and benchmark the baseline throughput. • Tune the SSIS data flow (buffers, rows per batch, commit size, async settings, etc.) and, if necessary, adjust the Oracle driver or provider configuration. • Provide an updated package or detailed change list so I can reproduce the performance gains in other environments. • Produce a concise report summarizing findings, before-and-after metrics, and next-step recommendations. Source: another Oracle database accessed via ADO.NET. Goal: reliable, hi...
Job Description We are looking for an experienced Azure Data Engineer / Data Integration Specialist to design and implement a robust, scalable data pipeline that pulls data from the Blackbaud API and loads it into an Azure SQL Database using Azure Data Factory (ADF). The goal is to have a fully automated, secure, and monitored ETL pipeline that runs on a scheduled basis and supports future scaling. Project Scope 1. Data Ingestion Connect to Blackbaud REST APIs (OAuth authentication) Handle pagination, rate limits, and API throttling Extract multiple endpoints (e.g., constituents, gifts, transactions, etc.) 2. Data Transformation Clean, normalize, and structure raw API JSON Handle nulls, schema drift, and data type conversions Add audit fields (load date, source system, batch id) 3. ...
Top Ranked Requirements 1. Adobe/AEM Architect: Expert-level structuring of Adobe Analytics (segments, dimensions, templates). This is the "heavy lift" of the role. 2. The "Storyteller" (Strategy Bridge): 5+ years of experience turning data into strategic recommendations for Creative and Strategy teams. 3. Technical Data Handling: Hands-on proficiency with SQL/Python for ETL and data cleaning, and BI tools (Domo, Tableau, Looker) for automation. 4. Governance Lead: The ability to create "durable" frameworks—naming conventions, tagging plans, and intake processes—that ensure data stays clean year-over-year. Core Responsibilities (The "What") • Design & Implement: Build measurement frameworks and "always-on" tr...
### Overview We are looking for **Python engineers focused on web scraping, data extraction, and data cleaning**. This is **NOT** a large system or customer-facing role. The work consists of: * Small, clearly scoped Python scripts * Web scraping (HTML, PDFs, APIs) * Data cleaning and transformation * ETL-style utilities All work is: * Async-first * Internal tools only * Clearly scoped with written requirements This is **ongoing contract work**. Strong performers may receive long-term work. --- ### What You’ll Be Doing * Build Python scripts to scrape public websites * Parse HTML, JSON, CSV, and PDF files * Clean and normalize messy real-world data * Write clear, maintainable utility scripts * Deliver working code (not just prototypes) --- ### Required Skills * Str...
...español de forma fluida. Por favor, no apliques si no cumples este requisito, ya que la comunicación constante con el equipo es vital. Buscamos un experto freelance en Power BI y n8n para una colaboración a largo plazo. Tenemos múltiples proyectos de automatización en cola, pero comenzaremos con un Dashboard de Productividad operativo. Buscamos un perfil técnico que domine la integración de datos (ETL) y la visualización de alto nivel. El Primer Proyecto: Dashboard de Productividad El objetivo es añadir una sección a un reporte existente para calcular la productividad por operario. Fuentes de datos: Odoo 17 Enterprise: Extracción vía API de ausencias (vacaciones, bajas), festivos y listado de...
...to extract and aggregate content from my website, including blogs, podcasts, YouTube videos, books, and articles. The goal is to populate a structured spreadsheet (Excel or Google Sheets) with this data, making it easy for AI tools to analyze themes, trends, summaries, etc. Background on the Process: This is essentially a data extraction or content aggregation task (also known as web scraping or ETL: Extract, Transform, Load). It involves systematically collecting unstructured content from my site and organizing it into a spreadsheet. Each entry should include metadata like title, URL, summaries, tags, and more. I have a template spreadsheet ("new come and reason ") with columns such as: id type (e.g., blog, podcast, video, book, article) title date_published source_na...
...for me. Each paper must be: • 8 pages (≈ 5 000 words) • fully formatted in IEEE style and delivered as clean, compilable LaTeX • accompanied by the final PDF Scope for the three papers 1. AWS services and cloud-native architecture—cover core building blocks such as VPC, IAM, S3, Lambda, and how they interoperate in production-grade designs. 2. Data Engineering methodologies—drill into ETL processes and modern data-pipeline patterns on AWS (Glue, Step Functions, Kinesis, etc.) with diagrams and code snippets where helpful. 3. AI/ML algorithms and applications—tie in SageMaker, feature engineering, and model deployment on AWS, illustrating at least one end-to-end use case. Purpose These are personal papers for my own portfolio, ...
...search stack that starts with an ETL flow pulling exclusively from our internal PostgreSQL databases. The pipeline must ingest and transform 38 000+ B2B category records and 5 000–10 000 company profiles, then run cleaning, vectorization, and enrichment steps so every record is categorized and stored in a pgvector-enabled schema. Once the data is in place, a separate microservice should expose a REST API that supports hybrid search: dense vectors (OpenAI text-embedding-3-small) combined with BM25 and blended with RRF scoring. Results have to work equally well in Hungarian and English; huspacy, spaCy, and Open AI are the preferred tools for language handling and any fallback generation. I expect the codebase in Python 3.10+, organised as two deployable units: • ET...
...surface company or product metadata Verification rules must anchor to two primary sources: the official manufacturer websites themselves and relevant government databases. When information conflicts, the system should automatically weigh each source and assign a confidence score, storing the rationale so we can trace every decision later. What I expect you to deliver • Clean, well-documented ETL scripts (Python, SQL or comparable) that ingest, normalise and enrich my current tables • A modular rules engine where I can tweak source priority, matching logic and thresholds without touching core code • A confidence-scoring function that explains how each record was resolved, including the exact URLs or API records consulted • Logging and error-handling...
KEY RESPONSIBILITIES - Build an AI chatbot infrastructure on Amazon Bedrock using Anthropic Claude - Develop Knowledge Base infrastructure and ETL pipelines - Implement security controls: VPC isolation, PrivateLink endpoints, KMS encryption - Configure Bedrock Guardrails (content filters, PII masking, threat detection) - Set up monitoring and logging systems REQUIRED SKILLS - AWS: Bedrock, VPC/PrivateLink, S3, OpenSearch, IAM, KMS, Lambda, CloudTrail - Security: Entra ID/IAM Identity Center SSO, OIDC, encryption, network isolation, SIEM - Data Engineering: ETL pipelines, API integrations, data classification - IaC: Terraform or CloudFormation - Monitoring: CloudWatch, log analysis, KQL queries - ISO 27001, GDPR, data privacy principles - AI security (prompt injection, guardr...
I have direct access to our e-commerce sale...connections or scheduled refresh against the database. I’ll provide the connection string and a sample export; you’ll propose the data model, set up the transformations, and surface the insights in clean, intuitive visuals. Exact metrics and calculated fields can be finalised together once the framework is in place. Deliverables • Secure connection to the sales database, including any necessary ETL or query optimisation • Fully-interactive dashboard with filter panels, drill-through views, and export options • Clear hand-off documentation covering data model, refresh schedules, and how to extend the report Please outline the tool you prefer, the timeline you need, and one example of a similar dashboar...
...modernizációjának támogatása - Adatmigrációs és adatátalakítási folyamatok implementálása és felügyelete - SQL-alapú és ETL jellegű megoldások fejlesztése: PostgreSQL/MSSQL környezetben - Migrációs validációs, adatminőségi és kontroll mechanizmusok kialakítása reconciliation, ellenőrző riportok, idempotens betöltések - Technikai döntések előkészítése: batch vs. incremental megközelítés reprocessing, rollback, cut-over stratégiák - Szoros együttműködés üzleti és IT oldali stakeholderekke...
Project Description We need a person to work on ETL development, Informatica IDMC) Work Type Full-time or Part-time 3–5 days per week Around 4 hours per day Remote work Required Skills Informatica PowerCenter & IDMC ETL development SQL Payment Payment based on work More pay if performance is very good
Project Title Data Integration & Visualization Specialist Project Description We need a person to work on ETL development, Informatica IDMC) Work Type Full-time or Part-time 3–5 days per week Around 4 hours per day Remote work Required Skills Informatica PowerCenter & IDMC ETL development SQL Payment Payment based on work More pay if performance is very good
...indexes) * Import setup from CSV/JSON (including ID strategy and validation) * 10–20 Cypher queries covering typical usage patterns (lookup, filtering, “context retrieval”) * Short documentation + recommended next structure Skills / Requirements Must-have * Neo4j + Cypher (strong hands-on experience) * Graph data modeling (ontology / domain modeling; clean labels, relationships, and boundaries) * ETL / ingestion from CSV/JSON (ID strategy, validation, deduplication) * Performance & quality fundamentals: constraints, indexing, and query profiling/optimization using EXPLAIN / PROFILE Nice-to-have * Python for scripting and transformations (import utilities, data cleanup, automation) * LLM / RAG integration (chunking strategies, metadata design, retrieval ...
I need my separate databases to behave as one reliable source of truth. The job starts with assessing the current schemas and finishes when a single, well-documented repository is live, fully populated, and syncing automatically. Core tasks include mapping tables and fields, building the ETL pipelines, migrating historical records, validating row-level accuracy, and putting monitoring in place so future updates flow without manual intervention. I am open to the tech stack—whether you prefer native SQL scripts, Python with Pandas and SQLAlchemy, Talend, Airflow, or another proven toolset—as long as the choice is justified and scalable. Please attach a detailed project proposal that walks through your approach, milestones, estimated timeline, and any comparable integ...
...search stack that starts with an ETL flow pulling exclusively from our internal PostgreSQL databases. The pipeline must ingest and transform 38 000+ B2B category records and 5 000–10 000 company profiles, then run cleaning, vectorization, and enrichment steps so every record is categorized and stored in a pgvector-enabled schema. Once the data is in place, a separate microservice should expose a REST API that supports hybrid search: dense vectors (OpenAI text-embedding-3-small) combined with BM25 and blended with RRF scoring. Results have to work equally well in Hungarian and English; huspacy, spaCy, and Open AI are the preferred tools for language handling and any fallback generation. I expect the codebase in Python 3.10+, organised as two deployable units: • ET...
Project Title: Data Integration & Visualization Specialist for ETL, IDMC, Informatica & Qlik Project Description: We are seeking an experienced Data Integration & Visualization Specialist for a project involving ETL development, Informatica, IDMC, and Qlik. The goal of the project is to design, implement, and maintain data pipelines and dashboards for seamless data processing and reporting. Responsibilities: Design and develop ETL pipelines for data extraction, transformation, and loading. Work with Informatica PowerCenter and IDMC to manage and optimize data workflows. Develop and maintain Qlik dashboards and reports for data visualization. Ensure data quality, accuracy, and consistency across systems. Collaborate with project stakeholders to meet r...
Platinum by ETL; Combining our experience in the crypto/digital asset space, as well as ETL global’s expertise in legal, accountancy & tax matters, we are looking to target HNW/UHNW individuals with significant assets in cryptocurrency. The goal is to assist them in incorporating digital assets into their day-to-day accounting & legal matters, with an additional focus on how these assets can be passed along to heirs as part of a trust/inheritance set-up. Core product: “The Vault”- This product will be a secure multi-sig/ MPC wallet (Hosted by DFNS). Core functionalities: Frontend design to be high-class, slick, professional. (Please see below a link to the website which is in development: , for you to get an idea of the design we have gone for)...
...precio del suelo en la zona. El objetivo es que el sistema procese estos datos, genere un valor estimado confiable y lo presente en informes PDF descargables, listos para compartir con inversionistas o clientes internos. Necesito: 1. Un modelo de valoración claro y documentado (puede ser un algoritmo estadístico o machine learning, siempre que justifique la precisión). 2. Un pequeño flujo ETL para cargar y depurar las tres fuentes de datos. 3. Plantilla de informe PDF con gráficos y métricas principales ya embebidos. 4. Guía de uso e instalación para que el motor pueda ejecutarse en mi propio entorno (preferentemente Python con librerías comunes como pandas, scikit-learn y reportlab, pero abierto a suger...
...data operations tasks. Any experience with building AI agents into the workflows or experience with N8N that is is nice to have add on. We are looking exclusively for people in Egypt. We agree weekly on 10 hour work packages and will asses the work quality and deliverables also on weekly basis prior to payout. skills & experience required: - Proven experience designing and implementing robust ETL pipelines for large-scale, heterogeneous data sources - Strong proficiency in building and maintaining custom web crawlers and data scrapers using tools like power automate desktop, Javascript or similar frameworks - Expertise in handling unstructured and semi-structured data (e.g., JSON, APIs, flat files) - Familiarity with SQL, excel, power bi - Strong problem-solving skills an...
We are a fast-growing technology company building intelligent, AI-driven products that solve real-world business problems. Our focus is on automation, scalable systems, and p...workflows. You will work closely with product, engineering, and business stakeholders to develop reliable, production-grade AI solutions—not just experiments. What You’ll Do Design and implement AI-powered features and automation workflows Build and integrate LLM-based applications (OpenAI, Hugging Face, etc.) Develop scalable backend services and APIs Work with structured and unstructured data (ETL pipelines, embeddings, vector databases) Optimize performance, reliability, and cost of AI systems Collaborate on system architecture and technical decision-making Take ownership of features from con...
...similar), ingest the relevant tables, and then handle missing values, outliers, type casting, and duplicate detection. Once cleaned, the data should be written back to a new table in the same database and optionally exported to CSV so that downstream teams can verify the results in Excel if they choose. Key deliverables – Python script (Pandas, NumPy, SQLAlchemy preferred) that performs the full ETL/cleaning routine – Re-usable functions or class-based structure so future data drops can be processed with one command – Clear inline comments plus a short README explaining installation, execution, and configurable parameters Acceptance criteria 1. Script connects to the database with credentials provided at run-time or via .env file 2. All missing o...
I’m looking for a Data Engineer with strong AWS native services experience to help build and support an event-driven data platform. This project focuses on automated batch data pipelines, data governance, and making data available in a secure and scalable way. This is not ad-hoc ETL — it’s a platform-style setup. Tech stack involved: • AWS: S3, SQS, Lambda, MWAA (Airflow), EMR Serverless • Data Processing: PySpark, Apache Spark • Data Lake: Apache Iceberg, AWS Glue Catalog • Governance & Security: Lake Formation, IAM, KMS • Querying: Amazon Athena
...analysts, engineers, or BI specialists with strong communication and writing skills to contribute technical articles to our blog. Your mission? Share hands-on experience, real-world examples, and practical tips to help fellow data professionals work smarter. Who are we? ClicData is an all-in-one data management and business intelligence platform (SaaS), offering data connectivity, warehousing, ETL, data visualization, and automation. Our audience includes data professionals and data-savvy business leaders, primarily in mid-market companies across North America. Why write with us? - Were looking for long-term collaborators not one-off gigs. That means predictable, recurring income for you. - You'll be credited as the author of each piece you write your expertise will be sho...
...domain to design, build, and maintain scalable data pipelines and reporting solutions. The ideal candidate will have hands-on experience across AWS and Microsoft Azure, strong Python/PySpark skills, and the ability to support integrated reporting and analytics using Power BI. Key Responsibilities Design, develop, and maintain end-to-end data pipelines for healthcare payer data Build and optimize ETL/ELT workflows using AWS Glue, Step Functions, and Python Work with Azure and AWS cloud services for data ingestion, processing, and storage Implement and manage Data Lake architecture (structured & unstructured data) Ensure high data quality, reliability, and performance across pipelines Support integrated reporting and analytics use cases Collaborate with business, analyt...
...domain to design, build, and maintain scalable data pipelines and reporting solutions. The ideal candidate will have hands-on experience across AWS and Microsoft Azure, strong Python/PySpark skills, and the ability to support integrated reporting and analytics using Power BI. Key Responsibilities Design, develop, and maintain end-to-end data pipelines for healthcare payer data Build and optimize ETL/ELT workflows using AWS Glue, Step Functions, and Python Work with Azure and AWS cloud services for data ingestion, processing, and storage Implement and manage Data Lake architecture (structured & unstructured data) Ensure high data quality, reliability, and performance across pipelines Support integrated reporting and analytics use cases Collaborate with business, analyt...
...appropriate for a small team (CloudWatch-based is fine) Governance & Access Configure permissions and table governance properly Maintain clear lineage from raw to curated Handoff Provide clear documentation: architecture diagram, runbook, and how to extend the pipeline Ideal Candidate (Must Have) Strong hands-on experience with S3, Athena, Glue, Python Experience building production ELT/ETL pipelines (not just ad hoc scripts) Solid understanding of data design (partitions, Parquet, table formats, cost/performance) Comfort with SQL and data modeling for analytics-ready datasets Ability to communicate tradeoffs clearly and propose a clean architecture Clear updates, short design review upfront, then implementation sprints After reviewing this proposal, I will proce...
...various sources into this new store. Think full-stack ETL: extract from APIs or flat files, transform for consistency and quality, then load into the MySQL tables on a defined schedule. I’m open to tools you’re comfortable with—Python scripts, Airflow, Fivetran, or a light-weight custom solution—as long as the result is reliable, monitorable, and easy to extend. Security, backup routines, and a concise hand-off document are part of the brief. If you’ve architected MySQL environments for analytics workloads before and can demonstrate tight, well-tested data pipelines, I’d like to see how you’d approach this. Deliverables (acceptance criteria): • Optimised MySQL schema tailored for analytics data • Automated, version-control...
...star or snowflake model that makes sense for reporting, then build the ETL logic to keep it populated and up to date. Our source of truth is MS SQL Server, and the semantic layer will live in Power BI; if you’re comfortable incorporating other sources such as flat files or APIs later, that flexibility will be a plus. Once the data model is solid, craft interactive Power BI dashboards that spotlight Sales performance—pipeline health, revenue trends, win-loss ratios, and regional drill-downs. I expect polished visual standards, sensible DAX, and refresh cycles that won’t keep users waiting. Deliverables I need to sign off on: • SQL scripts for all new or modified database objects and the dimensional model • Automated ETL process (T-SQL, SSIS, ...
Role: QA Engineer / QA Consultant · Engagement Type: Consultant basis (Temporary) · Duration: 2 months (extendable up to 3 months if needed) · Location – Work from office Experience & Skill Requirements: · Minimum 3 years of experience · Hands-on QA experience in Data Engineering projects (mandatory) · Experience validating data pipelines, ETL processes, data quality, and related workflows · Automation experience is good to have, but strong Data Engineering QA exposure is the key requirement
(This is not a real Project, If hired you will be training an individual in completing the follow...monitoring of pipeline health and bullet-proof error handling/recovery. Our stack is PostgreSQL 14 on Windows; feel free to suggest optimisations such as table-partitioning, COPY-based bulk inserts, or an intermediate message-queue if it helps sustain throughput. Deliverables • Updated C# modules that control the sensors/PCBs with automatic reconnect and status callbacks • A PostgreSQL-centric ETL pipeline covering transformation, live monitoring metrics and graceful retries • Clear set-up notes and concise inline code comments so the in-house team can maintain the solution If you’ve previously juggled hardware control and big SQL flows in the same project...
In this article, you learn how to create and configure a Zeppelin instance on an EC2, and about notebook storage on S3, and SSH access.