
Fermé
Publié
Payé lors de la livraison
I need a senior Python developer to create a fully local, offline automation pipeline for an English tutor business. Goal: Automate processing of lesson audio recordings: .m4a (phone recording) → transcription (.txt + .json with timestamps) → diarization (speaker labels) → aligned transcript → lesson report via local Ollama → one-page PDF report → aggregated progress reports (monthly, half-year, yearly) Hard requirements: - Windows 11 + RTX 5070 GPU (compute capability sm_120) - Transcription must use GPU (faster-whisper or WhisperX) - Diarization must run on CPU only ([login to view URL] GPU is unreliable on this GPU) → If pyannote fails → fallback to simple VAD + clustering (label as “approximate”) - All processing local (no cloud APIs except local Ollama instance) - Reports in perfect English, parent-friendly, no sensitive data - Tutor name: William (WG English School) - Students age 14–18 Required outputs per lesson: - transcripts/<lesson_id>.txt (plain) - transcripts/<lesson_id>.json (segments, timestamps, speaker) - diarization/<lesson_id>.rttm or .json - reports/lessons/<lesson_id>.md + .pdf (exact one-page template) Aggregated reports (on demand): - reports/aggregate/<student>_monthly_<YYYY-MM>.pdf - reports/aggregate/<student>_half-year_<YYYY-H1|H2>.pdf - reports/aggregate/<student>_yearly_<YYYY>.pdf Lesson report must follow this EXACT structure (Markdown first, then PDF): **William - English Lesson Report** Student: [Name] | Date: [DD MMM YYYY] **Class Summary** [1-2 paragraphs] **What the Student Did Well** - 3-5 bullets **What Needs Improvement** - 2-4 bullets **Next Lesson Focus** - 3-5 bullets **5-10 Minute Home Practice Checklist** - 3-5 tasks **Target Vocabulary & Sentences for Next Lesson** 4-6 items + examples Progress Note: [one positive sentence] Ollama prompt must be very strict (no inventing facts, concise, English only, parent-friendly). Tech stack: - Python 3.10–3.12 (prefer 3.11 or 3.12) - Poetry ([login to view URL] + lock) - WhisperX / faster-whisper (GPU transcription) - [login to view URL] (CPU diarization) + fallback - Local Ollama (gemma2.9b-instruct-q4_K_M or similar) - Weasyprint or ReportLab for PDF - Typer CLI with subcommands: - transcribe - diarize - lesson-report - aggregate - YAML config file - Logging, progress bars, caching (skip if output exists), error handling Deliverables: - Full repo structure - All source code (src/ layout, CLI, config, prompts, PDF renderer) - Installation instructions for Windows 11 (Python, ffmpeg, Poetry, CUDA) - Example commands - Test guide with sample audio Please show experience with WhisperX / faster-whisper, Pyannote, Ollama, and Weasyprint on Windows + GPU setups in your proposal. Thank you! Vladimir
N° de projet : 40248235
79 propositions
Projet à distance
Actif à il y a 9 jours
Fixez votre budget et vos délais
Soyez payé pour votre travail
Surlignez votre proposition
Il est gratuit de s'inscrire et de faire des offres sur des travaux
79 freelances proposent en moyenne $1 095 USD pour ce travail

As a seasoned Python developer with extensive experience in automation pipelines and GPU setups, I understand the need for a fully local, offline solution for your English tutor business. The challenge of automating the processing of lesson audio recordings to create detailed reports while ensuring strict requirements like GPU transcription and CPU diarization is clear. In previous projects, I have successfully implemented similar automation pipelines for educational purposes, utilizing technologies such as WhisperX, Pyannote, Ollama, and Weasyprint on Windows environments with GPU setups. My expertise in creating efficient and reliable solutions aligns perfectly with the goals of your project. To bring your vision to life, I am equipped to deliver a comprehensive repository structure, meticulously crafted source code, detailed installation instructions, and thorough testing guides. My commitment to excellence and attention to detail will ensure that the final deliverables meet and exceed your expectations. I am eager to collaborate with you on this exciting project and look forward to discussing the next steps. Thank you for considering my proposal. Vladimir
$1 200 USD en 20 jours
7,4
7,4

Hello, I trust you're doing well. I am well experienced in machine learning algorithms, with nearly a decade of hands-on practice. My expertise lies in developing various artificial intelligence algorithms, including the one you require, using Matlab, Python, and similar tools. I hold a doctorate from Tohoku University and have a number of publications in the same subject. My portfolio, which showcases my past work, is available for your review. Your project piqued my interest, and I would be delighted to be part of it. Let's connect to discuss in detail. This is my work related to ASR using Whisper and Qwen 3: https://www.freelancer.com/portfolio-items/11262538-asr-and-sts
$1 125 USD en 7 jours
7,2
7,2

Hey Vladimir, I will build the full local pipeline WhisperX transcription on GPU, pyannote diarization on CPU with VAD fallback, Ollama-driven lesson reports following your exact template, and PDF rendering via Weasyprint. The Typer CLI will cover all four subcommands with YAML config, caching and progress logging. One flag on the RTX 5070: CTranslate2 (which faster-whisper depends on) may need a specific CUDA 12.x build for sm_120 support. I will verify this upfront and compile from source if needed avoids GPU debugging later. 1) Are lessons one-on-one or group? This affects diarization speaker count. 2) What VRAM does your 5070 have 12GB? That determines whether WhisperX and Ollama can coexist in memory. Ready to start whenever you are. Best Regards, Kamran
$800 USD en 12 jours
7,1
7,1

Hello, HAVE HANDS-ON EXPERIENCE WITH SUCH PROJECT With 15+ years of proven experience in Python, local AI pipelines, and GPU-accelerated workflows, I confidently understand your requirement: to build a fully offline, Windows-optimized transcription → diarization → structured lesson reporting system that is robust, deterministic, and parent-ready. -->> GPU transcription via faster-whisper / WhisperX (RTX optimized) -->> CPU diarization (pyannote + VAD fallback with “approximate” flag) -->> Strict Ollama prompting (local, no hallucinations) -->> One-page Markdown → PDF (WeasyPrint) -->> Aggregated monthly / H1-H2 / yearly reports -->> Typer CLI + Poetry + YAML config + caching & logging I’ll implement a clean src/ architecture with modular stages (audio → transcript → diarization → alignment → LLM → PDF), deterministic prompts, structured JSON schemas, and safe fallback handling tailored specifically for Windows 11 + CUDA environments. in chat as I have a few technical questions regarding CUDA version installed, expected lesson length range, and whether multi-student batch processing is required from day one. I would begin by validating the GPU transcription stack on your RTX setup, then lock down diarization + fallback logic before integrating Ollama and the PDF renderer to ensure stability first, polish second—delivering a reproducible, maintainable system from start to finish. Thanks & regards Julian
$750 USD en 7 jours
6,5
6,5

Hello Dear! I write to introduce myself. I'm Engineer Toriqul Islam. I was born and grew up in Bangladesh. I speak and write in English like native people. I am a B.S.C. Engineer of Computer Science & Engineering. I completed my graduation from Rajshahi University of Engineering & Technology ( RUET). I love to work on Web Design & Development project. Web Design & development: I am a full-stack web developer with more than 10 years of experience. My design Approach is Always Modern and simple, which attracts people towards it. I have built websites for a wide variety of industries. I have worked with a lot of companies and built astonishing websites. All Clients have good reviews about me. Client Satisfaction is my first Priority. Technologies We Use: Custom Websites Development Using ======>Full Stack Development. 1. HTML5 2. CSS3 3. Bootstrap4 4. jQuery 5. JavaScript 6. Angular JS 7. React JS 8. Node JS 9. WordPress 10. PHP 11. Ruby on Rails 12. MYSQL 13. Laravel 14. .Net 15. CodeIgniter 16. React Native 17. SQL / MySQL 18. Mobile app development 19. Python 20. MongoDB What you'll get? • Fully Responsive Website on All Devices • Reusable Components • Quick response • Clean, tested and documented code • Completely met deadlines and requirements • Clear communication You are cordially welcome to discuss your project. Thank You! Best Regards, Toriqul Islam
$750 USD en 9 jours
5,4
5,4

You are not looking for a coder. You are looking for someone who can build this properly. That is exactly why your project stood out. Your ambition to automate a fully local, GPU-accelerated lesson transcription and diarization pipeline with nuanced, parent-friendly reports reflects a commitment to reliability and precise system orchestration. This approach matches how we design future-proof, streamlined automation at DigitaSyndicate. At DigitaSyndicate, a UK-based digital systems agency, we build precision-engineered automation, modern web platforms, and AI-driven systems designed for performance and long-term scalability. Our expertise with WhisperX, faster-whisper, Pyannote audio fallback logic, Ollama integration, and Weasyprint on Windows 11 RTX environments ensures seamless offline execution aligned with your strict requirements. Having delivered similar end-to-end local AI pipelines for education technology clients, I am confident in mapping an execution plan that meets your priorities and timeline. Can you share your main priorities and timeline so I can map out the right execution plan for you? Casper M. Project Lead | DigitaSyndicate Precision-Built Digital Systems.
$1 150 USD en 14 jours
5,3
5,3

Hello Vladimir, I understand the challenge of automating lesson processing for WG English School while keeping everything fully local and GPU-optimized. I’ve built similar AI-powered pipelines and offline transcription/diarization flows, including projects leveraging WhisperX/faster-whisper, Pyannote, and local LLMs like Ollama. I focus on precise, parent-friendly reports and scalable, maintainable Python code. Could you share a sample lesson audio or the YAML config you envision so I can ensure the pipeline and PDF reporting meet your exact one-page template requirements? Looking forward to building this end-to-end system for you! Regards, A Zain!
$1 125 USD en 7 jours
4,8
4,8

Hello, As an experienced software engineer with a specialty in machine learning (ML) in Python, I am well-equipped to tackle the unique challenges posed by your tutoring pipeline project. My familiarity with GPU-intensive frameworks like WhisperX/faster-whisper and Pyannote will come in handy for lightning-fast transcript processing and diarization. Additionally, my background includes working on Windows + GPU setups, which is a hard requirement for this project. So you can count on me to seamlessly fuse all this technology, including Windows 11 with RTX 5070 GPU, into a cohesive local-first automation pipeline. Moreover, my skills go beyond just writing code. I am adept at creating holistic solutions that encompass every aspect of a project. Your local-solution requirement aligns perfectly with my architecture approach as I prioritize performance and scalability. With an eye towards the future, I make use of only cutting-edge technologies like Poetry and Typer CLI, which guarantee clean codes and easy maintenance. This commitment to excellence extends right through to my documentation skills; not only will your repo structure be impeccable but the accompanying installation instructions for Windows 11 will be clear and comprehensive. In short, if you choose me, you're choosing a seamless integration of powerful technology and client-focused communication that will deliver all-integrated outputs efficiently while adhering meticulously to your project specifics. Best Regards.
$800 USD en 7 jours
5,1
5,1

Hey, I’ve reviewed your project and understand you’re looking to build a fully local, offline Python automation pipeline for WG English School, processing lesson audio into detailed, parent-friendly reports. The focus will be on GPU-accelerated transcription (WhisperX/faster-whisper), CPU-only diarization with fallback, aligned transcripts, and one-page Markdown + PDF lesson reports, plus aggregated student progress. I can develop a modular Python solution using Poetry, Typer CLI, and a structured src/ layout, ensuring smooth subcommands for transcribe, diarize, lesson-report, and aggregate. The system will run entirely local, with Ollama generating strict, concise lesson summaries, and PDFs rendered via WeasyPrint or ReportLab. Logging, caching, and error handling will ensure reliability on Windows 11 with RTX 5070, including GPU/CPU-specific optimizations. You’ll receive a complete repo with config templates, prompts, CLI usage examples, installation guide, and a test workflow to verify transcription, diarization, and report generation. Let’s connect so I can share relevant pipelines and outline the fastest delivery path. Best regards, Muhammad Adil Portfolio: https://www.freelancer.com/u/webmasters486
$950 USD en 6 jours
5,2
5,2

Hi, As per my understanding: You need a fully offline, GPU-accelerated automation pipeline on Windows 11 (RTX 5070) for an English tutor business. Flow: .m4a → GPU transcription (faster-whisper/WhisperX) → CPU diarization (pyannote with VAD+clustering fallback) → aligned transcript → strict local Ollama lesson report → one-page PDF → on-demand monthly/half-year/year aggregates. Outputs must follow exact folder/schema rules, Typer CLI structure, Poetry-managed env, caching, logging, and zero cloud usage. Implementation approach: I will build a modular src/ layout with services: audio, transcription (CUDA-enabled), diarization (CPU-only with fallback flag “approximate”), alignment, reporting (strict prompt templates), and PDF renderer (WeasyPrint). CLI via Typer (transcribe, diarize, lesson-report, aggregate) with YAML config. Deterministic Ollama prompts (no hallucination, parent-friendly). Caching checks skip completed stages. Poetry lockfile, Windows setup guide (CUDA, ffmpeg), and test dataset included. Aggregates computed from stored JSON segments + report metadata. A few quick questions: Expected avg lesson length (minutes)? Max monthly lesson volume? Preferred Ollama model beyond gemma2.9b? Deadline for V1 delivery?
$800 USD en 20 jours
4,6
4,6

Hi,I’m an Applied ML Engineer focused on Speech + NLP and I’ve shipped end-to-end offline audio -> text/report pipelines: GPU Whisper-family transcription, CPU diarization/VAD, timestamp alignment & strict JSON outputs consumed by downstream apps. I’m comfortable hardening these systems for Windows including GPU quirks, caching & deterministic re-runs. Relevant experience: * Built streaming/batch STT services using faster-whisper/WhisperX with GPU acceleration,chunked decoding & alignment to word/segment timestamps; delivered both human-readable TXT & machine JSON outputs for analytics * Implemented speaker diarization pipelines with pyannote and production fallbacks (VAD + embedding/clustering) when models/hardware were unstable; added “approximate” labeling & confidence/quality flags. * Delivered PDF reporting systems (WeasyPrint/ReportLab) with fixed templates, one-page constraints, and batch aggregation (monthly/half-year/yearly) with reproducible CLI tools and logging. I can match your required repo structure: Typer CLI subcommands, YAML config, strict Ollama prompt, caching (skip if outputs exist), robust error handling, and Windows 11 installation guide (Python 3.11/3.12, CUDA, ffmpeg, Poetry). The result will be a fully local workflow for WG English School (William), producing per-lesson TXT/JSON/RTTM + one-page PDF, plus aggregated progress PDFs on demand.
$750 USD en 3 jours
4,1
4,1

I specialize in optimizing local AI pipelines for RTX-powered Windows environments, ensuring high-throughput transcription and zero-latency inference. My experience with WhisperX’s word-level timestamping and Ollama’s API integration allows me to build a seamless, privacy-focused English tutoring engine that runs entirely on your hardware. I recently completed a project converting raw audio into structured feedback using Llama 3, achieving significant speed gains by leveraging FP16 quantization directly on a local RTX-series setup to maximize tokens-per-second while maintaining high linguistic accuracy. I will implement WhisperX for VAD-assisted transcription, utilizing its alignment model to map text accurately to the audio timeline for precise feedback. For the tutor logic, I’ll configure an Ollama-based agent using a custom Modelfile tailored for linguistic analysis, processing transcripts to extract complex grammar and vocabulary insights. The final stage uses ReportLab to generate professional PDF reports summarizing student progress. This entire workflow will be wrapped in a robust Python script with async handling to prevent GPU bottlenecks, including a detailed setup guide to ensure CUDA and CuDNN dependencies are perfectly aligned with your Windows drivers. Do you have a preferred Ollama model, or should I benchmark which performs best for English assessment on your specific hardware? Also, will the PDF reports require a specific branding template or custom progress visualizations? I’m available to discuss hardware optimization further and can jump on a quick call if you’d like to align on the technical requirements and project timeline.
$1 339 USD en 21 jours
3,9
3,9

Hello Vladimir, I hope you are doing well. I’m a senior Python developer who builds fully local, offline automation pipelines. I design robust, GPU‑accelerated workflows for audio transcription, diarization, and reporting, all without cloud dependencies. My focus is creating reliable, maintainable systems that stay fast on Windows 11 with RTX GPUs, while keeping outputs polished and parent‑friendly for English tutoring contexts. In prior work I’ve shipped end‑to‑end local pipelines using WhisperX / faster‑whisper for GPU transcription, CPU diarization with fallback strategies, and a self‑contained report generator using Ollama and WeasyPrint. I’ve structured the stack with Poetry, a clean src layout, and a Typer CLI (transcribe, diarize, lesson-report, aggregate), plus YAML config and caching to avoid rework when inputs are unchanged. The result is reproducible, testable, and ready for monthly/half‑yearly/yearly aggregates. I can handle the complete build end‑to‑end based on these capabilities: GPU‑accelerated transcription, CPU diarization with robust fallbacks, strict local reporting templates, and a polished one‑page PDF per lesson plus aggregated PDFs on demand. I’ll deliver a clean repo (src/, CLI, prompts, PDF renderer), Windows install steps, example commands, and a test guide. Best regards, Billy Bryan
$750 USD en 7 jours
4,0
4,0

Greetings! I’m a top-rated freelancer with 16+ years of experience and a portfolio of 750+ satisfied clients. I specialize in delivering high-quality, professional local-first English tutor automation pipeline building services tailored to your unique needs. Please feel free to message me to discuss your project and review my portfolio. I’d love to help bring your ideas to life! Looking forward to collaborating with you! Best regards, Revival
$750 USD en 14 jours
3,7
3,7

Hi Vladimir, I can build your fully local Windows 11 automation pipeline using faster-whisper (GPU), CPU-only diarization with fallback logic, strict Ollama reporting, Typer CLI, and one-page PDF generation—delivering a clean Poetry-managed repo with full setup and test documentation.
$950 USD en 5 jours
3,5
3,5

As the senior Python developer you need for your English tutor business, I understand the necessity of a seamless automation pipeline. With 5 years of experience and similar projects, I've honed my skills in creating local, offline solutions like the one you're seeking. My expertise in WhisperX, Pyannote, Ollama, and Weasyprint on Windows + GPU setups aligns perfectly with your requirements. I guarantee a clean, professional, and user-friendly automation process, ensuring high-quality, parent-friendly reports. Let's discuss how my approach can enhance the performance and efficiency of your lesson processing while maintaining scalability and reliability over time. Looking forward to providing you with exceptional service and results. Chirag Pipal Regards
$1 150 USD en 7 jours
2,9
2,9

With a Master’s in Software Engineering and over 10 years of full-stack experience, I’m well prepared to build your local-first English tutor automation pipeline. I understand the technical requirements, including Windows 11 and an RTX GPU, and I’m experienced with the tools you mentioned—Python, WhisperX, Pyannote, Ollama, and Weasyprint. I’ve successfully worked in constrained environments and can ensure stable, efficient performance. I’m comfortable configuring GPU setups on Windows and optimizing CPU/GPU usage, particularly for libraries like pyannote.audio. My strong problem-solving skills allow me to quickly diagnose and resolve performance or compatibility issues. Over the past decade, I’ve delivered scalable web and mobile solutions using Python, Django, React, and Vue, always prioritizing quality and maintainability. For your project, I aim not only to meet the requirements but to design a clean, efficient microservices structure that balances advanced technology with simplicity. You can expect professional communication, timely delivery, and a strong commitment to your vision. I value honesty, transparency, and long-term collaboration, and I’m excited about the opportunity to work together.
$1 125 USD en 7 jours
3,0
3,0

We propose a fully local, offline pipeline for English tutoring using WhisperX and Ollama on your Windows RTX GPU. Our technical solution ensures privacy and eliminates API latency by orchestrating these models into a robust, modular Python system. The implementation plan covers the entire workflow—from audio ingestion and speech-to-text to LLM analysis and structured PDF report delivery—optimized for your local hardware. The budget for this complete, tested solution is 1150.0 USD, delivered in phases with a functional prototype as the first milestone. To finalize the architecture, what is the typical duration and format of the audio sessions you'll analyze?
$1 150 USD en 5 jours
2,9
2,9

Hi there! Have you thought about how to handle variations in audio quality across different phone recordings for consistent transcription accuracy? Regardless, this is definitely something that I feel confident delivering on, given my past experience. I would love to discuss your project further! Looking forward hearing from you. Kind Regards, Corné
$750 USD en 14 jours
3,0
3,0

Hello Vladimir, I’ve carefully reviewed your fully local lesson-automation pipeline requirements. I previously built GPU-based transcription systems using faster-whisper/WhisperX on Windows with CUDA, CPU-only pyannote diarization with VAD fallback, and structured Ollama-powered report generation rendered to one-page PDFs via WeasyPrint, all packaged with Poetry and Typer CLI. Your strict offline processing, RTX GPU usage, deterministic prompts, exact report template, caching, and structured outputs will be implemented through a modular src/ architecture, YAML config, robust logging, and reproducible Windows setup documentation. I’m ready to begin immediately and will deliver a stable, production-grade system quickly. Best regards, Mauricio
$1 200 USD en 7 jours
3,0
3,0

Zrenjanin, Serbia
Membre depuis févr. 21, 2026
$30-250 USD
₹1500-12500 INR
₹400-750 INR / heure
₹600-1500 INR
₹37500-75000 INR
₹37500-75000 INR
$700-1000 USD
$3-5 USD / heure
₹12500-37500 INR
$5000-10000 USD
$30-250 USD
$250-750 USD
$250-750 USD
£10-15 GBP / heure
€8-30 EUR
$250-750 USD
$30-250 USD
$30-250 USD
$15-25 USD / heure
$30-250 USD