
Fermé
Publié
Payé lors de la livraison
I am looking for a senior AI/Backend Engineer to build a high-performance voice-ordering system for restaurants. The system must handle "Code-switching" (mixed Arabic and Hebrew dialects) with ultra-low latency. Key Requirements: STT Engine: Implement Faster-Whisper (Large-v3-Turbo) on a GPU-accelerated environment (RunPod/AWS/Lambda Labs). Contextual Accuracy: Use Initial Prompting techniques to ensure high accuracy for specific restaurant menus (dishes, modifiers, doneness levels). Real-time Processing: Experience with WebSockets for streaming audio and achieving <1s end-to-end latency. Parsing: Integrate an LLM (GPT-4o-mini or local Llama-3) to convert raw transcribed text into structured JSON. Infrastructure: Knowledge of Serverless GPUs or Dockerized GPU deployments to ensure scalability for 100+ concurrent restaurants. Specific Challenges: The system must be optimized for "Arabic-Hebrew" hybrid speech (Local dialect). Experience with VAD (Voice Activity Detection) like Silero is a plus to handle background noise in kitchens. Deliverables: A working backend API that accepts audio and returns structured JSON. Latency optimization report. Scalability plan for multi-tenant architecture.
N° de projet : 40248359
62 propositions
Projet à distance
Actif à il y a 16 jours
Fixez votre budget et vos délais
Soyez payé pour votre travail
Surlignez votre proposition
Il est gratuit de s'inscrire et de faire des offres sur des travaux
62 freelances proposent en moyenne $477 USD pour ce travail

Hi there, I’m Muhammad Awais, and I’ll help you build a fast, reliable voice-ordering system that works in Arabic-Hebrew chat and can scale to 100+ restaurants. I’ll start with a GPU-accelerated STT pipeline using Faster-Whisper (Large-v3-Turbo) on a serverless or Dockerized setup, tuned for kitchen noise with Silero VAD as needed. We'll apply initial prompting on menu data to boost contextual accuracy for dishes, modifiers, and doneness levels, and stream audio via WebSockets to keep end-to-end latency under 1 second. Transcripts will be parsed by an LLM (GPT-4o-mini or local Llama-3) to produce structured JSON ready for downstream apps. The backend API will accept audio chunks, return JSON, and run in a multi-tenant architecture with per-restaurant isolation and autoscaling, plus a latency optimization report and a scalable deployment plan. What is your preferred cloud/edge mix for GPU-accelerated inference (RunPod, AWS, Lambda Labs) and max concurrent streams per restaurant? What menu data format do you provide (JSON schema, YAML) and how often does it change? Do you require streaming transcription with sub-second chunk latency or full utterance latency? Which GPT/LLM option should be used for JSON parsing (GPT-4o-mini or local Llama-3) and any constraints on model size? What languages/dialects besides Arabic-Hebrew might appear, and how should code-switching be handled? Any preferred VAD strategy or noise profiles for kitchen environments? What is your securit
$750 USD en 21 jours
7,4
7,4

Hello, I trust you're doing well. I am well experienced in machine learning algorithms, with nearly a decade of hands-on practice. My expertise lies in developing various artificial intelligence algorithms, including the one you require, using Matlab, Python, and similar tools. I hold a doctorate from Tohoku University and have a number of publications in the same subject. My portfolio, which showcases my past work, is available for your review. Your project piqued my interest, and I would be delighted to be part of it. Let's connect to discuss in detail. Warm regards. please check my portfolio link: https://www.freelancer.com/u/sajjadtaghvaeifr
$500 USD en 7 jours
7,1
7,1

As a Senior AI Engineer with over 6 years of experience, I have successfully built and deployed several high-performance, real-time voice systems using state-of-the-art technologies similar to what you require. My expertise extends to the implementation of STT Engine, Contextual Accuracy through Initial Prompting, and deployment on GPU-accelerated environments like AWS RunPod and Lambda Labs. My in-depth knowledge and practical experience with WebSockets will ensure ultra-low latency during audio streaming for seamless voice ordering in your restaurant system. With regard to the challenge of code-switching, my familiarity with VAD systems like Silero will allow me to efficiently handle mixed Arabic and Hebrew dialects in the presence of background noise. Additionally, I am proficient in integrating language models such as GPT-4o-mini or Llama-3 for parsing and transforming raw transcribed text into structured JSON. Scalability remains a crucial aspect of any large-scale system, and I specialize in deploying solutions on Serverless GPUs or Dockerized GPU instances, ensuring your system can seamlessly handle over 100 concurrent restaurants. With my proven mastery in these areas aligned with your project's key requirements, I can guarantee pioneering solutions that meet your unique needs. Let's get started on making your voice-ordered system excel!
$251 USD en 2 jours
5,6
5,6

With an in-depth understanding of your requirements and decades of collective experience, my team and I can confidently offer you the best possible solution for this voice-ordering system. My AI specialization in speech recognition and ML allows me to leverage powerful tools such as the Faster-Whisper engine and VAD like Silero that brilliantly articulate with mixed Arabic and Hebrew dialects. Contextual Accuracy is a challenging but crucial aspect for restaurant menus. I have employed Initial Prompting techniques in various projects to ensure our models fully comprehend menu items, modifiers, and nuances like doneness levels. This, coupled with my extensive knowledge in websockets for real-time streaming audio and end-to-end latency reduction, guarantees your system will have robust, high-performing architecture. Moreover, I possess excellent infrastructure engineering skills using AWS (specifically serverless GPUs), Dockerized GPU deployments allowing seamless scalability for 100+ concurrent users. My proficiency with LLMs like GPT-4o-mini or local Llama-3 helps convert raw transcribed text into structured JSON for efficient parsing. With this skillset, I assure an optimized backend API that accepts audio and returns structured JSON with ultra-low latency. Lastly, my attentiveness to clients' needs ensures thorough deliverables like the latency optimization report and a detailed plan for multi-tenant scalability.
$500 USD en 7 jours
5,3
5,3

⭐⭐⭐⭐⭐ We at CnELIndia, led by Raman Ladhani, are well-positioned to deliver this high-performance voice-ordering system. Leveraging our expertise in GPU-accelerated ML deployments, we can implement Faster-Whisper (Large-v3-Turbo) on RunPod or AWS Lambda Labs to achieve ultra-low latency for Arabic-Hebrew code-switching. Using Initial Prompting and contextual embeddings, we ensure menu-specific STT accuracy, while WebSocket streaming combined with VAD techniques like Silero will maintain <1s end-to-end latency even in noisy kitchens. Our team will integrate GPT--mini or Llama-3 for robust JSON parsing of transcribed text and deploy scalable serverless or Dockerized GPU infrastructure to handle 100+ concurrent restaurants. We will also provide a detailed latency optimization report and a multi-tenant scalability plan to ensure reliability and rapid adoption.
$500 USD en 7 jours
5,2
5,2

I recently engineered a low-latency conversational AI for a high-volume restaurant kiosk, where I reduced order friction by nearly 40% through optimized STT-to-LLM handoffs and asynchronous state management. Your voice-ordering system requires more than just a transcription layer; it needs a robust, real-time state machine capable of handling mid-sentence interruptions and ambient noise while maintaining high accuracy in intent extraction. I am ready to implement a production-grade solution that bridges the gap between raw audio and structured order data with sub-second response times, ensuring a frictionless experience for every customer. To achieve this high-performance threshold, I will utilize a modular pipeline featuring Deepgram or a distilled Whisper model for ultra-low latency STT conversion. This will be coupled with a context-aware LLM, such as GPT-4o or a fine-tuned Llama 3, specifically optimized for multi-turn reasoning and dynamic menu navigation. I will implement a robust Menu Logic engine using Pydantic for strict schema validation and vector-based searching to ensure the agent accurately identifies complex modifiers and upsell opportunities in real-time. The infrastructure will be built on FastAPI with high-concurrency WebSockets to maintain a persistent bi-directional stream, while ElevenLabs or AWS Polly will be integrated for natural, low-latency TTS responses. How do you plan to handle edge cases like multi-item deletions or complex dietary modifications mid-order? Furthermore, will this system be deployed in high-noise environments like a drive-thru, where we might need to implement specialized neural noise-suppression layers to maintain clarity? I would appreciate the opportunity to discuss your technical architecture in more detail and explore how we can optimize the total latency budget to create a truly seamless, human-like ordering experience. I am available for a brief technical alignment call at your convenience to discuss the next steps.
$571 USD en 21 jours
4,2
4,2

Hi, I’m a seasoned Applied ML Engineer focused on Speech + NLP & I’ve built real-time voice/LLM pipelines(streaming STT -> normalization -> structured intent) with GPU deployment, VAD & multi-tenant routing. I can deliver your restaurant voice-ordering backend optimized for Arabic–Hebrew code-switching & <1s perceived latency Approach *Streaming STT: Faster-Whisper on GPU with chunked WebSocket audio(20–40ms frames).Use VAD (Silero/WebRTC VAD)+ partial hypotheses to reduce turnaround time *Menu contextualization: initial prompting via dynamic hotwords/phrase biasing:menu items,modifiers,common slang+ post-STT normalization (spelling variants,Arabic/Hebrew mixing etc) *Robust parsing: LLM layer converts transcript ->JSON schema (item, qty, modifiers,special notes). Add guardrails: schema validation,retries & confidence thresholds *Latency engineering: GPU warm pools,batching where safe,async I/O,token streaming & short-circuit on end-of-utterance from VAD. Provide an E2E latency breakdown +optimizations *Multi-tenant scale: per-restaurant configs with isolation,rate limits & observability. Deploy on RunPod/AWS GPU via Docker; plan for autoscaling to 100+ concurrent tenants Relevant experience * Built multilingual voice bots with streaming STT, VAD & WebSocket transport under tight latency * Implemented NLU extraction(LLM + rules) into validated JSON for support workflows * Deployed GPU inference services (Docker, autoscaling, monitoring) with tenant-specific config management
$500 USD en 4 jours
4,0
4,0

Hello, I understand you are looking to build a high-performance, multilingual voice-ordering system for restaurants, capable of handling code-switching between Arabic and Hebrew with ultra-low latency. The objective is a robust backend that transcribes audio, interprets menu-specific commands, and outputs structured JSON in real time, optimized for noisy kitchen environments and scalable to multiple concurrent tenants. My approach begins with deploying Faster-Whisper (Large-v3-Turbo) on a GPU-accelerated environment (RunPod/AWS/Lambda Labs) for low-latency speech-to-text. I will implement initial prompting and contextual adaptation to improve menu-specific accuracy, and integrate a local or GPT-4o-mini LLM to parse transcriptions into structured JSON. Real-time streaming will be handled via WebSockets, complemented by VAD using Silero to reduce background noise interference. The architecture will leverage Dockerized GPU deployments or serverless GPU instances to ensure scalability across 100+ restaurants while maintaining <1s end-to-end latency. Deliverables include a fully functional backend API, detailed latency and accuracy optimization report, and a comprehensive scalability plan. All components will be modular, documented, and production-ready for rapid integration with front-end ordering systems. Thanks, Asif.
$750 USD en 11 jours
4,2
4,2

You are not looking for a coder. You are looking for someone who can build this properly. That is exactly why your project stood out. Your ambition to create a high-performance voice-ordering system capable of ultra-low latency code-switching between Arabic and Hebrew dialects reflects an advanced understanding of multilingual AI challenges and real-time processing demands. At DigitaSyndicate, a UK-based digital systems agency, we build precision-engineered automation and scalable AI-driven platforms designed for reliable performance and future-proof architecture. Our experience implementing GPU-accelerated Faster-Whisper models alongside contextual initial prompting aligns seamlessly with your need for audio-to-JSON parsing, including expertise in WebSockets streaming and serverless GPU infrastructures. Having delivered real-time multilingual systems optimized for voice activity detection and sub-second latency, I’m keen to understand your priorities and timeline to tailor an execution strategy that meets your operational goals. Casper M. Project Lead | DigitaSyndicate Precision-Built Digital Systems.
$550 USD en 14 jours
4,3
4,3

Hello, I understand you need a high-performance voice-ordering backend capable of handling Arabic-Hebrew code-switching, ultra-low latency streaming, and menu-specific contextual accuracy. At SEO Global, we have 10+ years building scalable, GPU-accelerated systems, real-time APIs with WebSockets, and AI-driven pipelines for structured data extraction, perfectly aligning with Faster-Whisper and LLM integrations. We will implement a Dockerized GPU deployment with VAD optimization, ensuring sub-1s latency and multi-tenant scalability while delivering a structured JSON API and latency report. Could you clarify the expected audio formats and max file sizes? Do you have existing menu datasets for initial prompting, or should we generate them? Looking forward to collaborating, SEO Global Team
$500 USD en 7 jours
3,1
3,1

Hello There!!! ★★★★ ( Senior AI Engineer for Voice-Ordering System ) ★★★★ I understand you need a high-performance voice-ordering backend that handles Arabic-Hebrew code-switching with sub-second latency. The system must transcribe audio using Faster-Whisper, parse menu-specific commands via an LLM, and scale across multiple restaurants while being robust to kitchen noise. ⚜ GPU-accelerated STT engine using Faster-Whisper Large-v3-Turbo ⚜ Real-time audio streaming with WebSockets for <1s latency ⚜ Contextual accuracy with initial prompting for menu-specific terms ⚜ LLM integration (GPT-4o-mini or local Llama-3) to output structured JSON ⚜ Voice Activity Detection (Silero) to filter background noise ⚜ Multi-tenant scalable architecture via serverless or Dockerized GPUs ⚜ Detailed latency optimization and scalability report With 9+ years of AI and backend experience, I’ve built real-time multilingual audio systems and scalable voice apps. I can deliver a robust, low-latency solution tailored to your restaurant workflow. Excited to bring your voice-ordering system to life efficiently. Warm Regards, Farhin B.
$256 USD en 12 jours
4,0
4,0

Hello, I specialize in AI voice systems and built & customized large scale real-time speech pipelines. The main challenge here is handling Arabic–Hebrew code-switching with under 1s latency while keeping menu accuracy high in noisy kitchens. I am certified in Python and GPU-based AI deployment, and I will solve this using Faster-Whisper Large-v3-Turbo on CUDA (RunPod or AWS), Silero VAD for noise filtering, and WebSocket streaming for real-time flow. I will use GPT-4o-mini or Llama-3 to convert transcripts into clean structured JSON with menu-aware prompting. A few things I’d like to clarify: Do you already have recorded samples of real kitchen audio? Should each restaurant have its own fine-tuned prompt or shared model? What peak concurrent calls do you expect per location? Do you need POS integration in phase one? I’ll deliver scalable Dockerized backend, latency report, and multi-tenant plan. Best regards, Dev S.
$1 500 USD en 10 jours
2,3
2,3

Hello! I am a US-based full stack developer with extensive experience in building high-performance systems. I carefully read your project description regarding the voice-ordering system and I believe I can deliver exactly what you need. With about 10 years of experience in AI and backend engineering, I specialize in integrating LLMs and building intelligent workflow automation tools. I understand the importance of creating a seamless user experience, especially in voice-ordering systems, where clarity and accuracy are crucial. Could you please clarify the following questions to help me better understand the project? 1. What specific features do you envision for the voice-ordering system? 2. Are there any preferred AI technologies or frameworks you would like to utilize? 3. What is your timeline for the project completion? In the past, I've worked on various relevant projects, such as developing an AI-driven order management system and creating a voice-enabled customer service platform. I believe my skills align well with your project goals, and I approach every task with a commitment to quality and detail. I look forward to the opportunity to discuss your project further. Let's make this voice-ordering system a success together! Best, James Zappi
$700 USD en 5 jours
2,0
2,0

Hello, I can architect and develop a high-performance voice-ordering system that meets your requirements for handling mixed Arabic and Hebrew dialects with ultra-low latency. The likely root cause for similar projects often lies in optimizing real-time processing and ensuring efficient GPU utilization. With over five years of experience in building scalable AI systems, I've successfully implemented STT engines and integrated advanced LLMs for various applications. For your project, I will leverage Faster-Whisper in a GPU-accelerated environment (AWS or RunPod) to ensure rapid transcription and contextually accurate responses, utilizing Initial Prompting techniques tailored to specific restaurant menus. My approach involves implementing WebSocket connections for real-time audio streaming and integrating VAD for effective noise management in kitchen environments. I will also ensure that the backend API efficiently processes audio input and outputs structured JSON data, alongside a comprehensive latency optimization report and a scalability plan for supporting over 100 concurrent users. To clarify, do you have specific restaurant menus ready for initial prompting? What audio formats will the system need to accept? I'm ready to start immediately and look forward to bringing your vision to life.
$250 USD en 7 jours
1,8
1,8

Hello, I’m Dinesh Kumar With 14+ years of experience across multiple platforms, I’ve helped build numerous startups through dedication and hard work. I’m committed to delivering high quality work that ensures 100% client satisfaction. Your success is my priority, and I focus on building long term relationships based on trust and excellence. Expertise: Web & App Development – React.js, Node.js, JavaScript, PHP, MySQL, WordPress, Magento, CodeIgniter, Shopify, .NET, Flutter, FoxPro Strong knowledge of frameworks, software design, and development methodologies Proven ability to deliver custom, scalable, and reliable solutions for diverse industries I work with clients globally, providing end to end solutions that meet unique project needs while maintaining the highest quality standards.
$500 USD en 7 jours
1,8
1,8

For implementing the voice-ordering system for restaurants, would it be fine if I analyze the requirements thoroughly, then propose a strategic approach to integrate the Faster-Whisper STT Engine on a GPU-accelerated environment and ensure high contextual accuracy using Initial Prompting techniques? So I believe that by leveraging WebSockets for real-time processing and integrating an LLM for parsing, we can achieve a seamless voice-ordering system that meets the specific needs of your project. By the way, do you have a preferred timeline for the implementation of the voice activity detection feature to handle background noise efficiently in kitchen environments?
$300 USD en 4 jours
1,4
1,4

Hi there , I can architect a low-latency, GPU-accelerated voice-ordering backend optimized for Arabic–Hebrew code-switching using Faster-Whisper (large-v3-turbo) with streaming inference. I’ve built real-time speech pipelines with WebSockets, VAD (Silero), and LLM-based semantic parsers, achieving sub-second end-to-end latency in noisy environments like QSR kitchens. Proposed approach: • GPU STT microservice (Dockerized, CUDA) with Faster-Whisper + Silero VAD for clean segmentation • WebSocket streaming API for real-time audio ingestion and partial transcripts • Contextual prompting layer injecting dynamic menu vocab (dishes, modifiers, doneness) • LLM parsing (GPT-4o-mini or Llama-3) to convert transcripts into validated structured JSON • Multi-tenant architecture with tenant-specific prompts, menus, and rate limits • Scalable deployment on RunPod/AWS GPU autoscaling with load-balanced workers Deliverables include: production-ready API, latency benchmarking (<1s target), and a scaling blueprint for 100+ concurrent restaurants with cost/performance trade-offs. Clarification Questions: Expected audio format/bitrate from POS or mobile devices? Do menus change frequently and require dynamic prompt updates? Preferred LLM (hosted vs local) considering latency vs cost? Peak concurrent streams per restaurant? Any existing POS/KDS integration requirements for order dispatch?
$250 USD en 7 jours
1,4
1,4

✋ Hi There!!! ✋ The Goal of the project:- Build a high-performance, low-latency voice-ordering backend system capable of handling Arabic-Hebrew code-switching with structured JSON output and scalable architecture. I have carefully read and understood your complete project description and I am confident in delivering a robust AI-driven solution. I am the best fit for this project because I have extensive experience in audio processing, NLP, and GPU-accelerated backend deployments for real-time applications. I can provide: 1. Implementation of Faster-Whisper Large-v3-Turbo on GPU environments for accurate STT. 2. Integration of an LLM to parse transcribed audio into structured JSON with contextual accuracy. 3. Real-time audio streaming via WebSockets with latency optimization and scalable serverless GPU deployment. I offer services including database management, testing, and full source code delivery at project completion, ensuring maintainability. I have 9+ years experience as a full stack developer and have successfully built real-time AI-powered voice systems with multi-language support and low-latency performance. Looking forward to chat with you for make a deal Best Regards Elisha Mariam!
$254 USD en 14 jours
1,5
1,5

Hello, I’m a Senior AI/Backend Engineer specializing in real-time speech systems, Whisper optimization, and LLM-based structured pipelines. I’ve built streaming ASR services with GPU acceleration, VAD, and sub-second latency—well aligned with your voice-ordering use case. Approach 1. STT: GPU deployment of Faster-Whisper (Large-v3-Turbo) on RunPod/AWS with FP16 optimization 2. Real-time: WebSocket streaming with chunked inference (300–500 ms) targeting <1s end-to-end latency 3. Noise Handling: Silero VAD for speech detection and reduced background noise (kitchen environments) 4. Code-switching Accuracy: Menu-grounded initial prompts (Arabic–Hebrew dialect, items, modifiers, doneness levels) 5. Parsing: LLM-based structured extraction using GPT-4o-mini or local Llama-3 → validated JSON output Architecture 1. FastAPI + async WebSockets 2. Dockerized GPU services 3. Multi-tenant design (restaurant-specific menus/prompts) 4. Scalable for 100+ concurrent restaurants via GPU worker pooling I focus on production-grade, low-latency AI systems, not prototypes. Happy to discuss traffic expectations and deployment preferences. Best regards, Ayush Senior Research Engineer (Speech + LLM Systems)
$334 USD en 14 jours
1,7
1,7

Hello! I've been recommended by a Freelancer Recruiter. Nice to meet you. I've just completed a similar high-performance voice-ordering system for restaurants that needed to handle code-switching with ultra-low latency. As a seasoned AI expert with experience in GPU-accelerated environments, I'm the perfect fit to tackle the complexities of Arabic-Hebrew hybrid speech and achieve sub-1s end-to-end latency. With expertise in implementing Faster-Whisper on GPU hardware, integrating LLMs like GPT-4o-mini, and utilizing WebSockets for real-time processing, I'm confident in delivering a scalable and accurate voice-ordering system that meets your requirements. In a previous project, I reduced manual work by 80% by implementing Initial Prompting techniques and optimizing the STT engine for specific restaurant menus. Multiple 5-star reviews on AI-driven voice apps and OpenAI API integrations speak to my expertise. Happy to hop on a quick call (no obligation) to discuss architecture, timeline, and a clear plan + quote. Chris | Lead Developer | Novatech
$750 USD en 7 jours
1,2
1,2

Jerusalem, Israel
Membre depuis janv. 30, 2026
$30-250 USD
£18-36 GBP / heure
₹12500-37500 INR
$25-50 AUD / heure
$15-25 USD / heure
$10-30 USD
$250-750 USD
$250-750 USD
$15-25 USD / heure
$3000-5000 USD
$250-750 NZD
€250-750 EUR
$8-15 USD / heure
€250-750 EUR
$250-750 AUD
£5000-10000 GBP
$1500-3000 USD
$250-750 USD
$2-8 USD / heure
$15-25 AUD / heure
$15-25 USD / heure