erek
Supreme [H]ardness
- Joined
- Dec 19, 2005
- Messages
- 5,027
Man this is baller. first Facebook Research Group's Parl.AI / Blender and now NVIDIA Jarvis!! How fantastic it is to be alive right noew!!!
"Real-time conversational AI is a complex and challenging task. To allow real-time, natural interaction with an end user, the models need to complete computation in under 300 milliseconds. Natural interactions are challenging requiring multimodal sensory integration. Model pipelines are also complex and require coordination across multiple services:
You can fuse these skills to form multimodal skills in your applications. You can fine-tune services and models on your datasets to get the highest accuracy possible using NVIDIA NeMo. And you can use other tools in the NVIDIA AI Toolkit to optimize and build services that can run at scale.
Jarvis is designed to help you access conversational AI functionality easily and quickly. With a few commands, you can access the high-performance services through API operations and try multimodal demos.
Jarvis framework
Jarvis is a fully accelerated, application framework for building multimodal conversational AI services that use an end-to-end deep learning pipeline (Figure 1).
The Jarvis framework includes pretrained conversational AI models, tools, and optimized end-to-end services for speech, vision, and NLU tasks. In addition to AI services, Jarvis enables you to fuse vision, audio, and other sensor inputs simultaneously to deliver capabilities such as multi-user, multi-context conversations in applications such as virtual assistants, multi-user diarization, and call center assistants.
Using Jarvis, you can easily fine-tune state-of-art-models on your data to achieve a deeper understanding of their specific contexts. Optimize for inference to offer real-time services that run in 150 ms compared to the 25 seconds required on CPU-only platforms."
https://devblogs.nvidia.com/introducing-jarvis-framework-for-gpu-accelerated-conversational-ai-apps/
"Real-time conversational AI is a complex and challenging task. To allow real-time, natural interaction with an end user, the models need to complete computation in under 300 milliseconds. Natural interactions are challenging requiring multimodal sensory integration. Model pipelines are also complex and require coordination across multiple services:
- Automatic speech recognition (ASR)
- Natural language understanding (NLU)
- Domain-specific fulfillment services
- Text-to-speech (TTS)
You can fuse these skills to form multimodal skills in your applications. You can fine-tune services and models on your datasets to get the highest accuracy possible using NVIDIA NeMo. And you can use other tools in the NVIDIA AI Toolkit to optimize and build services that can run at scale.
Jarvis is designed to help you access conversational AI functionality easily and quickly. With a few commands, you can access the high-performance services through API operations and try multimodal demos.
Jarvis framework
Jarvis is a fully accelerated, application framework for building multimodal conversational AI services that use an end-to-end deep learning pipeline (Figure 1).
The Jarvis framework includes pretrained conversational AI models, tools, and optimized end-to-end services for speech, vision, and NLU tasks. In addition to AI services, Jarvis enables you to fuse vision, audio, and other sensor inputs simultaneously to deliver capabilities such as multi-user, multi-context conversations in applications such as virtual assistants, multi-user diarization, and call center assistants.
Using Jarvis, you can easily fine-tune state-of-art-models on your data to achieve a deeper understanding of their specific contexts. Optimize for inference to offer real-time services that run in 150 ms compared to the 25 seconds required on CPU-only platforms."
https://devblogs.nvidia.com/introducing-jarvis-framework-for-gpu-accelerated-conversational-ai-apps/