New HYBRID AI Model Just Shocked The Open-Source  World - JAMBA 1 .5 - video Dailymotion

High tech & Ai world

AI 21 Labs has released  two new open - source AI models Jamba 1.5 Mini and Jamba 1.5 large, featuring a unique hybrid architecture called SSM-Transf ormer that Combine the strength of Transformers and structured state space modelS.

Transcript

00:00So, AI21Labs, the brains behind the Jurassic language models, has just dropped two brand

00:08new open source LLMs called Jamba 1.5 Mini and Jamba 1.5 Large.

00:13And these models are designed with a unique hybrid architecture that incorporates cutting-edge

00:17techniques to enhance AI performance.

00:20And since they're open source, you can try them out yourself on platforms like HuggingFace

00:25or run them on cloud services like Google Cloud Vertex AI, Microsoft Azure, and NVIDIA

00:31NIMH.

00:32Definitely worth checking out.

00:33All right, so what's this hybrid architecture all about?

00:36Okay, let's break it down in simple terms.

00:39Most of the language models you know, like the ones used in ChatGPT, are based on the

00:44transformer architecture.

00:46These models are awesome for a lot of tasks, but they've got this one big limitation.

00:51They struggle when it comes to handling really large context windows.

00:55Think about when you're trying to process a super long document or a full transcript

01:00from a long meeting.

01:01Regular transformers get kind of bogged down because they have to deal with all that data

01:06at once.

01:07And that's where these new Jamba models from AI21Labs come into play with a totally new,

01:13game-changing approach.

01:14So AI21 has cooked up this new hybrid architecture they're calling the SSM transformer.

01:20Now what's cool about this is it combines the classic transformer model with something

01:24called a structured state space model, or SSM.

01:28The SSM is built on some older, more efficient techniques like neural networks and convolutional

01:33neural networks.

01:35Basically these are better at handling computations efficiently.

01:38So by using this mix, the Jamba models can handle much longer sequences of data without

01:43slowing down.

01:44That's a massive win for tasks that need a lot of context, like if you're doing some

01:48complex generative AI reasoning or trying to summarize a super long document.

01:53Now why is handling a long context window such a big deal?

01:57Well, think about it.

01:58When you're using AI for real-world applications, especially in businesses, you're often dealing

02:03with complex tasks.

02:05Maybe you're analyzing long meeting transcripts, or summarizing a giant policy document, or

02:09even running a chatbot that needs to remember a lot of past conversations.

02:14The ability to process large amounts of context efficiently means these models can give you

02:19more accurate and meaningful responses.

02:21Or Dagan, the VP of product at AI21 Labs, actually nailed it when he said an AI model

02:27that can effectively handle long context is crucial for many enterprise generative AI

02:33applications.

02:34And he's right.

02:35Without this ability, AI models often tend to hallucinate or just make stuff up because

02:39they're missing out on important information.

02:42But with the Jamba models and their unique architecture, they can keep more relevant

02:47info in memory, leading to way better outputs and less need for repetitive data processing.

02:52And you know what that means.

02:54Better quality and lower cost.

02:55Alright, let's get into the nuts and bolts of what makes this hybrid architecture so

02:59efficient.

03:00So there's one part of the model called Mamba, which is actually very important.

03:04It's developed with insights from researchers at Carnegie Mellon and Princeton, and it has

03:08a much lower memory footprint and a more efficient attention mechanism than your typical

03:13transformer.

03:14This means it can handle longer context windows with ease.

03:17Unlike transformers, which have to look at the entire context every single time, slowing

03:21things down, Mamba keeps a smaller state that gets updated as it processes the data.

03:27This makes it way faster and less resource intensive.

03:30Now, you might be wondering, how do these models actually perform?

03:33Well, AI21 Labs didn't just hype them up, they put them to the test.

03:38They created a new benchmark called Ruler to evaluate the models on tasks like multi-hop

03:44tracing, retrieval, aggregation, and question answering.

03:47And guess what?

03:48The Jamba models came out on top, consistently outperforming other models like Lama 3.170B,

03:53Lama 3.1405B, and Mistral Large 2.

03:57On the Arena Hard benchmark, which is all about testing models on really tough tasks,

04:02Jamba 1.5 Mini and Large outperformed some of the biggest names in AI.

04:07Jamba 1.5 Mini scored an impressive 46.1, beating models like Mistral 8, X22B, and Command

04:15R+, while Jamba 1.5 Large scored a whopping 65.4, outshining even the big guns like Lama

04:213.170B and 405B.

04:25One of the standout features of these models is their speed.

04:29In enterprise applications, speed is everything.

04:32Whether you're running a customer support chatbot or an AI-powered virtual assistant,

04:37the model needs to respond quickly and efficiently.

04:40The Jamba 1.5 models are reportedly up to 2.5 times faster on long contexts than their

04:45competitors, so not only are they powerful, but they're also super practical for high-scale

04:50operations.

04:51And it's not just about speed.

04:53The Mamba component in these models allows them to operate with a lower memory footprint,

04:57meaning they're not as demanding on hardware.

04:59For example, Jamba 1.5 Mini can handle context lengths up to 140,000 tokens on a single GPU.

05:06That's huge for developers looking to deploy these models without needing a massive infrastructure.

05:12Here's where it gets even cooler.

05:13To make these massive models more efficient, AI21 Labs developed a new quantization technique

05:18called Experts Int 8.

05:21Now I know that might sound a bit technical, but here's the gist of it.

05:25Quantization is basically a way to reduce the precision of the numbers used in the model's

05:31computations.

05:32This can save on memory and computational costs without really sacrificing quality.

05:37Experts Int 8 is special because it specifically targets the weights in the mixture of experts,

05:42or MoE layers, of the model.

05:44These layers account for about 85% of the model's weights in many cases.

05:48By quantizing these weights to an 8-bit precision format and then dequantizing them directly

05:54inside the GPU during runtime, AI21 Labs managed to cut down the model size and speed

06:00up its processing.

06:02The result?

06:03Jamba 1.5 Large can fit on a single 8-GPU node while still using its full context length

06:09of 256k.

06:11This makes Jamba one of the most resource-efficient models out there, especially if you're working

06:16with limited hardware.

06:18Now, besides English, these models also support multiple languages, including Spanish, French,

06:23Portuguese, Italian, Dutch, German, Arabic, and Hebrew, which makes them super versatile

06:27for global applications.

06:29And here's a cherry on top, AI21 Labs made these models developer-friendly.

06:34Both Jamba 1.5 Mini and Large come with built-in support for structured JSON output function

06:40calling and even citation generation.

06:43This means you can use them to create more sophisticated AI applications that can perform

06:47tasks like calling external tools, digesting structured documents, and providing reliable

06:53references, all of which are super useful in enterprise settings.

06:57One of the coolest things about Jamba 1.5 is AI21 Labs' commitment to keeping these

07:02models open.

07:03They're released under the Jamba Open Model License, which means developers, researchers,

07:09and businesses can experiment with them freely.

07:11And with availability on multiple platforms and cloud partners like AI21 Studio, Google

07:16Cloud, Microsoft Azure, NVIDIA NIMH, and soon on Amazon Bedrock, Databricks Marketplace,

07:22and more, you've got tons of options for how you want to deploy and experiment with

07:27these models.

07:28Looking ahead, it's pretty clear that AI models that can handle extensive context windows

07:32are going to be a big deal in the future of AI.

07:35As Ordegan from AI21 Labs pointed out, these models are just better suited for complex,

07:41data-heavy tasks that are becoming more common in enterprise settings.

07:44They're efficient, fast, and versatile, making them a fantastic choice for developers and

07:49businesses looking to push the boundaries in AI.

07:52So if you haven't checked out Jamba 1.5 Mini or Large yet, now's the perfect time

07:56to dive in and see what these models can do for you.

07:59Alright, if you found this video helpful, smash that like button, hit subscribe, and

08:04stay tuned for more updates on the latest in AI tech.

08:07Thanks for watching, and I'll catch you in the next one.

New HYBRID AI Model Just Shocked The Open-Source World - JAMBA 1 .5

Category

Transcript

Recommended