New HYBRID AI Model Just Shocked The Open-Source World - JAMBA 1 .5

  • last month
AI 21 Labs has released two new open - source AI models Jamba 1.5 Mini and Jamba 1.5 large, featuring a unique hybrid architecture called SSM-Transf ormer that Combine the strength of Transformers and structured state space modelS.
Transcript
00:00So, AI21Labs, the brains behind the Jurassic language models, has just dropped two brand
00:08new open source LLMs called Jamba 1.5 Mini and Jamba 1.5 Large.
00:13And these models are designed with a unique hybrid architecture that incorporates cutting-edge
00:17techniques to enhance AI performance.
00:20And since they're open source, you can try them out yourself on platforms like HuggingFace
00:25or run them on cloud services like Google Cloud Vertex AI, Microsoft Azure, and NVIDIA
00:31NIMH.
00:32Definitely worth checking out.
00:33All right, so what's this hybrid architecture all about?
00:36Okay, let's break it down in simple terms.
00:39Most of the language models you know, like the ones used in ChatGPT, are based on the
00:44transformer architecture.
00:46These models are awesome for a lot of tasks, but they've got this one big limitation.
00:51They struggle when it comes to handling really large context windows.
00:55Think about when you're trying to process a super long document or a full transcript
01:00from a long meeting.
01:01Regular transformers get kind of bogged down because they have to deal with all that data
01:06at once.
01:07And that's where these new Jamba models from AI21Labs come into play with a totally new,
01:13game-changing approach.
01:14So AI21 has cooked up this new hybrid architecture they're calling the SSM transformer.
01:20Now what's cool about this is it combines the classic transformer model with something
01:24called a structured state space model, or SSM.
01:28The SSM is built on some older, more efficient techniques like neural networks and convolutional
01:33neural networks.
01:35Basically these are better at handling computations efficiently.
01:38So by using this mix, the Jamba models can handle much longer sequences of data without
01:43slowing down.
01:44That's a massive win for tasks that need a lot of context, like if you're doing some
01:48complex generative AI reasoning or trying to summarize a super long document.
01:53Now why is handling a long context window such a big deal?
01:57Well, think about it.
01:58When you're using AI for real-world applications, especially in businesses, you're often dealing
02:03with complex tasks.
02:05Maybe you're analyzing long meeting transcripts, or summarizing a giant policy document, or
02:09even running a chatbot that needs to remember a lot of past conversations.
02:14The ability to process large amounts of context efficiently means these models can give you
02:19more accurate and meaningful responses.
02:21Or Dagan, the VP of product at AI21 Labs, actually nailed it when he said an AI model
02:27that can effectively handle long context is crucial for many enterprise generative AI
02:33applications.
02:34And he's right.
02:35Without this ability, AI models often tend to hallucinate or just make stuff up because
02:39they're missing out on important information.
02:42But with the Jamba models and their unique architecture, they can keep more relevant
02:47info in memory, leading to way better outputs and less need for repetitive data processing.
02:52And you know what that means.
02:54Better quality and lower cost.
02:55Alright, let's get into the nuts and bolts of what makes this hybrid architecture so
02:59efficient.
03:00So there's one part of the model called Mamba, which is actually very important.
03:04It's developed with insights from researchers at Carnegie Mellon and Princeton, and it has
03:08a much lower memory footprint and a more efficient attention mechanism than your typical
03:13transformer.
03:14This means it can handle longer context windows with ease.
03:17Unlike transformers, which have to look at the entire context every single time, slowing
03:21things down, Mamba keeps a smaller state that gets updated as it processes the data.
03:27This makes it way faster and less resource intensive.
03:30Now, you might be wondering, how do these models actually perform?
03:33Well, AI21 Labs didn't just hype them up, they put them to the test.
03:38They created a new benchmark called Ruler to evaluate the models on tasks like multi-hop
03:44tracing, retrieval, aggregation, and question answering.
03:47And guess what?
03:48The Jamba models came out on top, consistently outperforming other models like Lama 3.170B,
03:53Lama 3.1405B, and Mistral Large 2.
03:57On the Arena Hard benchmark, which is all about testing models on really tough tasks,
04:02Jamba 1.5 Mini and Large outperformed some of the biggest names in AI.
04:07Jamba 1.5 Mini scored an impressive 46.1, beating models like Mistral 8, X22B, and Command
04:15R+, while Jamba 1.5 Large scored a whopping 65.4, outshining even the big guns like Lama
04:213.170B and 405B.
04:25One of the standout features of these models is their speed.
04:29In enterprise applications, speed is everything.
04:32Whether you're running a customer support chatbot or an AI-powered virtual assistant,
04:37the model needs to respond quickly and efficiently.
04:40The Jamba 1.5 models are reportedly up to 2.5 times faster on long contexts than their
04:45competitors, so not only are they powerful, but they're also super practical for high-scale
04:50operations.
04:51And it's not just about speed.
04:53The Mamba component in these models allows them to operate with a lower memory footprint,
04:57meaning they're not as demanding on hardware.
04:59For example, Jamba 1.5 Mini can handle context lengths up to 140,000 tokens on a single GPU.
05:06That's huge for developers looking to deploy these models without needing a massive infrastructure.
05:12Here's where it gets even cooler.
05:13To make these massive models more efficient, AI21 Labs developed a new quantization technique
05:18called Experts Int 8.
05:21Now I know that might sound a bit technical, but here's the gist of it.
05:25Quantization is basically a way to reduce the precision of the numbers used in the model's
05:31computations.
05:32This can save on memory and computational costs without really sacrificing quality.
05:37Experts Int 8 is special because it specifically targets the weights in the mixture of experts,
05:42or MoE layers, of the model.
05:44These layers account for about 85% of the model's weights in many cases.
05:48By quantizing these weights to an 8-bit precision format and then dequantizing them directly
05:54inside the GPU during runtime, AI21 Labs managed to cut down the model size and speed
06:00up its processing.
06:02The result?
06:03Jamba 1.5 Large can fit on a single 8-GPU node while still using its full context length
06:09of 256k.
06:11This makes Jamba one of the most resource-efficient models out there, especially if you're working
06:16with limited hardware.
06:18Now, besides English, these models also support multiple languages, including Spanish, French,
06:23Portuguese, Italian, Dutch, German, Arabic, and Hebrew, which makes them super versatile
06:27for global applications.
06:29And here's a cherry on top, AI21 Labs made these models developer-friendly.
06:34Both Jamba 1.5 Mini and Large come with built-in support for structured JSON output function
06:40calling and even citation generation.
06:43This means you can use them to create more sophisticated AI applications that can perform
06:47tasks like calling external tools, digesting structured documents, and providing reliable
06:53references, all of which are super useful in enterprise settings.
06:57One of the coolest things about Jamba 1.5 is AI21 Labs' commitment to keeping these
07:02models open.
07:03They're released under the Jamba Open Model License, which means developers, researchers,
07:09and businesses can experiment with them freely.
07:11And with availability on multiple platforms and cloud partners like AI21 Studio, Google
07:16Cloud, Microsoft Azure, NVIDIA NIMH, and soon on Amazon Bedrock, Databricks Marketplace,
07:22and more, you've got tons of options for how you want to deploy and experiment with
07:27these models.
07:28Looking ahead, it's pretty clear that AI models that can handle extensive context windows
07:32are going to be a big deal in the future of AI.
07:35As Ordegan from AI21 Labs pointed out, these models are just better suited for complex,
07:41data-heavy tasks that are becoming more common in enterprise settings.
07:44They're efficient, fast, and versatile, making them a fantastic choice for developers and
07:49businesses looking to push the boundaries in AI.
07:52So if you haven't checked out Jamba 1.5 Mini or Large yet, now's the perfect time
07:56to dive in and see what these models can do for you.
07:59Alright, if you found this video helpful, smash that like button, hit subscribe, and
08:04stay tuned for more updates on the latest in AI tech.
08:07Thanks for watching, and I'll catch you in the next one.

Recommended