<![CDATA[Maginative]]>https://www.maginative.com/https://www.maginative.com/favicon.pngMaginativehttps://www.maginative.com/Ghost 5.82Sat, 18 May 2024 12:48:21 GMT60<![CDATA[ElevenLabs Unveils Audio Native: Automated, Human-like Narration for Websites]]>https://www.maginative.com/article/elevenlabs-unveils-audio-native-automated-human-like-narration-for-websites/664794bc4e08580001c7783eFri, 17 May 2024 17:40:45 GMT
Loading the Elevenlabs Text to Speech AudioNative Player...
ElevenLabs Unveils Audio Native: Automated, Human-like Narration for Websites

ElevenLabs has launched Audio Native, an embeddable audio player that automatically generates human-like narration for blog posts, news sites, and other web content. This innovative tool aims to enhance reader engagement and make content more accessible to a wider audience.

Audio Native uses ElevenLabs' text-to-speech technology to create an automated voiceover for any article, blog, or newsletter. The player is customizable, allowing users to select a default voice, customize the player's appearance, and even add a pronunciation dictionary for unique brand terms.

Setting up Audio Native is super simple and the company provides starter guides for popular CMS platforms. Overall, we were able to get Audio Native up and running within a few minutes. There are three ways to deploy it:

  1. Embed the player and let it automatically voice the site's content
  2. Embed audio from an existing ElevenLabs project
  3. Use the API to programmatically create an Audio Native player for existing content

After deploying Audio Native, users can track audience engagement through a built-in listener dashboard, which provides valuable metrics and insights. To use Audio Native, you must subscribe to a Creator plan or higher from ElevenLabs.

ElevenLabs Unveils Audio Native: Automated, Human-like Narration for Websites

While Audio Native is a valuable tool for enhancing content accessibility and engagement, the landscape of AI technology is rapidly evolving. As on-device AI models become more prevalent in laptops and mobile devices, and browsers like Chrome begin leveraging these models, we can expect solutions like Audio Native to transition to local implementations. This shift will likely provide even faster, more efficient, and more personalized user experiences.

]]>
<![CDATA[Google Introduces Frontier Safety Framework to Identify and Mitigate Future AI Risks]]>https://www.maginative.com/article/google-introduces-frontier-safety-framework-to-identify-and-mitigate-future-ai-risks/664789304e08580001c7781bFri, 17 May 2024 17:03:00 GMT

Google has announced the Frontier Safety Framework, a set of protocols designed to identify and mitigate potential harms from future AI systems. This framework aims to stay ahead of potential risks by putting in place mechanisms to detect and address them before they materialize.

The Frontier Safety Framework focuses on severe risks posed by advanced AI models, such as those with exceptional autonomy or sophisticated cyber capabilities. It is designed to complement Google's existing AI safety practices and alignment research, which ensures AI acts in accordance with human values.

The framework is built around three main components:

  1. Identifying Capabilities: Google will research how advanced AI models could potentially cause harm. They will define "Critical Capability Levels" (CCLs) that indicate the minimum capability a model must have to pose a severe risk. These CCLs guide the evaluation and mitigation approach.
  2. Evaluating Models: Google will periodically test their AI models to detect when they approach these critical capability levels. They will develop "early warning evaluations" to alert them before a model reaches a CCL.
  3. Mitigation Plans: When a model passes the early warning evaluations, Google will apply a mitigation plan. This plan will balance the benefits and risks of the model, focusing on security and preventing misuse of critical capabilities.

Initially, the framework focuses on four domains: autonomy, biosecurity, cybersecurity, and machine learning R&D. For each domain, Google has outlined specific CCLs and corresponding security and deployment mitigations.

For example, in the domain of autonomy, a critical capability might be an AI model that can autonomously acquire resources and sustain additional copies of itself. In cybersecurity, a critical capability might be a model that can automate opportunistic cyberattacks.

Research labs like OpenAI and Anthropic have also been investing in AI safety research. OpenAI released their Preparedness Framework last year, and recently outlined key security measures they believe are necessary to safeguard AI technology from misuse. Anthropic is also actively pursuing AI safety research across multiple fronts, including Mechanistic Interpretability, Scalable Oversight, Testing for Dangerous Failure Modes, and Societal Impacts and Evaluations. Collectively, these efforts indicate a growing recognition in the AI research community of the importance of proactively addressing potential risks associated with advanced AI systems.

Google's framework is exploratory and expected to evolve as they learn from its implementation and collaborate with industry, academia, and government. They aim to have the initial framework fully implemented by early 2025.

]]>
<![CDATA[ChatGPT's Data Analysis Capabilities Just got a Major Upgrade]]>https://www.maginative.com/article/chatgpts-data-analysis-capabalities-just-got-a-major-upgrade/6646987b4e08580001c77796Thu, 16 May 2024 23:53:20 GMT

OpenAI has announced significant improvements to data analysis in ChatGPT, powered by their new GPT-4o model. This update includes several features that streamline the data analysis process for users.

First, ChatGPT now allows direct file uploads from Google Drive and Microsoft OneDrive. This means users can access and analyze data stored in cloud services without downloading and uploading files, making the process more efficient and convenient.

Once the files are uploaded, users can interact with tables and charts in a new expandable view, enabling a more immersive analysis experience. The expandable view allows users to focus on the data and have a dynamic back-and-forth with ChatGPT, asking follow-up questions and digging deeper into the insights revealed.

Additionally, the ability to customize and download charts enhances the presentation-readiness of the output. Users can now select colors, hover over chart elements, and ask additional questions to refine their visualizations. This ensures that the charts are not just informative but also aesthetically pleasing and ready for use in professional presentations or documents.

These enhancements further establish ChatGPT as a powerful tool for data analysis, combining natural language processing with an intuitive understanding of datasets. By generating code and handling various data tasks, from merging datasets to creating charts, ChatGPT assists both beginners exploring data and experts seeking time-saving solutions for routine tasks.

As with all ChatGPT features, OpenAI emphasizes comprehensive security and privacy measures.

We don’t train on data from ChatGPT Team and Enterprise customers, and ChatGPT Plus users can opt out of training through their Data Controls.

The rollout of these data analysis improvements to ChatGPT Plus, Team, and Enterprise users over the coming weeks will undoubtedly be welcomed by professionals seeking to leverage the power of AI in their data-driven endeavors.

]]>
<![CDATA[Reddit Secures Partnership with OpenAI]]>https://www.maginative.com/article/openai-partners-with-reddit-to-enhance-ai-capabilities-and-user-experience/6646920d4e08580001c7777eThu, 16 May 2024 23:14:07 GMT

OpenAI and Reddit have announced a partnership that will bring Reddit's vast trove of user-generated content to OpenAI's products, including ChatGPT. This move follows a similar deal between Reddit and Google in February, highlighting the growing demand for diverse and timely data to train AI models.

Under the partnership, OpenAI will access Reddit's Data API, allowing its AI tools to better understand and showcase Reddit content, particularly on recent topics. This integration aims to provide OpenAI users with more relevant and up-to-date information, fostering improved human learning and community-building experiences.

In return, Reddit will leverage OpenAI's AI platform to introduce new AI-powered features for its users and moderators. The social platform also stands to benefit financially, as OpenAI will become a Reddit advertising partner.

Brad Lightcap, OpenAI's COO, expressed enthusiasm for the partnership, stating that it will "enhance ChatGPT with uniquely timely and relevant information" and "explore the possibilities to enrich the Reddit experience with AI-powered features." Reddit Co-Founder and CEO Steve Huffman echoed this sentiment, emphasizing the importance of a connected internet and the potential for AI to help users find more of what they're looking for.

This partnership comes at a time when AI companies are willing to pay significant sums for quality training data. With over 70 million daily active users and a diverse range of content, Reddit is well-positioned to capitalize on this demand.

Reddit Signs Deal With Google to License User Content
This deal leverages surging investor appetite for opportunities in the AI space. As models like ChatGPT and Anthropic continue to capture headlines, many startups are trying to leverage the AI boom in some capacity to inflate their value.
Reddit Secures Partnership with OpenAI

This partnership follows a similar agreement between Reddit and Google that was announced in February. Reddit has been leveraging its vast user-generated content to meet the growing demand for data to train AI systems. Both deals underscore Reddit’s strategy to monetize its data, providing valuable resources for AI companies while enhancing the user experience on its platform.

]]>
<![CDATA[New AI Jobs: Meet the Prompt Designer and the Prompt Engineer]]>https://www.maginative.com/article/new-ai-jobs-meet-the-prompt-designer-and-the-prompt-engineer/664609144e08580001c775f4Thu, 16 May 2024 19:28:55 GMT

With AI, it's easy to get lost in the jargon—especially when they become buzzwords that people throw around arbitrarily. One term you may have seen a lot lately is "prompt engineering." Has a prompt engineer popped up in your social media feed yet, telling you how 95% of people are using ChatGPT wrong?

I often see AI influencers and self-proclaimed AI experts calling themselves "prompt engineers" or claiming to practice "prompt engineering." But, is it? What does the term even mean? In my experience, when most people say prompt engineering, they are actually talking about prompt design.

Of course, we are at a very nascent stage in this new AI era, and it's exciting to see new jobs and emerging roles. However, it's crucial to have clear definitions and a shared understanding of these terms. Accurately distinguishing between prompt design and prompt engineering not only improves communication but also helps in hiring the right talent and ensuring everyone in the workplace is on the same page.

In this first installment of a two-part series, we'll clarify the terms 'prompt design' and 'prompt engineering.' Stay tuned for our next article, where we'll dive deeper into the job descriptions and responsibilities of these exciting new AI roles.

Prompt Design

Simply put, prompt design is writing an effective prompt. It's about creating clear and structured instructions, often including specific words, context, input data, and examples, to guide language models (like ChatGPT, Gemini, Claude, et al.) towards the desired output.

Prompt design requires a creative and intuitive understanding of language, psychology, and communication. There are many techniques and styles that a promot designer can use, and every model is unique. For example, people have been able to get better responses by telling a model it will be rewarded, threatening it, or asking it to take a deep breath. My favorite—if you ask a model to respond as if they are a Star Trek character, apparently they are more accurate at math.

Importantly, I'll stress that it's called prompt design because writing an effective prompt is a deliberate process. But how do you know how effective a prompt is? Which technique should you use? What changes will make your output better? This is where prompt engineering comes it.

Prompt Engineering

Prompt engineering is the science of optimizing prompts through rigorous testing and iteration. It's an empirical process that involves developing evaluations, testing prompts against those evaluations, analyzing results, and refining the prompts accordingly. Here’s how it works:

  1. Evaluation Development: Before writing prompts, prompt engineers create a strong set of evaluation criteria. These benchmarks will establish what a good response looks like.
  2. Testing: Prompt engineers then test prompts against these criteria, seeing how well the model performs.
  3. Iteration: Based on the test results, prompts are tweaked and tested again. This cycle repeats until the desired performance is achieved.

The majority of time in prompt engineering is spent on creating robust evaluations and iterating based on the findings, not on writing the prompts themselves.

Summary

So, while prompt design and prompt engineering are closely related, they are not the same. In summary:

  • Prompt design focuses on creating detailed and specific instructions to elicit desired responses. It's a blend of creativity and technical know-how.
  • Prompt engineering involves the empirical testing and iteration of prompts to optimize their performance. It's more about the process and methodology than just writing prompts.

Think of it this way: if prompt design is about writing the perfect recipe for cooking Jamaican jerk chicken, prompt engineering is about testing that recipe, tasting the chicken, and adjusting the ingredients until you achieve the ideal flavor profile.

So, the next time you hear someone mention "prompt engineering," take a moment to consider whether they're actually referring to the design process or the iterative optimization cycle. By understanding and using these terms correctly, we can have more precise, productive conversations about this exciting frontier of AI development.

In the next article, we'll cover job descriptions and the responsibilities associated with both roles.

]]>
<![CDATA[Google Search is Getting an AI Makeover]]>https://www.maginative.com/article/google-search-is-getting-an-ai-makeover/66466c1e4e08580001c7771dWed, 15 May 2024 22:01:00 GMT

Google has gone ahead and done it! After 25 years, they are reimagining the core Google search experience around generative AI. At its annual I/O developer conference, the tech giant unveiled a range of new features that leverage AI to simplify and enhance the search process, offering users an "AI-first" approach to finding information. Of course, this raises a lot of questions about how it will impact web traffic and content creators.

At the core of the redesign are AI-generated summaries (Search Generative Experiences), or "AI Overviews," that appear at the top of search results. These summaries provide a quick snapshot of key information on a topic, with links to relevant sources, allowing users to dig deeper if desired. These summaries leverage Google's Gemini model, which combines the company's Knowledge Graph with advanced AI capabilities.

The new AI capabilities will enable users to perform more complex searches and planning tasks. For instance, users can ask detailed questions about local services or create meal plans with specific requirements. These features aim to save users time and effort by providing organized and relevant information quickly.

For users that are looking for the traditional search experience, the feature has been relegated to a "Web" tab under the search settings.

The shift to AI-driven search has raised concerns about the impact on web traffic and advertising revenue. AI Overviews could reduce the number of clicks on traditional search results, potentially affecting website traffic and ad revenue. Despite these concerns, Google believes the benefits of AI will lead to more searches and engagement.

However, Liz Reid, head of Search at Google, argues that early data shows AI Overviews could actually lead to more clicks on the open web, particularly for websites that offer unique perspectives or expertise. Google says they aim to strike a balance between providing helpful AI-generated summaries and directing users to valuable content across the web.

Ultimately, Google seems very committed to its AI-first approach, believing it will ultimately benefit users by making search more intuitive, efficient, and accessible. The company is promising even more AI agentic capabilities, "Soon, Google will do the searching, simplifying, researching, planning, brainstorming and so much more."

Starting this week, AI Overviews will be available to all users in the U.S., with plans to expand globally by the end of the year. Google expects AI Overviews to be available to over a billion users by the end of the year.

]]>
<![CDATA[Instagram Co-Founder Mike Krieger Joins Anthropic as Chief Product Officer]]>https://www.maginative.com/article/instagram-co-founder-mike-krieger-joins-anthropic-as-chief-product-officer/66450b5877bb5700013f81d3Wed, 15 May 2024 21:51:26 GMT

Anthropic has announced that Mike Krieger will be its new Chief Product Officer. Krieger co-founded Instagram with Kevin Systrom in 2010, which was subsequently acquired by Meta.

He brings valuable experience to Anthropic, having scaled Instagram to over a billion users and grown its engineering team to more than 450 people. Most recently, he co-founded Artifact, an AI news app that was acquired by Yahoo.

In a post on X, Krieger expressed his excitement about joining Anthropic, praising the team's exceptional talent, empathy, and pragmatism. He says he sees immense potential in pairing Anthropic's cutting-edge AI research with thoughtful product development to positively impact how people and companies work.

As Chief Product Officer, Krieger will oversee Anthropic's product engineering, management, and design efforts. His primary focus will be expanding the company's suite of enterprise applications and making their AI assistant, Claude, accessible to a broader audience.

Anthropic CEO Dario Amodei welcomed Krieger, "Mike's background in developing intuitive products and user experiences will be invaluable as we create new ways for people to interact with Claude, particularly in the workplace."

Krieger's appointment comes at a crucial time for Anthropic, as the company recently released the Claude app for iOS and announced support for Spanish, French, Italian, and German. With Krieger's expertise, Anthropic aims to accelerate its product development and compete with established AI giants in the industry.

]]>
<![CDATA[Project IDX, Google's Cloud-Based IDE Now in Open Beta]]>https://www.maginative.com/article/project-idx-googles-cloud-based-ide-now-in-open-beta/6644953177bb5700013f8148Wed, 15 May 2024 11:58:35 GMT

Google unveiled major updates to Project IDX at its recent I/O developer conference this week. Project IDX is their next-generation development environment, accessible through your web browser, and designed to streamline the app-building process with the help of artificial intelligence.

The biggest news? Project IDX is now in open beta, meaning anyone with a Google account can sign up and start using it for free. Previously, it was invite-only.

The open beta introduces several new integrations, including Google Maps Platform for adding geolocation features, Chrome Dev Tools and Lighthouse for seamless debugging, and upcoming support for deploying apps to Cloud Run, Google's serverless platform. IDX will also integrate with Checks, Google's AI-driven compliance platform, which is transitioning from beta to general availability.

Project IDX, Google's Cloud-Based IDE Now in Open Beta

Project IDX isn't just about building AI-enabled applications; it's also about leveraging AI within the coding process itself. The IDE offers standard features like code completion and a chat assistant sidebar, as well as innovative capabilities like using Google's Gemini model to modify code snippets, similar to generative fill in Photoshop. Importantly, whenever Gemini suggests code, it links back to the original source and its associated license.

Project IDX is built on the open-source Visual Studio Code foundation, which means that most developers will instantly be familiar with it. It also integrates with GitHub for seamless version control.

It is worth noting that Project IDX is entering a competitive market dominated by established players and innovative startups. Cursor offers novel AI functionalities with a focus on code completion and debugging. Devin is an "AI software engineer" from Cognition AI that can autonomously complete complex coding projects. And of course, Microsoft is leading the space with their GitHub Copilot, which offers the most comprehensive and integrated AI solutions within the familiar GitHub workflow.

There are several reasons why Google is now prioritizing its own AI-powered development environment. Firstly,control over the development experience allows for deeper integration with Google Cloud services, potentially creating a more seamless workflow for developers building cloud-based applications. Secondly, Project IDX serves as a platform for Google to showcase its advancements in AI, particularly its Gemini model. Finally, an open and successful Project IDX could attract more developers to the Google ecosystem, fostering innovation and potentially leading to a future where Google Cloud becomes the go-to platform for AI-powered development.

While Google has some catching up to do, its vast resources and expertise in AI put it in a strong position to quickly evolve and enhance Project IDX. Today's open beta launch marks a significant step forward for the company. Whether it becomes the go-to platform remains to be seen, but it certainly offers a feature-rich and innovative option for developers seeking to leverage AI in their workflows.

Head over to the Project IDX website and sign up with your Google account to start building your next app. 

]]>
<![CDATA[Google is Building Gemini Nano into Chrome]]>https://www.maginative.com/article/google-is-building-gemini-nano-into-chrome/66442ddf222e440001837eb0Wed, 15 May 2024 10:57:28 GMT

Google is bringing its small but powerful Gemini Nano large language model directly into Chrome desktops. This means starting with Chrome 126, developers and websites can leverage AI features without needing to build their own models.

For developers, this is a game-changer. Imagine adding features like summarizing webpages, translating text, or even helping users write content – all without the hassle of managing and updating large AI models. Chrome will handle that for them. This translates to faster development, lower costs, and a wider range of AI-powered features for Chrome users.

Additionally, this integration offers several benefits for web developers, including ease of deployment, access to hardware acceleration, and the ability to process sensitive data locally. On-device AI can also provide a snappier user experience, greater access to AI features, and offline AI usage.

Google is Building Gemini Nano into Chrome

Gemini Nano, the most compact model in the Gemini family of LLMs, is designed to run locally on most modern desktop and laptop computers. Google's recent investments in WebGPU and WebAssembly (WASM) support in Chrome have enabled these models to run efficiently on a wide range of hardware.

Developers will be able to access built-in AI capabilities primarily through task APIs, such as translation, summarization, and categorization. Google plans to provide exploratory APIs for local prototyping and soliciting feedback on potential future task APIs.

And that is just the beginning. Google is offering experimental features that could let developers do even more. Imagine a Chrome DevTools that uses AI to explain coding errors and even suggest solutions – a developer's dream come true!

While Google is aiming to make Chrome the go-to platform for AI on the web. They're working with other browser companies to potentially bring similar features to other browsers too. This could open doors for a new wave of AI-powered web experiences for everyone.

With the combination of WebGPU, WebAssembly, and Gemini built into Chrome, Google believes the web is now AI-ready. Developers can join the early preview program to experiment with early-stage built-in AI APIs and help shape the future of AI in Chrome.

]]>
<![CDATA[OpenAI Co-Founder and Chief Scientist Ilya Sutskever Departs, Jakub Pachocki Named Successor]]>https://www.maginative.com/article/openai-co-founder-and-chief-scientist-ilya-sutskever-departs-jakub-pachocki-named-successor/66441942222e440001837e4eWed, 15 May 2024 03:15:18 GMT

Ilya Sutskever, one of the original co-founders of OpenAI and its chief scientist, has announced his departure from the company. This news comes just months after a dramatic leadership dispute that saw the brief ouster of CEO Sam Altman. However, the news has been met with an outpouring of appreciation, gratitude, and warm wishes from his colleagues and the AI community at large.

In his announcement, Sutskever reflected on his time at OpenAI, expressing confidence in the company's trajectory and the leadership of CEO Sam Altman, CTO Mira Murati, and now, Chief Scientist Jakub Pachocki. Sutskever's post struck a sentimental tone, indicating his honor and privilege to have worked with the team and his fond farewell.

Sutskever also shared a photo with himself, Sam Altman, Greg Brockman, Mira Murati and Jakub Pachocki (who will be taking over as Chief Scientist). The photo was likely aimed at dismissing rumors of any animosity or discord between the them in light of the resignation.

Turmoil at OpenAI: CEO Ousted, President Quits, and Key Employees Depart
OpenAI’s board faces a crisis of credibility that threatens to undermine the organization’s mission. While change may have been needed, the chaotic, opaque nature of Friday’s events raised red flags.
OpenAI Co-Founder and Chief Scientist Ilya Sutskever Departs, Jakub Pachocki Named Successor

For context, Sutskever's departure comes after a tumultuous period at OpenAI, where he played a key role in the controversial removal of Altman last year. Sutskever later expressed regret for his actions and threatened to quit if the board did not resign. While Altman was reinstated, Sutskever's position at the company remained uncertain, with sources indicating internal debates about his future role.

This marks the second high-profile departure OpenAI has seen this year. In February, Andrej Karpathy, a founding member and prominent researcher at OpenAI, confirmed his exit from the company. Earlier this month, two other executives also resigned, including VP of People Diane Yoon and Head of Nonprofit and Strategic Initiatives Chris Clark.

Sutskever's exit is a significant loss for OpenAI, as he was instrumental in shaping the company's direction and research initiatives. His contributions to the field of AI are well-recognized, dating back to his work on neural networks at the University of Toronto and his stint at Google Brain. His work at OpenAI included heading the Superalignment team, which focused on ensuring AI safety by reserving computing power to manage AI risks. While details about Sutskever's next venture remain undisclosed, he hinted at a personally meaningful project that he will share more about in due course. 

Ultimately, his departure marks the end of an era at OpenAI. His co-founder, Greg Brockman, reminisced about their early days, spending countless hours shaping the company's culture, technical direction, and strategy. Brockman credited Sutskever's infectious artistry and gusto for helping him understand the field of AI when he was just starting out.

Jakub Pachocki, the incoming Chief Scientist, also expressed his gratitude for Sutskever's mentorship and collaboration over the years. He credited Sutskever with introducing him to the world of deep learning research and praised his incredible vision, which has been foundational to the field of AI.

Altman expressed his sadness at Sutskever's departure, describing him as "easily one of the greatest minds of our generation, a guiding light of our field, and a dear friend." Altman emphasized Sutskever's warmth and compassion, qualities that are less known but equally important to his brilliance and vision.

The consistent theme across all messages from the leadership team is one of immense gratitude and ongoing commitment to the mission that Sutskever helped to define. The positive reactions to his departure illustrate a collective belief in OpenAI's enduring mission and the solid foundations laid by its original leaders.

]]>
<![CDATA[Google Unveils LearnLM: AI Models Tailored for Enhanced Learning Experiences]]>https://www.maginative.com/article/google-unveils-learnlm-ai-models-tailored-for-enhanced-learning-experiences/6644059a222e440001837df9Wed, 15 May 2024 01:23:46 GMT

Google has introduced LearnLM, a new family of models based on Gemini and fine-tuned for learning and education. The goal is to make Google products, such as Search, Gemini, and YouTube, more interactive, personalized, and engaging for learners.

LearnLM is underpinned by educational research and tailored to how people learn. The models are designed to inspire active learning, manage cognitive load, adapt to learners' needs, stimulate curiosity, and deepen metacognition. If you are an educator, I highly recommend that you take a look at their technical report. They detail their approach and highlight how they are working with the AI and EdTech communities to maximize the positive impact of generative AI in education.

Google is already integrating LearnLM technology into its products to help users to deepen their understanding of complex topics. For example, Google Search will soon allow users to adjust AI Overviews into more useful formats, while Android's Circle to Search will help solve complex math and physics problems. Gemini Gems will allow users to create a custom AI "expert" on any topic that can act as a learning coach to provide personalized study guidance. Additionally, YouTube will feature a conversational AI tool, enabling users to ask clarifying questions or take quizzes during educational videos.

Google is also using LearnLM to develop AI experiences for schools. In Google Classroom, a pilot program is helping teachers with lesson planning, giving them more time to focus on teaching.

In addition to LearnLM, Google is introducing two experimental tools that push the boundaries of learning even further. The first tool, Illuminate, breaks down complex research papers into engaging, bite-sized audio conversations. In minutes, it generates audio with two AI voices discussing the key insights from these papers. Google says users will soon be able to ask follow-up questions also. I can't wait to try this on all the Arxiv papers that I read daily!

The second tool, Learn About, is an immersive Labs experience that seeks to transform information into understanding. By combining high-quality content, learning science principles, and interactive chat experiences, Learn About guides users through any topic at their own pace. Learners can upload files, take notes, and ask clarifying questions along the way, creating a truly personalized and enriching learning journey.

Google is collaborating with institutions like MIT RAISE, Columbia Teachers College, Arizona State University, NYU Tisch, and Khan Academy to improve and extend LearnLM beyond its own products. The company has extended an open invitation for interested parties to work with them in defining educational benchmarks and exploring the possibilities of applying generative AI to teaching and learning.

]]>
<![CDATA[A First Look at Project Astra, Google's Vision for The Future of AI Assistants]]>https://www.maginative.com/article/project-astra-is-googles-vision-for-the-future-of-ai-assistants/6643e9d9222e440001837d9cWed, 15 May 2024 00:10:32 GMT

One of the more impressive demos at Google I/O this year was Project Astra—a real-time, multimodal AI assistant that can see the world, understand context, and respond to user queries in a natural, conversational manner. Powered by Google's Gemini 1.5 model and other task-specific models, Astra processes video and speech input continuously, enabling it to understand and remember its surroundings.

In the demo, a Google employee showed off Astra's capabilities using just a smartphone camera. The AI assistant effortlessly identified objects, answered questions about code snippets, and even recognized the King's Cross area of London from looking out the window. But that's not all, the demo also showed Astra being used with smart glasses—hinting at potentially renewed hardware ambitions.

Demis Hassabis, head of Google DeepMind, emphasized that the goal is to create a "universal assistant" that is helpful in everyday life. "Imagine agents that can see and hear what we do, better understand the context we're in and respond quickly in conversation, making the pace and quality of interactions feel much more natural," he said.

In another impressive demo on X, Google shared a video of Project Astra "watching" the keynote alongside an employee. This suggests that users can expect some sort of desktop integration down the road.

Google says it plans to bring Project Astra to the Gemini app and its other products, starting later this year. While there's no specific launch date yet, Google seems committed to making these capabilities available. After all, CEO Sundar Pichai is calling Astra their "vision for the future of AI assistants."

For those attending I/O in person, Google provided a Project Astra demo station to try out some of the new capabilities. Check out CNET’s Lexy Savvides hands-on video below:

P.S. If you are wondering Astra stands for "advanced seeing and talking responsive agent".

]]>
<![CDATA[Google Unveils Veo: An Advanced AI Video Generation Model]]>https://www.maginative.com/article/google-unveils-veo-an-advanced-ai-video-generation-model/6643c07c222e440001837d29Tue, 14 May 2024 22:02:50 GMT

At I/O 2024, Google introduced Veo, its latest and most advanced generative AI video model. Veo is capable of generating high-quality 1080p videos that exceed 60 seconds in length. This is essentially their answer to Sora, which OpenAI unvieled in February.

One of Veo's key strengths is its understanding of natural language and visual semantics. It can interpret complex text prompts, accurately grasping the nuance and tone of a phrase, and then generate video content that closely aligns with the creator's vision. This includes the ability to interpret and implement cinematic terms and techniques, such as "timelapse" or "aerial shots," offering an unprecedented level of creative control to users.

The model also ensures consistency and coherence in the generated footage. People, animals, and objects move realistically and maintain their integrity throughout the shots, creating a smooth and immersive viewing experience.

Videos created by Veo are watermarked using SynthID, Google's tool for identifying AI-generated content. It's not yet known if Veo will support the emerging C2PA metadata standard. Google says the model also undergoes safety filters and memorization checking processes to mitigate privacy, copyright, and bias risks.

Veo is not yet publicly available—there is a waitlist that you can signup for if you are interested. The company also plans to integrate some of Veo's capabilities into YouTube Shorts and other products in the future, offering exciting new possibilities for content creation and storytelling.

How does Veo compare to OpenAI's Sora? Well, they're pretty close—both models can generate videos over 60 seconds in length with high-quality visuals and temporal continuity. However, based on the examples Google has shared, Veo's realism doesn't quite match that of Sora. Here is an example of similar content from both of them for reference.

Veo

Prompt: An aerial shot of a lighthouse standing tall on a rocky cliff, its beacon cutting through the early dawn, waves crash against the rocks below

Sora

Prompt: Drone view of waves crashing against the rugged cliffs along Big Sur’s garay point beach. The crashing blue waters create white-tipped waves, while the golden light of the setting sun illuminates the rocky shore. A small island with a lighthouse sits in the distance, and green shrubbery covers the cliff’s edge. The steep drop from the road down to the beach is a dramatic feat, with the cliff’s edges jutting out over the sea. This is a view that captures the raw beauty of the coast and the rugged landscape of the Pacific Coast Highway.

What do you think?

]]>
<![CDATA[Google Expands SynthID to Watermark AI-Generated Text and Video]]>https://www.maginative.com/article/google-expands-synthid-to-watermark-ai-generated-text-and-video/6644bf0e77bb5700013f818fTue, 14 May 2024 21:15:00 GMT

Google has expanded the capabilities of SynthID to now include watermarking AI-generated text and video content. This move comes as the tech giant aims to address the potential harms and ethical concerns surrounding generative AI, particularly the risk of misinformation and phishing attempts.

SynthID, first introduced last year, is a digital toolkit designed to unobtrusively watermark AI-generated content, providing a way to identify its origin. The latest update applies this technology to text generated through the Gemini app and web experience and videos created with Veo, Google's advanced generative video model.

Google Expands SynthID to Watermark AI-Generated Text and Video
A piece of text generated by Gemini with the watermark highlighted in blue.

Google's approach to watermarking AI-generated text is designed to work with most large language models and can be deployed at scale. It works by subtly adjusting the probability scores of tokens, which are the building blocks of generated text, without impacting the quality or creativity of the output. These adjustments create a unique pattern of scores that can be used to identify AI-generated content, even when it is mildly paraphrased, or modified.

This technique is most effective for longer and more diverse text generations, such as essays or scripts, and may be less accurate for short, factual responses. It is important to note that SynthID is not a perfect solution and may struggle when text is heavily rewritten or translated. However, it offers a promising approach to identifying AI-generated text that can be combined with other detection methods to improve reliability.

For videos, the process involves embedding a digital marker into the pixels of each video frame. This approach, inspired by their image watermarking tool, ensures that even as video generation technologies evolve, the origin of AI-created videos can be identified.

While SynthID is not a complete solution for identifying AI-generated content, it is an essential building block for developing more reliable detection tools. Digital watermarking tools provide an additional layer of trust and enables users to make informed decisions about the content they engage with online.

Google says it plans to open-source SynthID text watermarking  and publish a research paper later this year.

]]>
<![CDATA[OpenAI Announces ChatGPT App Coming to Mac]]>https://www.maginative.com/article/openai-announces-chatgpt-app-coming-to-mac/66429811222e440001837c5dTue, 14 May 2024 02:01:02 GMT

Today, OpenAI unveiled their latest multimodal model, GPT-4o, which offers improved speed and enhanced capabilities across text, voice, and vision. Their new model is now being rolled out to ChatGPT, bringing advanced features to both free and paid users.

One of the standout features of the new GPT-4o model is its ability to understand and discuss images shared by users. For example, you can take a picture of a menu in a foreign language and ChatGPT will translate it, provide insights into the food's history, and offer recommendations. In the future, OpenAI says GPT-4o will enable even more natural, real-time voice conversations and the ability to converse via live video.

OpenAI Unveils GPT-4o, A New State-of-the-Art Multimodal AI Model
GPT-4o is an update to OpenAI’s previous GPT-4 model, which was launched just over a year ago. The latest iteration improves capabilities across text, vision, and audio, and is said to be much faster.
OpenAI Announces ChatGPT App Coming to Mac

To complement the new model, OpenAI is also rolling out new tools and updated experiences. Firstly, ChatGPT is getting a new, friendlier look and feel with an updated home screen, message layout, and more. These changes aim to make the user experience more conversational and intuitive.

Secondly, OpenAI has announced a new ChatGPT desktop app for macOS. Users can now access ChatGPT with a universal keyboard shortcut (Option + Space) from anywhere on their Mac. You can ask questions, discuss screenshots, and even have voice conversations directly from your desktop.

The app also enables voice conversations with ChatGPT. Simply tap the headphone icon, and you can start discussing ideas, preparing for interviews, or even brainstorming with ChatGPT. However, this feature currently uses the older Voice Mode, with plans to incorporate GPT-4o's new audio and video capabilities in the future.

The macOS app is being rolled out to Plus users first, with broader availability in the coming weeks. A Windows version is planned for later this year.

]]>