AI News Roundup: OpenAI's New Model, Apple's AI Features, and More

From Thoughtful AI to Digital Creativity: This Week's Breakthroughs Reshaping the Future of Technology

Sep 15, 2024

OpenAI Unveils o1-Preview AI Models for Advanced Solutions: Guest Post by CryptoTale | CoinMarketCap

In this week's AI news roundup, we're covering some major developments from industry leaders like OpenAI and Apple, as well as exciting breakthroughs in robotics, game development, and content creation. Let's dive into the latest innovations shaping the world of artificial intelligence.

OpenAI Unveils O1-Preview Model

OpenAI has released its highly anticipated O1-preview model, marking a significant shift in their approach to AI development. This new model, previously known as QAR and Strawberry, represents a reset in OpenAI's naming convention. Instead of continuing with the GPT series (GPT-5, GPT-6, etc.), they've opted for a fresh start with OpenAI 1, suggesting future models may follow a similar naming scheme (OpenAI 2, OpenAI 3, and so on).

Key Points:

- Available to Pro and Enterprise plan users

- Focuses on advanced reasoning, mathematics, and logic

- Utilizes "Chain of Thought" prompting for more thoughtful responses

- Slower response time but potentially more accurate results

The O1-preview model introduces a novel approach to AI reasoning. When given a prompt, it spends more time "thinking" through its response, often taking 30 seconds or more to generate an answer. This process, visible to users through a dropdown menu, shows the AI's step-by-step thought process as it formulates its response.

Performance Improvements:

- 89th percentile on competitive programming questions

- Top 500 students in US Math Olympiad qualifiers

- Exceeds PhD-level accuracy on physics, biology, and chemistry problems

OpenAI has also released O1-mini, a more cost-effective and faster alternative to O1-preview. While it may not match the full capabilities of its larger counterpart, O1-mini aims to provide a balance between performance and efficiency.

Pricing and Accessibility:

The new models are currently only available to paid members, with plans to make O1-mini accessible to free users in the future. However, the pricing for API access has raised some concerns in the developer community due to its higher cost compared to existing options.

Apple's AI Features in New Devices

Apple's recent "Wonderlust" event, primarily focused on the iPhone 15 and Apple Watch Series 9, also shed light on the company's ongoing AI initiatives. While many of the AI features were previously announced at WWDC, the event provided more context on how these capabilities will be integrated into Apple's latest devices.

Key AI Features:

1. Email and document summarization and proofreading

2. Photo editing with background removal

3. Notification prioritization

4. Siri improvements with a new visual indicator

5. AI art generation in Notes app

New AI-Powered Capabilities:

- Apple Watch translation: Built-in AI translation for the Apple Watch

- Gesture control for AirPods: Nod or shake your head to respond to Siri

- Private cloud compute: Allows use of larger AI models while maintaining privacy

- Visual intelligence (coming in 2025): Provides contextual information about objects in photos

It's worth noting that many of these AI features won't be available immediately upon the release of the new devices. Apple plans to roll out these capabilities gradually through iOS updates, with some features not arriving until early 2025.

Adobe Firefly Text-to-Video Generation

Adobe has unveiled a new text-to-video generation feature for its Firefly AI tool. This development puts Adobe in direct competition with other text-to-video AI models like OpenAI's Sora.

Key Features:

- Generates 5-second video clips from text prompts

- Claims to use ethically sourced video content

- Trained on openly licensed, public domain, and Adobe Stock footage

Examples of generated videos include:

- A galaxy zooming out to reveal an eyeball

- Slow-motion footage of a volcanic landscape

- Stop-motion animation of an egg cooking in a frying pan

- Drone shots of landscapes and natural scenes

While the tool is not yet publicly available, the previews suggest impressive capabilities that could revolutionize video content creation for marketers, educators, and creatives.

Mistral's Pixtral 12B Multimodal Model

Mistral, known for both open-source and closed-source language models, has released Pixtral 12B, their first model capable of accepting images as input. This open-source multimodal model allows developers to build upon, iterate, and fine-tune it for various applications.

Google's Notebook LM and Audio Overview Feature

Google has introduced an intriguing new feature in its Notebook LM tool. This AI-powered notebook allows users to upload multiple documents and then engage in conversations about the content. The standout addition is the "Audio Overview" feature, which generates a podcast-style discussion based on the uploaded documents.

How it works:

1. Upload documents to Notebook LM

2. Click on the "Audio Overview" button

3. AI generates a conversation between two speakers discussing the document contents

This feature has proven effective even with complex scientific papers, making it a potentially valuable tool for researchers and students looking to grasp difficult concepts more easily.

Amazon's audiobook service 'Audible' launches AI voice replicas created by narrators - GIGAZINE

Amazon's AI Voice Cloning for Audible Narrators

Amazon is venturing into AI voice cloning for audiobook production on its Audible platform. The company is inviting a select group of Audible narrators to train AI-generated voice clones of themselves.

Key Points:

- Aims to speed up audiobook production

- Narrators can edit pronunciation and pacing of their AI voice replica

- Compensation will be based on a royalty share model

- Beta test to be extended to rights holders later this year

This move could significantly impact the audiobook industry, potentially allowing for faster production of audio content while maintaining the unique qualities of popular narrators.

Suno's New "Covers" Feature

AI music generation platform Suno has introduced a new feature called "Covers." This tool allows users to transform simple voice recordings or fully produced tracks into entirely new styles while preserving the original melody.

How it works:

1. Upload or record an audio clip

2. Choose the "Cover Song" option

3. AI generates multiple cover versions in different styles

The feature is currently available to paid Suno members, with a limited number of free cover generations per month.

Facebook and Instagram's AI Content Labeling Changes

Meta has announced changes to how AI-generated content is labeled on Facebook and Instagram. In response to user feedback, the platforms are making AI labels less prominent on AI-edited content.

Key Changes:

- AI information now requires clicking into a menu to view

- Aims to reduce frustration for users whose non-AI content was incorrectly labeled

Additionally, Meta admitted during a hearing in Australia that they are scraping users' photos and posts to train their AI systems, with no opt-out option available for public posts.

Roblox Builds Open-Source 3D AI Model, Adds Tech for Faster Game Loading | PCMag

Roblox's 3D Generative AI for Game Creation

Roblox is developing a 3D foundational model to power generative creation on its platform. This open-source, multimodal model will allow creators to generate 3D content using text, video, and prompts.

Key Features:

- Generate full 3D scenes from text descriptions

- Focuses on enabling more people to develop and create games

- Aims to enhance rather than replace the creative process

Cybever's 3D World Creation Platform

Cybever has unveiled a promising 3D world creation platform that allows users to generate and customize 3D environments through text prompts and drawing tools.

Features:

- Generate maps through text prompts

- Adjust terrain and world style

- Create 3D previews in less than a minute

- Add custom assets to the generated worlds

While the platform looks impressive in demos, its real-world performance remains to be seen.

Daz 3D's Character Generation Plugin

Daz 3D, in collaboration with Yellow 3D, has introduced a plugin for generating 3D character meshes from text prompts. This tool could significantly speed up the character creation process for game developers and 3D artists.

Features:

- Generate character meshes based on text descriptions

- Create diverse character types (e.g., warriors, aliens, vampires)

- Potentially includes texturing and clothing generation

Meshy v4 for 3D Object Generation

Meshy has released version 4 of its 3D object generation tool, which can create 3D models from text prompts or images. The tool offers various features for both free and paid users.

Key Features:

- Text-to-3D object generation

- Image-to-3D model conversion

- Multiple topology options (quad and triangle)

- Texturing capabilities

While the results can be impressive, some generated models may require additional refinement, particularly in areas like facial features.

PS5 Pro's AI Upscaling for Video Quality

Sony has announced the upcoming PS5 Pro, which will utilize AI for upscaling video quality in games. This feature aims to enhance visual fidelity without requiring extensive hardware upgrades.

Key Points:

- AI-powered upscaling for improved graphics

- No built-in disc drive (requires external drive for physical games)

- Priced at around $700

DeepMind's Dexterous Robots

DeepMind has made significant strides in robotics, demonstrating robots capable of performing complex, dexterous tasks:

1. Tying shoelaces

2. Hanging clothes on a hanger

3. Performing intricate repairs on other robots

These advancements bring us closer to robots that can perform everyday household tasks, potentially revolutionizing home automation and assistance for those with limited mobility.

Conclusion

The AI landscape continues to evolve at a rapid pace, with innovations spanning from natural language processing and image generation to robotics and game development. As these technologies become more sophisticated and accessible, we can expect to see increasingly transformative applications across various industries.

Key takeaways:

1. OpenAI's O1-preview model represents a shift towards more thoughtful AI responses

2. Apple is gradually integrating AI features across its device ecosystem

3. Text-to-video and 3D content generation tools are becoming more powerful and accessible

4. AI is enhancing creative processes in music, audiobooks, and game development

5. Robotic dexterity is improving, bringing us closer to more capable household robots

As we move forward, it will be crucial to balance the potential of these AI advancements with ethical considerations, particularly regarding data usage and content generation. The coming years promise to be an exciting time for AI development, with the potential to revolutionize how we work, create, and interact with technology.

The Week In AI

Discussion about this post