Apple avoids “AI” hype at WWDC keynote by baking ML into products

View non-AMP version at arstechnica.com

Amid impressive new products like the Apple Silicon Mac Pro and the Apple Vision Pro revealed at Monday's WWDC 2023 keynote event, Apple presenters never once mentioned the term "AI," a notable omission given that its competitors like Microsoft and Google have been heavily focusing on generative AI at the moment. Still, AI was a part of Apple's presentation, just by other names.

While "AI" is a very ambiguous term these days, surrounded by both astounding advancements and extreme hype, Apple chose to avoid that association and instead focused on terms like "machine learning" and "ML." For example, during the iOS 17 demo, SVP of Software Engineering Craig Federighi talked about improvements to autocorrect and dictation:

Autocorrect is powered by on-device machine learning, and over the years, we've continued to advance these models. The keyboard now leverages a transformer language model, which is state of the art for word prediction, making autocorrect more accurate than ever. And with the power of Apple Silicon, iPhone can run this model every time you tap a key.

Notably, Apple mentioned the AI term "transformer" in an Apple keynote. The company specifically talked about a "transformer language model," which means its AI model uses the transformer architecture that has been powering many recent generative AI innovations, such as the DALL-E image generator and the ChatGPT chatbot.

A transformer model (a concept first introduced in 2017) is a type of neural network architecture used in natural language processing (NLP) that employs a self-attention mechanism, allowing it to prioritize different words or elements in a sequence. Its ability to process inputs in parallel has led to significant efficiency improvements and powered breakthroughs in NLP tasks such as translation, summarization, and question-answering.

Apparently, Apple's new transformer model in iOS 17 allows sentence-level autocorrections that can finish either a word or an entire sentence when you press the space bar. It learns from your writing style as well, which guides its suggestions.

All this on-device AI processing is fairly easy for Apple because of a special portion of Apple Silicon chips (and earlier Apple chips, starting with the A11 in 2017) called the Neural Engine, which is designed to accelerate machine learning applications. Apple also said that dictation "gets a new transformer-based speech recognition model that leverages the Neural Engine to make dictation even more accurate."

During the keynote, Apple also mentioned "machine learning" several other times: while describing a new iPad lock screen feature ("When you select a Live Photo, we use an advanced machine learning model to synthesize additional frames"); iPadOS PDF features ("Thanks to new machine learning models, iPadOS can identify the fields in a PDF so you can use AutoFill to quickly fill them out with information like names, addresses, and emails from your contacts."); an AirPods Adaptive Audio feature ("With Personalized Volume, we use machine learning to understand your listening preferences over time"); and an Apple Watch widget feature called Smart Stack ("Smart Stack uses machine learning to show you relevant information right when you need it").

Apple also debuted a new app called Journal that allows personal text and image journaling (kind of like an interactive diary), locked and encrypted on your iPhone. Apple said that AI plays a part, but it didn't use the term "AI."

"Using on-device machine learning, your iPhone can create personalized suggestions of moments to inspire your writing," Apple said. "Suggestions will be intelligently curated from information on your iPhone, like your photos, location, music, workouts, and more. And you control what to include when you enable Suggestions and which ones to save to your Journal."

Finally, during the demo for the new Apple Vision Pro, the company revealed that the moving image of a user's eyes on the front of the goggles comes from a special 3D avatar created by scanning your face—and you guessed it, machine learning.

"Using our most advanced machine learning techniques, we created a novel solution," Apple said. "After a quick enrollment process using the front sensors on Vision Pro, the system uses an advanced encoder-decoder neural network to create your digital Persona."

An encoder-decoder neural network is a type of neural network that first compresses an input into a compressed numerical form called a "latent-space representation" (the encoder), and then reconstructs the data from the representation (the decoder). We're speculating, but the encoder part might analyze and compress facial data captured during the scanning process into a more manageable, lower-dimensional latent representation. Then, the decoder part might use that condensed information to generate its 3D model of the face.

During the WWDC keynote, Apple unveiled its most powerful Apple Silicon chip yet, the M2 Ultra, which features up to 24 CPU cores, 76 GPU cores, and a 32-core Neural Engine that reportedly delivers 31.6 trillion operations per second, which Apple says represents 40 percent faster performance than the M1 Ultra.

Interestingly, Apple directly said that this power might come in handy for training "large transformer models," which to our knowledge is the most prominent mention of AI in an Apple keynote (albeit only in passing):

And M2 Ultra can support an enormous 192GB of unified memory, which is 50% more than M1 Ultra, enabling it to do things other chips just can't do. For example, in a single system, it can train massive ML workloads, like large transformer models that the most powerful discrete GPU can't even process because it runs out of memory.

This development has some AI experts excited. On Twitter, frequent AI pundit Perry E. Metzger wrote, "Whether by accident or intent, the Apple Silicon unified memory architecture means high end Macs are now really amazing machines for running big AI models and doing AI research. There really aren't many other systems at this price point that offer 192GB of GPU accessible RAM."

Here, larger RAM means that bigger and ostensibly more capable AI models can fit in memory. The systems are the new Mac Studio (starting at $1,999) and the new Mac Pro (starting $6,999), which could potentially put AI training within reach of many new people—and in the form factor of desktop- and tower-sized machines.

Only rigorous evaluations will tell how the performance of these new M2 Ultra-powered machines will stack up against AI-tuned Nvidia GPUs like the H100. For now, it looks like Apple has openly thrown its hat into the generative-AI-training hardware ring.

View article comments