9/24/2024

NEURAL -ARCHITECTURE- NUANCE : REVISION GLOBAL ESSAY


 

GALAXY BRAINS : In 2022 researchers at Stanford University published a modified version of the '' attention algorithm, '' which allows LLMS to learn connections between words and ideas.

The idea was to modify the code to take account of what is happening on the chip that is running it, and especially to keep track of when a given piece of information needs to be looked up or stored.

Their algorithm was able to speed up the training of GPT-2, an older large language model, threefold. It also gave it the ability to respond to longer queries.

Sleeker code can also come from better tools. Earlier in 2023, Meta released an updated version of  PyTorch, an AI-programming framework. By allowing coders to think more about how computations are arranged on the actual chip, it can double a model's training speed by adding just one line of code.

Modular, a startup founded by former engineers at Apple and Google, just last year, released a new  AI-focused programming language called Mojo, which is based on Python.

It too gives coders control over all sorts of fine details that were previously hidden. In some cases, code written in Mojo can run '' thousands of times faster '' than the same code in Python.

A final option is to improve the chips on which the code runs. GPUS are only accidentally good at running AI software - they were originally designed to process the fancy graphics in modern video games.

In particular, says a hardware researcher at Meta, GPUS are imperfectly designed for '' inference ''  work [ ie, actually running a model once it has been trained ]. 

Some firms are therefore designing their own, more specialised hardware. Google already runs most of its AI projects on its in-house '' TPU '' chips.

Meta, with its MTIAS, and Amazon, with its Inferentia chips, are pursuing a similar path.

That such big performance increases can be extracted from relatively simple changes like rounding numbers or switching programming languages might seem surprising.

But it reflects the breakneck speed with which LLMS have been developed. For many years they were research projects, and simply getting them to work well was more important than making them elegant. 

Only recently have they graduated to commercial, mass-market products. Most experts think there remains plenty of room for improvement.

Chris Manning, a computer scientist at Stanford University, put it : '' There's absolutely no reason to believe ........ that this is the ultimate neural architecture, and we will never find anything better.''

This Revision Publishing and Essay continues. The World Students Society thanks The Economist.

0 comments:

Post a Comment

Grace A Comment!