The building blocks of AI, artificial neural networks (ANN) or just neural networks are a type of machine learning algorithms that are based on the structure of neurons in the human brain. They are non-linear statistical data modeling tools utilized when the exact attributes an input and an output are unknown. In his article Machine Learning Applications for Data Center Optimization, Jim Gao explains:
“Neural networks are a class of machine learning algorithms that mimic cognitive behavior via interactions between artificial neurons. They are advantageous for modeling intricate systems because neural networks do not require the user to predefine the feature interactions in the model, which assumes relationships within the data. Instead, the neural network searches for patterns and interactions between features to automatically generate a best fit model.”
Neural networks are specifically useful for speech recognition, image processing, and autonomous software agents like chatbots, says Goa. “As with most learning systems, the model accuracy improves over time as new training data is acquired,” he adds.
Neural networks learn by mimicking how neurons works, recognizing patterns, and making decisions based on those patterns, directly emulating what humans do when they recognize objects or solve certain problems. A neural network consists of layers of nodes that are tied together by connections called ‘links’. The output of each node is the weighted sum of its inputs, and the weights are adjusted to make the network learn how to recognize patterns in data. Neural nets mimic how neurons work in the human brain and can learn complex tasks by themselves without any human input. A simple example would be a computer vision system where one layer recognizes an image as a car or train based on a set number of features (e.g., shape, color, size). Another layer might then use those same features to classify the image into different categories such as “sedan” or “SUV”.
Neural nets can be used to recognize patterns in data, perform tasks such as image classification and speech recognition, and even play games like Go or chess. What are the different types of neural networks? There are many different types of neural networks that solve different problems in different ways.
Neural nets engage in three types of training — supervised learning, unsupervised learning, and reinforcement learning, with the former being the most common. They thrive at finding patterns in data learning to decipher the relationships between inputs and outputs through training.
Use Cases
Today, much of the hard work of image classification has been done by others and libraries of open-source neural networks datasets can be found at sites like Kaggle. As of the writing of this article, Kaggle had 10,658 datasets, everything from a pistachio image dataset to COVID-19 X-ray image datasets, to facial recognition datasets, and even to a Yoga Pose Image classification dataset, so it truly runs the gamut of offerings.
In his article A Beginners Guide To AI: Neural Networks, Tristan Greene states “Scientists believe that a living creature’s brain processes information using a biological neural network. The human brain has as many as 100 trillion synapses—gaps between neurons—which form specific patterns when activated.” When a person learns to read she might have had to sound out the letters related to the word, but after reading the word cat enough times, the learner doesn’t have to slow down and sound it out because she is accessing a part of her brain associated with memory rather than with problem-solving, and a different set of synapses fire off because she’s trained her biological neural network to recognize the word “cat”, claims Greene.
With neural networks, researchers have taught computers to understand at least what the picture of a cat looks like by feeding it an enormous amount of cat images.The neural network analyzes these images and “tries to find out everything that makes them similar so that it can find cats in other pictures,” adds Greene.
In marketing, neural networks can help classify a consumer’s spending habits, analyze a new product, identify a customer’s characteristics, as well as forecast sales. Neural networks are exceptionally accurate, have high noise tolerance, are simple to use, and are easy to update, which makes them useful for dynamic data environments.
Neural nets have been applied to computer vision tasks such as image recognition, speech recognition, and natural language processing (NLP). It also has applications in areas such as robotics, autonomous vehicles, and healthcare. A common application area for deep learning is NLP, where it’s used for text analytics and sentiment analysis to identify spam emails or catch the sentiment in social media posts.
GANs, CNNs, and RNNs
Although there are many types of neural networks, the most important ones include generative adversarial networks (GANs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs) here. Invented by Ian Goodfellow, generative adversarial networks (GANs) are a type of machine learning algorithm that can generate realistic images, videos, and voices. A GAN neural network comprises two arguing sides—a generator and an adversary—that fight among themselves until the generator wins, explains Greene. For example, “If you wanted to create an AI that imitates an art style, like Picasso’s for example, you could feed a GAN a bunch of his paintings.”
GANs consist of two neural networks: a generator network that produces samples from input data, and a discriminator network, which tries to distinguish between real data samples produced by the generator, and fake samples generated by the discriminator. The goal is to train the generator to generate realistic samples, while simultaneously training the discriminator to distinguish between real and fake data. The simplest way of doing this is by using a loss function that penalizes both networks for misclassifying samples as real and fake. A common choice is the cross-entropy loss function, which considers how much information has been lost when classifying a sample as real or fake. The more information lost, the higher the penalty it receives.
To train GANs effectively, two things are required:
- a large dataset of labeled images for training; and
- enough data for generating new images/videos/audio with high-quality results.
In practice, these two requirements are often not compatible with each other — especially if the models are expected to be able to generate realistic images from scratch without having any pre-existing dataset at hand. This is where GANs come in. They can be used in conjunction with existing datasets to create synthetic datasets that contain enough labeled examples for training. Once trained on these synthetic datasets, we can then apply them back to our original datasets to obtain better results than what we would have gotten otherwise. This process is known as transfer learning because it transfers knowledge learned during one task (generating synthetic images from scratch) over into another task (training generative adversarial networks)! If you’re interested in reading more about transfer learning techniques used by GANs check out this blog post by David Silver et al.,
According to Greene, the process works by side of the network trying to create new images that fool the other side of the network into thinking they have painted a work by Picasso. The AI learns everything it can about the famous artist’s work by examining each painting down to an individual pixel level. While one side creates the image, the other side determines how close the final image is to a real Picasso. Once the AI fools itself, a human can view the results to determine if the algorithm requires tweaking or if the results could be considered a success.
Convolutional Neural Networks (CNNs) are among the most common and robust neural networks around and are used mostly in image recognition and text language processing. “Where a GAN tries to create something that fools an adversary, a CNN has several layers through which data is filtered into categories,” explains Greene.
A CNN can sift through a billion hours of video, examining each frame to figure out what’s going on in the scene, explains Greene. The CNN will be trained by feeding it complex images that were tagged by humans, adds Greene. “AI learns to recognize things like stop signs, cars, trees, and butterflies by looking at pictures that humans have labeled, comparing the pixels in the image to the labels it understands and then organizing everything it sees into the categories it’s been trained on,” says Greene.
Recurrent neural networks (RNNs) are used in AI that requires nuance and context to understand its input. An example would be a natural language processing AI that interprets human speech, like Amazon’s Alexa or Google’s Assistant. For example, AI can be used to generate original music based on human input. When a series of notes is played for the AI, it tries to figure out what the next sequence of notes should. The AI anticipates what the song should sound like. “Each piece of context provides information for the next step, and an RNN continuously updates itself based on its continuing input—hence the recurrent part of the name,” explains Greene.
One Trick Pony?
In his article Is AI Riding a One-Trick Pony?, Somers argues “AI today is deep learning, and deep learning is backprop—which is amazing, considering that backprop is more than 30 years old.” In 1986, Geoffrey Hinton published a breakthrough paper with colleagues David Rumelhart and Ronald Williams that described a technique called “backpropagation.” Almost every achievement in the last decade of AI—in translation, speech recognition, image recognition, and game-playing—led back to Hinton’s work in some way or another. For decades, backprop was cool math that did little, but once computers got fast enough and Graphics Processing Units (GPUs) entered the scene, the cool math suddenly amounted to much, much more, says Somers.
With backpropagation, big layer of neurons are connected to other big layers of neurons above them, and so on and so on for a few layers, then each layer of neural nets will only excite the first of those neurons if there’s a cat in the picture, and only the second if there isn’t. The remarkable thing about it, however, is that when millions or even billions of images are trained, the network gets exceptionally good at recognizing images. Even more amazing, “the individual layers of these image-recognition nets start being able to ‘see’ images in sort of the same way our own visual system does,” says Somers. The net arranges itself into hierarchical layers without being explicitly programmed to do it that way, notes Somers. Neural nets seem to build representations of ideas. With text, you can feed the datasets from Wikipedia into a neural net, and train it to devise a big list of numbers that correspond to the excitement of each neuron in a layer for each word. What you’re doing is finding a point or vector for each word somewhere in that space. “Now, train your network in such a way that words appearing near one another on Wikipedia pages end up with similar coordinates, and voilà, something crazy happens: words that have similar meanings start showing up near one another in the space,” says Somers.
A Dance of Vectors
Mathematically, neural nets can take images, words, voices, and/or video recordings and assemble them into what mathematicians call “high-dimensional vector spaces”, “where the closeness or distance of the things reflects some important feature of the actual world,” notes Somers. Many AI researchers believe this is how the brain works. “Big patterns of neural activity, if you’re a mathematician, can be captured in a vector space, with each neuron’s activity corresponding to a number, and each number to a coordinate of a really big vector,” says Somers. For Hinton, the man many consider the father of AI, believes this is what thought is – a dance of vectors.
However impressive this process sounds, one should not forget that deep learning systems are still incredibly dumb, no matter how smart they appear on the surface. “Neural nets are just thoughtless fuzzy pattern recognizers, and as useful as fuzzy pattern recognizers can be,” contends Somers. A deep neural net that recognizes cat images won’t have a clue how to spot a train in a photo and visual noise easily throws it off. The limitations of neural nets are becoming highly apparent. Self-driving cars were once promised to be roaming the streets of U.S. cities by 2020, but they are still years away from anything close to self-driving and only limited to small local enclaves that avoid taxing their systems with too much novel stimulation.