Industry Ready Java Spring Boot, React & Gen AI — Live Course
AI EngineeringThe core concepts of ai and llms

Introduction to AI, ML, and DL

AI (Artificial Intelligence)

  • AI is the field of making computers behave in ways that we consider smart or intelligent.
  • It focuses on enabling machines to perform tasks such as decision-making, problem-solving, and understanding human language.

ML (Machine Learning)

  • Machine Learning is a subset of AI that learns patterns directly from data instead of being explicitly programmed with fixed rules.
  • ML models are trained on data so they can make predictions or decisions on new, unseen data.

DL (Deep Learning)

  • Deep Learning is a subset of ML that uses neural networks with multiple layers.
  • These layers include an input layer, hidden layers, and an output layer.
  • DL is especially effective for complex data such as images, audio, and natural language, as it automatically learns useful features from raw data.

Deep Learning Structure and Training

Layers (Input, Hidden, Output)

  • The input layer takes raw data into the neural network.
  • Hidden layers perform the main computations and learn features from the data.
  • The output layer generates the final prediction or result.

Forward Pass

  • A forward pass sends input data from the input layer through the hidden layers to the output layer.
  • At each layer, calculations are performed using current weights and biases.
  • The value produced at the output layer is the model’s prediction.

Loss Function

  • The loss function measures how far the model’s prediction is from the correct answer.
  • A higher loss indicates poor predictions, while a lower loss indicates better predictions.
  • Training aims to minimize the loss.

Backpropagation

  • Backpropagation uses the loss value to update the model’s parameters.
  • The error is propagated backward from the output layer through the hidden layers.
  • Parameters are adjusted so that the loss becomes smaller in future predictions.

Parameters (Weights and Biases)

  • Weights determine how strongly one neuron influences another.
  • Biases are additional values added to neurons to shift outputs.
  • More parameters allow better tuning, and with sufficient data, this can improve model performance.

Transformers, GPT, and Types of LLMs

Transformers and GPT

  • Transformers are deep learning models designed to process and generate sequential data such as text.
  • GPT is a transformer-based model specialized for language tasks.
  • GPT models are known for generating human-like text.

LLMs and Their Types

  • Large Language Models (LLMs) work with language at a large scale.
  • They are commonly built using transformer architectures, such as GPT.
  • Two important types of LLMs are masked LLMs and autoregressive LLMs.

Masked vs Autoregressive LLMs

  • Masked LLMs predict missing words within a sentence.
    • Example: “my fav __ is blue.”
  • Autoregressive LLMs predict the next word based on previous words.
    • Example: “my fav color is __.”

Tokens and Vocabulary

Tokens

  • Tokens are the basic units of text processed by language models.
  • Words or parts of words can be tokens (for example, “cooking”).
  • A larger vocabulary increases model size, while a smaller vocabulary reduces it.
  • On average, 1 token ≈ ¾ of a word.

Ways to Connect with LLMs

Using an API

  • LLMs can be accessed through an API.
  • Text is sent to the model, and a response is returned.

Running Locally with Ollama

  • Models can also be run locally using Ollama.
  • This allows working with LLMs directly on your own machine.

Last updated on