Learn & Review: CS50x 2025 - Lecture 5 - Data Structures

Jan 23, 2026

CS50x 2025 - Lecture 5 - Data Structures

audio

Media preview

Transcript

Transcript will appear once available.

summarize_document

CS50 Week 5: Data Structures - Capping Off C

This week marks the final week on C in CS50, focusing on data structures, a crucial concept for organizing information in computer memory. While arrays were introduced earlier as a simple, contiguous data structure, this week delves into more complex structures like queues, stacks, linked lists, hash tables, and tries, highlighting the trade-offs between time and space complexity.

1. Abstract Data Types vs. Data Structures

  • Data Structures: Concrete ways to organize data in memory (e.g., arrays). They involve specific implementations using variables, pointers, and functions.
  • Abstract Data Types (ADTs): Broader descriptions of how information is structured and manipulated, independent of the underlying implementation. Different data structures can implement the same ADT.

2. Queues: First-In, First-Out (FIFO)

  • Concept: A linear data structure where elements are added at one end (enqueue) and removed from the other (dequeue). The first element added is the first one removed.
  • Real-world Analogy: A line of people waiting for a service (e.g., at a movie theater, for dinner).
  • Operations:
    • enqueue: Adds an element to the rear of the queue.
    • dequeue: Removes and returns the element from the front of the queue.
  • Implementation Example (C): Can be implemented using arrays, requiring tracking of size and capacity.

3. Stacks: Last-In, First-Out (LIFO)

  • Concept: A linear data structure where elements are added and removed from the same end (the top). The last element added is the first one removed.
  • Real-world Analogy: A stack of plates, a pile of sweaters, or an email inbox where new emails appear at the top.
  • Operations:
    • push: Adds an element to the top of the stack.
    • pop: Removes and returns the element from the top of the stack.
  • Implementation Example (C): Similar to queues, can be implemented using arrays, also requiring size and capacity tracking.

4. Arrays: Limitations and Dynamic Resizing

  • Contiguous Memory: Arrays store elements back-to-back in memory.
  • Fixed Size: The size of an array must be declared in advance.
  • Challenges with Growth:
    • Reallocation: To grow an array, a new, larger array must be allocated, and all elements copied over. This is time-consuming (linear time).
    • Over-allocation: Allocating a much larger array than needed upfront wastes space.
  • malloc, realloc, free: C functions for dynamic memory allocation.
    • malloc: Allocates a block of memory.
    • realloc: Resizes an existing memory block.
    • free: Deallocates memory.
  • Memory Leaks: Failure to free allocated memory leads to memory leaks.

5. Linked Lists: Dynamic and Flexible Memory Allocation

  • Concept: A data structure where elements (nodes) are linked together using pointers. Each node contains data and a pointer to the next node.
  • Decoupled Memory: Nodes do not need to be contiguous in memory, allowing for efficient use of available space.
  • Structure of a Node: Typically includes the data (e.g., an integer) and a pointer (next) to the subsequent node.
  • typedef struct: Used to define custom data types like node.
  • Arrow Operator (->): A shorthand for dereferencing a pointer and accessing a member of the structure it points to (e.g., ptr->next is equivalent to (*ptr).next).
  • Operations:
    • Insertion:
      • Prepending (Adding to the beginning): Efficient (O(1) time complexity) as it only involves updating the head pointer and the new node's next pointer. However, it results in a reversed order of insertion.
      • Appending (Adding to the end): Requires traversing the list to find the end (O(n) time complexity), but maintains the insertion order. Special handling is needed for an empty list.
    • Searching: Requires traversing the list from the beginning (O(n) time complexity).
    • Deletion: Requires finding the node and updating the previous node's next pointer (O(n) time complexity).
    • Memory Management: Requires careful use of malloc and free to avoid memory leaks.

6. Trees: Hierarchical Data Structures

  • Concept: A non-linear, hierarchical data structure resembling an upside-down tree, with a root node and branches leading to children, grandchildren, etc.
  • Binary Search Trees (BSTs): A specific type of tree where each node has at most two children (left and right).
    • Property: For any given node, all values in its left subtree are less than the node's value, and all values in its right subtree are greater than or equal to the node's value.
    • Advantage: Allows for efficient searching (O(log n) time complexity on average) similar to binary search, due to its balanced structure.
    • Implementation: Nodes contain data and two pointers: left and right.
    • Challenge: Maintaining balance is crucial. Unbalanced trees (e.g., inserting elements in sorted order) can degenerate into linked lists, resulting in O(n) performance. Self-balancing algorithms (like AVL trees or Red-Black trees) are needed for guaranteed logarithmic performance, but are complex to implement.

7. Dictionaries (Key-Value Pairs)

  • Concept: An abstract data type that stores associations between unique keys and their corresponding values.
  • Real-world Analogy: A phone book (name -> number), a dictionary (word -> definition).
  • Goal: Fast retrieval of values based on their keys.
  • Implementation Methods: Can be implemented using arrays, linked lists, trees, hash tables, or tries.

8. Hash Tables: Towards Constant Time Access

  • Concept: A data structure that implements a dictionary using a hash function to map keys to indices in an array (buckets).
  • Hash Function: A function that takes an input (key) and produces a fixed-size output (hash value, typically an integer index). A good hash function distributes keys uniformly across the available buckets.
  • Collisions: Occur when two different keys hash to the same index.
  • Collision Resolution:
    • Separate Chaining: Each bucket in the array points to a linked list containing all elements that hash to that index. This is a common implementation ("array of linked lists").
    • Open Addressing: If a bucket is full, probe for the next available bucket (e.g., linear probing, quadratic probing).
  • Time Complexity:
    • Average Case: Close to O(1) (constant time) if the hash function is good and collisions are minimized.
    • Worst Case: Can degrade to O(n) if many collisions occur (e.g., all keys hash to the same bucket).
  • Trade-off: Requires a good hash function and potentially more space to reduce collisions.

9. Tries (Retrieval Trees)

  • Concept: A specialized tree-like data structure used for efficient retrieval of strings. It's essentially a "tree of arrays."
  • Structure: Each node represents a character, and an array within the node contains pointers to subsequent nodes representing the next characters in a string.
  • Operation: Traversal follows the characters of the input string. The depth of the traversal depends on the length of the string, not the total number of strings stored.
  • Time Complexity: O(1) (constant time) for search, insertion, and deletion, as the time is dependent only on the length of the key (string), which is bounded.
  • Trade-off: Can consume significant space due to the arrays at each node, especially if the alphabet is large and strings are long, leading to many unused pointers.

This week emphasizes the importance of understanding these data structures and their associated time/space complexities to choose the most appropriate one for a given problem. The concepts learned in C, particularly pointers and memory management, are foundational for implementing these structures.

Ask Sia for quick explanations, examples, and study support.

Let's Get in Touch

AskSia on InstagramAskSia on TikTokAskSia on DiscordAskSia on FacebookAskSia on LinkedInAskSia on Reddit