Lab: JSON Serialization and Deserialization with REST APIs

Estimated Time: 30-45 minutes
Difficulty: Intermediate
Prerequisites: Basic Python, HTTP requests, JSON concepts

Learning Objectives

By the end of this lab, you will be able to:

Fetch data from REST APIs and deserialize JSON responses
Transform and validate complex nested JSON structures
Serialize custom Python objects to JSON format
Handle API errors and edge cases
Work with datetime serialization challenges

Setup

import requests
import json
from datetime import datetime
from typing import List, Dict, Optional

You'll be working with the GitHub API (no authentication required for these endpoints).

Part 1: Basic Deserialization (10 minutes)

Task 1.1: Fetch and Parse Repository Data

Write a function that fetches information about a GitHub repository and extracts specific fields.

def get_repo_info(owner: str, repo: str) -> Dict:
    """
    Fetch repository information from GitHub API and return a dictionary
    containing: name, description, stars, forks, language, created_at, and open_issues.
    
    API Endpoint: https://api.github.com/repos/{owner}/{repo}
    
    Args:
        owner: Repository owner username
        repo: Repository name
    
    Returns:
        Dictionary with simplified repository information
    """
    # YOUR CODE HERE
    pass

Test your function:

# Should return info about the Python requests library
info = get_repo_info("psf", "requests")
print(json.dumps(info, indent=2))

Task 1.2: Handle Missing Data

Modify your function to handle cases where fields might be None or missing. Use default values:

Missing description → "No description provided"
Missing language → "Unknown"

Part 2: Complex Nested Deserialization (15 minutes)

Task 2.1: Process Multiple Repositories

Write a function that fetches a user's repositories and returns a list of dictionaries with selected fields.

def get_user_repos_summary(username: str, max_repos: int = 5) -> List[Dict]:
    """
    Fetch a user's public repositories and return summary information.
    
    API Endpoint: https://api.github.com/users/{username}/repos
    
    For each repo, extract:
    - name
    - description
    - stars (stargazers_count)
    - primary_language
    - last_updated (convert to readable format: "YYYY-MM-DD")
    
    Args:
        username: GitHub username
        max_repos: Maximum number of repos to return
    
    Returns:
        List of repository summaries, sorted by stars (descending)
    """
    # YOUR CODE HERE
    pass

Test your function:

repos = get_user_repos_summary("torvalds", max_repos=3)
for repo in repos:
    print(json.dumps(repo, indent=2))

Task 2.2: Aggregate Statistics

Create a function that calculates aggregate statistics across a user's repositories:

def calculate_user_stats(username: str) -> Dict:
    """
    Calculate statistics about a user's repositories.
    
    Return a dictionary containing:
    - total_repos: Total number of public repos
    - total_stars: Sum of stars across all repos
    - languages: Dictionary mapping language names to count of repos using that language
    - most_popular_repo: Name of repo with most stars
    - avg_stars_per_repo: Average stars per repository (rounded to 2 decimals)
    
    Args:
        username: GitHub username
    
    Returns:
        Dictionary of statistics
    """
    # YOUR CODE HERE
    pass

Challenge: Handle users with many repositories efficiently (the API returns paginated results).

Part 3: Custom Serialization (15 minutes)

Task 3.1: Create a Repository Class

Define a Python class to represent a GitHub repository, then implement custom JSON serialization:

class Repository:
    """Represents a GitHub repository with relevant metadata."""
    
    def __init__(self, name: str, owner: str, stars: int, 
                 forks: int, language: Optional[str], 
                 created_at: datetime, description: Optional[str] = None):
        self.name = name
        self.owner = owner
        self.stars = stars
        self.forks = forks
        self.language = language
        self.created_at = created_at
        self.description = description
    
    def to_dict(self) -> Dict:
        """
        Convert Repository object to a JSON-serializable dictionary.
        
        Format created_at as ISO 8601 string.
        Include a computed field 'popularity_score' = stars + (forks * 2)
        """
        # YOUR CODE HERE
        pass
    
    @classmethod
    def from_api_response(cls, api_data: Dict) -> 'Repository':
        """
        Create a Repository instance from GitHub API response data.
        
        Parse the 'created_at' string to a datetime object.
        Handle missing/null values appropriately.
        """
        # YOUR CODE HERE
        pass

Task 3.2: Batch Processing

Write a function that fetches repositories for a user, converts them to Repository objects, and saves them to a JSON file:

def save_user_repos_to_file(username: str, filename: str, max_repos: int = 10):
    """
    Fetch user's repositories, convert to Repository objects, and save to JSON file.
    
    The JSON file should contain:
    - metadata: username, fetch_timestamp (ISO format), repo_count
    - repositories: list of repository dictionaries
    
    Args:
        username: GitHub username
        filename: Output JSON filename
        max_repos: Maximum repos to fetch
    """
    # YOUR CODE HERE
    pass

Test your implementation:

save_user_repos_to_file("octocat", "octocat_repos.json", max_repos=5)

# Verify by reading back
with open("octocat_repos.json", "r") as f:
    data = json.load(f)
    print(f"Saved {data['metadata']['repo_count']} repositories")

Part 4: Error Handling and Edge Cases (5-10 minutes)

Task 4.1: Robust API Client

Enhance one of your earlier functions with proper error handling:

def get_repo_info_robust(owner: str, repo: str) -> Dict:
    """
    Fetch repository info with comprehensive error handling.
    
    Handle:
    - Network errors (requests.RequestException)
    - 404 Not Found (repo doesn't exist)
    - 403 Forbidden (rate limit exceeded)
    - Invalid JSON responses
    
    Returns:
        Dictionary with repo info or error information
    """
    # YOUR CODE HERE
    pass

Test with invalid inputs:

# Should handle gracefully
result1 = get_repo_info_robust("nonexistent_user_xyz", "fake_repo_abc")
result2 = get_repo_info_robust("psf", "requests")

Bonus Challenges

If you finish early, try these:

Bonus 1: Commit History Analysis

Fetch a repository's recent commits and analyze:

Most active contributor (by commit count)
Average commits per day
Most common commit hour

API Endpoint: https://api.github.com/repos/{owner}/{repo}/commits

Bonus 2: Custom JSON Encoder

Create a custom JSONEncoder class that automatically handles:

datetime objects
Custom Repository objects
Sets (convert to sorted lists)

class GitHubEncoder(json.JSONEncoder):
    def default(self, obj):
        # YOUR CODE HERE
        pass

# Usage:
json.dumps(repository_object, cls=GitHubEncoder, indent=2)

Bonus 3: Data Validation

Add data validation to your Repository class using assertions or a validation method:

Stars and forks must be non-negative
Name and owner cannot be empty
created_at must be in the past

Submission Checklist

All functions are implemented and tested
Code handles missing/null values appropriately
Error handling is implemented for API calls
datetime objects are properly serialized/deserialized
Code follows Python naming conventions
Test outputs are included in comments or a separate test file

Helpful Resources

GitHub API Documentation: https://docs.github.com/en/rest
Python requests library: https://requests.readthedocs.io/
Python json module: https://docs.python.org/3/library/json.html
Python datetime: https://docs.python.org/3/library/datetime.html

Common Pitfalls

Datetime serialization: datetime objects are not JSON serializable by default. Use .isoformat() or convert to strings.
Rate limiting: GitHub limits unauthenticated requests to 60/hour. Space out your tests.
None vs missing keys: Check if a key exists before accessing it, or use .get() with defaults.
List comprehensions: Perfect for transforming API response lists into your custom format.

Sample Solution Structure

Your final code should look something like this:

# Part 1
def get_repo_info(owner: str, repo: str) -> Dict:
    url = f"https://api.github.com/repos/{owner}/{repo}"
    response = requests.get(url)
    response.raise_for_status()
    data = response.json()
    
    return {
        "name": data["name"],
        "description": data.get("description", "No description provided"),
        # ... more fields
    }

# Part 2
def get_user_repos_summary(username: str, max_repos: int = 5) -> List[Dict]:
    url = f"https://api.github.com/users/{username}/repos"
    # ... implementation
    
# Part 3
class Repository:
    # ... implementation
    
# Part 4
def get_repo_info_robust(owner: str, repo: str) -> Dict:
    try:
        # ... implementation with error handling
    except requests.RequestException as e:
        return {"error": str(e)}

Good luck!

nmagee/README.md

Select an option

No results found