Skip to content

Instantly share code, notes, and snippets.

@pgandla
Last active January 29, 2025 08:04
Show Gist options
  • Select an option

  • Save pgandla/d68a34f1400d9fa765b0c07d9c505b93 to your computer and use it in GitHub Desktop.

Select an option

Save pgandla/d68a34f1400d9fa765b0c07d9c505b93 to your computer and use it in GitHub Desktop.
AskTim Label studio setup

Label-Studio Setup

System Overview

This document describes the automated feedback import system that transfers user feedback data from Asktim to LabelStudio using AWS services. The system uses an event-driven architecture to ensure real-time data processing and reliable delivery.

Architecture Components

%%{init: {"theme": "deafult", "layout": "elk"} }%%

flowchart TB
    subgraph "EC2 Server-1"
        direction TB
        ATApp["Asktim App"]
    end

    subgraph "AWS Services"
        direction TB
        S3[("S3 Bucket")]
        Lambda["AWS Lambda"]
    end

    subgraph "EC2 Server-2"
        direction TB
        LSApp["LabelStudio App"]
        LSDb[("PostgreSQL DB")]
    end

    subgraph "Monitoring"
        CW["CloudWatch"]
        DLQ[("Dead Letter Queue")]
    end

    subgraph "Notifications"
        direction TB
        SNS["SNS Topic"]
        subgraph "Alert Destinations"
            Email["Email Alerts"]
            Slack["Slack Channel"]
        end
    end

    ATUser["User"] -- Interaction --> ATApp
    ATApp -- Feedback JSON --> S3
    S3 -- ObjectCreated Event --> Lambda
    LSApp --> LSDb
    Lambda -- Import Data --> LSApp
    Lambda -- Logs & Metrics --> CW
    Lambda -. Failed Events .-> DLQ

    CW -- Trigger Alarm --> SNS
    DLQ -- Trigger Alarm --> SNS
    SNS --> Email
    SNS -- Lambda --> Slack
Loading

1. AskTim Application

  • Location: EC2 server 1
  • Purpose: Collects user interactions and feedback
  • Output: JSON feedback files
  • Storage: Automatically uploads feedback data to designated S3 bucket

2. AWS S3 Bucket

  • Purpose: Central storage for feedback data
  • Configuration:
    • Event notifications enabled for ObjectCreated events
    • Versioning enabled for data consistency
    • Lifecycle policies for data retention
  • File Format: JSON structure containing feedback data

3. AWS Lambda Function

  • Trigger: S3 ObjectCreated events

  • Runtime: Python 3.10

  • Purpose: Processes feedback data and imports into LabelStudio

  • Configuration:

    {
      "Runtime": "python3.9",
      "Timeout": 300,
      "Memory": 256,
      "Environment": {
        "Variables": {
          "LABELSTUDIO_API_URL": "http://your-labelstudio-url",
          "LABELSTUDIO_TOKEN": "{{secret}}"
        }
      }
    }

4. LabelStudio Application

  • Location: EC2 server 2
  • Deployment: Docker compose
  • Database: PostgreSQL
  • Access: Label studio UI

Data Flow

  1. Feedback Generation

    • User interacts with Asktim application
    • Feedback is collected and formatted as JSON
    • JSON file is uploaded to S3 bucket
  2. Event Triggering

    • S3 ObjectCreated event triggers Lambda function
    • Event contains bucket and object details
  3. Data Processing

    • Lambda retrieves feedback file from S3
    • Transforms data to LabelStudio format
    • Imports data via LabelStudio API
  4. Data Storage

    • LabelStudio stores imported data in PostgreSQL
    • Data becomes available for review in LabelStudio UI

Monitoring and Error Handling

CloudWatch Monitoring

  • Metrics Tracked:
    • Number of files processed
    • Processing duration
    • Success/failure rates

Error Handling

  • Dead Letter Queue (DLQ):
    • Captures failed processing attempts
    • Enables retry mechanism
    • Preserves failed events for analysis
  • Error Notifications:
    • CloudWatch alarms for critical errors
    • Slack notifications via SNS

Security

IAM Permissions

  • Lambda Role:

    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "s3:GetObject",
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
          ],
          "Resource": [
            "arn:aws:s3:::your-bucket/*",
            "arn:aws:logs:*:*:*"
          ]
        }
      ]
    }

Deployment and Maintenance

Deployment Steps

  1. Configure S3 bucket and event notifications
  2. Deploy Lambda function
  3. Configure CloudWatch monitoring
  4. Set up error handling and DLQ

Maintenance Tasks

  • Monitor CloudWatch metrics
  • Review DLQ messages
  • Update Lambda function as needed

Troubleshooting

Common Issues and Solutions

  1. Lambda Timeouts

    • Check processing duration
    • Increase Lambda timeout if needed
    • Optimize code performance
  2. API Failures

    • Verify LabelStudio availability
    • Check API token validity
    • Review network connectivity
  3. Data Format Issues

    • Validate input JSON format
    • Check transformation logic
    • Review LabelStudio API response
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment