Skip to content

Instantly share code, notes, and snippets.

@lindesvard
Last active October 19, 2025 19:55
Show Gist options
  • Select an option

  • Save lindesvard/1f2e8e22198e1ff98e7300be26880865 to your computer and use it in GitHub Desktop.

Select an option

Save lindesvard/1f2e8e22198e1ff98e7300be26880865 to your computer and use it in GitHub Desktop.
Guide how to import data into OpenPanel

Manual Event Import Guide for OpenPanel

This guide explains how to manually import historical analytics events into OpenPanel using TypeScript. Use this when you need full control over the import process or have a custom data format.

Table of Contents

  1. Introduction
  2. Understanding the Event Schema
  3. Session Reconstruction Logic
  4. Session Aggregation Table
  5. Inserting into ClickHouse
  6. Complete Example
  7. Important Considerations
  8. Testing & Validation

Introduction

OpenPanel's Event Architecture

OpenPanel stores analytics data in ClickHouse using two main tables:

  • events: Individual user interactions (page views, clicks, custom events)
  • sessions: Aggregated view of user sessions (grouped events with 30-minute timeout)

Each event is linked to a device (anonymous visitor), a profile (identified user), and a session (group of related events).

When to Use Manual Import

Use this guide when:

  • When you're self-hosting and your data is not from a common provider

Prerequisites

  • Node.js 18+ with TypeScript
  • Access to your OpenPanel ClickHouse instance
  • Your OpenPanel project ID
  • Historical event data in CSV, JSON, or similar format

Understanding the Event Schema

OpenPanel events follow a specific schema in ClickHouse. Here's what each field means and how to populate it:

Critical Identity Fields

These fields establish user and session identity:

interface EventIdentity {
  // Anonymous device identifier (like a browser cookie)
  // Use a consistent hash of IP + User-Agent, or your platform's visitor ID
  device_id: string;
  
  // User identifier for logged-in users
  // Set to device_id if user is anonymous
  // Set to actual user ID (email, UUID, etc.) if user is identified
  profile_id: string;
  
  // Session identifier - groups related events together
  // Generate using UUID v4, must be consistent for events in same session
  // See "Session Reconstruction Logic" section for how to assign this
  session_id: string;
  
  // Your OpenPanel project ID (get from Settings > Details)
  project_id: string;
}

Key relationships:

  • device_id tracks anonymous visitors across sessions
  • profile_id tracks identified users (can be same as device_id for anonymous users)
  • session_id groups events within a 30-minute window
  • Multiple sessions can have the same device_id or profile_id

Core Event Fields

interface CoreEventFields {
  // Unique event ID - generate with UUID v4
  id: string;
  
  // Event name (e.g., "screen_view", "button_click", "purchase")
  // Use "screen_view" for page views
  // Use "session_start" and "session_end" for session boundaries
  name: string;
  
  // Event timestamp in ISO format: "2024-01-15 14:30:00"
  // ClickHouse expects: YYYY-MM-DD HH:MM:SS format (no T, no Z)
  created_at: string;
  
  // URL path without domain (e.g., "/blog/article-1")
  path: string;
  
  // Full origin URL (e.g., "https://example.com")
  origin: string;
  
  // Where the user came from (full URL)
  // Set to empty string if direct traffic or same domain
  referrer: string;
  
  // Human-readable referrer source (e.g., "Google", "Facebook", "Direct")
  referrer_name: string;
  
  // Referrer category: "search", "social", "referral", "email", "paid", "direct"
  referrer_type: string;
  
  // Event duration in milliseconds (usually 0 for most events)
  // For session_end events, set to: last_event_time - first_event_time
  duration: number;
  
  // Custom event data as flat key-value object
  // Nested objects are flattened using dot notation
  // Example: { "button_id": "signup", "user.plan": "premium", "price": 99.99 }
  properties: Record<string, string | number | boolean | null>;
}

Properties Object Structure

The properties field is a flexible key-value store for custom event data. Understanding its structure is crucial for proper data import.

Basic Structure

// Properties is always a flat object - nested data uses dot notation
properties: {
  "button_id": "signup",           // Simple string
  "price": 99.99,                  // Number
  "is_premium": true,              // Boolean
  "user.name": "John",             // Nested using dot notation
  "metadata.source": "mobile_app"  // Multi-level nesting
}

Important: ClickHouse stores properties as a flat key-value object. If your source data has nested objects, flatten them using dot notation:

// ❌ Wrong: Nested objects won't work in ClickHouse
properties: {
  user: {
    name: "John",
    plan: "premium"
  }
}

// βœ… Correct: Flatten using dot notation
properties: {
  "user.name": "John",
  "user.plan": "premium"
}

// Helper function to flatten objects
function flattenObject(obj: any, prefix = ''): Record<string, string | number | boolean | null> {
  const flattened: Record<string, any> = {};
  
  for (const [key, value] of Object.entries(obj)) {
    const newKey = prefix ? `${prefix}.${key}` : key;
    
    if (value && typeof value === 'object' && !Array.isArray(value)) {
      Object.assign(flattened, flattenObject(value, newKey));
    } else {
      flattened[newKey] = value;
    }
  }
  
  return flattened;
}

Reserved Property Keys

OpenPanel uses specific property keys for special purposes. Always prefix these with __ (double underscore):

properties: {
  // URL query parameters (extracted from page URL)
  "__query": {
    "utm_source": "google",
    "utm_medium": "cpc",
    "utm_campaign": "summer_sale",
    "ref": "newsletter"
  },
  
  // Page metadata
  "__title": "Product Page - Item XYZ",
  "__screen": "1920x1080",
  "__language": "en-US",
  
  // Your custom properties (no prefix needed)
  "product_id": "xyz-123",
  "category": "electronics"
}

Properties you SHOULD include for imports:

  • __query: URL query parameters (essential for attribution)
  • __title: Page title
  • __screen: Screen resolution
  • __language: User language
  • Any custom event properties

Properties you should NOT include (these are auto-generated by real-time system):

  • __path, __referrer: Use top-level path, origin, referrer fields instead
  • __user_agent: Use top-level device/browser fields instead
  • __hash: URL hash fragment (not typically preserved in imports)
  • __reqId: Internal request tracking ID

Query Parameters and UTM Tracking

OpenPanel stores URL query parameters in a special __query nested object within properties. This is essential for marketing attribution:

// Example: User visits from Google Ads
// URL: https://example.com/product?utm_source=google&utm_medium=cpc&utm_campaign=sale&product_id=123

properties: {
  "__query": {
    "utm_source": "google",      // Traffic source
    "utm_medium": "cpc",          // Marketing medium
    "utm_campaign": "sale",       // Campaign name
    "utm_content": "banner_ad",   // Ad content (optional)
    "utm_term": "shoes",          // Search term (optional)
    "product_id": "123"           // Other query params
  }
}

How to extract query parameters from URLs:

function extractQueryParams(url: string): Record<string, string> {
  try {
    const urlObj = new URL(url);
    const params: Record<string, string> = {};
    
    // Extract all query parameters
    for (const [key, value] of urlObj.searchParams.entries()) {
      params[key] = value;
    }
    
    return params;
  } catch {
    return {};
  }
}

// Usage in event transformation
const query = extractQueryParams(raw.page_url);
const properties: Record<string, any> = {};

// Store query params if present
if (Object.keys(query).length > 0) {
  properties.__query = query;
}

// Alternative: Store individual UTM params at root level (they'll be nested automatically)
// This approach matches the Umami provider pattern
if (query.utm_source) {
  if (!properties.__query) properties.__query = {};
  properties.__query.utm_source = query.utm_source;
}
if (query.utm_medium) {
  if (!properties.__query) properties.__query = {};
  properties.__query.utm_medium = query.utm_medium;
}
// ... repeat for utm_campaign, utm_content, utm_term

UTM Parameter Importance

UTM parameters are critical for OpenPanel's attribution system. The sessions table automatically extracts these for aggregation:

// Session aggregate will have top-level UTM fields
interface SessionAggregate {
  // ... other fields
  
  // These are extracted from the first event's properties.__query
  utm_source: string;    // From properties.__query.utm_source
  utm_medium: string;    // From properties.__query.utm_medium
  utm_campaign: string;  // From properties.__query.utm_campaign
  utm_content: string;   // From properties.__query.utm_content
  utm_term: string;      // From properties.__query.utm_term
}

// Helper function to extract UTM from properties
function extractUTMFromProperties(properties: Record<string, any>): {
  utm_source: string;
  utm_medium: string;
  utm_campaign: string;
  utm_content: string;
  utm_term: string;
} {
  const query = properties.__query || {};
  
  return {
    utm_source: String(query.utm_source || ''),
    utm_medium: String(query.utm_medium || ''),
    utm_campaign: String(query.utm_campaign || ''),
    utm_content: String(query.utm_content || ''),
    utm_term: String(query.utm_term || ''),
  };
}

Common Property Patterns

Here are recommended patterns for different types of events:

// Page view event
{
  name: "screen_view",
  properties: {
    "__query": { "utm_source": "google", "ref": "homepage" },
    "__title": "Product Details",
    "__screen": "1920x1080",
    "__language": "en-US"
  }
}

// E-commerce event
{
  name: "purchase",
  properties: {
    "order_id": "ORD-12345",
    "revenue": 149.99,
    "currency": "USD",
    "items": "3",  // Store count, not array
    "item.1.id": "PROD-A",  // Use dot notation for item details
    "item.1.price": 49.99,
    "item.2.id": "PROD-B",
    "item.2.price": 100.00
  }
}

// Custom interaction event
{
  name: "button_click",
  properties: {
    "button_id": "signup_cta",
    "button_text": "Start Free Trial",
    "page_section": "hero",
    "user.is_logged_in": false
  }
}

Geo and Device Fields

Location and device information for analytics:

interface GeoDeviceFields {
  // Geographic location
  country: string;        // ISO country code (e.g., "US", "GB")
  city: string;           // City name
  region: string;         // State/province code (e.g., "CA", "NY")
  longitude: number | null;  // GPS coordinates
  latitude: number | null;
  
  // Device information
  os: string;             // Operating system (e.g., "Windows", "macOS", "iOS")
  os_version: string;     // OS version (e.g., "11.0")
  browser: string;        // Browser name (e.g., "Chrome", "Safari", "Firefox")
  browser_version: string; // Browser version (e.g., "120.0")
  device: string;         // Device type: "desktop", "mobile", "tablet", "smarttv"
  brand: string;          // Device brand (e.g., "Apple", "Samsung")
  model: string;          // Device model (e.g., "iPhone 14 Pro")
}

Metadata Fields

interface MetadataFields {
  // Timestamp when this event was imported (ISO format)
  imported_at: string;
  
  // SDK identifier (use "manual-import" or your platform name)
  sdk_name: string;
  
  // SDK version (use "1.0.0" or your import script version)
  sdk_version: string;
}

Complete Event Interface

interface OpenPanelEvent {
  // Identity
  id: string;
  device_id: string;
  profile_id: string;
  session_id: string;
  project_id: string;
  
  // Core
  name: string;
  created_at: string;
  path: string;
  origin: string;
  referrer: string;
  referrer_name: string;
  referrer_type: string;
  duration: number;
  properties: Record<string, string | number | boolean | null>;
  
  // Geo/Device
  country: string;
  city: string;
  region: string;
  longitude: number | null;
  latitude: number | null;
  os: string;
  os_version: string;
  browser: string;
  browser_version: string;
  device: string;
  brand: string;
  model: string;
  
  // Metadata
  imported_at: string;
  sdk_name: string;
  sdk_version: string;
}

Session Reconstruction Logic

Sessions group related events together using a 30-minute inactivity timeout. If a user is inactive for more than 30 minutes, a new session starts.

Pseudo-code Overview

1. Sort all events by (device_id, created_at) - chronological order per device
2. Initialize empty session tracking map
3. For each event in sorted order:
   a. Look up last event for this device_id
   b. Calculate time gap since last event
   c. If gap > 30 minutes OR no previous event:
      - Generate new session_id (UUID)
      - Create session_start event (1 second before first event)
      - Store session metadata (first event, counts, etc.)
   d. Else (continuing existing session):
      - Use existing session_id
      - Update session metadata (last event, counts)
4. After processing all events:
   - Create session_end event for each session (1 second after last event)

TypeScript Implementation

import { randomUUID } from 'crypto';

interface SessionState {
  sessionId: string;
  firstEvent: OpenPanelEvent;
  lastEvent: OpenPanelEvent;
  eventCount: number;
  screenViewCount: number;
  screenViews: Set<string>; // unique paths visited
  revenue: number; // sum of revenue from events
}

const SESSION_TIMEOUT_MS = 30 * 60 * 1000; // 30 minutes in milliseconds

function reconstructSessions(events: OpenPanelEvent[]): {
  eventsWithSessions: OpenPanelEvent[];
  sessionStarts: OpenPanelEvent[];
  sessionEnds: OpenPanelEvent[];
  sessionAggregates: SessionAggregate[];
} {
  // Step 1: Sort events by device and time
  const sortedEvents = [...events].sort((a, b) => {
    const deviceCompare = a.device_id.localeCompare(b.device_id);
    if (deviceCompare !== 0) return deviceCompare;
    return new Date(a.created_at).getTime() - new Date(b.created_at).getTime();
  });
  
  // Step 2: Track active sessions per device
  const deviceSessions = new Map<string, SessionState>();
  const sessionStarts: OpenPanelEvent[] = [];
  const sessionEnds: OpenPanelEvent[] = [];
  const eventsWithSessions: OpenPanelEvent[] = [];
  
  // Step 3: Process each event
  for (const event of sortedEvents) {
    const lastSession = deviceSessions.get(event.device_id);
    const eventTime = new Date(event.created_at).getTime();
    
    let currentSession: SessionState;
    
    if (!lastSession) {
      // First event for this device - start new session
      currentSession = startNewSession(event, eventTime, sessionStarts);
      deviceSessions.set(event.device_id, currentSession);
    } else {
      // Check if we should continue existing session or start new one
      const lastEventTime = new Date(lastSession.lastEvent.created_at).getTime();
      const timeSinceLastEvent = eventTime - lastEventTime;
      
      if (timeSinceLastEvent > SESSION_TIMEOUT_MS) {
        // Timeout reached - close old session and start new one
        sessionEnds.push(createSessionEndEvent(lastSession));
        currentSession = startNewSession(event, eventTime, sessionStarts);
        deviceSessions.set(event.device_id, currentSession);
      } else {
        // Continue existing session
        currentSession = lastSession;
        currentSession.lastEvent = event;
        currentSession.eventCount++;
        
        if (event.name === 'screen_view') {
          currentSession.screenViewCount++;
          currentSession.screenViews.add(event.path);
        }
        
        // Track revenue if present in properties
        if (event.properties.revenue && typeof event.properties.revenue === 'number') {
          currentSession.revenue += event.properties.revenue;
        }
      }
    }
    
    // Assign session_id to event
    const eventWithSession = { ...event, session_id: currentSession.sessionId };
    eventsWithSessions.push(eventWithSession);
  }
  
  // Step 4: Create session_end events for all remaining sessions
  for (const session of deviceSessions.values()) {
    sessionEnds.push(createSessionEndEvent(session));
  }
  
  // Step 5: Build session aggregates for sessions table
  const sessionAggregates = Array.from(deviceSessions.values()).map(session => 
    buildSessionAggregate(session)
  );
  
  return {
    eventsWithSessions,
    sessionStarts,
    sessionEnds,
    sessionAggregates,
  };
}

function startNewSession(
  event: OpenPanelEvent,
  eventTime: number,
  sessionStarts: OpenPanelEvent[]
): SessionState {
  const sessionId = randomUUID();
  
  // Create session_start event (1 second before first event)
  const sessionStart: OpenPanelEvent = {
    ...event,
    id: randomUUID(),
    name: 'session_start',
    session_id: sessionId,
    created_at: formatClickHouseDate(new Date(eventTime - 1000)),
    duration: 0,
  };
  sessionStarts.push(sessionStart);
  
  const session: SessionState = {
    sessionId,
    firstEvent: event,
    lastEvent: event,
    eventCount: 1,
    screenViewCount: event.name === 'screen_view' ? 1 : 0,
    screenViews: new Set(event.name === 'screen_view' ? [event.path] : []),
    revenue: typeof event.properties.revenue === 'number' ? event.properties.revenue : 0,
  };
  
  return session;
}

function createSessionEndEvent(session: SessionState): OpenPanelEvent {
  const firstTime = new Date(session.firstEvent.created_at).getTime();
  const lastTime = new Date(session.lastEvent.created_at).getTime();
  const duration = lastTime - firstTime;
  
  return {
    ...session.lastEvent, // Inherit most fields from last event
    id: randomUUID(),
    name: 'session_end',
    session_id: session.sessionId,
    created_at: formatClickHouseDate(new Date(lastTime + 1000)), // 1 second after
    duration, // Total session duration
  };
}

function formatClickHouseDate(date: Date): string {
  return date.toISOString().replace('T', ' ').replace(/\.\d{3}Z$/, '');
}

Important Session Logic Notes

  1. 30-minute timeout: Measured from the timestamp of the last event in the session, not wall-clock time
  2. Session boundaries: session_start is 1 second before first event, session_end is 1 second after last event
  3. Event order: Must process events in chronological order per device for correct session assignment
  4. Duration field:
    • session_start: always 0
    • session_end: time difference between first and last event in milliseconds
    • Regular events: usually 0 (unless you're tracking page view time)

Session Aggregation Table

The sessions table stores aggregated session metadata for fast querying. Track this inline during event processing:

interface SessionAggregate {
  // Identity
  id: string;                    // Same as session_id
  profile_id: string;
  device_id: string;
  project_id: string;
  
  // Counts
  event_count: number;           // Total events in session
  screen_view_count: number;     // Number of page views
  screen_views: string[];        // Unique paths visited (as array)
  
  // Entry/Exit
  entry_path: string;            // Path of first event
  entry_origin: string;          // Origin of first event
  exit_path: string;             // Path of last event
  exit_origin: string;           // Origin of last event
  
  // Timing
  created_at: string;            // Timestamp of first event
  ended_at: string;              // Timestamp of last event
  duration: number;              // Session length in milliseconds
  
  // Referrer (from first event)
  referrer: string;
  referrer_name: string;
  referrer_type: string;
  
  // Device info (from first event)
  os: string;
  os_version: string;
  browser: string;
  browser_version: string;
  device: string;
  brand: string;
  model: string;
  
  // Location (from first event)
  country: string;
  region: string;
  city: string;
  longitude: number | null;
  latitude: number | null;
  
  // Marketing attribution (extracted from first event's query params)
  utm_source: string;
  utm_medium: string;
  utm_campaign: string;
  utm_content: string;
  utm_term: string;
  
  // Analytics
  is_bounce: boolean;            // true if only 1 event in session
  revenue: number;               // Sum of revenue from all events
  
  // ClickHouse-specific (use these exact values)
  sign: 1;                       // Always 1 for imports
  version: number;               // Use current timestamp
  properties: Record<string, string>; // Additional session properties
}

Building Session Aggregates

function buildSessionAggregate(session: SessionState): SessionAggregate {
  const firstEvent = session.firstEvent;
  const lastEvent = session.lastEvent;
  
  // Extract UTM parameters from first event's properties.__query
  const query = firstEvent.properties.__query || {};
  const utmSource = String(query.utm_source || '');
  const utmMedium = String(query.utm_medium || '');
  const utmCampaign = String(query.utm_campaign || '');
  const utmContent = String(query.utm_content || '');
  const utmTerm = String(query.utm_term || '');
  
  return {
    // Identity
    id: session.sessionId,
    profile_id: firstEvent.profile_id,
    device_id: firstEvent.device_id,
    project_id: firstEvent.project_id,
    
    // Counts
    event_count: session.eventCount,
    screen_view_count: session.screenViewCount,
    screen_views: Array.from(session.screenViews),
    
    // Entry/Exit
    entry_path: firstEvent.path,
    entry_origin: firstEvent.origin,
    exit_path: lastEvent.path,
    exit_origin: lastEvent.origin,
    
    // Timing
    created_at: firstEvent.created_at,
    ended_at: lastEvent.created_at,
    duration: new Date(lastEvent.created_at).getTime() - 
              new Date(firstEvent.created_at).getTime(),
    
    // Referrer (from first event)
    referrer: firstEvent.referrer,
    referrer_name: firstEvent.referrer_name,
    referrer_type: firstEvent.referrer_type,
    
    // Device info (from first event)
    os: firstEvent.os,
    os_version: firstEvent.os_version,
    browser: firstEvent.browser,
    browser_version: firstEvent.browser_version,
    device: firstEvent.device,
    brand: firstEvent.brand,
    model: firstEvent.model,
    
    // Location (from first event)
    country: firstEvent.country,
    region: firstEvent.region,
    city: firstEvent.city,
    longitude: firstEvent.longitude,
    latitude: firstEvent.latitude,
    
    // Marketing attribution
    utm_source: utmSource,
    utm_medium: utmMedium,
    utm_campaign: utmCampaign,
    utm_content: utmContent,
    utm_term: utmTerm,
    
    // Analytics
    is_bounce: session.eventCount === 1,
    revenue: session.revenue,
    
    // ClickHouse metadata
    sign: 1,
    version: Date.now(),
    properties: {},
  };
}

Inserting into ClickHouse

Use the official ClickHouse client to insert events and sessions in batches:

import { createClient } from '@clickhouse/client';

// Initialize ClickHouse client
const ch = createClient({
  url: process.env.CLICKHOUSE_URL || 'http://localhost:8123',
  // Optional: add authentication
  username: process.env.CLICKHOUSE_USER,
  password: process.env.CLICKHOUSE_PASSWORD,
});

async function insertEvents(events: OpenPanelEvent[]) {
  const BATCH_SIZE = 5000;
  
  for (let i = 0; i < events.length; i += BATCH_SIZE) {
    const batch = events.slice(i, i + BATCH_SIZE);
    
    await ch.insert({
      table: 'events',
      values: batch,
      format: 'JSONEachRow',
    });
    
    console.log(`Inserted events ${i} to ${i + batch.length}`);
  }
}

async function insertSessions(sessions: SessionAggregate[]) {
  await ch.insert({
    table: 'sessions',
    values: sessions,
    format: 'JSONEachRow',
  });
  
  console.log(`Inserted ${sessions.length} sessions`);
}

Batch Size Recommendations

  • Events: 5,000-10,000 per batch for optimal performance
  • Sessions: Can insert all at once (usually much fewer than events)
  • Memory: Monitor memory usage with large imports
  • Progress tracking: Log after each batch for monitoring

Complete Example

Here's a full working example that imports events from a CSV file:

import { createClient } from '@clickhouse/client';
import { parse } from 'csv-parse/sync';
import { readFileSync } from 'fs';
import { randomUUID } from 'crypto';

// Configuration
const PROJECT_ID = 'your-project-id-here';
const CLICKHOUSE_URL = process.env.CLICKHOUSE_URL || 'http://localhost:8123';
const CSV_FILE_PATH = './events-export.csv';

// Initialize ClickHouse
const ch = createClient({ url: CLICKHOUSE_URL });

interface RawEvent {
  timestamp: string;      // Your CSV timestamp column
  visitor_id: string;     // Your CSV visitor ID column
  user_id?: string;       // Your CSV user ID column (optional)
  page_url: string;       // Your CSV URL column
  event_name: string;     // Your CSV event name column
  country?: string;
  city?: string;
  browser?: string;
  os?: string;
  device?: string;
  // ... other columns from your CSV
}

async function main() {
  console.log('Starting import...');
  
  // Step 1: Read and parse CSV
  const csvContent = readFileSync(CSV_FILE_PATH, 'utf-8');
  const rawEvents: RawEvent[] = parse(csvContent, {
    columns: true,
    skip_empty_lines: true,
  });
  
  console.log(`Loaded ${rawEvents.length} raw events`);
  
  // Step 2: Transform to OpenPanel format
  const openPanelEvents: OpenPanelEvent[] = rawEvents.map(raw => {
    const url = new URL(raw.page_url);
    
    // Extract query parameters from URL
    const query: Record<string, string> = {};
    for (const [key, value] of url.searchParams.entries()) {
      query[key] = value;
    }
    
    // Build properties object
    const properties: Record<string, any> = {};
    
    // Add query parameters (including UTM)
    if (Object.keys(query).length > 0) {
      properties.__query = query;
    }
    
    // Add page metadata if available
    if (raw.page_title) properties.__title = raw.page_title;
    if (raw.screen_resolution) properties.__screen = raw.screen_resolution;
    if (raw.language) properties.__language = raw.language;
    
    // Add custom properties from your data
    if (raw.custom_field1) properties.custom_field1 = raw.custom_field1;
    if (raw.custom_field2) properties.custom_field2 = raw.custom_field2;
    
    return {
      // Identity
      id: randomUUID(),
      device_id: raw.visitor_id,
      profile_id: raw.user_id || raw.visitor_id, // Use user_id if available
      session_id: '', // Will be assigned during session reconstruction
      project_id: PROJECT_ID,
      
      // Core
      name: raw.event_name === 'pageview' ? 'screen_view' : raw.event_name,
      created_at: formatClickHouseDate(new Date(raw.timestamp)),
      path: url.pathname,
      origin: url.origin,
      referrer: '', // Extract from your data if available
      referrer_name: '',
      referrer_type: 'direct',
      duration: 0,
      properties, // Properties with query params and metadata
      
      // Geo/Device
      country: raw.country || '',
      city: raw.city || '',
      region: '',
      longitude: null,
      latitude: null,
      os: raw.os || '',
      os_version: '',
      browser: raw.browser || '',
      browser_version: '',
      device: raw.device || 'desktop',
      brand: '',
      model: '',
      
      // Metadata
      imported_at: new Date().toISOString(),
      sdk_name: 'manual-import',
      sdk_version: '1.0.0',
    };
  });
  
  console.log('Transformed events to OpenPanel format');
  
  // Step 3: Reconstruct sessions
  const {
    eventsWithSessions,
    sessionStarts,
    sessionEnds,
    sessionAggregates,
  } = reconstructSessions(openPanelEvents);
  
  console.log(`Created ${sessionAggregates.length} sessions`);
  console.log(`Generated ${sessionStarts.length} session_start events`);
  console.log(`Generated ${sessionEnds.length} session_end events`);
  
  // Step 4: Combine all events
  const allEvents = [
    ...sessionStarts,
    ...eventsWithSessions,
    ...sessionEnds,
  ];
  
  console.log(`Total events to insert: ${allEvents.length}`);
  
  // Step 5: Insert into ClickHouse
  await insertEvents(allEvents);
  await insertSessions(sessionAggregates);
  
  console.log('Import complete!');
}

function formatClickHouseDate(date: Date): string {
  return date.toISOString().replace('T', ' ').replace(/\.\d{3}Z$/, '');
}

async function insertEvents(events: OpenPanelEvent[]) {
  const BATCH_SIZE = 5000;
  
  for (let i = 0; i < events.length; i += BATCH_SIZE) {
    const batch = events.slice(i, i + BATCH_SIZE);
    
    await ch.insert({
      table: 'events',
      values: batch,
      format: 'JSONEachRow',
    });
    
    console.log(`βœ“ Inserted events ${i + 1} to ${Math.min(i + BATCH_SIZE, events.length)}`);
  }
}

async function insertSessions(sessions: SessionAggregate[]) {
  if (sessions.length === 0) return;
  
  await ch.insert({
    table: 'sessions',
    values: sessions,
    format: 'JSONEachRow',
  });
  
  console.log(`βœ“ Inserted ${sessions.length} sessions`);
}

// Run the import
main().catch(console.error);

Running the Example

  1. Install dependencies:
npm install @clickhouse/client csv-parse
  1. Set environment variables:
export CLICKHOUSE_URL="http://your-clickhouse-host:8123"
  1. Update the configuration in the script:

    • Set PROJECT_ID to your OpenPanel project ID
    • Set CSV_FILE_PATH to your data file
    • Adjust the RawEvent interface to match your CSV columns
    • Update the transformation logic in the map function
  2. Run the script:

npx tsx import-script.ts

Important Considerations

Properties and Query Parameters

The __query object is an exception to the flat structure rule:

// βœ… Correct: __query can contain nested objects
properties: {
  "__query": {
    "utm_source": "google",  // Nested is OK here
    "utm_medium": "cpc"
  },
  "custom_field": "value"    // Other fields stay flat
}

// ❌ Wrong: Don't flatten __query into dot notation
properties: {
  "__query.utm_source": "google",  // DON'T do this
  "__query.utm_medium": "cpc"
}

Always extract and include UTM parameters for proper attribution tracking. OpenPanel's dashboard relies on these for traffic source analysis.

Date Format Requirements

ClickHouse expects dates in a specific format:

// βœ… Correct: YYYY-MM-DD HH:MM:SS
"2024-01-15 14:30:00"

// ❌ Wrong: ISO 8601 with T and Z
"2024-01-15T14:30:00.000Z"

// Conversion function
function formatClickHouseDate(date: Date): string {
  return date.toISOString()
    .replace('T', ' ')           // Remove T separator
    .replace(/\.\d{3}Z$/, '');  // Remove milliseconds and Z
}

Batch Size and Performance

  • Optimal batch size: 5,000-10,000 events
  • Too small: More network overhead, slower imports
  • Too large: Memory issues, risk of timeout
  • Monitor: Watch memory usage and insert time per batch

Session Boundary Edge Cases

Exactly 30 minutes gap:

// If gap === 30 minutes exactly, continue existing session
if (timeSinceLastEvent > SESSION_TIMEOUT_MS) {
  // Start new session
}

Events at exact same timestamp:

  • They belong to the same session
  • Order doesn't matter as they're simultaneous

Out-of-order events:

  • Always sort by (device_id, created_at) before processing
  • Do NOT assume your source data is already sorted

Profile Upsertion

If you're tracking identified users, you may want to create profile records:

// Optional: Upsert profiles in PostgreSQL
import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient();

async function upsertProfiles(events: OpenPanelEvent[]) {
  const uniqueProfiles = new Map<string, OpenPanelEvent>();
  
  for (const event of events) {
    // Only track identified users (where profile_id !== device_id)
    if (event.profile_id !== event.device_id) {
      if (!uniqueProfiles.has(event.profile_id)) {
        uniqueProfiles.set(event.profile_id, event);
      }
    }
  }
  
  for (const [profileId, event] of uniqueProfiles) {
    await prisma.profile.upsert({
      where: {
        projectId_id: {
          projectId: event.project_id,
          id: profileId,
        },
      },
      create: {
        id: profileId,
        projectId: event.project_id,
        firstName: '', // Extract from your data if available
        lastName: '',
        email: '',
      },
      update: {}, // Don't overwrite existing data
    });
  }
}

Referrer Logic and Priority

OpenPanel uses a specific priority order for determining traffic sources:

Priority Order (highest to lowest):

  1. UTM parameters (utm_source, ref, utm_referrer in query string)
  2. Organic referrer (HTTP referrer header, if not same domain)
  3. Direct (no referrer information)
// Step 1: Check if referrer is from same domain (internal navigation)
function isSameDomain(referrerUrl: string | undefined, currentUrl: string | undefined): boolean {
  if (!referrerUrl || !currentUrl) return false;
  
  try {
    const referrerHost = new URL(referrerUrl).hostname;
    const currentHost = new URL(currentUrl).hostname;
    return referrerHost === currentHost;
  } catch {
    return false;
  }
}

// Step 2: Parse organic referrer (if not same domain)
function parseReferrer(url: string | undefined): {
  name: string;
  type: string;
  url: string;
} | null {
  if (!url) return null;
  
  try {
    const hostname = new URL(url).hostname;
    
    // Check against known referrer database (simplified)
    // In production, use a comprehensive list like apps/worker/src/referrers.ts
    const knownReferrers: Record<string, { name: string; type: string }> = {
      'google.com': { name: 'Google', type: 'search' },
      'bing.com': { name: 'Bing', type: 'search' },
      'yahoo.com': { name: 'Yahoo', type: 'search' },
      'duckduckgo.com': { name: 'DuckDuckGo', type: 'search' },
      'facebook.com': { name: 'Facebook', type: 'social' },
      'twitter.com': { name: 'Twitter', type: 'social' },
      'linkedin.com': { name: 'LinkedIn', type: 'social' },
      'reddit.com': { name: 'Reddit', type: 'social' },
      't.co': { name: 'Twitter', type: 'social' },
    };
    
    const match = knownReferrers[hostname] || knownReferrers[hostname.replace('www.', '')];
    
    return {
      name: match?.name || '',
      type: match?.type || 'unknown',
      url: url.replace(/\/$/, ''), // Strip trailing slash
    };
  } catch {
    return null;
  }
}

// Step 3: Check for UTM referrer (HIGHEST PRIORITY)
function getReferrerFromQuery(query: Record<string, string> | undefined): {
  name: string;
  type: string;
  url: string;
} | null {
  if (!query) return null;
  
  // Check multiple query param variants
  const source = query.utm_source || query.ref || query.utm_referrer || '';
  
  if (!source) return null;
  
  // Try to match against known referrers
  const knownReferrers: Record<string, { name: string; type: string }> = {
    google: { name: 'Google', type: 'search' },
    bing: { name: 'Bing', type: 'search' },
    facebook: { name: 'Facebook', type: 'social' },
    twitter: { name: 'Twitter', type: 'social' },
    linkedin: { name: 'LinkedIn', type: 'social' },
  };
  
  const match = knownReferrers[source.toLowerCase()];
  
  return {
    name: match?.name || source, // Use source as name if not recognized
    type: match?.type || 'unknown',
    url: '', // UTM referrers don't have a URL
  };
}

// Step 4: Apply referrer logic with correct priority
function determineReferrer(
  organicReferrer: string | undefined,
  currentUrl: string | undefined,
  query: Record<string, string> | undefined
): {
  referrer: string;
  referrer_name: string;
  referrer_type: string;
} {
  // Priority 1: UTM parameters (highest priority)
  const utmReferrer = getReferrerFromQuery(query);
  if (utmReferrer) {
    return {
      referrer: utmReferrer.url,
      referrer_name: utmReferrer.name,
      referrer_type: utmReferrer.type,
    };
  }
  
  // Priority 2: Organic referrer (if not same domain)
  if (organicReferrer && !isSameDomain(organicReferrer, currentUrl)) {
    const parsed = parseReferrer(organicReferrer);
    if (parsed) {
      return {
        referrer: parsed.url,
        referrer_name: parsed.name,
        referrer_type: parsed.type,
      };
    }
  }
  
  // Priority 3: Direct traffic (no referrer)
  return {
    referrer: '',
    referrer_name: '',
    referrer_type: '',
  };
}

Important notes:

  • UTM parameters always override organic referrers
  • Same-domain referrers are treated as direct traffic (internal navigation)
  • The referrer field stores the URL, referrer_name stores human-readable name
  • Use OpenPanel's referrer database for accurate classification (see apps/worker/src/referrers.ts)
  • Support multiple UTM variants: utm_source, ref, utm_referrer

Region Code Normalization

Some platforms prefix region codes with country codes:

function normalizeRegion(region: string): string {
  // Remove "US-" or "GB-" prefix if present
  return region.replace(/^[A-Z]{2}-/, '');
}

// Example:
// "US-CA" β†’ "CA"
// "GB-ENG" β†’ "ENG"
// "CA" β†’ "CA" (unchanged)

Testing & Validation

After importing, verify your data is correct:

1. Check Event Counts

-- Total events imported
SELECT count(*) as total_events
FROM events
WHERE project_id = 'your-project-id'
  AND imported_at IS NOT NULL;

-- Events by type
SELECT name, count(*) as count
FROM events
WHERE project_id = 'your-project-id'
  AND imported_at IS NOT NULL
GROUP BY name
ORDER BY count DESC;

2. Verify Sessions Created

-- Total sessions
SELECT count(*) as total_sessions
FROM sessions
WHERE project_id = 'your-project-id';

-- Session duration distribution
SELECT 
  round(duration / 1000) as duration_seconds,
  count(*) as count
FROM sessions
WHERE project_id = 'your-project-id'
GROUP BY duration_seconds
ORDER BY duration_seconds;

-- Sessions with events
SELECT 
  s.id,
  s.event_count,
  count(e.id) as actual_events
FROM sessions s
LEFT JOIN events e ON e.session_id = s.id
WHERE s.project_id = 'your-project-id'
GROUP BY s.id, s.event_count
HAVING actual_events != s.event_count
LIMIT 10;

3. Check for Orphaned Events

-- Events without session_id (should be 0)
SELECT count(*) as orphaned_events
FROM events
WHERE project_id = 'your-project-id'
  AND session_id = ''
  AND name NOT IN ('session_start', 'session_end');

-- Events with session_id but no matching session
SELECT count(*) as events_without_session
FROM events e
LEFT JOIN sessions s ON e.session_id = s.id
WHERE e.project_id = 'your-project-id'
  AND e.session_id != ''
  AND s.id IS NULL;

4. Validate Session Boundaries

-- Check session_start timing (should be 1 second before first event)
SELECT 
  s.id,
  s.created_at as session_start,
  min(e.created_at) as first_event,
  dateDiff('second', s.created_at, min(e.created_at)) as diff_seconds
FROM sessions s
JOIN events e ON e.session_id = s.id AND e.name != 'session_start'
WHERE s.project_id = 'your-project-id'
GROUP BY s.id, s.created_at
HAVING diff_seconds != 1
LIMIT 10;

-- Check session_end timing (should be 1 second after last event)
SELECT 
  s.id,
  s.ended_at as session_end,
  max(e.created_at) as last_event,
  dateDiff('second', max(e.created_at), s.ended_at) as diff_seconds
FROM sessions s
JOIN events e ON e.session_id = s.id AND e.name != 'session_end'
WHERE s.project_id = 'your-project-id'
GROUP BY s.id, s.ended_at
HAVING diff_seconds != 1
LIMIT 10;

5. Sample Data Review

-- Look at a few complete sessions
SELECT 
  e.session_id,
  e.name,
  e.created_at,
  e.path
FROM events e
WHERE e.project_id = 'your-project-id'
  AND e.session_id IN (
    SELECT id FROM sessions 
    WHERE project_id = 'your-project-id' 
    LIMIT 5
  )
ORDER BY e.session_id, e.created_at;

Troubleshooting

Common Issues

Issue: "Events have no session_id"

  • Cause: Session reconstruction didn't run or failed
  • Fix: Ensure events are sorted by (device_id, created_at) before processing

Issue: "Too many sessions (one per event)"

  • Cause: Session timeout logic not working
  • Fix: Verify time comparison uses milliseconds, not seconds
  • Fix: Ensure device_id is consistent for same user

Issue: "Session duration is negative"

  • Cause: Events out of chronological order
  • Fix: Sort events before processing: .sort((a, b) => new Date(a.created_at) - new Date(b.created_at))

Issue: "ClickHouse insert fails with date format error"

  • Cause: Using ISO 8601 format with T and Z
  • Fix: Use YYYY-MM-DD HH:MM:SS format (no T, no Z, no milliseconds)

Issue: "Memory error with large imports"

  • Cause: Loading all events into memory at once
  • Fix: Process events in batches of 10,000-50,000
  • Fix: Use streaming CSV parser instead of loading entire file

Next Steps

Once you've successfully imported your data:

  1. Verify in OpenPanel Dashboard: Check that events and sessions appear correctly
  2. Set up retention: Configure data retention policies if needed
  3. Monitor performance: Watch query performance with imported data
  4. Document your mapping: Keep notes on how you mapped fields for future imports
  5. Automate if needed: Schedule regular imports if you have ongoing data

Additional Resources

Support

If you encounter issues with manual import:

  1. Check the validation queries above to identify the problem
  2. Review your transformation logic for correctness
  3. Verify your ClickHouse connection and permissions
  4. Open an issue on GitHub with:
    • Sample of your source data (anonymized)
    • Your transformation code
    • Error messages or unexpected results

Happy importing! πŸš€

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment