Skip to content

Instantly share code, notes, and snippets.

@srijancse
Created July 9, 2024 18:28
Show Gist options
  • Select an option

  • Save srijancse/841206558d46ebddf02cdd005aee88e1 to your computer and use it in GitHub Desktop.

Select an option

Save srijancse/841206558d46ebddf02cdd005aee88e1 to your computer and use it in GitHub Desktop.
datasets.md

Datasets

Notes

  • This is not an exhaustive list of all the datasets we would need for our platform
  • The idea here is to find a minimal viable subset that is sufficient to create context rich api that we need such as PNL, holdings, etc
  • The scope here is restricted to Wallet PnL with Uniswap LP Positions, Token Holdings enriched with farcaster details
  • The datasets are divided into 3 stages
    • Stage 0: Ingest
      • This is where we ingest data from various sources
      • The data is raw and not transformed
      • Current sources include Cryo+Base Node, Farcaster Hub, Coingecko API
    • Stage 1: Initial Transform
      • This is where we transform the raw data into a more usable format
      • Everything on this level is ideally enriched with pricing + farcaster data on a minimum
      • Datasets here would include: Encriched FC user/cast, Dex Trades, Prices Dex Expanded, ERC20 Transfers, ERC20 Balances
    • Stage 2: Project Level Transform
      • This is where we create project level transformation that is required
      • For example purposes, we have included Uniswap LP positions
      • In the future we would have to do this for all the projects we are interested in
    • Stage 3: Domain Level Transform
      • This is where we create domain level transformation that is required for different access paterns
      • For example purposes, we have included Wallet PNL with LP, Token details, Fid holdings and pnl.
      • Roughly representative of data required for the token page
      • In the future we would have to do this for all the domains we are interested in
flowchart TD
    subgraph Sources
        Cryo[Cryo + Node]
        FarcasterHub[Farcaster Hub]
        CoingeckoAPI[Coingecko API]
    end

    subgraph Stage0_Ingest
        A[Transaction Data]
        B[Transaction Logs Data]
        C[Transaction Traces Data]
        D[Farcaster User Data]
        F[Farcaster Cast Data]
        G[Farcaster Reactions Data]
        H[Farcaster Link Data]
        I[Farcaster Verifications Data]
        S[Prices Data]
    end

    subgraph Stage1_Transform
        E[Farcaster User Enriched]
        J[Dex Trades Data]
        K[Prices Dex Expanded]
        N[ERC20 Transfers]
        M[ERC20 Balances]
 
   

    end

    subgraph Stage2_ProjectLevelTransform
        L[Uniswap LP Positions]
        SH[Staking Holding]
    end

    subgraph Stage3_DomainLevelTransform
        R[Wallet PNL with LP]
        P[Token details]
        Q[Fid holdings and pnl]
    end

    Cryo --> A
    Cryo --> B
    Cryo --> C
    FarcasterHub --> D
    FarcasterHub --> F
    FarcasterHub --> G
    FarcasterHub --> H
    FarcasterHub --> I
    CoingeckoAPI --> S

    A --> J
    B --> J
    B --> N
    C --> J
    D --> E
    D --> L
    E --> J
    E --> L
    E --> N
    E --> Q
    F --> E
    G --> E
    H --> E
    I --> E
    J --> K
    J --> P
    K --> N
    K --> L
    K --> M
    K --> O
    K --> P
    N --> M
    N --> P
    M --> O
    O --> R
    L --> R
    R --> Q
    P --> Q
    D--> SH
    K-->SH
    E --> SH
    SH --> R
Loading

Stage 0 (Ingest)

Transaction Data

  • Name: Transaction Data
  • Description: Contains all Transaction related information + timestamp
  • Required Columns:
    • from_address
    • to_address
    • input
    • block_number
    • transaction_index
    • transaction hash
    • timestamp
  • Constraints
    • unique transaction hash
  • Sort order
    • block_number, transaction_index
  • Availability: Periodically updated (~hourly for now)
  • Dependencies: Node + Cryo

Transaction Logs Data

  • Name: Transaction Logs Data
  • Description: Contains all logs related to transaction
  • Required Columns:
    • address
    • block_number
    • transaction_index
    • log_index
    • data
    • topic0
    • topic1
    • topic2
    • topic3
    • timestamp
  • Constraints
    • unique block_number, log_index
  • Sort order
    • block_number, log_index
  • Availability: Periodically updated (~hourly for now)
  • Dependencies: Node + Cryo

Transaction Traces Data

  • Name: Transaction Traces Data
  • Description: Contains all traces related to transaction
  • Required Columns:
    • action_from
    • action_to
    • action_value
    • action_gas
    • action_input
    • action_call_type
    • action_init
    • action_reward_type
    • action_type
    • result_gas_used
    • result_output
    • result_code
    • result_address
    • trace_address
    • subtraces
    • transaction_index
    • transaction_hash
    • block_number
    • block_hash
    • error
  • Constraints
    • unique transaction_hash, trace_address
  • Sort order
    • block_number, transaction_index, trace_address
  • Availability: Periodically updated (~hourly for now)
  • Dependencies: Node + Cryo

Farcaster User Data

  • Name: Farcaster User Data
  • Description: All FC user info
  • Required Columns:
    • fid
    • username
    • displayname
    • pfp
    • bio
    • last_updated
  • Constraints
    • unique fid
  • Sort order
    • fid
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Farcaster Hub

Farcaster Cast Data

  • Name: Farcaster Cast Data
  • Description: All FC cast info
  • Required Columns:
    • hash
    • fid
    • parent_url
    • url_embeds
    • cast_embeds
    • parent_cast_fid
    • parent_cast_hash
    • mention
    • mention_positions
    • text
    • timestamp
    • is_deleted
  • Constraints
    • unique hash
  • Sort order
    • timestamp
  • Partioning Attribute
    • fid
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Farcaster Hub

Farcaster Reactions Data

  • Name: Farcaster Reactions Data
  • Description: All FC reactions info
  • Required Columns:
    • target_hash
    • target_url
    • fid
    • reaction_type
    • timestamp
    • is_deleted
  • Constraints
    • unique target_hash, fid, target_url, reaction_type
  • Sort order
    • timestamp
  • Partioning Attribute
    • fid
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Farcaster Hub

Farcaster Link Data

  • Name: Farcaster Link Data
  • Description: All FC link info
  • Required Columns:
    • fid
    • target_fid
    • timestamp
    • title
    • description
    • image
    • timestamp
  • Constraints
    • unique fid, target_fid
  • Sort order
    • timestamp
  • Partioning Attribute
    • fid
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Farcaster Hub

Farcaster Verifications Data

  • Name: Farcaster Verifications Data
  • Description: All FC verifications info
  • Required Columns:
    • fid
    • verification_address
    • protocol
    • timestamp
    • is_deleted
  • Constraints
    • unique fid, verification_address
  • Sort order
    • timestamp
  • Partioning Attribute
    • fid
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Farcaster Hub

Prices Data

  • Name: Prices Data
  • Description: All prices info for top 20 assets on coingecko
  • Required Columns:
    • timestamp
    • asset
    • price
  • Constraints
    • unique timestamp, asset
  • Sort order
    • timestamp
  • Partioning Attribute
    • asset
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Coingecko API

Stage 1 (transform)

Farcaster User Enriched

  • Name: Farcaster User Enriched
  • Description: Farcaster User Enriched
  • Required Columns:
    • fid
    • username
    • displayname
    • pfp
    • bio
    • last_updated
    • no_of_casts{1h,24h,7d,30d,90d,180d,365d,all_time}
    • no_of_likes{1h,24h,7d,30d,90d,180d,365d,all_time}
    • no_of_recasts{1h,24h,7d,30d,90d,180d,365d,all_time}
    • follow_list
    • follower_list
    • addresses
  • Constraints
    • unique fid
  • Sort order
    • fid
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Farcaster User Data, Farcaster Cast Data, Farcaster Reactions Data, Farcaster Link Data, Farcaster Verifications Data

Farcaster Cast Enriched

  • Name: Farcaster Cast Enriched
  • Description: Farcaster Cast Enriched
  • Required Columns:
    • hash
    • fid
    • channel_name
    • channel_url
    • channel_pfp
    • username
    • user_bio
    • user_pfp
    • user_displayname
    • url_embeds
    • cast_embeds
    • root
    • parent
    • text
    • text_normalised
    • no_of_recasts
    • no_of_likes
    • children
    • no_of_children
    • timestamp
    • is_deleted
    • annotations
  • Constraints
    • unique hash
  • Sort order
    • timestamp
  • Partioning Attribute
    • fid
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Farcaster Cast Data, Farcaster User Enriched, Farcaster Reactions Data

Dex Trades Data

  • Name: Dex Trades Data
  • Description: All trades on dex
  • Required Columns:
    • timestamp
    • transaction_hash
    • project
    • pool_address
    • token0
    • token1
    • is_token0_sent
    • token_amount_sent
    • token_amount_received
    • token_amount_sent_usd
    • token_amount_received_usd
    • farcaster_user
  • Constraints
    • unique transaction_hash
  • Sort order
    • timestamp
  • Partioning Attribute
    • ??
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Transaction Data, Transaction Logs Data, Farcaster User Enriched, Prices Data

Prices Dex Expanded

  • Name: Prices Dex Expanded
  • Description: Price info extrapolated from uniswap swap events and cg assets
  • Required Columns:
    • timestamp
    • asset
    • refercence_pool
    • reference_quote
    • reference_to_usd
    • asset_in_usd
  • Constraints
    • unique timestamp, asset
  • Sort order
    • timestamp
  • Partioning Attribute
    • asset
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Dex Trades Data, Prices Data

ERC20 Transfers

  • Name: ERC20 Transfers
  • Description: ERC20 token transfers
  • Required Columns:
    • from_adddress,
    • to_address
    • token_address
    • value
    • usd_value
    • from_farcaster_user
    • to_farcaster_user
    • transaction_hash
    • block_number
    • log_index
    • timestamp
  • Constraints
    • unique block_number, log_index
  • Sort order
    • block_number, log_index
  • Partioning Attribute
    • token_address
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Transaction Data, Transaction Logs Data, Farcaster User Enriched, Prices Dex Expanded

ERC20 Balances

  • Name: ERC20 Balances
  • Description: ERC20 token balances
  • Required Columns:
    • user_address
    • farcaster_user
    • token_address
    • balance
    • balance_usd
    • block_number
    • log_index
    • timestamp
  • Constraints
    • unique user_address, token_address, block_number, log_index
  • Sort order
    • block_number, log_index
  • Partioning Attribute
    • token_address
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: ERC20 Transfers, Prices Dex Expanded

Wallet PNL

  • Name: Wallet PNL
  • Description: Wallet PNL
  • Required Columns:
    • user_address
    • farcaster_user
    • timestamp
    • token_address
    • net_bought
    • net_sold
    • realised_pnl{1h,24h,7d,30d,90d,180d,365d,all_time}
    • unrealised_pnl{1h,24h,7d,30d,90d,180d,365d,all_time}
  • Constraints
    • unique user_address, token_address, timestamp
  • Sort order
    • timestamp
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: ERC20 Balances, Prices Dex Expanded, Dex Trades Data

Stage 2 (project level transform)

Uniswap LP Positions (Example DeFi App)

  • Name: Uniswap LP Positions
  • Description: Uniswap LP Positions
  • Required Columns:
    • user_address
    • farcaster_user
    • pool_address
    • token0
    • token1
    • token0_balance
    • token1_balance
    • token0_balance_usd
    • token1_balance_usd
    • token0_bought
    • tick
    • sqrt_price
    • liquidity
  • Constraints
    • unique user_address, pool_address
  • Sort order
    • user_address, pool_address
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Transaction Data, Transaction Logs Data, Prices Dex Expanded, Farcaster User Enriched

Stage 3 (domain level transform)

Wallet PNL with LP

  • Name: Wallet PNL with LP

  • Description: Wallet PNL with LP

  • Required Columns:

    • user_address
    • farcaster_user
    • timestamp
    • token_address
    • net_bought
    • net_sold
    • realised_pnl{1h,24h,7d,30d,90d,180d,365d,all_time}
    • unrealised_pnl{1h,24h,7d,30d,90d,180d,365d,all_time}
    • lp_positions
  • Constraints

    • unique user_address, token_address, timestamp
  • Sort order

    • timestamp
  • Availability: Not available at the moment in "this" pipeline

  • Dependencies: Wallet PNL, Uniswap LP Positions

Token details

  • Name: Token details
  • Description: Token details
  • Required Columns:
    • token_address
    • name
    • symbol
    • decimals
    • total_supply
    • circulating_supply
    • pool_liquidity
    • fdv
    • market_cap
    • volume{1h,24h,7d,30d,90d,180d,365d,all_time}
    • all_time_high
    • all_time_low
    • all_time_high_date
    • all_time_low_date
    • all_time_high_usd
    • all_time_low_usd
    • all_time_high_usd_date
    • all_time_low_usd_date
  • Constraints
    • unique token_address
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: ERC20 Transfers, Dex Trades Data, Prices Dex Expanded, Prices Data

Fid holdings and pnl

  • Name: Fid holdings and pnl
  • Description: Fid holdings and pnl
  • Required Columns:
    • fid
    • farcaster_user
    • holdings{1h,24h,7d,30d,90d,180d,365d,all_time}
    • pnl{1h,24h,7d,30d,90d,180d,365d,all_time}
  • Availability: Not available at the moment in "this" pipeline
  • Dependencies: Farcaster User Data, Wallet PNL with LP, Token details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment