File Type: PY
Lines: 369
Size: 12.8 KB
Generated: 1/31/2026, 5:45:38 PM
This file implements PineconeMemory, a robust, self-contained service class designed to manage persistent, vector-based memory for an AI pipeline. It acts as a high-level abstraction layer over the Pinecone vector database and the Sentence Transformers embedding library, providing standard CRUD (Create, Read, Update, Delete) and Query functionality.
The service employs defensive programming by using try...except ImportError blocks to check for the availability of pinecone and sentence_transformers. If either dependency is missing, the service sets self.enabled = False and logs a warning, allowing the main application to run without the memory feature, preventing hard crashes.
The core components (Pinecone client, Index object, and Sentence Transformer model) are initialized lazily within the private method _initialize(). This ensures that network connections and model loading only occur when the first memory operation (e.g., upsert or query) is explicitly called, optimizing startup time and resource usage if the memory service is optional.
Configuration is prioritized from a PipelineConfig object, with fallback mechanisms to environment variables (PINECONE_API_KEY, PINECONE_INDEX_NAME). This adheres to best practices for configurable services.
During initialization, the service automatically checks if the target Pinecone index (self._index_name) exists. If it does not, the service creates it idempotently.
- Dimension: Hardcoded to
384, matching the output dimension of the chosen embedding model (all-MiniLM-L6-v2). - Specification: Uses
ServerlessSpec(AWS,us-east-1), simplifying infrastructure management. - Metric: Uses
cosinesimilarity, standard for semantic search.
The service relies on the SentenceTransformer('all-MiniLM-L6-v2') model for generating dense vector representations of input text. This model is chosen for its balance of speed and semantic performance, suitable for general-purpose memory tasks.
| Method | Purpose | Technical Detail |
|---|---|---|
__init__ |
Setup and Configuration | Checks dependencies, loads API key/index name from config/env, sets self.enabled status. |
_initialize |
Lazy Setup | Instantiates Pinecone client, checks/creates the index, loads the SentenceTransformer model. |
upsert |
Store Memory | Encodes content into a 384-dimension vector. Generates a UUID. Stores metadata including source, created_at, and a truncated text preview (first 1000 chars). |
query |
Retrieve Memory | Encodes query_text, performs a vector similarity search (top_k), and supports optional Pinecone metadata filtering. Results are mapped to the structured MemoryEntry TypedDict. |
update |
Modify Memory | Requires fetching the existing vector and metadata first. If content is provided, a new vector is generated. If only metadata is provided, the existing vector is preserved. Uses upsert internally to overwrite the existing ID. |
delete, delete_by_filter |
Removal | Provides granular deletion by specific ID or bulk deletion using Pinecone's powerful metadata filtering capabilities. |
batch_upsert |
Bulk Loading | (Truncated, but implied) Essential for high-throughput ingestion, allowing multiple vectors to be sent in a single API call. |
The MemoryEntry(TypedDict) defines the standardized output format for retrieved memories, ensuring consumers of the service receive predictable data:
id: The unique Pinecone vector ID (UUID).content: The retrieved text (typically the truncated preview stored in metadata).metadata: The full dictionary of associated metadata (source, timestamps, etc.).score: The similarity score (cosine distance) from the query.
- Hardcoded Model and Dimension: The embedding model (
all-MiniLM-L6-v2) and the resulting dimension (384) are hardcoded. For a more flexible pipeline, these should be configurable parameters inPipelineConfig. If the model changes, the index creation logic must also be updated to reflect the new dimension, or the service will fail. - Full Content Storage: The service stores only the first 1000 characters of the content in the Pinecone metadata (
"text"). While this is efficient for indexing, the full original content is not stored within the vector database itself. If the full text is required upon retrieval, the service assumes the consumer can retrieve it from the originalsource_fileor that the 1000-character preview is sufficient. - Error Handling Granularity: While general
try...except Exceptionblocks are used, specific handling for Pinecone API errors (e.g., rate limiting, authentication failures) could provide more informative logging and recovery mechanisms. - Region Hardcoding: The
ServerlessSpecregion is hardcoded tous-east-1. This should ideally be configurable to minimize latency for users in other geographic regions.
Description generated using AI analysis