Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Select an option

  • Save ascherj/5353263a84ce984a7c8fd13e9b497093 to your computer and use it in GitHub Desktop.

Select an option

Save ascherj/5353263a84ce984a7c8fd13e9b497093 to your computer and use it in GitHub Desktop.
πŸ§‘β€πŸ’» Interviewer Guide: Find the Index of the First Occurrence in a String (UMPIRE Method)

πŸ§‘β€πŸ’» Interviewer Guide: Find the Index of the First Occurrence in a String (UMPIRE Method)

https://leetcode.com/problems/find-the-index-of-the-first-occurrence-in-a-string/

πŸ“Œ Problem Statement

Given two strings needle and haystack.
Return the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.


🧭 U β€” Understand the Problem

Interviewer Prompts & Possible Answers:

  • "Can you restate the problem in your own words?"
    β†’ "We need to find where a substring (needle) first appears in a larger string (haystack), and return that starting position. If it doesn't appear at all, return -1."

  • "What should we return if the needle is empty?"
    β†’ "If needle is empty, we should return 0 (it's found at the beginning by convention)."

  • "Are we looking for the first occurrence or all occurrences?"
    β†’ "Just the first occurrence. Once we find it, we can stop searching."

  • "Is the search case-sensitive?"
    β†’ "Yes, 'A' and 'a' are different characters."

  • "What if the needle is longer than the haystack?"
    β†’ "Then it's impossible for needle to be in haystack, so we return -1."

  • "Can you give an example?"
    β†’ "If haystack is 'hello' and needle is 'll', we return 2 because 'll' starts at index 2."


πŸ§ͺ M β€” Match with Patterns

Interviewer Prompts & Possible Answers:

  • "What type of problem is this?"
    β†’ "This is a string searching or pattern matching problem."

  • "Have you seen similar problems before?"
    β†’ "Yes, this is the classic substring search problem. It's similar to how Ctrl+F works in a text editor."

  • "What approaches do you know for string searching?"
    β†’ "The naive approach checks every possible position. More advanced algorithms include KMP (Knuth-Morris-Pratt), Boyer-Moore, and Rabin-Karp."

  • "What would be the simplest approach?"
    β†’ "Check each position in haystack to see if needle starts there. Use a sliding window of size equal to needle's length."

  • "Does Python have a built-in for this?"
    β†’ "Yes, the str.find() method does exactly this, but we should implement it ourselves for the interview."


πŸ› οΈ P β€” Plan the Solution

Interviewer Prompts & Possible Answers:

  • "Walk me through your approach step by step."
    β†’ "Iterate through each possible starting position in haystack (up to haystack length minus needle length). At each position, check if the substring matches needle. Return the index if we find a match, or -1 if we finish without finding it."

  • "How many positions do we need to check?"
    β†’ "We need to check positions 0 through (len(haystack) - len(needle)). Beyond that, there's not enough room for needle to fit."

  • "How do we check if needle matches at a given position?"
    β†’ "We can either: (1) slice haystack and compare to needle, or (2) use a nested loop to compare character by character."

  • "Which approach would you prefer and why?"
    β†’ "Using Python's string slicing is cleaner and leverages optimized built-in code. The nested loop gives more control but is more verbose."

  • "What's the early termination condition?"
    β†’ "As soon as we find a match, we return that index immediately."


⌨️ I β€” Implement

Main Solution (String Slicing)

def strStr(haystack: str, needle: str) -> int:
    # Edge case: empty needle
    if not needle:
        return 0
    
    # Edge case: needle longer than haystack
    if len(needle) > len(haystack):
        return -1
    
    # Check each possible starting position
    for i in range(len(haystack) - len(needle) + 1):
        # Check if needle matches at position i
        if haystack[i:i + len(needle)] == needle:
            return i
    
    # No match found
    return -1

Alternative Implementation (Character-by-Character)

def strStr(haystack: str, needle: str) -> int:
    if not needle:
        return 0
    
    if len(needle) > len(haystack):
        return -1
    
    # Try each starting position
    for i in range(len(haystack) - len(needle) + 1):
        # Check if needle matches starting at position i
        match = True
        for j in range(len(needle)):
            if haystack[i + j] != needle[j]:
                match = False
                break
        
        if match:
            return i
    
    return -1

Optimized Implementation (Two Pointers)

def strStr(haystack: str, needle: str) -> int:
    if not needle:
        return 0
    
    h_len, n_len = len(haystack), len(needle)
    
    if n_len > h_len:
        return -1
    
    # Try each valid starting position
    for i in range(h_len - n_len + 1):
        # Use two pointers to compare
        h_ptr = i
        n_ptr = 0
        
        while n_ptr < n_len and haystack[h_ptr] == needle[n_ptr]:
            h_ptr += 1
            n_ptr += 1
        
        # If we matched all of needle, we found it
        if n_ptr == n_len:
            return i
    
    return -1

πŸ” R β€” Review

Interviewer Prompts & Possible Answers:

  • "What edge cases should we test?"
    β†’ "Empty needle, empty haystack, needle longer than haystack, needle not in haystack, needle at the beginning, needle at the end, single character strings, needle equals haystack."

  • "Can you trace through an example?"
    β†’ "Let's trace 'sadbutsad' searching for 'sad': Start at index 0: 'sad' == 'sad' βœ“ Return 0."

  • "Can you trace through a non-match example?"
    β†’ "Let's trace 'leetcode' searching for 'leeto': Index 0: 'leeto' vs 'leetc' βœ—, Index 1: 'eeto' (only 4 chars left, need 5) β†’ return -1."

  • "Could you write some test cases?"

def test_strStr():
    # Test case 1: Needle found at beginning
    assert strStr("sadbutsad", "sad") == 0
    
    # Test case 2: Needle found in middle
    assert strStr("hello", "ll") == 2
    
    # Test case 3: Needle not found
    assert strStr("aaaaa", "bba") == -1
    
    # Test case 4: Empty needle
    assert strStr("hello", "") == 0
    
    # Test case 5: Needle longer than haystack
    assert strStr("abc", "abcd") == -1
    
    # Test case 6: Needle equals haystack
    assert strStr("abc", "abc") == 0
    
    # Test case 7: Single character match
    assert strStr("a", "a") == 0
    
    # Test case 8: Single character no match
    assert strStr("a", "b") == -1
    
    # Test case 9: Needle at end
    assert strStr("mississippi", "issip") == 4
    
    # Test case 10: Multiple occurrences (return first)
    assert strStr("ababcabc", "abc") == 2

🧹 E β€” Evaluate & Optimize

Interviewer Prompts & Possible Answers:

  • "What's the time and space complexity of your solution?"
    β†’ "Time: O(n * m) where n is haystack length and m is needle length. In the worst case, we check each of the n-m+1 positions and compare m characters. Space: O(1) if we use the character-by-character approach, or O(m) if we use slicing (creates temporary strings)."

  • "Can this be optimized further?"
    β†’ "Yes! The KMP algorithm achieves O(n + m) time by preprocessing the needle to avoid redundant comparisons. Boyer-Moore can be even faster in practice by skipping characters."

  • "When would you use the naive approach vs. KMP?"
    β†’ "For short strings or when needle length is small, the naive approach is simpler and sufficient. KMP is worth the complexity for large texts with long patterns, especially with many partial matches."

  • "What about using built-in functions?"
    β†’ "In production code, use haystack.find(needle) which is optimized in C. In interviews, implement it yourself unless told otherwise."

  • "How does string slicing compare to character-by-character?"
    β†’ "Slicing is cleaner and often faster due to optimization, but creates temporary strings (O(m) space). Character comparison uses O(1) space but may be slower for long needles."

  • "What if we needed to find all occurrences, not just the first?"
    β†’ "Continue the loop instead of returning early, and collect all matching indices in a list. Time complexity remains O(n * m)."

Advanced: KMP Algorithm (Optional)

def strStr_KMP(haystack: str, needle: str) -> int:
    if not needle:
        return 0
    
    # Build failure function (LPS array)
    def build_lps(pattern):
        lps = [0] * len(pattern)
        length = 0
        i = 1
        
        while i < len(pattern):
            if pattern[i] == pattern[length]:
                length += 1
                lps[i] = length
                i += 1
            elif length > 0:
                length = lps[length - 1]
            else:
                lps[i] = 0
                i += 1
        return lps
    
    lps = build_lps(needle)
    i = j = 0
    
    while i < len(haystack):
        if haystack[i] == needle[j]:
            i += 1
            j += 1
            if j == len(needle):
                return i - j
        elif j > 0:
            j = lps[j - 1]
        else:
            i += 1
    
    return -1

Time Complexity: O(n + m)
Space Complexity: O(m) for LPS array

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment