Skip to content

Instantly share code, notes, and snippets.

View SimonNie98's full-sized avatar
🎣
Always.

SimonNie SimonNie98

🎣
Always.
View GitHub Profile
@ctlllll
ctlllll / longest_chinese_tokens_gpt4o.py
Created May 13, 2024 19:53
Longest Chinese tokens in gpt4o
import tiktoken
import langdetect
T = tiktoken.get_encoding("o200k_base")
length_dict = {}
for i in range(T.n_vocab):
try:
length_dict[i] = len(T.decode([i]))
except: