LewisGet’s gists

LewisGet / database_conver_script.py

Created December 31, 2025 08:52

	import glob
	import os
	from pydub import AudioSegment

	database_path = "./Taiwan-Tongues-ASR-CE-dataset-zhtw/train"
	database_list = []

	for i in glob.glob(os.path.join(database_path, "mp3", "*.mp3")):
	filename = os.path.basename(i).split(".")[0]
	filetext = os.path.join(database_path, "txt", filename + ".txt")

LewisGet / restic_ignore.txt

Last active October 29, 2025 13:52

LewisGet / output.md

Created October 2, 2025 11:43

LewisGet / split_audio.py

Created September 25, 2025 08:21

	from pyannote.audio import Pipeline
	import torch

	pipeline = Pipeline.from_pretrained("pyannote/speaker-diarization-3.1", use_auth_token="...")

	import os
	import glob
	import torchaudio
	from pydub import AudioSegment

LewisGet / docker-compose.yml

Last active September 10, 2025 04:47

workspace

	services:
	workspace:
	image: panjojocom/py39-cuda-12.8.1-ubuntu-24.04-audio-torch
	container_name: workspace
	volumes:
	- ./workspace:/workspace
	ports:
	- 8080:8080
	- 8000:8000
	- 5050:5050

LewisGet / youtube_subtitle_download.js

Last active September 7, 2025 08:22

	// 下載 youtube 字幕按鈕
	// 建立一個下載字幕的按鈕
	const downloadButton = document.createElement('a');


	var yt = ytInitialPlayerResponse['captions'];

	// 字幕語言列表物件，這物件需讀取過，才會將字幕 url 讀取，通過 token
	// yt.playerCaptionsTracklistRenderer.captionTracks;

LewisGet / local_server_ssh_autocomplete.bashrc

Last active August 19, 2025 12:53

	# 懶得裝 zsh 跟整套 oh my 時，塞上去 bash 版本

	_my_ssh_autocomplete() {
	local cur=${COMP_WORDS[COMP_CWORD]}

	if [[ $cur =~ ^[0-9]{1,3}$ ]]; then
	local full_ip="[email protected].${cur}"
	COMPREPLY=( $(compgen -W "$full_ip" -- "$cur") )
	fi
	}

LewisGet / llm_4bit_dpo_train.py

Created August 15, 2025 09:50

4bit 跑 dpo 沒對齊提問種類，因為資料不足。

	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig

	from datasets import Dataset
	from torch.cuda.amp import autocast

	import json
	import glob
	import os

LewisGet / llm_4bit_train.py

Last active August 13, 2025 23:43

用 4bit int8 的訓練資料來訓練 llm

	import torch
	from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, BitsAndBytesConfig
	import json
	import os

	model_path = "/workspace/Qwen3-0.6B"
	fine_tune_path = "/workspace/4bit-train"

	per_device_train_batch_size=3
	learning_rate=5e-6

LewisGet / ask_script.py

Last active August 12, 2025 05:18

私人 llm，用 ollama 批量翻譯或者整理概要，也可以刪除廣告用，目前是在做 role play rpg 的訓練資料

	from ollama import Client
	import json
	import re

	client = Client(
	host='http://ollama:11434',
	headers={'x-some-header': 'some-value'}
	)

	f = open("text.txt", "r")

Lewis Jang LewisGet