finetune finetunej

enijkamp / gist:aaacc540ebefccc1bc5fef457e54ddbe

Created July 29, 2021 13:04

reshard.py

	def apply_reshard(pytree_params_in, pytree_params_out, shards_in, shards_out):

	def override_dtype(x):
	if x.dtype == np.dtype('V2'):
	x.dtype = jnp.bfloat16
	return x

	def is_leaf(x):
	return type(x) == np.ndarray

kinoc / j6b_train_hf_ds.py

Last active January 1, 2026 09:46

So now you want to finetune that GPT-J-6B on a 3090/TITAN GPU ... okay, using HF and DeepSpeed too

	# So now you want to finetune that GPT-J-6B on a 3090/TITAN GPU ... okay
	# More exploratory coding. It uses the Huggingface model port, deepspeed and reads all text/md files from a target directory
	# It is a fragment of a larger system with remote editing, but that's another story
	# This is the raw, training tester. Items to look out for:
	# - uses DeepSpeed and has a DS config
	# - to save space uses SGD instead of ADAM
	# - uses gradient checkpointing
	# - freezes 25% of the layers to fit

	# Assumes you can already run https://gist.github.com/kinoc/2d636a68876cd3de7b6e9c9452b61089