Skip to content

Instantly share code, notes, and snippets.

View manojkarthick's full-sized avatar
powered by caffeine

Manoj Karthick manojkarthick

powered by caffeine
View GitHub Profile
@jalkjaer
jalkjaer / run.sh
Last active November 3, 2019 19:55
steps to automatically build k8s spark with latest hadoop 2 version and push to ECR
export HADOOP_VERSION=2.9.1
export SPARK_VERSION=2.3.2
export AWS_ACCOUNT_ID=<your numeric AWS account id>
export ECR_REGION=us-east-1
# Fetch and extract the spark source
curl -L "https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}.tgz" | tar -xzvf -
cd "spark-${SPARK_VERSION}"
# set maven opts according to https://spark.apache.org/docs/latest/building-spark.html
export MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"
@astamicu
astamicu / Remove videos from Youtube Watch Later playlist.md
Last active December 9, 2025 16:39
Script to remove all videos from Youtube Watch Later playlist

UPDATED 22.11.2022

It's been two years since the last update, so here's the updated working script as per the comments below.

Thanks to BryanHaley for this.

setInterval(function () {
    video = document.getElementsByTagName('ytd-playlist-video-renderer')[0];

 video.querySelector('#primary button[aria-label="Action menu"]').click();
@nayato
nayato / Dockerfile
Last active May 8, 2023 13:36
Dockerfile for rust multistage build sample
FROM rustlang/rust:nightly as builder
WORKDIR /app/src
RUN USER=root cargo new --bin ht
COPY Cargo.toml Cargo.lock ./ht/
WORKDIR /app/src/ht
RUN cargo build --release
COPY ./ ./
RUN cargo build --release

GitHub Markup Reference

GitHub supports a number of

@orenitamar
orenitamar / Dockerfile
Last active March 22, 2024 05:13
Installing numpy, scipy, pandas and matplotlib in Alpine (Docker)
# Below are the dependencies required for installing the common combination of numpy, scipy, pandas and matplotlib
# in an Alpine based Docker image.
FROM alpine:3.4
RUN echo "http://dl-8.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories
RUN apk --no-cache --update-cache add gcc gfortran python python-dev py-pip build-base wget freetype-dev libpng-dev openblas-dev
RUN ln -s /usr/include/locale.h /usr/include/xlocale.h
RUN pip install numpy scipy pandas matplotlib
@thomasdarimont
thomasdarimont / KeycloakClientAuthExample.java
Last active June 30, 2025 01:39
Retrieve and verify AccessToken with Keycloak Client.
package de.tdlabs.keycloak.client;
import com.fasterxml.jackson.databind.ObjectMapper;
import org.keycloak.OAuth2Constants;
import org.keycloak.RSATokenVerifier;
import org.keycloak.admin.client.Keycloak;
import org.keycloak.admin.client.KeycloakBuilder;
import org.keycloak.common.VerificationException;
import org.keycloak.jose.jws.JWSHeader;
import org.keycloak.representations.AccessToken;
@dusenberrymw
dusenberrymw / spark_tips_and_tricks.md
Last active October 23, 2025 02:15
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
@karstenmueller
karstenmueller / aws_az_list.md
Created December 1, 2016 18:26
List of AWS availability zones for each AWS region

AWS Regions

Region Code Region Name Availability Zones
us-east-1* N. Virginia us-east-1a us-east-1b us-east-1c us-east-1d us-east-1e
us-east-2 Ohio us-east-2a us-east-2b us-east-2c
us-west-1* N. California us-west-1a us-west-1b us-west-1c
us-west-2 Oregon us-west-2a us-west-2b us-west-2c
eu-west-1 Ireland eu-west-1a eu-west-1b eu-west-1c
eu-central-1 Frankfurt eu-central-1a eu-central-1b
@andrearota
andrearota / example.scala
Created October 18, 2016 08:40
Creating Spark UDF with extra parameters via currying
// Problem: creating a Spark UDF that take extra parameter at invocation time.
// Solution: using currying
// http://stackoverflow.com/questions/35546576/how-can-i-pass-extra-parameters-to-udfs-in-sparksql
// We want to create hideTabooValues, a Spark UDF that set to -1 fields that contains any of given taboo values.
// E.g. forbiddenValues = [1, 2, 3]
// dataframe = [1, 2, 3, 4, 5, 6]
// dataframe.select(hideTabooValues(forbiddenValues)) :> [-1, -1, -1, 4, 5, 6]
//
// Implementing this in Spark, we find two major issues: