Skip to content

Instantly share code, notes, and snippets.

@davidlenz
Last active July 1, 2025 06:18
Show Gist options
  • Select an option

  • Save davidlenz/deff6cc7405d58efa32f4dfe12a6db8b to your computer and use it in GitHub Desktop.

Select an option

Save davidlenz/deff6cc7405d58efa32f4dfe12a6db8b to your computer and use it in GitHub Desktop.
20 newsgroup dataset from sklearn to csv.
from sklearn.datasets import fetch_20newsgroups
import pandas as pd
def twenty_newsgroup_to_csv():
newsgroups_train = fetch_20newsgroups(subset='train', remove=('headers', 'footers', 'quotes'))
df = pd.DataFrame([newsgroups_train.data, newsgroups_train.target.tolist()]).T
df.columns = ['text', 'target']
targets = pd.DataFrame( newsgroups_train.target_names)
targets.columns=['title']
out = pd.merge(df, targets, left_on='target', right_index=True)
out['date'] = pd.to_datetime('now')
out.to_csv('20_newsgroup.csv')
twenty_newsgroup_to_csv()
@Adamthe1st
Copy link

Thank you so much! this was helpful

@Rishabh-creator601
Copy link

this is too good , you acan visit my github also github.com/rishabh-creator601

@JumpingDino
Copy link

JumpingDino commented Apr 27, 2022

Hi guys! If someone needs the data itself in .csv you can download here:
https://github.com/JumpingDino/datasets/blob/master/20newsgroup/20_newsgroup.csv

Thanks a lot for the code david :D !

@sufyanafzal7
Copy link

404 not found bro

@nelvintan
Copy link

Hi guys! If someone needs the data itself in .csv you can download here: https://github.com/JumpingDino/datasets/blob/master/20newsgroup/20_newsgroup.csv

Thanks a lot for the code david :D !

404 not found.

@JumpingDino
Copy link

Hello @sufyanafzal7 and @nelvintan, thanks for telling me, just added the dataset here if you are interested:
https://github.com/JumpingDino/nlp-datasets/tree/main/data

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment