Data Cleaning

Snippet of Dataset Description
r_cleaning_pic
View
Download clean csv file
Download raw csv file

R code - Record Data


R is used for cleaning labeled record data. The data used is Age by Gender of Social workers in the USA. All the additional unnecesary columns are dropped, the data is checked for missing NA values and the type of the columns of the dataset are fixed.

R code
clean_twitter_pic
View
Download raw csv file
Download clean csv file

Python code - Text data

Python is used for cleaning text data from Twitter. All the columns are dropped and only the text column is retained. Countvectorizer and WordCloud is used.

Python code