Data-Science

Using ESS for Datascience

RStudio is a formidable IDE to work with and offers an environment to seamlessly work with multiple languages beyond R. It is especially convenient for tasks involving frequent visualisation of data frames and plots, and for use with Shiny app development. However, the text (i.e code) editing capabalities are still significantly lacking compared to the likes of Emacs and Vim. Besides this, it does not offer a seamless interface integrating task, time management and multi-language programming environments to the extent available within Org-mode via Emacs.

Nteract : An interactive computing environment

A slide deck from Netflix, mentions using Nteract as their programming notebook, and prompted a mini exploration. This blog post by Safia Abdalla, (a maintainer/ developer of Nteract) introduces Nteract as an open source, desktop-based, interactive computing application that was designed to overcome a bunch of limitations in Jupyter Notebook’s design philosophy. One key difference (among many others) is the ability to execute code in a variety of languages within a single notebook, and it also appears that that the electron based desktop app should make it easier for beginners to start coding.

Technical notes : Research paper on learning/teaching data science

Title: Navigating Diverse Data Science Learning: Critical Reflections Towards Future Practice Author: Yehia Elkhatib Download link This are my notes on the above paper, which mainly deals with detailing the methods explored and implemented to impart a high quality of education in data science. The paper also provides an interesting breakup of the different roles in data science workflows. The importance of being able to work in a team is highlighted.

R notes and snippets

Long <-> Wide formats : example for gathering library("tidyverse") ## Defining a sample tribble with several duplicates a <- tribble( ~IDS, ~"client id 1", ~"client id 2", ~"client id 3", ~"client id 4", ~"old app", ~"new app", 123, 767, 888,"" , "", "yes" , "no", 222, 333, 455, 55, 677, "no", "yes", 222, 333, 343, 55,677, "no", "yes" ) ## Defining vector to form column names vec1 <- seq(1:4) vec2 <- "client id" vec3 <- str_glue("{vec2} {vec1}") ## Gathering and removing duplicates a %>% gather( key = "Client number", value = "client ID", vec3 ) %>% unique() Matrix Defining a matrix A matrix is a collection of elements of the same data type (numeric, character, or logical) arranged into a fixed number of rows and columns.