First time using Terminal to install Jupyter on Windows.
Another significant of this setup is that the Jupyter runs in virtualenv, a tool to create isolated Python environments.
Jupyter make it easy to communicate ideas that combine code, equations, text, visualizations & share with others. https://try.jupyter.org/
It is emerging as the standard for sharing reproducible research in the sciences.
My next step is to install R and run it in Jupyter environment.
Drawing on work by Tukey, Chambers, Breiman and Cleveland, Stanford statistics professor, David Donoho present a vision of data science based on the activities of people who are ‘learning from data’.
- John Tukey’s The Future of Data Analysis, asserts that Statistics must become concerned with the handling and processing of data, its size, and visualization.
- John Chambers’s S language, the predecessor of R, is the forerunner of the “notebook” concept, where an academic paper can be made reproducible, scripted, shareable (i.e. Jupyter Notebook)
- Leo Breiman’s Two Cultures notes that concern strictly with prediction accuracy is different from inference about models, and that the former is under-represented in academia but prevalent in industry, where it has turned into “machine learning.”
- William S. Cleveland 2001 paper Data Science: An Action Plan for Expanding the Technical Areas of the ﬁeld of Statistics addressed academic statistics departments and proposed a plan to reorient their work.
His paper reviews the recent spectacle about data science in the popular media, and about how/whether Data Science is really different from Statistics.
He also describe an academic ﬁeld dedicated to improving that activity in an evidence-based manner. His premises is that this new ﬁeld is a better academic enlargement of statistics and machine learning than today’s Data Science Initiatives, while being able to accommodate the same short-term goals.
He propose to call the following collection of activities below as a would-be ﬁeld “Greater Data Science”
1. Data Exploration and Preparation
2. Data Representation and Transformation
3. Computing with Data
4. Data Modeling
5. Data Visualization and Presentation
6. Science about Data Science
He contended that Information technology skills are a premium but scientiﬁc understanding and statistical insight should be ﬁrmly in the driver’s seat.
Check out a thoughtful essay by Stanford statistics professor David Donoho, titled “50 Years of Data Science“