portable data science tools

Toying the idea to have a portable data science tools to bring around. Prior, I being using
Anacoda on a Windows PC

76 days of “enforced” summer breaks without toggling along a laptop means I am away from a computer. If I am lucky, I have internet access on a mobile phone or a desktop with better connectivity but without the data science tools. Wanting to keep up leanring and developing, so my skills are no longer rusty, the idea to have a portable data science tools to bring around ever more sensible.

Back to Dublin and in a bit of a slump, struggle to get my motivation back and not sure what to be doing with everything. Worse of all, feeling guilty at the same time. I then decide to revisit those DIY project I embark on last year without success. The success of having a SSD and its performace spurred me on https://twitter.com/mryap/status/900776007409520642

Jupyter on the Cloud

One option is to use cloud computing. Azure Notebooks looks like a viable options. Jupyter is used as an IDE for both Python and R. This negates the use of seperate RStudio and Python IDE.

Jupyter in the USB drive

I am using Winpython with success. It works when I move WinPython in the following situations.

  • one portable drive to another (USB to portable drive),
  • one directory to another on the same machine.

Nothing breaks. In terms of performance, it bets cloud-hosted Jupyter like Axure Notebooks based on install cygwin and run ipython from there.

  • to avoid “Error in contrib.url(repos, “source”): trying to use CRAN without setting a mirror”, you need to run this first

  • you can use Notepad++ to tinker with the user-interface, Juypton dashboard.
    (https://twitter.com/mryap/status/910825822541447169)

  • Cygwin gives you the advantage of being able to run linux commands on Windows, act on data and setting up data science
    environment without installing Ubuntu. These linux commands are useful if you are following good tutorials where linux commands is commonly use compare to Windows PC.

    Verdict

    Being using Rstudio for sometime, I really love using Jupyter as it allows me to work with both R and Python on the same tool. Don’t think I will go back but I am keeping RStudio as now you can use R Markdown to create website IF you want the quickest, easiest way to get Jupyter notebook up and running, Anaconda is an excellent option. If ultimate portability is all you seek, goes for the cloud options. My favourite option is Winpython https://winpython.github.io/

    The USD 15,000 Deep Learning Machine

    The Beast

    A purpose-bulit machine to design and train your model to detect and identify “Empty” and “Occupied” Parking spots(a.k.a Deep Learning). At USD 15,000 each, the hardware requirement for someone embarking on a deep learning project also need a deep pocket. After much googling, I came up with a cheaper option. Without a monitor, it bloody expensive for me.

    PCPartPicker part list / Price breakdown by merchant

    Type Item Price
    CPU Intel – Core i5-6600K 3.5GHz Quad-Core Processor $228.98 @ OutletPC
    CPU Cooler Cooler Master – GeminII M4 58.4 CFM Sleeve Bearing CPU Cooler $32.89 @ OutletPC
    Motherboard Asus – B150I PRO GAMING/WIFI/AURA Mini ITX LGA1151 Motherboard Purchased For $0.00
    Memory Kingston – FURY 16GB (2 x 8GB) DDR4-2400 Memory $193.78 @ OutletPC
    Storage Samsung – 960 EVO 250GB M.2-2280 Solid State Drive $117.60 @ Amazon
    Video Card Gigabyte – GeForce GTX 1070 8GB Mini ITX OC Video Card $424.98 @ Newegg
    Case Cooler Master – Elite 110 Mini ITX Tower Case $38.89 @ OutletPC
    Power Supply Cooler Master – 550W 80+ Bronze Certified Semi-Modular ATX Power Supply $57.98 @ Newegg
    Prices include shipping, taxes, rebates, and discounts
    Total (before mail-in rebates) $1125.10
    Mail-in rebates -$30.00
    Total $1095.10
    Generated by PCPartPicker 2017-09-21 15:55 EDT-0400

    To side track, I learn that

    CPU and Cooling

    The CPU must be compatible with the selected chipset while providing sufficient PCIe support. Consumer CPUs are ideal because the application target is highly fault tolerant and the NVIDIA DIGITS DevBox is acting as a workstation instead of a server. Pairing the CPU with effective cooling is crucial for optimal performance, especially when the GPUs are under peak load. I chose the Intel Core i7-5930K CPU with a Corsair Hydro H60 cooler.

    Memory and Storage

    RAM is important for handling large DNN files and datasets. The Intel Core i7-5930K CPU can stably support up to 64 GB RAM. An Intel Xeon processor can handle more RAM and allow ECC. However, using the Intel Xeon processor will significantly increase the cost.

    Chassis, Thermal, and Acoustic Considerations

    Acoustics and heat management are major considerations, especially when deploying the NVIDIA DIGITS DevBox in a normal office environment. A chassis that separates the power supply and disks from the heat generated by the CPU and GPUs is ideal.

    Power Supply

    The power supply should provide enough power to operate the system components along with some headroom to ensure stable operation. The total dissipated power for all of the system components used in a sample build is between 1,200 and 1,300 watts.
    Our sample build uses an EVGA SuperNOVA 1600W P2 power supply that delivers approximately 90% efficiency at 100% load (1,400 watts), ensuring system stability at peak workloads.

    Motherboard

    Effective deep learning requires multiple GPUs. However, suitable PCIe topology is critical to being able to use those GPUs efficiently. Synchronous Stochastic Gradient Descent (SGD) for deep learning relies on broadcast communication between the GPUs. SGD acceleration needs P2P DMAs to work between devices. This means that all GPUs must be on the same I/O hub with a very fast PCIe switches. Workstation motherboards based on the Intel X99 chipset with a PLX bridge setup can support four PCIe Generation3 x16 cards at either full speed or with minimal drop-off.

    The sample build used the ASUS X99-E WS workstation motherboard that supports Intel LGA 2011-v2 CPUs while drawing only 20W.

    Jupyter in Virtualenv

    First time using Terminal to install Jupyter on Windows.


    Another significant of this setup is that the Jupyter runs in virtualenv, a tool to create isolated Python environments.

    Jupyter make it easy to communicate ideas that combine code, equations, text, visualizations & share with others. https://try.jupyter.org/

    It is emerging as the standard for sharing reproducible research in the sciences.

    My next step is to install R and run it in Jupyter environment.

    Measuring non-profit success

    Measuring non-profit and government websites is a bit more challenging. You need to figure out your goals and measure how efficient you are in achieving them. Every organization needs money to operate. In a non-commerce environment, the objective won’t be profit, but rather achieving more with less.

    For non-profit, here are just some ways to incorporate non-commerce goals into the financial statement:

    – The cost of attracting volunteers and optimizing to bring that number down

    – The cost of attracting donations (how much do you spend for every dollar collected) and optimizing to bring that number down

    – For government and advocacy sites – cost per visitor (you want to spread your message further for less) and cost per engagement.

    Here are some articles that will be useful:

    http://www.grokdotcom.com/2009/07/29/turning-web-analytics-into-nonprofit-success/

    I Got No Ecommerce. How Do I Measure Success?

    Web Analytics Success Measurement For Government Websites

    How to go about updating and tweaking your website?

    I got this question from a client.

    This is what I answered.

    Starts with user research to find out why some visitors do not take certain actions as we expect them to perform. Perform a series of test such as those below to extract insights.

    • usability tests,
    • user interviews,
    • 5-sec tests,
    • heatmap-analyses,
    • funnel analyses,
    • segmenting,
    • cohort analyses,
    • exit-intent surveys

    Based on the gathered insights, come up with hypothesis such as “adding certain descriptive text helps to clarify things on the website for certain group of visitors so they can take actions ”

    Test the above example hypotheses quickly and easily using A/B test. This A/B test help us to understand what works best with certain group of visitors by testing different layouts of the website. A/B tools are available without bothering those IT people.

    Think of A/B test as a mean to conceptually validating ideas/hypothesis with real people. The above methodology is a test and iterate process.

    The above process will not works if people find your products not relevant or useful as the saying goes you are flogging a dead horse.

    A seismic shift in business models is already here

     

    When a platform is self-service, even improbable ideas get tried, because there’s no expert gatekeeper ready to say “That will never work!” Guess what? Many of those improbable ideas do work.
    —JEFF BEZOS, 2011 LETTER TO SHAREHOLDERS

     

    The rise of the platform as a business model

    Platform businesses model bring together producers and consumers in high-value exchanges. Their chief assets are information and interactions, which together with “network effects” are the source of the value they create and their competitive advantage. Many of these business strategies have existed for a long time (Malls link consumers and merchants; newspapers connect subscribers and advertisers.), digital technologies’ increasing capabilities and internet are fueling their rapid growth.

    Apple’s App Store, connects app developers and iPhone owners, combine with its handset connect participants in two-sided markets — app developers on one side and app users on the other — generating value for both groups. As the number of participants on each side grew, that value increased —a phenomenon called “network effects,” which is central to any platform strategy.

     

    Players in the Platform Ecosystem

    New entrants like Uber, Alibaba and Airbnb, are all platform businesses, and they are all disrupting their industries. The 5 largest companies in tech are platform businesses.

    Example in Ireland

    In Ireland, an example is PhysioLinked, the products are the physiotherapy provider and the consumers are the patients. In the case of Popertee, property owners who have vacant space on one side and businesses looking for a retail presence on a pop-up basis on the other. You can say that Popertee is the Airbnb of retail space. Daft.ie made it easy for customers looking for property to rent and buy in Ireland via a single source.

    Core Interaction

    According to book entitled Platform Revolution by Parker, Van Alstyne and Choudary, getting users to participate in a platform is one of the major challenges. They put it down as

    Participants + Value Unit + Filter → Core Interaction

    The Value Unit is what participants on one side of the platform create for others. In the case of Daft.ie, this was the pricing and availability information about each property entered by the property owner. With Physiolinked, it’s listing the physiotherapist you want to make an appointment with.

    The Filter is how the other users find the Value Unit, and is usually some sort of search interface. In PhysioLinked case, patients can use their location to filter the Value Units — available physiotherapists — and display them quickly and easily.

    For platform to survive and prosper hinges on its ability to elicit a continuing stream of repeated interactions from platform participants. As the story of Monster vs LinkedIn in the book demonstrates,  successful platforms got to layer new interactions on top of the core interactions. LinkedIn foster additional reasons for participants to spend time on their platform by emphasis content creation and sharing.

    Implications

    From marketing to finance, the seismic shift in business models calls for a different way of doing things. Platform business owner also need to consider governance rules on how platform participants or members may interact, such as deciding to what extent they allow communication between participants.

    “Uber, the world’s largest taxi company, owns no vehicles. Facebook, the world’s most popular media owner, creates no content. Alibaba, the most valuable retailer, has no inventory. And Airbnb, the world’s largest accommodation provider, owns no real estate. Something interesting is happening.”

    —Tom Goodwin, EVP and Head of Innovation at Zenith, writing for TechTarget in 2015

    Since a company no longer has to own or produce all the assets – product listings, cars and properties any more to create value. It’s capital expenditure is significantly lower because it uses the resources of third parties. How do accountants going to put these down in the balance sheets?  As far as I can tell, Platform Revolution does not offer any insight. ACCA based on findings from workshops around the world has published a PDF on this matter.

    Metrics that matters

    When you are running a platform business, forget about the total number of people sign up and number of page view generates. As the value of a platform is derived from network effects, platform metrics should seek to measure

    • the rate of interaction success and 
    • the factors that contribute it.

    Platforms are driven by usage. One person’s usage will drive the other person’s usage. The metrics that matters are those that track usage such as volume of transactions. Are people from both sides repeatedly and increasingly engaged in positive, value-creating interactions? Are people happy with the platform to continue investing their time participating actively?

    Some important metrics to keep track are:
    – Searches
    – Rate of conversion to Sale
    – Net Promoter Score

    All the above metrics seeks to track and measures usage on a platform. Snapchat a social platform, called daily user engagement a “critical component

    Curation and Reputation are the New Quality Control

    The Platform business embrace and enable external vetted entities to create and capture value. Parker, Van Alstyne and Choudary mentions about curation to vet quality participants on to the platform.

    In Physiolinked, physiotherapist on sign-up are require to produce “a photo, proof of qualification, insurance & relevant professional body membership.” This built credibility and trust thereby assuring the other side of the market, the people seeking treatment that this is a platform they can trust.

    Conclusion

    Platforms as a business models scale more efficiently and more quickly by eliminating gatekeepers; unlocking new sources of value creation supply and demand; using data to create community feedback loops; and centering on people, resources, and functions that exist outside the operations of a platform business.

    As digital networks increase in ubiquity, businesses that do a better job of orchestrating all these competing elements will win.

    Lesson from Nike and Samsung illustrated that the leaders of incumbent companies who understand the new business model can begin building tomorrow’s platforms in a way that not only leverages their existing assets but strengthens and reinforces them.

    Have you made moves to turn your business into a platform? Tell me about it in the comments.

    Resources:
    If you do not have time to read the 336 pages of Platform Revolution, these links are suffice.
    https://hbr.org/2016/04/pipelines-platforms-and-the-new-rules-of-strategy
    http://www.marketwatch.com/story/what-twitter-knows-that-blackberry-didnt-2013-10-10

    Email capture at the end of a process

    Add an email capture form during your final process of setting up. Users are already sold on your products and services so adding an email opt in box as part of the final process rarely interrupts the flow.