Setup Local Env (PyCharm) for Kaggle Competition
Offer you a reason & quick steps to have a local Dev Env to assist your competition
Overview
- Motivation
- Setup PyCharm
- Alternative (Totaly Free)
- Plugins
- Debug
- Trade-off: Local vs Cloud vs Kaggle
- Summary
Motivation
Kaggle offers us a great online platform / Jupyter Notebook to interactively test our code while possibly utilizing free GPU resources. So why should we have this "troublesome" option?
one-word answer -- "efficiency". So with the help of IDE, you could faster implement your solutions (like using Vim-mode), instantly find some static code errors, debug and dig into your work, function param hintβ¦
Jupyter notebook is the best choice for arranging your whole solution into an interactive "paper" while an IDE like Pycharm definitely is the choice for helping you speed up the solution coding stage.
Vim-mode, error hints, auto-completion, debug!! ... reason for IDE
Setup PyCharm
Get A Trail or Licensed Professional Version
PyCharm offers 2 versions for us - Professional vs Community
You may need to consider the Pro as "Scientific Tools" related to our data science work is only included in this version. The Pro version is a paid one for sure. Here are the options
- Normally there are 30~90 days for trial, so use this free time window
- Check any deals/discounts/compensation from your school or work
- Use your student email to get Github Student Dev Pack
- Check Special Offers from the subscription page
If, sadly... all not work for your case, guess you find a good reason to work (so use it hard!) hard to get the value back for your data science work right? π
Alternative (Totaly Free) But you know what... we have the Plan B - Open Source IDE Spyder
The basic features are pretty similar, like debugger, viewer on your dataset etc.
The official documents and tutorials are well organized for you to have it work on your local within 1 or 2 hours.
Plugins
Kite - AI-Driven Auto-Completion One of the best features IDE offers us is auto-completion and hints for calling funcs and params.
There are too many functions and even more params from quite a few libs we frequently use, like pd, np, sklearn... Impossible to memorize them all!
Kite is the best option to have these pop-up hints while typing which are further AI-driven and trying offering you the results fitting in the context.
IdeaVim - Speed Typing Vim is a fantastic tool to speed up your input performance. Many people heard from old guys π΄ and thought it is already out of date...
No! Vim is a so classical tool for "shortcut" your typing that there is not so much space to improve input further in this direction -- so used for a long time since the terminal interface for most computers.
Material UI + Atom Icons - Modern & Comfortable Interface Well, a beautiful UI perhaps will not help you in a "technical" way but definitely will offer you a comfortable experience while navigating over your projects and countless settings for IDE.
This combination of UI and Icons theme I used across different IDEs and recommended for a try here π
Debug
The most important reason I choose to have a local env is that I could debug line by line.
Here is the case -- you could have your printout over certain intermediate results, like a DataFrame, easily by adding a cell in the Jupyter notebook. But how about you want to check some steps in your funcs? like, ensure each step actually did as you expected, especially when the dims and shapes of your dataset are a bit confusing.
If you are not new to programming, you should prefer a debug mode in IDE rather than writing line-by-line printout code. ++ IDE also offers us more organized details for the line we debug in.
Trade-off: Local vs Cloud vs Kaggle
Actually, we have 3 options for our Kaggle competition, Local - Cloud - Kaggle site.
While we know the Kaggle site is the place we posted our final work and local with IDE is the best fit for developing the solution. Why should we bother to have a Cloud one again?
An important problem for our work on Kaggle is tuning our model for the best performance. Either Kaggle's default kernel or our local computer will largely be limited by the memory and computing capacities which means we will spend longer time and try fewer ideas -- slower model building iterations...
As mentioned by Andrew Ng in the course Structuring Machine Learning Projects
Cloud computing services provide us with an option to upscale memory and computing capacities within a time range as we wanted and at an affordable price as we configured.
Personally, I used the GCP's Cloud AI Notebook which is integrated with Kaggle and possible for a single-click jump
Summary
So a brief reason why we should have a local env with IDE for Kaggle completion is offered. Also, try setup your local with an IDE and plugins you like while exploring a bit the Cloud way π