Hello team!
I was wondering what are your thoughts on the following. I am learning / doing data science / analytics / BI in my daily work. I’ve been in analytics for about 3-4 years. Before that, I was in operations (moving industry and sports industry).
As a person who has transitioned into data world from non-IT background, I often finnd myself at a disadvantage. From not knowing the best practices for coding (learning on the fly), to not knowing how relational databases work on DBA / engineer level, to not being able to communicate with my IT folks in the same language... I hope you get the idea. I am not a complete buffoon, but sometimes I wish I had lots of IT background...
I was wondering what do you think to what extend IT background is a must? Put another way, what are some of the most important concepts a data person should learn in CS (anywhere from nothing to “go get a Master degree in CS”). Proper coding? Got? Databases? And where to learn them?
Any data science folks with no CS background in da house?
I use PEP 8 as a standard for my coding, but I will agree I graduated with a BS in Statistics and had some introduction to R, but never to the extent that is required for data science/analytics in general. Most of what I've learned has been self taught. But I agree I wish I had more background in CS, seems like a lot of people transition from CS to data science pretty smoothly.
Some important things is just understanding how to write code that's legible as well as code that can be used in other projects. I started learning python January and I'm barely understanding object oriented programming, but in all honesty it has helped my scripts and job significantly.
Databases, I utilized free resources online like w3, but my advice is just implement side projects to get better at programming. I am no where near a programming expert but my two side projects have helped with both R and python as for SQL I was fortunate to get a job that utilized a Postgres database so I just learned through trial and error
"Data science" means different things to different people.
You could write a similar statement about having a formal Statistics background or not. It could help, but is not essential.
Substitute CS for "hacking skills" if that's appropriate, in the famous Venn diagram:
You can go a long way without having all three skill sets.
My degree (some 25 years ago) would nowadays be likely called data science or similar, but we did very little computer programming and only did a bit of Minitab (I think). Thankfully there has been progress since that Stone Age, but I would still see the programming and the databases as tools, where the emphasis would be the other way around for the developers and the architects, etc.
I recently had a meeting with the company's Data Architect who pretty much dismissed R as being something which just did some plotting after his Hadoop cluster had done all the work. He reckoned that only summaries like means, etc. were required. On the other hand I had no understanding of how his overall infrastructure hung together (in a very large organisation) and why seemingly straightforward things couldn't get done, so it was difficult to make progress.
Additional knowledge all round may have helped, but there is always the problem of being the jack of all trades and master of none.
Drawing on similar discussions elsewhere R users don't tend to come from CS backgrounds, whereas the current growth in Python looks to be due to CS people entering data science.
Remember that IT and CS are not the same thing. I studied Math and Economics in undergraduate, but I also worked in the IT department during college helping people fix their computers. I didn't start learning programming until about 4 years after college. I've was fixing computers for about 10 years and using Linux regularly for about 5 years before I even tried programming. Even though I didn't have a CS background, I was still probably closer to what you're thinking of. I have also since gone on to earn a masters degree in information science and have studied data structures and algorithms, filling in more of that knowledge gap.
Knowing how computers work definitely helps. Experience dealing with fixing machines equally helps. Ultimately, I don't think the IT or computer science background matters as much as diligence studying and practicing. Having a network that you can go to for help and bounce ideas off of can contribute tremendously to learning speed. With data science (really anything academic), there's just way too much information out there to learn it all. Try not to worry about what you don't know. There's a phrase in sports,
Train your weaknesses, race your strengths.
I think it applies to career as well. Keep learning new skills, but focus on delivering what you have mastered.
I have little to no background in CS/IT - playing with computers wasn't even an option when I was in school, and my parents didn't want me anywhere near the household computer without supervision.
I've bootstrapped most of what I know about computers and learned on the fly, and I owe so much of my knowledge to my network. The great thing about networks is that you don't have to seek out a "computer science network" to learn CS/IT - you probably know people who are great at CS/IT from other areas of your life! For example, I played video games pretty seriously for a couple years, where I made friends with people who are great at things that I'm not - those are the first people I reach out to when I run into something I don't understand.
I just want to make it clear that I'm more than willing to help fellow data scientists and data analysts become more proficient with technology.
All that I ask:
Search for an answer first. Both here on the RStudio community and via your favorite search engine and/or StackOverflow.
If you can't find an answer or don't understand what research you've done, please create a new thread here with your issue so that multiple people can reply. This increases your chance of getting a good response, as well as more timely feedback. Tag me to ask me directly.
Be as willing to learn and put in your best effort as I am to help you.
If you just want to chat or pick my brain, feel free to send me a direct message.
I have 2 degrees in anthropology, and none in computer science or IT, so I would personally not consider it a required background if one is willing to self learn and dive in and use that knowledge.
I think my particular background gives me advantages in understanding the way the structure of the data is a representation, and the biases in that representation. But, by extension, I think any background informs one's actions.
To quote @RStudioJoe "there are no innocent data scientists" ! Meaning, yes, everyone brings their backgrounds and baggage to bear (which is true of anything in life, really).
I don't have a formal CS background, and definitely wouldn't be able to do much (if anything) without some hearty autodidactism (plus learning from unwitting experts, courtesy of the internet).
I do find that there are some core concepts that I kind of glossed over when I first encountered them, but which are now "a-ha"-moment-inducing when I circle back to them (e.g. re-read @noamross' Vectorization in R: Why? yesterday, and it makes so much more sense now than when I read it the first time around).