I work for a small/medium size not-for-profit academic research institute. We partner with many universities to conduct observational and experimental studies related to coastal ecology.
Many of these research programs are very synergistic, wherein one program utilizes the data collected by another. We are a fairly young institute and we don't currently have any formalized workflows for 'developing analyses' (credit @hspter). Each program's Primary Investigator (PI) uses an ad-hoc approach for summarizing data and conducting analyses.
I'm keen to push the organization to move towards a coherent, systematic approach to developing analyses and communicating results (R-Studio, tidyverse, GItHUB, LaTeX) that one might see in a more tech oriented data science business. I think there is a lot to be gained by adhering to the principles of reproducibility, open-science, and using a workflow that allows easy integration of other program's work and results.
My question is, how can I be successful in doing this? What challenges will I face? Many of my colleagues use R, but also use Python and Matlab. Certainly there are some general principles that are relevant regardless of programming language, but has anyone experienced push-back from colleagues when being told they have to adhere to some new way of doing things that they have to spend their valuable time to learn?
That is a great question!
I work at a company where data initiatives are at infancy stage. Partly, this is why I created this thread earlier:
What I've found is that any proposals you might have come easier if you have enough of career capital. In which case time and hard work are one's best friends.
Now, you may not have the time or the patience to gain career capital. In which case domain knowledge and expertise become even more important.
I guess I kind of leverage both things. Being at the company for over 5 years and earning a seat at a few tables, and also demonstrating that I know what I'm talking about at least some of the time helps me push some of the ideas forward...
Vacuum also helps. At my company, we weren't doing any ML or predictive analytics in general before I started it, so there was never a question if we should use R or Python. We use R because I know R
P.S. What never worked for me is showing how things are done elsewhere. I literally read out loud every StitchFix Multithreaded blogpost to my boss and my CIO (well, figuratively), but that hasn't help. I send my notes from every HBR webinar and discuss ideas, but it yet to yield any fruit... But it could be just my bad luck or poor communication.
Lastly, you question reminded me of one small bit in this HBR Idecast from a few days ago:
There was this one idea that went kind of like It is still cool to have a side-gig even if you have a full time job. Among other benefits is that your employer can take you on some other skills you demonstrate outside of your regular duties.
Actually, here's the quote from the transcript:
There’s a guy, Lenny Achan, went from being a nurse at his hospital to being the head of communications at his hospital on the strength of having created some apps. And his boss found out about it, saw his initiative, and said, Hey, will you head up social media for the hospital? And he did such a good job, he ultimately became the head of communications.
I also heard about this girl who worked as a Starbucks barista, and was doodling on their chalkboard daily, and doodled her way into Marketing at Starbucks HQ.
I know, I know, none of this is DS related, but I hope you get the idea of doing something that you want to push for outside of work first. In your context, that'd be doing something data-related and with features that you're trying to push, and then being able to demonstrate the benefits...
@Brett-Johnson I found convincing people to use git+GitHub to be really challenging. It's totally worth it and the tooling and interactive tutorials available now make the adjustment much easier. But... it's still hard sometimes. The best advice I can give is give out a training and pile on resources and have someone available who is an expert and can help people debug their issues in a timely manner.
In fact, that is probably a good idea in general. If you want people to use specific tools, have a way to provide them support. If you don't have anyone who is an expert in one of the technologies, try to hire someone. If you can't, that might need to be you so be prepared to learn a lot.
My own experiences in doing this has taught me that demonstrating the value (time/money/effort/product) that the new tools and workflows would bring, and how these tools can be used to customize the workflow depending upon the users preference.
Reproducibility and redundancy are great reasons, but if you demonstrate to someone that this workflow can save them time or make their life easier they listen a little more intently.
You can't just 'Field of Dreams' the entire thing, as people in these situations have been told before about how 'this new xxx' will do so much for them and doesn't. Don't promise magic beans, but be able to show where and how adopting this approach will benefit them. (Benefitting the team is always nice, but people are selfish).
Software carpentry is a great introductory place to get some more background on introducing these types of tools to others and having accessible training materials and workshops. There are also a lot of people that are a bit further down the road than you and could provide some very useful advice on how to achieve your goals.
Change is a tricky thing. One of the best lessons I learned was the 4 L's (from Geoff Scott at the University of Western Australia):
Listen to the concerns and issues of others
Link people with mentors and resources to help them
Leverage experience and resources from those further down the road
Lead remove obstacles and build capacity
When you're championing your process and start talking to people about why they should adopt it, remember these and apply them in order.
I started writing a guide about how to use R, using bookdown, in my spare time to share with new R users in the organization. It started off as a basic how-to but has morphed into more of a best-practice workflow guide that speaks to many of the modern workflows that are now possible. I point to all the good resources for learning and support in this guide so I hope this, in part, will help support people.
I think leaning on some of the git experts in our software development team will be key and I suppose starting an R working group may be a good way to set aside some dedicated time each week for people to work on R related stuff and ask each other questions.
I have a meeting with our CTO to talk about some of this stuff in a few weeks. Basically I'm going to pitch this data science workflow. I think one of his concerns will probably be 'how will we support this'. Being aware of how this approach needs to be supported is great advice. I also really like the advice of showing people how it will benefit them directly and coming armed with examples.
I've found it to be most helpful to pilot a workflow or initiative with clear indicators of success as well as scalability and sustainability. Working with a smaller group before an org-wide roll-out helps you suss out (and address) any potential problems, demonstrate success, and get more organizational buy-in.
Ultimately you want people to want to adopt this workflow, not be told that they have to.
I agree. Not doing this is setting up a project to fail, especially when resources are scarce and competitive.
I'm working in Academia, so some will be impervious to this of workflow or adoption of new tools (I'm looking at you, faculty member that still uses Eudora). They ultimately fall in line due to policy or pressure, but I agree, you want to avoid this at all possible.
Such a great point. I've been developing this workflow within my project group, which has been a great testing ground. Obviously I'm an advocate for this approach, but ultimately the goal I suppose is to strive to better my teams workflow and if others see value in that they can adopt at their discretion.
Do you have any examples of a clear indicators of success? Perhaps I need to define what success is first and then illustrate how success is being achieved through examples.
Exactly - start with success and backwards plan the rest.
So for one of my big projects, success might be something like "all staff are using data on a weekly basis in order to inform and drive decision-making." From there, I've got to work backwards to figure out what capacity building needs to happen (and success criteria for that), what tools need to be used (and success criteria for that)... you get the idea.
For a pilot it is definitely time-bound - three months tends to work well with the scope and scale I work at. Determine what success looks like, and then on Day 0, take as complete an inventory as possible to determine where things are right now, so that you can show growth over time.
Some of my indicators are:
staff can calculate, interpret, and compare means and medians
staff can demonstrate an understanding of standard deviation
staff can access and accurately interpret state educational data without assistance
staff are active participants in weekly data meetings (still developing what the specifics of this are)