This is a companion discussion topic for the original entry at https://blog.rstudio.com/2021/10/26/how-data-scientists-and-security-teams-can-work-together
body { margin-left:2em; margin-right:2em; } :focus { outline: 0; } a.info { position:relative; z-index:24; text-decoration:underline; } a.info:hover, a.info:focus, a.info:active { z-index:25; background-color:#D3D3D3 } a.info span { position: absolute; left: -9000px; width: 0; overflow: hidden; } a.info:hover span, a.info:focus span, a.info:active span { display:block; position:absolute; top:1em; left:1em; width:12em; border:1px solid #000063; background-color:#fff; color:#000063; text-align: center } div.example { margin-left: 5em; }
Data scientists are hungry for every bit of data they can use in their work. Security teams, on the other hand, are primarily concerned with making sure that data stays put and no one ever gets access without authorization. Thereâs a natural tension, which can result in friction and miscommunication.
Gordon ShotwellOpens a new window, lead data scientist at SocureOpens a new window, has dealt with this tension firsthand. The team at Socure builds best-in-class fraud models for top banks and credit card companies, so theyâre constantly working with sensitive data. During a RStudio Enterprise MeetupOpens a new window, he explained how his quickly-growing team cooperates with Socureâs security team to move fast without harming organizational security.
Become friends to achieve collective goals
Step into the mindset of security
Data scientists should show their intention of being allies to the security folks in their organizations, starting by putting themselves in the security teamâs shoes. Imagine the scenario: you worry all day about events that, though unlikely, could be catastrophic to your organization. You constantly anger your colleagues by saying ânoâ to cool, new tools because of the risk to security. You are rarely recognized when your work goes well, but everyone will know if something goes wrong.
By empathizing with security teams, data scientists can better understand where they are coming from, acknowledge the potential risks, and understand why they are important.
Advocate for security projects
Even important security projects can get buried because of more urgent tasks. Data scientists should advocate for security improvements to their work. By raising these projects to other teams, data scientists can reinforce that theyâre on the same side as their security-focused colleagues.
Prove that you can make security improvements
Data scientists shouldnât be all talk when it comes to security projects. They should prove that they can actually improve their security practices. This means knowing the context of what the different teams want to do, finding solutions that work for both of them, and following through on what was decided. The relationship between the teams strengthens when security knows that data scientists can fulfill their promises.
Mutually understand value and threats
Security professionals usually donât have an intuitive sense of the value that data science brings to the organization. By articulating the business value to the security organization, data scientists can get security teams on their side.
Does creating that public-facing app provide a critical new capability for customers? Does accessing that internal database allow for automation that will save staff time and money? Does having write access to the database allow for machine learning models that will impact the companyâs bottom line?
At the same time, security teams should describe the âthreat modelâ â that improbable but devastating event that they are trying to prevent. Are they concerned about data scientists accidentally putting proprietary data in a public app? Or are they worried that outside hackers could find a way in to steal customer payment information? Do they stay up at night worrying that a disgruntled employee could exfiltrate intellectual property? Or are there regulatory regimes in place that specify how theyâre allowed to provide access to data that identify customers? Very different prevention and mitigation strategies are warranted depending on the threat.
Data scientists who understand the threat model can help ensure the gravest threats are less likely, and they can also point out where security choices donât make sense given the threat model.
A common example is a database that contains a combination of sensitive and non-sensitive data. A data scientist might want to use sales order data to build a model of the customers that are the highest value for marketing purposes. But if that data is in the same database as customer names, addresses, and credit card information, security is going to be (and should be!) really restrictive on where that data goes.
Both the data science and security teams can appreciate the potential value of identifying valuable customers and the threat of exposing customer data. It might become obvious that splitting the database so the sales order data isnât merged with the sensitive customer data would serve everyone.
Make it easy to follow the rules
For fast-growing organizations, thereâs no way security education can keep up with team growth â the resources and time needed to continuously train every new data scientist will become unsustainable. And once a company is big, security cannot audit everything that everybody is doing.
Data scientists are hired because theyâre smart problem-solvers. So if they are locked in a room without something they need, they will waste time trying to get to it â and theyâre unlikely to discover the safest or best path. A better plan would be to figure out how to create an environment for data scientists thatâs super secure but doesnât leave them needing more.
Letâs take the example of a database that requires access authentication. Instead of having everyone develop their own connection to the database, security can write R and Python packages that include wrapper functions for access. Everybody is getting the data in a secure way and when there is a reason to update â say, to a more secure connection method â users donât have to change their code and can just upgrade to the new version of the package. The system may change over time but the users can continue working with minimal interruption.
Set up child-proof [data science environments] to work efficiently and securely
> A place that has all of their stuff and is really nice, and they canât burn your house down.
Organizations can embed security directly into systems by setting up âchild-proof roomsâ. These closed systems for data ensure that users adhere to the organizationâs security boundaries. Less training is needed for new users since the environment has made it impossible to do the wrong thing, allowing them to quickly and safely get started on their work.
In general, closed systems are more secure than open ones. But if those systems arenât provisioned with the things data scientists need, theyâre unlikely to get used. Instead, itâs important to pair restrictions with power. If itâs necessary to lock data scientists into a specific environment, let it be a playroom where they can do what they need while everyone else rests easy.
A server that canât connect to the internet or edit corporate data would be restrictive on its own. Security teams can provide read access, a closed analytics database, and offline access to data science tools (such as R and Python packages through RStudio Package ManagerOpens a new window), empowering data scientists to run their models inside a safe environment.
Buy tools (donât build them) for continuous, high-quality security
Organizations sometimes create their own security systems or tools. However, most donât make money on security, so it can be under-resourced and is often first on the chopping block when times are tough. On the other hand, security is a feature for software vendors. They have a vested interest in creating and maintaining secure features and systems. Vendors also aim to make their tools user-friendly and pretty (making following the rules easy!).
By buying tools, organizations can turn their own cost center into someone elseâs revenue stream. Security gets prioritized appropriately and the company can instead focus on the growth of their people and capabilities.
Arrive at the good place
Tension between data science and security teams is common and even expected, but that doesnât mean they canât work together so that data scientists can get their jobs done without opening security vulnerabilities. Through continuous conversation, closed systems for data, and streamlined tools, organizations can set up the relationships and systems needed in order to be successful.
Watch Gordonâs full talk below or on YouTubeOpens a new window. Interested in working at the good place? SocureOpens a new window is hiring!