Copilot data safety

Hellohello,

So i have lately been using github copilot within rstudio for some personal projects, but I would love to use it in my research projects as well. Whithin these projects I will be handling sensitive data (e.g. user data) mostly in csv form and later within the workflow also stored in (environment) variables. As far as i understand copilots workflow, I can't allow it to index/scan these files.
Is there a way to exclude data containing files and ideally all (environment) variables from copilot and only let it read the code-files?

Best wishes,
Doznkekz

Hi @Doznkekz, welcome to the community!

I think the part you are looking for is from our documentation:

GitHub Copilot primarily relies on the context in the file you are actively editing. Any comments, code, or other context provided within the active document will be used as a “prompt” that Copilot will then use to provide a suggested completion.

So it's really just your code that triggers the suggestions, not the data it is running on. However:

To expand the scope of the context used by Copilot beyond just the active document, there is a setting to also index and read from other R, Python, or SQL files in the current project. This setting can be toggled on or off in the Tools > Global Options > Copilot > “Index project files with GitHub Copilot” setting.

To me, this implies the setting still doesn't read the data files, but if you want to be cautious, then not enabling indexing will limit Copilot to only the open files.

The gray area of course is that column names that you refer to within a program could contain their own proprietary information. As part of the source code, that would be sent to Copilot.

Hope this helps,
Randy

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.