I wrote a script that analyses the metadata from all files in a (nested) main folder. It returnes a dataframe with the columns "full_path", "file_size", "production_date" and "author".
However the author is only returned correctly when I run the script over a folder on my local drive. As soon as I run it over a folder on my companies server (to which I have reading and writing acces), I get some kind of coded author returned. When manually inspecting the metadata of any of such files, I see the actual author (myself or my colleagues).
The returned author from a folder on the server drive looks like this: O:S-1-5-21-3331848083-1022324987-3899522693-3353 (example).
Any idea's how to get the correct author returned from folders on a server drive?
Below is my code:
# Load necessary libraries
library(tidyverse)
library(lubridate)
# Function to get the file author (Windows)
get_file_author <- function(file_path) {
if (!file.exists(file_path)) {
return(NA)
}
# Normalize path to avoid issues with backslashes
file_path <- normalizePath(file_path, winslash = "\\")
# PowerShell command to retrieve only the resolved NTAccount name
cmd <- sprintf(
'powershell -Command "& {
$acl = Get-Acl \'%s\'
$owner = $acl.Owner
Try {
$sid = New-Object System.Security.Principal.SecurityIdentifier($owner)
$ntAccount = $sid.Translate([System.Security.Principal.NTAccount]).Value
} Catch {
$ntAccount = $owner # Use the original value if translation fails
}
Write-Output $ntAccount
}"',
file_path
)
# Run the command and capture the output
author <- tryCatch(system(cmd, intern = TRUE), error = function(e) NA)
# Ensure we return a single cleaned-up author name
return(trimws(author))
}
# Function to extract metadata from a file
extract_metadata <- function(file_path) {
# Get file info
file_info <- file.info(file_path)
# Construct metadata list
metadata <- list(
full_path = file_path,
file_size = file_info$size,
production_date = as.character(file_info$ctime),
author = get_file_author(file_path) # Use the function to get the author
)
return(metadata)
}
# Function to process all files in a folder and its subfolders
process_files_in_folder <- function(main_folder) {
# List all files in the folder and subfolders
all_files <- list.files(main_folder, recursive = TRUE, full.names = TRUE)
# Extract metadata for each file
metadata_list <- lapply(all_files, extract_metadata)
# Convert list to dataframe
results <- bind_rows(metadata_list)
return(results)
}
# Set the main folder path (change this to your actual folder path)
main_folder <- "path/to/my/folder"
# Process all files in the folder and its subfolders
file_metadata <- process_files_in_folder(main_folder)
# Print the resulting data frame
print(file_metadata)