analyzing author metadata from all files in a main folder

I wrote a script that analyses the metadata from all files in a (nested) main folder. It returnes a dataframe with the columns "full_path", "file_size", "production_date" and "author".

However the author is only returned correctly when I run the script over a folder on my local drive. As soon as I run it over a folder on my companies server (to which I have reading and writing acces), I get some kind of coded author returned. When manually inspecting the metadata of any of such files, I see the actual author (myself or my colleagues).

The returned author from a folder on the server drive looks like this: O:S-1-5-21-3331848083-1022324987-3899522693-3353 (example).

Any idea's how to get the correct author returned from folders on a server drive?

Below is my code:

# Load necessary libraries
library(tidyverse)
library(lubridate)

# Function to get the file author (Windows)
get_file_author <- function(file_path) {
  if (!file.exists(file_path)) {
    return(NA)
  }
  
  # Normalize path to avoid issues with backslashes
  file_path <- normalizePath(file_path, winslash = "\\")
  
  # PowerShell command to retrieve only the resolved NTAccount name
  cmd <- sprintf(
    'powershell -Command "& {
        $acl = Get-Acl \'%s\'
        $owner = $acl.Owner
        Try {
            $sid = New-Object System.Security.Principal.SecurityIdentifier($owner)
            $ntAccount = $sid.Translate([System.Security.Principal.NTAccount]).Value
        } Catch {
            $ntAccount = $owner  # Use the original value if translation fails
        }
        Write-Output $ntAccount
    }"',
    file_path
  )
  
  # Run the command and capture the output
  author <- tryCatch(system(cmd, intern = TRUE), error = function(e) NA)
  
  # Ensure we return a single cleaned-up author name
  return(trimws(author))
}



# Function to extract metadata from a file
extract_metadata <- function(file_path) {
  # Get file info
  file_info <- file.info(file_path)
  
  # Construct metadata list
  metadata <- list(
    full_path = file_path,
    file_size = file_info$size,
    production_date = as.character(file_info$ctime),
    author = get_file_author(file_path)  # Use the function to get the author
  )
  
  return(metadata)
}

# Function to process all files in a folder and its subfolders
process_files_in_folder <- function(main_folder) {
  # List all files in the folder and subfolders
  all_files <- list.files(main_folder, recursive = TRUE, full.names = TRUE)
  
  # Extract metadata for each file
  metadata_list <- lapply(all_files, extract_metadata)
  
  # Convert list to dataframe
  results <- bind_rows(metadata_list)
  
  return(results)
}

# Set the main folder path (change this to your actual folder path)
main_folder <- "path/to/my/folder"

# Process all files in the folder and its subfolders
file_metadata <- process_files_in_folder(main_folder)

# Print the resulting data frame
print(file_metadata)

Unless I'm misunderstanding, it seems like all the magic is happening in that powershell script. So this is not really a R question and you might get more answers on a Windows/Powershell forum.

That is, unless I'm misunderstanding what you mean:

What metadata? Is (example) an author name? If there is some kind of text with the author name, it might be possible to extract it on the R side, but it's not really clear to me what your Powershell call is returning.

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.