How to read file that are not in table form and get value from the input file

Hi all,

I'm new to R, Shiny community. I would like to read from a txt file which is not in tabular form and get the values from the file and display it in my shiny app.

Below would be a sample template of my file,

File Name: Test
Operator: Sample
Operator1: Sample1

The Input file that I have consists of fields and values like the above in .txt format. What I would like to do is, upload the above file as an Input file and read to display the value of a field. For example read the value of the field (Operator).

I have surfed about read.table, read.delim, read.csv all of which requires the file to be in tabular form or csv form or tab-delimited format.

Can someone explain how to establish this

You can use readLines which will return a vector with each element being a line of the file. Here is an example that reads your data in and parses them with str_extract from the stringr package.

DAT <- readLines("c:/users/fjcc/Documents/R/Play/Dummy.csv")
#> [1] "File Name: Test"    "Operator: Sample"   "Operator1: Sample1"
NAMES <- str_extract(DAT, "^[^:]+") #get from beginning of string up to first colon
#> [1] "File Name" "Operator"  "Operator1"
VALUES <- str_extract(DAT, "(?<=: ).+") #get text after colon and space
names(VALUES) <- NAMES
#> File Name  Operator Operator1 
#>    "Test"  "Sample" "Sample1"
#> Operator 
#> "Sample"

Created on 2020-03-29 by the reprex package (v0.3.0)

1 Like

Thanks for your reply,

The values are actually in this arrangement,

File Name: Test
Operator: Sample
Operator1: Sample1

They are not in this arrangement,

"File Name: Test" "Operator: Sample" "Operator1: Sample1"

In the file that I read in with the first line of my code, the data are arranged with one Name-Value pair per line, like this:
File Name: Test
Operator: Sample
Operator1: Sample1

The readLines() function that I used to read the data transform it to the arrangement
"File Name: Test" "Operator: Sample" "Operator1: Sample1"

For me it just display the way it is,

Attaching my code for better understanding,


ui <- dashboardPage(skin = "yellow",
  dashboardHeader(title = "Modelling Automation"), 
      # Setting id makes input$tabs give the tabName of currently-selected tab
      id = "tabs",
      menuItem("Data Processing", tabName = "DP", icon = icon("database"),
        menuSubItem("Data", tabName = "Data"),
        menuSubItem("Plot", tabName = "Plot")),
      menuItem("Parameter Extraction", icon = icon("book-open"), tabName = "PE"),
      menuItem("Model QA", tabName = "QA", icon = icon("angellist"))
      tabItem("DP", "Dashboard tab content"),
      tabItem("PE", "Widgets tab content"),
      tabItem("Data", tableOutput("fileTable")),
      tabItem("Plot", fileInput("dataset", "Choose Input File",
                                                          accept = c(".txt"),multiple = FALSE)),
      tabItem("QA", " QA tab content")


server <- function(input, output) {
  # Define a reactive variable named `input_file`
  input_file <- reactive({
    if (is.null(input$dataset)) {
    # Read the text in the uploaded file
   #colname <- str_extract(alldata, "^[^:]+") #get from beginning of string up to first colon
   #colval <- str_extract(alldata, "(?<=: ).+") #get text after colon and space
   #names(colval) <- colname
  output$fileTable <- renderTable({



The actual file content that I'm trying to read in the input file,

File Name:	DUT_7_05-22-19_03'49'03_PM.txt	05/22/2019	03:49:03 PM
Operator:	M.Seva
Device ID/Name:	DUT_7
Wafer Manufacturer Lot:	Manufacturer Lot
Macro ID:	
Wafer ID:	
Device Type:	Device Type
Design Technology:	Design Technology
Pulse Pin:	PulsePin
Ground Pin(s):	GroundPin
Leakage Test Voltage:	1.20E+01
Leakage Current Limit:	9.00E-07
Start Pulse Voltage:	 .1
Stop Pulse Voltage:	 260
Pulse Step Voltage:	 10
Pulse Current Limit:	 12
Pulse Voltage Limit:	No Limit
Pulse Width:	 100
File location:	V:\SHARE\Data\130nm\130BCDLite-Gen2\MPW1399\7XYT35603 W#01\TLP\Selected DUT\TYPE_A_HV_NFET_11\Site_1\
Software Version:	1.28.0
6.239807E-08	3.814697E-06	8.382944E-08
4.565469E-04	1.113022E+00	8.383572E-08
9.137393E-04	2.133775E+00	8.381606E-08
1.667908E-03	3.157775E+00	8.378652E-08
1.653103E-03	4.240917E+00	8.381706E-08
2.476987E-03	5.178008E+00	8.367549E-08
2.829626E-03	6.242900E+00	8.335839E-08
3.232082E-03	7.335805E+00	8.371097E-08
3.490521E-03	8.417846E+00	8.372727E-08
4.099384E-03	9.515202E+00	8.372922E-08
4.571652E-03	1.041957E+01	8.378581E-08
5.142270E-03	1.147917E+01	8.373846E-08
5.631151E-03	1.247198E+01	8.382597E-08
6.071577E-03	1.356277E+01	8.373496E-08
6.404034E-03	1.470252E+01	8.364077E-08
6.891180E-03	1.574068E+01	8.361212E-08
7.711257E-03	1.685986E+01	8.365183E-08
7.935076E-03	1.802491E+01	8.362996E-08
8.117957E-03	1.861187E+01	8.367073E-08
1.116480E-02	1.946108E+01	8.355733E-08
1.357152E-01	1.464898E+01	8.359809E-08
1.626400E-01	1.443248E+01	8.364299E-08
1.847420E-01	1.432151E+01	8.364938E-08
2.018257E-01	1.462355E+01	8.370832E-08
2.305544E-01	1.426207E+01	8.367147E-08
2.509157E-01	1.436447E+01	8.379963E-08
2.744999E-01	1.426137E+01	8.380006E-08
2.834875E-01	1.487390E+01	8.370138E-08
3.179986E-01	1.430694E+01	8.359198E-08
3.355307E-01	1.442193E+01	8.365922E-08
3.616470E-01	1.428885E+01	8.372998E-08
3.795614E-01	1.452563E+01	8.374781E-08
4.011557E-01	1.451214E+01	8.372639E-08
4.126909E-01	1.499031E+01	8.370344E-08
4.447165E-01	1.457598E+01	8.378998E-08
4.675344E-01	1.448474E+01	8.376827E-08
4.833057E-01	1.460633E+01	8.377739E-08
5.004997E-01	1.478640E+01	8.377307E-08
5.151708E-01	1.506013E+01	8.370181E-08
5.485688E-01	1.465090E+01	8.323767E-08
5.515518E-01	1.452778E+01	8.370971E-08
5.676378E-01	1.452609E+01	8.366423E-08
5.812804E-01	1.479527E+01	8.373765E-08
6.083527E-01	1.465321E+01	8.373650E-08
6.294513E-01	1.473494E+01	8.376326E-08
6.444830E-01	1.487896E+01	8.373768E-08
6.718627E-01	1.480932E+01	8.330072E-08
6.878707E-01	1.487233E+01	8.372962E-08
7.046394E-01	1.544803E+01	8.374570E-08
7.279932E-01	1.522583E+01	8.378011E-08
7.489063E-01	1.502720E+01	8.373263E-08
7.902766E-01	1.543117E+01	8.378768E-08
8.297004E-01	1.553516E+01	8.331267E-08
8.801832E-01	1.538164E+01	8.362602E-08
9.174910E-01	1.547915E+01	8.371020E-08
9.651760E-01	1.565010E+01	8.378374E-08
9.997189E-01	1.613643E+01	8.378765E-08
1.051957E+00	1.597215E+01	8.381898E-08
1.088250E+00	1.603511E+01	8.378431E-08
1.135244E+00	1.615250E+01	8.380556E-08
1.156196E+00	1.680047E+01	8.379028E-08
1.206770E+00	1.623561E+01	8.379460E-08
1.243778E+00	1.662487E+01	8.379715E-08
1.284879E+00	1.679282E+01	8.374958E-08
1.320887E+00	1.660658E+01	8.378270E-08
1.349478E+00	1.780657E+01	8.378417E-08
1.384810E+00	1.778793E+01	8.321834E-08
1.423300E+00	1.802637E+01	8.331524E-08
1.459507E+00	1.817682E+01	8.379011E-08
1.480272E+00	1.948042E+01	8.380736E-08
1.540519E+00	1.866507E+01	8.379661E-08
1.578907E+00	1.896645E+01	8.381004E-08
1.614472E+00	1.917025E+01	8.376267E-08
1.657733E+00	1.936727E+01	8.376478E-08
1.701674E+00	1.967623E+01	8.381006E-08
1.744701E+00	1.937889E+01	8.377180E-08
1.851262E+00	1.938290E+01	8.381507E-08
1.959695E+00	1.990160E+01	8.379695E-08
2.063491E+00	2.051965E+01	8.378455E-08
2.167801E+00	2.100894E+01	8.382225E-08
2.270986E+00	2.155497E+01	8.380320E-08
2.372776E+00	2.174066E+01	8.287770E-08
2.485415E+00	2.201425E+01	8.378729E-08
2.591836E+00	2.234839E+01	8.380566E-08
2.696239E+00	2.266899E+01	8.380053E-08
2.803359E+00	2.293110E+01	8.380469E-08
2.901620E+00	2.321826E+01	8.341647E-08
3.020439E+00	2.331882E+01	8.378901E-08
3.099453E+00	2.344074E+01	8.382750E-08
3.214460E+00	2.287862E+01	8.384065E-08
3.300923E+00	2.319590E+01	8.380813E-08
3.400187E+00	2.318350E+01	8.383955E-08
3.506159E+00	2.327781E+01	8.380410E-08
3.608968E+00	2.322847E+01	8.383451E-08
3.718848E+00	2.336338E+01	8.381448E-08
3.822297E+00	2.331808E+01	8.385290E-08
3.928007E+00	2.371011E+01	8.383540E-08
4.035362E+00	2.364818E+01	8.385560E-08
4.140842E+00	2.360416E+01	8.381686E-08
4.248976E+00	2.358145E+01	8.383597E-08
4.357128E+00	2.369732E+01	8.375720E-08
4.512154E+00	2.183294E+01	8.633615E-08
4.620808E+00	2.192124E+01	1.105722E-07
4.822959E+00	1.738696E+01	1.155120E-07
4.915059E+00	1.797860E+01	1.164079E-07
5.049326E+00	1.817750E+01	1.166816E-07
5.194325E+00	1.592921E+01	1.175901E-07
5.286871E+00	1.802845E+01	1.000003E-06

The idea is to convert the initial few rows with fields and values in to columns along with the numerical data columns. Something like this,

|TLP I(AMPS)|TLP V(VOLTS)|I(LEAKAGE)|File Name   Operator
|6.239807E-08|3.814697E-06|8.382944E-08|  DUT_7... M.Seva

This does something similar to what you requested. Instead of having one column per Field/Value pair, it has one column for the Field Name and one column for the Value. The code assumes that every file will have 20 lines of meta data at the top and then the numeric data. If that is not true, more complicated data parsing would have to be written.

If you want 20 columns for the Field/Value meta data, my code can be modified to add twenty columns to DATA, take those column names from colname and set the value of the first row of each to the appropriate value from colval.

The table rendering has formatting problems with the data you posted. It displays the numeric data with only two decimal places, rounding many of the values to zero. I did not attempt to fix that.

Note that the file I read in is tab delimited. That is the result of copying the data from your post. The delimiter in your actual file may be different.


ui <- fluidPage(
  titlePanel("Doubt App"),
      fileInput("dataset", "Choose Input File",
                accept = c(".csv"),multiple = FALSE)

server <- function(input, output,session) {
  input_file <- reactive({
    if (is.null(input$dataset)) {
    # Read the text in the uploaded file
    META <- readLines(input$dataset$datapath, n = 20) #first 20 lines are meta data
    DATA <- read.csv(input$dataset$datapath, skip = 20, sep = "\t")
    colname <- str_extract(META, "^[^:]+") #get from beginning of string up to first colon
    colval <- str_extract(META, "(?<=:\t).+") #get text after colon and tab
    NameVec <- vector("character", length = nrow(DATA))
    ValueVec <- vector("character", length = nrow(DATA))
    NameVec[1:20] <- colname
    ValueVec[1:20] <- colval
    DATA$Parameter <- NameVec
    DATA$ParamVal <- ValueVec
  output$fileTable <- renderTable({

shinyApp(ui = ui, server = server)

Thanks for your time.

I'm able to replicate the result. But is there a way to get the colval by doing word search with colname.

Also, as you rightly predicted, my results file need not have only 20 rows of meta data. The format of the results file can have similar type of files combined. It can be any number of files merged.

The template would be,

File Name: 1....................
20 Lines of Meta data...

Numeric data of File name 1

File Name: 2...................
20 Lines of Meta data

Numeric data of File name 2

The Number of File Name for each file will be dynamic. And that's why I feel it will be very hard to segregate based on the line number. Rather it would be a better idea to do it by making a word search in the entire file with the colnames and match it the colvalues. So that even though the file has multiple file Names. the grouping will be done accordingly.

This is really getting complicated. I took the meta data and different subsets of the numeric data and made a file with three sets of data. I detected the position of the lines that start with File Name: and used that to pull out the first subset. The meta data are the first twenty lines of that and those can be processed as before. I made a data frame out of the numeric values using the separate function from tidyr. I don't know if you want to make a data frame for each subset or make one big data frame using rbind to stack up all of the subsets once they are extracted.

DATA <- readLines("c:/users/fxcampos/Documents/R/Play/Dummy.csv")
StartPos <- str_which(DATA, "File Name:") #at which positions is "File Name: found
#> [1]  1 46 89
#You would need to build a loop to handle all the values of StartPos
#Here is how to handle each subset of DATA from a row that starts with File Name:
#to just before the next occurence of File Name:
META <- DATA[StartPos[1]:StartPos[1]+19] #META can be processed as previously shown
VALUES <- DATA[(StartPos[1]+21):(StartPos[2]-1)]
VALUES <- separate(VALUES, col = 1, c("Amps", "Volts", "Leakage"), sep = "\t")
#>           Amps        Volts      Leakage
#> 1 6.239807E-08 3.814697E-06 8.382944E-08
#> 2 4.565469E-04 1.113022E+00 8.383572E-08
#> 3 9.137393E-04 2.133775E+00 8.381606E-08
#> 4 1.667908E-03 3.157775E+00 8.378652E-08
#> 5 1.653103E-03 4.240917E+00 8.381706E-08
#> 6 2.476987E-03 5.178008E+00 8.367549E-08

Created on 2020-03-29 by the reprex package (v0.3.0)

Thanks for the heads up.

The idea is great, I have managed to create dataframes with the Meta data values.

Although I'm finding difficulty in looping the startPos (Position of File Name: location) to find the numerical data.

Below is my code,

StartPos <- str_which(DATA, "File Name:") #at which positions is "File Name: found
    for (i in StartPos) {
      #META <- DATA[StartPos[1]:StartPos[1]+19] #META can be processed as previously shown
      VALUES <- DATA[(i+21):((i+1)-1)]
      #VALUES <-
      #VALUES <- separate(VALUES, col = 1, c("Amps", "Volts", "Leakage"), sep = "\t")
      #SGP.df <- data.frame(Macroid, DeviceName, VALUES, check.names = FALSE)

I think the looping is correct, but output data is just empty.

I am not sure why VALUES is empty but your logic for sub setting DATA is not right. Lets say one of the values of StartPos is 100. That is a vector index where File Name: appears. When i = 100 then DATA[(i+21):((i+1)-1)] means DATA[121:100] capturing the meta data section not the data.
Here is a toy example of a vector containing blocks of 21 header elements followed by data. The data are 1:4, 11:15, and 21:24. I build up the VALUES vector by appending each new result to the previously found values. This is actually an inefficient way to build up a vector but helps to highlight the fact that your code overwrote the previous data with the latest data.

Header <- c("File Name:", LETTERS[1:20])
DATA <- c(Header, 1:4, Header, 11:15, Header, 21:24)
StartPos <- str_which(DATA, "File Name:")

DataStartPos <- StartPos + 21

#The numeric segments end one position before the second and third StartPos values AND
#at the end of the entire vector
DataEndPos <- c(StartPos[2:length(StartPos)]-1, length(DATA))
#> [1] 25 51 76
VALUES <- vector(mode = "numeric", length = 0)
for (i in seq_along(DataStartPos)){
  VALUES <- c(VALUES, DATA[DataStartPos[i]:DataEndPos[i]])
#>  [1] "1"  "2"  "3"  "4"  "11" "12" "13" "14" "15" "21" "22" "23" "24"

Created on 2020-03-30 by the reprex package (v0.3.0)

Thanks, it works great in getting all sets of numeric values together with the column names.

However, as a final step to complete the entire data frame, I still have something to be done. Right now the code that I have is,

input_file <- reactive({
    if (is.null(input$dataset)) {
    #read the input file
    DATA <- readLines(input$dataset$datapath)
    #get Macro details
    Macroid <- vector(mode = "numeric", length = 0)
    Macroid <- grep ( "Macro ID:", DATA, value = TRUE)
    Macroid <- str_extract(Macroid, "(?<=:\t).+") 
    #get device details
    DeviceName <- vector(mode = "numeric", length = 0)
    DeviceName <- grep ( "Device ID/Name:", DATA, value = TRUE)
    DeviceName <- str_extract(DeviceName, "(?<=:\t).+") 
    #get numerical values
    StartPos <- str_which(DATA, "File Name:") 
    DataStartPos <- StartPos + 21
    DataEndPos <- c(StartPos[2:length(StartPos)]-1, length(DATA))
    VALUES <- vector(mode = "numeric", length = 0)
    for (i in seq_along(DataStartPos)){
      VALUES <- c(VALUES, DATA[DataStartPos[i]:DataEndPos[i]])
    VALUES <-
    VALUES <- separate(VALUES, col = 1, c("Amps", "Volts", "Leakage"), sep = "\t")
    SGP.df <- data.frame(Macroid, DeviceName)
    Final.df <- merge(VALUES, SGP.df)

The above code generates two data frames, i.e one for the META data and the other for numeric data. But when I merge, these two data frames, the output is somewhat messy compared to my expected result. In the above code I'm extracting values of the rows macroid, Device Name. Each of these header will have 3 values, as you can understand that each set of numeric values that we generate have one macro id and one Device id.

The expected output is,

First set of numeric data First macro id First Device Id
...................................... First macro id First Device Id
Last line of first set First macro id First Device Id
Secondset of numeric data Second macro id Second Device Id
...................................... Second macro id Second Device Id
Last line of first set Second macro id Second Device Id
Thirdset of numeric data Third macro id Third Device Id
...................................... Third macro id Third Device Id
Last line of first set Third macro id Third Device Id

But right now, with my above code, the data frame is generating one set of numeric data repeatedly for each device and macro headers.

I know some part of merging has to be improved than the one that I have now. Is there a better way to accomplish this.

For the part of your code where you extract the numeric values, try something like this. The idea is to make a data frame out of each set of numeric data, load that with the appropriate Macroid and DeviceName and then glue the data frames together.

StartPos <- str_which(DATA, "File Name:") 
DataStartPos <- StartPos + 21
DataEndPos <- c(StartPos[2:length(StartPos)]-1, length(DATA))
VALUES <- data.frame()
for (i in seq_along(DataStartPos)){
  tmp <-[DataStartPos[i]:DataEndPos[i]])
  separate(tmp, col = 1, c("Amps", "Volts", "Leakage"), sep = "\t")
  tmp$MacroID <- Macroid[i]
  tmp$DeviceName <- DeviceName[i]
  Values <- rbind(Values,tmp)

Yes it works Perfect and the data frame is as expected. Thanks again for your time. Appreciate it.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.