appending a vector of a smaller length into a dataframe

Akash01 · August 31, 2024, 11:37am

structure(list(Year = c("1950-51", "1951-52", "1952-53", "1953-54", 
"1954-55", "1955-56", "1956-57", "1957-58", "1958-59", "1959-60"
), PFCE = c(9108.70200078724, 9994.38800854429, 9971.36747568861, 
10850.0731708826, 10098.1953260502, 10100.3466929385, 11912.8215125467, 
12083.6433097481, 13718.3797337824, 14259.893635583), GFCE = c(568.572044296763, 
596.626585956143, 618.135067895001, 652.735669274903, 680.790210934283, 
729.418083143874, 804.230194235554, 939.827145589222, 1008.09319696038, 
1062.33197750185), GFCF = c(1168.97935648308, 1255.8038625884, 
1179.84976837663, 1165.67789102323, 1363.31784916023, 1645.68623522217, 
2052.11010496047, 2052.8078549506, 2058.02252601698, 2243.98394822314
), CIS = c(370.125755337825, 447.10631183316, -56.6819869039023, 
-163.79476752208, 80.9786415544308, 305.402929854317, 457.074253364158, 
317.018444998688, 32.7379373447296, 247.30164893678), Valuables = c("na", 
"na", "na", "na", "na", "na", "na", "na", "na", "na"), `Export of goods and services` = c(736, 
846, 715, 644, 705, 757, 767, 800, 719, 779), `Import of goods and services` = c(711, 
1038, 702, 652, 750, 839, 1174, 1304, 1104, 1010), Discrepancies = c(-1019.74926304924, 
-1238.57771032067, -1062.23693465502, -889.986636174826, -1201.16059002946, 
-1524.07207277061, -1505.32674531827, -1178.85542838485, -1148.76520296551, 
-1480.76521455959), GDP = c(10221.6298938557, 10863.3470586013, 
10663.4333904013, 11606.7053274838, 10977.1214376697, 11174.7818683883, 
13313.9093197886, 13710.4413269017, 15283.468191139, 16101.7459956852
), nom_agg_GDP = c(10871.2534015671, 11654.8184570888, 11782.3523119602, 
12660.4867311807, 12097.3033861447, 12393.4510113046, 14362.1618117427, 
14572.2783102879, 16399.4954567598, 17335.209561308)), row.names = c(NA, 
-10L), class = c("tbl_df", "tbl", "data.frame"))

The data goes till Year 2023-24.
Now i have a vector from year 2012-13 to 2022-23.

How do I append this vector into dataframe?
I thought of two ways:
One to make the same vector of the same length as the dataframe's column length by filling 'na' at other points except the period 2012-13 to 2022-23.
Second to make a smaller data frame with corresponding time and vector values and then append this dataframe using inner_join() using by= 'Year' condition.

How do I go about both the cases?
Also if I want to filter or make a subset of my dataframe by Year values but my year values are in YYYY-YY format how do i go about it.

FJCC · August 31, 2024, 1:40pm

I would make a new column in your original data that is a numeric year. It is then easy to make new data with a sequence of years and join it, I think you want to use left_join(), to the original data.
To make a simple vector of the correct length, you can use the rep() function to fill your NA values.
Both methods are shown below.

library(tidyverse)
#> Warning: package 'ggplot2' was built under R version 4.3.3
DF <- structure(list(Year = c("1950-51", "1951-52", "1952-53", "1953-54", 
                              "1954-55", "1955-56", "1956-57", "1957-58", "1958-59", "1959-60"
), PFCE = c(9108.70200078724, 9994.38800854429, 9971.36747568861, 
            10850.0731708826, 10098.1953260502, 10100.3466929385, 11912.8215125467, 
            12083.6433097481, 13718.3797337824, 14259.893635583), 
GFCE = c(568.572044296763, 596.626585956143, 618.135067895001, 652.735669274903, 680.790210934283, 
         729.418083143874, 804.230194235554, 939.827145589222, 1008.09319696038, 
         1062.33197750185), 
GFCF = c(1168.97935648308, 1255.8038625884, 
         1179.84976837663, 1165.67789102323, 1363.31784916023, 1645.68623522217, 
         2052.11010496047, 2052.8078549506, 2058.02252601698, 2243.98394822314
), 
CIS = c(370.125755337825, 447.10631183316, -56.6819869039023, 
        -163.79476752208, 80.9786415544308, 305.402929854317, 457.074253364158, 
        317.018444998688, 32.7379373447296, 247.30164893678), 
Valuables = c("na", 
              "na", "na", "na", "na", "na", "na", "na", "na", "na"), 
`Export of goods and services` = c(736, 846, 715, 644, 705, 757, 767, 800, 719, 779), 
`Import of goods and services` = c(711, 1038, 702, 652, 750, 839, 1174, 1304, 1104, 1010), 
Discrepancies = c(-1019.74926304924, -1238.57771032067, -1062.23693465502, -889.986636174826, -1201.16059002946, -1524.07207277061, -1505.32674531827, -1178.85542838485, -1148.76520296551, 
                  -1480.76521455959), GDP = c(10221.6298938557, 10863.3470586013, 
                                              10663.4333904013, 11606.7053274838, 10977.1214376697, 11174.7818683883, 
                                              13313.9093197886, 13710.4413269017, 15283.468191139, 16101.7459956852
                  ), nom_agg_GDP = c(10871.2534015671, 11654.8184570888, 11782.3523119602, 
                                     12660.4867311807, 12097.3033861447, 12393.4510113046, 14362.1618117427, 
                                     14572.2783102879, 16399.4954567598, 17335.209561308)), 
row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

#Make a new numeric column with the beginning year.
DF <- DF |> mutate(Year_Index = as.numeric(str_extract(Year, "^\\d{4}")))

NewData <- data.frame(Year_Index = seq(1957,1959), NewCol= c("A","B","C"))
NewData
#>   Year_Index NewCol
#> 1       1957      A
#> 2       1958      B
#> 3       1959      C

NewCol <- c(rep(NA, 7), "A","B","C")
NewCol
#>  [1] NA  NA  NA  NA  NA  NA  NA  "A" "B" "C"

^{Created on 2024-08-31 with reprex v2.0.2}

system · November 29, 2024, 1:41pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.

If you have a query related to it or one of the replies, start a new topic and refer back with a link.