I have been given an oddly structured dataset that I need to prepare for visualisation in GIS. The data is from historic newspapers from different locations in China, published between 1921 and 1937. The excel table is structured as follows:
- There is a sheet for each location, 2. each sheet has a column for every year and the variables for each newspaper is organised in blocks of 7 rows and separated by a blank row. Here's a sample from one of the sheets:
Year,1921年,1922年,1923年
Title of Newspaper 1,直隷公報(Zhi Li Gong Bao),直隷公報(Zhi Li Gong Bao),直隷公報(Zhi Li Gong Bao)
Language,漢文,漢文,漢文
Ideology,直隷省公署の公布機関,直隷省公署の公布機関,直隷省公署の公布機関
Owner,直隷省,直隷省,直隷省
Editor,,,
Publication Frequency,日刊,日刊,日刊
Circulation,"2,500","約2,000","約2,000"
Others,光緒22年創刊、官報の改称,光緒22年創刊、官報の改称,光緒22年創刊、官報の改称
,,,
Title of Newspaper 2,大公報(Da Gong Bao),大公報(Da Gong Bao),大公報(Da Gong Bao)
Language,漢文,漢文,漢文
Ideology,稍親日的,中立,主義一定せず
Owner,合資組織,合資組織,合資組織
Editor,樊敏鋆,樊敏鋆,翁湛之
Publication Frequency,日刊,日刊,日刊
Circulation,,"約1,200","約1,500"
Others,光緒28年創刊、倪嗣仲の機関にて現に王祝山其の全権を握り居れり、9年夏該派の没落と共に打撃を受け少しく幹部を変更して再発行せり、但し資金は依然王より供給し居れり,光緒28年創刊、倪嗣仲の機関にして王郅隆其の全権を握り居たり民国9年夏安福派の没落と共に一時停刊せるか直ちに幹部を変更して再ひ発行せり、但し資金は依然王より供給し居れり、地方に相当勢力あり,光緒28年創刊、初め倪嗣仲の機関にして王郅隆其の全権を握り居たり、民国9年安福派の没落と共に一時停刊せるか直ちに幹部を変更して再刊今日に至れり、資金は依然王より供給し居るも其地盤は昔日の如くならず
,,,
Title of Newspaper 3,天津日々新聞(Tianjin Ri Ri Xin Wen),天津日々新聞(Tianjin Ri Ri Xin Wen),天津日々新聞(Tianjin Ri Ri Xin Wen)
Language,漢文,漢文,漢文
Ideology,日支親善,日支親善,日支親善
Owner,方若,方若,方若
Editor,郭心培,郭心培,郭心培
Publication Frequency,日刊,日刊,日刊
Circulation,"約2,000","約2,000",約900
Others,光緒27年創刊、親日主義を以て一貫す、國聞報の後身なり民国9年安直戦争中直隷派の圧迫を受けたるも遂に屈せさりし,光緒27年創刊、親日主義を以て一貫す、國聞報の後身なり民国9年安直戦争中直隷派の圧迫を受けたるも遂に屈せす、活字不鮮明にして大なる勢力なし,光緒27年創刊、國聞報の後身なり、安直戦争当時直隷派の圧迫甚しかりしも屈せざりき、大なる勢力なきも近時稍販路拡張の傾向あり、親日主義を以て一貫す
,,,
Yes, it's also in numerous non-latin languages, which makes it a little bit more challenging.
I want to create a new matrix for every year, then rotate the table to turn the 7 rows for each newspaper into columns so that I end up with each row corresponding to one newspaper. Finally, I need to generate a new column that gives me the location of the newspaper. I would also like to add a unique identifier for each newspaper and add another column that states the year, just in case I decide to merge the entire dataset into a single matrix. I did the transformation manually in Excel but the entire dataset contains data several from thousand newspapers, so I need to automate the process. Here is what I want to achieve (sans unique identifier and year column):
Title of Newspaper,Language,Ideology,Owner,Editor,Publication Frequency,Circulation,Others,Location
直隷公報(Zhi Li Gong Bao),漢文,直隷省公署の公布機関,直隷省,,日刊,2500,光緒22年創刊、官報の改称,Tientsin
大公報(Da Gong Bao),漢文,稍親日的,合資組織,樊敏鋆,日刊,,光緒28年創刊、倪嗣仲の機関にて現に王祝山其の全権を握り居れり、9年夏該派の没落と共に打撃を受け少しく幹部を変更して再発行せり、但し資金は依然王より供給し居れり,Tientsin
天津日々新聞(Tianjin Ri Ri Xin Wen),漢文,日支親善,方若,郭心培,日刊,2000,光緒27年創刊、親日主義を以て一貫す、國聞報の後身なり民国9年安直戦争中直隷派の圧迫を受けたるも遂に屈せさりし,Tientsin
Is there a way of doing this in R? How will I go about it?