GENERAL INFORMATION 1. Title of Dataset: Corpus of Political Speeches: Policy responses to the Great Recession in the United Kingdom and Spain (2008-2014) 2. Author Information Name: Jose Piquer Institution: University of Cambridge, Department of Politics and International Studies (POLIS) Address: Alison Richard Building, 7 West Rd, Cambridge CB3 9DT, United Kingdom Email: jp766@cam.ac.uk / josepiquerm@gmail.com 3. Date of data collection: 2019-2020 4. Funding sources that supported the collection of the data: This dataset has been developed with the financial support of the Innovative Training Network 'The Post-Crisis Legitimacy of the European Union' (PLATO), Marie Skłodowska-Curie Grant Agreement No. 722581, funded by the European Union’s Framework Programme for Research and Innovation Horizon 2020. 5. Language: English and Spanish SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: The corpus as a whole is shared under a CC BY licence but the dataset contains individual publicly available political speeches that are released under their own licences and have their own copyright holders. If reusing any of the individual political speeches then please refer to the original source and licence information as outlined in the file 'PIQUER_Corpus description.csv', paying particular attention to the information in the following columns: 'Original source' and 'Licence'. 2. Links to other publicly accessible locations of the data: https://doi.org/10.17863/CAM.79047 3. Data sources A. Hansard: https://hansard.parliament.uk B. National Archives: https://webarchive.nationalarchives.gov.uk C. British Political Speech: http://www.britishpoliticalspeech.org D. Truan, Naomi (2016). Parliamentary Debates on Europe at the House of Commons (1998-­2015) [Corpus]. ORTOLANG. https://hdl.handle.net/11403/uk-­parl E. UK Government portal: https://www.gov.uk F. Congreso de los Diputados, Diario de Sesiones: https://www.congreso.es/en/busqueda-de-intervenciones G. PSOE party website: https://www.psoe.es/ 4. Recommended citation for this dataset: Piquer, J. (2021). Policy responses to the Great Recession in the United Kingdom and Spain (2008-2014)[Corpus]. https://doi.org/10.17863/CAM.79047 DATA & FILE OVERVIEW 1. Folders and files List: A. PIQUER_CORPUS SPEECHES_Readme.txt: 'Readme' file with all the relevant information of the dataset B. PIQUER_Corpus description.csv: A file (in 'csv' format) with a detailed description of the full corpus, including a classification of each file along 8 relevant variables C. Corpus_files.zip [Folder]: Main corpus folder containing the individual files of all speeches in 'txt' format. It contains two additional folders corresponding to each sub-corpus (UK and ESP): C1. es_corp_clean files_txt [Folder]: ES sub-corpus containing the transcription of speeches from political leaders of the two crisis governments in Spain (in Spanish). It also includes a file with the full sub-corpus text ('esp_fullcorpus_metadata.txt') esp_fullcorpus_metadata.txt ES PSOE.ZP.PARTYCONF.2008 ES PSOE.ZP.PARL.2008 ES PSOE.SALGADO.PARL.2009 ES PSOE.ZP.PARL.2009 ES PSOE.SALGADO.PARL.2010.5 ES PSOE.ZP.PARL.2010.7 ES PSOE.ZP.PARL.2011.6 ES PSOE.ALONSO.PARL.2011.8-9 ES PSOE.SANTAMARIApp.PARL.2011.8-9 ES PP.RAJOY.PARL.2011.12 ES PP.GOV.FORA.2011 ES PP.MONTORO.PARL.2012 ES PP.DE GUINDOS.PARL.2012.2a ES PP.RAJOY.PARTYCONF.2012 ES PP.DE GUINDOS.PARL.2012.2b ES PP.DE GUINDOS.PARL.2012.5 ES PP.RAJOY.PARL.2012.6 ES PP.RAJOY.PARL.2012.7 ES PP.RAJOY.PARL.2013 C2. uk_corp_clean files_txt [Folder]: UK sub-corpus containing the transcription of speeches from political leaders of the two crisis governments in the UK (in English). It also includes a file with the full sub-corpus text ('uk_fullcorpus_metadata.txt') uk_fullcorpus_metadata.txt UK LAB.BROWN.PARL.2008 UK LAB.DARLING.PARL.2008.2 UK LAB.DARLING.FORA.2008 UK LAB.BROWN.PARTYCONF.2008 UK LAB.DARLING.PARL.2008.10 UK LAB.DARLING.PARL.2008.11 UK LAB.DARLING.FORA.2009 UK LAB.BROWN.PARL.2009.6 UK LAB.BROWN.PARL.2009.7 UK LAB.BROWN.PARTYCONF.2009 UK LAB.BROWN.PARL.2010.3 UK CON.OSBORNE.FORA.2010.5 UK CON.CAMERON.FORA.2010 UK CON.OSBORNE.FORA.2010.6 UK CON.OSBORNE.PARL.2010 UK CON.CAMERON.PARTYCONF.2010 UK CON.OSBORNE.FORA.2011 UK CON.OSBORNE.PARL.2011 UK CON.CAMERON.PARTYCONF.2011 UK CON.OSBORNE.FORA.2012 UK CON.OSBORNE.PARL.2012 UK CON.CAMERON.PARTYCONF.2012 UK CON.CAMERON.PARL.2012 UK CON.OSBORNE.FORA.2013 UK CON.OSBORNE.FORA.2014 DATA-SPECIFIC INFORMATION FOR THE CORPUS CLASSIFICATION 1. Number of variables: 8 2. Variable list for classification: 1. Date: Date of the speech-debate. Format: Year-Month-Day 2. Venue: Setting where the speech was delivered Parl: Parliament Partyconf: Party Conference or Party Congress Fora: Other venues. Examples: House Mansion, Press Conference, etc. 3. Government (majority): File classified by the name of the PM and number of government if more than one term (Governing parties with the largest number of seats in the parliament). Full list: Brown (Labour), Cameron I (Tories/Lib-Dem), ZP II (PSOE), Rajoy I (PP) 4. Speaker: Name of the person delivering the speech, or speaking in the debate 5. Policy response: Where and if relevant, type of policy response(s) covered in the speech Fiscal stimulus: Discretionary (public) spending, tax cuts associated with a fiscal stimulus. Fiscal adjustment: Discretionary spending cuts, tax raises associated with a fiscal adjustment Bank recap and nationalisation: Public interventions in the financial sector of various kinds (recapitalisation, nationalisation, higher capital requirements) Independent authorities: Creation or transfer of competences to (new) independent authorities (central bank, independent fiscal authorities) 6. Code of files: Unique corpus identifiers for each text file. Code format: Country Party.Speaker.Venue.Year [month] - If a file contains a speech given by the same ‘speaker’ (e.g. DE GUINDOS) of the same ‘party’ (e.g. PP) in the same ‘venue’ (PARL) in the same ‘year’ (e.g. 2012), the month will be included in numerical format as an additional value for classification (January=1; February=2; March=3, and so on). For example, a file ending with ‘2012.2’ refers to a speech given in January 2012. If several speeches were delivered in the same month by same speaker of the same party, a letter will be added in ascending order (a, b, c…) at the end as in 'ES PP.DE GUINDOS.PARL.2012.2a'. This coding scheme gives a unique classification to each file. 7. Additional notes: Descriptive notes about the content and/or context of the speech. For example, the location of the party conference where a speech was delivered or specific information about the measures justified in the speech (nationalisation of Bankia in 2012). 8. Source: The original source of the text for each file. METHODOLOGICAL INFORMATION 1. Methods for processing the data: The corpus files are a ‘clean’ or ‘processed’ version of each speech/debate included in the corpus. It has been necessary to process the original texts because (1) the speeches come from various sources and were archived in different formats and (2) the official transcription of political leaders' interventions in parliamentary debates include interventions from other speakers and/or references to issues which are unrelated to the topic of research. The same procedure of compilation and cleaning has been followed for both the Spanish and British sub-corpus. First, a single document has been created for each speech/public intervention. Where necessary, only the relevant content of the relevant speaker has been included. For example, the full transcript of a debate on the state of the nation in the Spanish Congress (‘debate sobre el estado de la nación’) has an extension of more than 70 pages in pdf format. It includes a verbatim transcript of all interventions of all speakers from all parliamentary groups on any topic discussed during the debate. In this sense, only the relevant content of the relevant speaker is included in the corresponding file. In this specific example, only the intervention of the President of the Government and his main response to the leader of the opposition has been included. Secondly, certain texts have been ‘cleaned’ to exclude irrelevant terms (note that this is different from tokenising). Using the software Sketch Engine, terms such as "Señoras y señores", "usted", "señoría", (interjecciones/interjection) or comments about the delivery context included in the official transcript have been removed. Special care has been taken to remove only those terms that have no substantive or linguistic value in this research context. Finally, the corpus has been re-compiled: the version provided includes the files with the 'clean' transcription of each speech as well as two documents with all the relevant speeches and metadata (see 'esp_fullcorpus_metadata.txt' and 'uk_fullcorpus_metadata.txt'). Each file is available in 'txt' format for re-usage and further processing.