The demography of Red Sea reef fishes since the Last Glacial Maximum Madeleine Anne Emms Darwin College, University of Cambridge This thesis is submitted for the degree of Doctor of Philosophy March 2022 DECLARATION This thesis is the result of my own work and includes nothing which is the outcome of work done in collaboration except as declared in the preface and specified in the text. I further state that no substantial part of my thesis has already been submitted, or, is being concurrently submitted for any such degree, diploma, or other qualification at the University of Cambridge or any other University or similar institution except as declared in the preface and specified in the text. It does not exceed the prescribed word limit for the School of Biology Degree Committee. Madeleine Anne Emms March 2022 ABSTRACT The demography of Red Sea reef fishes since the Last Glacial Maximum Madeleine Anne Emms Coral reefs are at increasing risk of climate-induced mass bleaching events and mass mortality, yet we do not know how coral reef fish species respond to habitat loss on temporal and spatial scales relevant to climate change. The Red Sea represents an ideal model system to address this given that many reef fish populations persisted during the Last Glacial Maximum despite a significant loss of coral reefs. I studied their demogaphic history to determine the impact of environmentally-induced habitat loss. High-throughput sequencing data combined with an Approximate Bayesian Computation framework (including machine learning techniques) provided sufficient power to estimate population parameters for five reef fish species, Dascyllus abudafur, Dascyllus trimaculatus, Dascyllus marginatus, Pomacanthus maculosus, and Carcharhinus melanopterus. The genetic bottleneck experienced during the Last Glacial Maximum (LGM) was not as small as was expected, highlighting the importance of coral reef habitat and refugia. Studying the impact of the LGM on Dascyllus marginatus, a species with a restricted range in the Indian Ocean, enabled me to use this study design to determine that the refugia was unlikely to have been outside the Red Sea, but rather in-situ. The extensiveness of an external population did not appear to affect the response to habitat loss. Lastly, studying the impact of the LGM on Carcharhinus melanopterus, a larger, more motile shark species, showed a similar pattern of response to habitat loss. I then compared the Red Sea barrier with other biogeographic barriers across the Indo-Pacific; in this case it was stronger than some but not as strong as the Indo-Pacific barrier. Overall, the demographic histories showed a similar and mild response to environmentally- induced habitat loss in the Red Sea across species, albeit with some ecological differences. Two case studies allowed me to uncover more about the unique history of the Red Sea, and provided opportunities to discuss other important questions around coral reef refugia and the biogeography of the Indo-Pacific. CONTRIBUTIONS Manuscripts arising from the first and second chapter of this thesis include contributions from my supervisor, Prof. Andrea Manica, my co-supervisors, Prof. Moshe Kiflawi (Ben-Gurion University of the Negev and Interuniversity Institute for Marine Sciences, Israel) and Dr. Pierpaolo Maisano Delser (University of Cambridge, UK), and from co-authors Dr. Vanessa Robitzch (Universidad Austral de Chile, Chile) and Dr. Tilman Alpermann (Senckenberg Research Institute and Natural History Museum, Germany). All contributors provided guidance and comments on the manuscripts. Co-author Dr. Vanessa Robitzch provided restriction site associated DNA sequencing (RAD-seq) data from coral reef fishes Dascyllus abudafur (tissue samples also collected by Dr. Tilman Alpermann and Dr. Bruno Frédérich of the University of Liège, Belgium) and Dascyllus marginatus (tissue samples also collected by Dr. Tilman Alpermann). ACKNOWLEDGEMENTS I would firstly like to thank my supervisor, Prof. Andrea Manica, for wonderful mentorship in what has been an unexpectedly bumpy journey as plans changed due to circumstances outside of our control. I am so grateful for all the support, particularly at difficult and sometimes even unprecedented times. Completing a PhD during a pandemic has not been an easy feat, and I am sure the same can be said for supervising one. I have been lucky to have had such an exemplary PhD supervisor, not only to help me through my degree, but also to learn from and to be able to take those skills forward in any future leadership roles. I would like to extend these heartfelt thanks to my co-supervisors, Prof. Moshe Kiflawi, and Dr. Pierpaolo Maisano Delser. I have learned so much during my PhD, both academically and otherwise, and would simply not have got this far without all of the help, knowledge, and support I received from all of you. I am thankful for all the times I benefitted from hours of scientific discussion, fieldwork, and help with technical skills from academics with such expert knowledge and, just as importantly, patience and understanding. Thank you for your time that you so generously shared with me. I cannot underestimate the impact it has had on both me and my work. Despite all the ups and downs, I have enjoyed this journey, am proud of my research, and am very glad and lucky that I was able to do it with your supervision. I would like to add additional thanks to Dr. Pierpaolo Maisano Delser, who continued to support me in the last year despite taking on a full-time position outside of academia. I would also like to take this opportunity to thank my advisors, Dr. Bill Amos and Dr. Jason Head, for their time and feedback at various points through my PhD. I appreciated the kind, supportive, thought-provoking, and honest discussions about my next steps. I am so grateful for the support of the Whitten Studentship in Marine Biology for funding this project. I simply would not have been able to spend these years conducting my research without it. I am so pleased to see such an amazing community of Whitten scholars growing in the Department of Zoology, and have taken much joy in being a part of it. I am very grateful for the gatherings through the years, spending time and sharing stories and laughs with like-minded people. I would like to thank Claire Barnes and Dr. Laura Fan for their kind words of support and infectious, unwavering interest in the natural world. These connections and experiences have been a wonderful additional aspect of support in my PhD that will stay with me into the future. I am grateful to the Jiggins Group who kindly agreed to host me in their laboratory in the Department of Zoology to process samples I collected in the field. I would also like to thank Dr. Vanessa Robitzch, who was very generous in providing genetic data from collected samples when I reached out at a critical period in my PhD when plans had to change due to the pandemic. Many of these samples were collected by Dr. Tilman Alpermann and some by Dr. Bruno Frédérich, both of whom kindly approved the use of the associated data in my research. Dr. Vanessa Robitzch’s support and interesting discussions around new avenues for my research was so valuable, in addition to the time both she and Dr. Tilman Alpermann spent reviewing submissions as co-authors for manuscripts arising from this work. Similarly, I am grateful to the authors of several published studies who uploaded raw data for available use. This was vital for re-planning my PhD, and I would particularly like to extend my thanks to them for being so willing and reponsive when I contacted them. Dr. Eleanor Miller supported me greatly in the development of multiple maps of sampling sites. I am very grateful to her for sharing the relevant R scripts which were the product of multiple months of work, and for sharing her knowledge and experience of the relevant packages with me when needed. I am also grateful to all my fellow lab members and office colleagues for their continued support throughout my PhD. It has been a friendly, welcoming, and inspiring environment to work in. These experiences have all made me appreciative of the scientific community, and for being able to continue working as colleagues with amazing scientists that I have had the pleasure of meeting in previous years. I have very much enjoyed being a student in the Department of Zoology, and am thankful for every member of staff that has contributed to my experience, and every student that has volunteered their time to support the postgraduate community. I extend my sincere thanks to my examiners, Dr. James Herbert-Read and Dr. Mathias Currat, for their support in the final stages of my PhD, including their taking the time to read through my thesis and provide valuable, knowledgeable, and specialist feedback which contributed greatly to the completed document and an engaging examination. Last but not least, I would like to express my gratitude to all of my family and friends, near and far, for their love and support. Thankyou for always encouraging my ambitions, keeping me going, and keeping me grounded. I dedicate this thesis to my parents, my Gran, and my late Grampy, who inspired my love of the outdoors and who I will miss greatly. TABLE OF CONTENTS Introduction…………………………………………………………..……....….………... 1 Background……………..…………………………………………………………….. 1 Justification………..………………………………………………………………….. 4 Contribution………………..…………………………………………………………. 5 Chapter 1…………………………………………………..……….……..…..…………... 9 1.1 Abstract…………………………………………………………………………… 9 1.2 Introduction………………………………………………………………………..10 1.3 Materials and methods……………………………………………………………. 11 1.4 Results……………………………………………………………………………..18 1.5 Discussion………………………………………………………………………… 22 1.6 Conclusion………………………………………………………………………... 25 Chapter 2…………………………………………………………………….……………. 27 2.1 Abstract…………………………………………………………………………… 27 2.2 Introduction………………………………………………………………………. 28 2.3 Materials and methods……………………………………………………………. 31 2.4 Results……………………………………………………………………………. 41 2.5 Discussion………………………………………………………………………… 46 2.6 Conclusion………………………………………………………………………... 49 Chapter 3……………………………………………………………………….…………. 51 3.1 Abstract…………………………………………………………………………… 51 3.2 Introduction………………………………………………………………………. 52 3.3 Materials and methods……………………………………………………………. 55 3.4 Results……………………………………………………………………………. 62 3.5 Discussion………………………………………………………………………... 68 3.6 Conclusion………………………………………………………………………... 72 Discussion……………...……………………………………………………….………… 75 Conclusion…………………………………………………………………..……………. 79 References……………………………………………………………………...…………. 81 Appendix A: Supplementary information – Chapter 1....………….…...…..…………….. 93 Appendix B: Supplementary information – Chapter 2……...…...………...………………113 Appendix C: Supplementary information – Chapter 3……...………...…………...………127 INTRODUCTION Background Climate change is the biggest current threat to coral reefs (Hughes et al. 2003; Hoegh- Guldberg et al. 2007). There has been an increase in extreme weather events such as hurricanes (Emanuel 2005; Webster et al. 2005) which can directly damage reefs, but more important is the increase in sea surface temperatures which are causing mass bleaching events (Hoegh-Guldberg 1999). “Bleaching” is the term given to the loss of colour resulting from an expulsion of the symbiotic microalgae zooxanthellae (Family Symbiodiniaceae, LaJeunesse et al. 2018) by corals (Brown 1997; Douglas 2003), which usually provide up to 90% of a coral’s nutritional requirements (Muscatine and Porter 1977). Bleaching is a natural process that, it has been hypothesised, could actually help coral colonies to survive increasing temperatures by replacing an unsuitable species of symbiont with one better adapted to the new conditions (i.e. with a different thermal tolerance) (Buddemeier and Fautin 1993; Berkelmans and van Oppen 2006; Ladner et al. 2012). Mass bleaching, where entire reefs bleach simultaneously, is a more modern phenomenon (Glynn 1984; Hughes et al. 2018). Corals can recover by recruiting zooxanthellae from the environment if more favourable conditions return (Lewis and Coffroth 2004), but if the coral remains without its symbionts and their additional energy source, bleaching can lead to coral mortality (Glynn 1984; Hoegh-Guldberg 1999). Large scale coral mortality would be expected to have a direct impact on obligate corallivores, organisms that must feed on coral polyps (Graham et al. 2009), but other reef fishes have also been shown to abandon or avoid bleached or dead reef habitat (Booth and Beretta 2002; Feary et al. 2007; Coker et al. 2009, 2012). Overall reduction in coral cover and reef fragmentaion has been shown to reduce abundance and biodiversity of coral reef fish (Graham et al. 2007b; Airoldi et al. 2008; Pratchett et al. 2008; Paddack et al. 2009), but these studies tend to be at small temporal or spatial scales, such as after specific bleaching events or local hurricane damage. What we do not know is how reef fish populations respond to habitat loss and fragmentation on scales relevant to global climate change. Though we cannot know for certain how species will respond to future climate change, we can try to determine how reef fish populations may respond to climate- induced habitat loss and fragmentation as accurately as possible by understanding how they have been impacted in the past. Such research is crucial as future climate predictions show a continued rise in temperatures, suggesting larger scale and longer term impacts on coral reefs, with further 1 habitat loss and fragmentation (Hoegh-Guldberg et al. 2007; Li and Reidenbach 2014; Descombes et al. 2015). The Red Sea represents an ideal model system to study how reef fish populations may have been impacted by climate-induced habitat loss and fragmentation. It is connected to the Indian Ocean via a single, narrow and shallow strait in the south, the Bab-el-Mandeb strait (Figure 1), which is only ~130m deep (Siddall et al. 2003). Sea levels decreased to ~120m below current levels during the Last Glacial Maximum (LGM) approximately 21,000 years ago (Fairbanks 1989), which would have left the Red Sea effectively isolated (Siddall et al. 2003). Sediment cores show most fauna and flora, even single-celled foraminifera, were absent from the majority of the Red Sea at this time (Fenton et al. 2000), proposed to be mostly due to highly elevated salinity levels (Biton et al. 2008). Certain regions, including the Gulf of Aqaba in the north and the South Red Sea, are thought to have maintained slightly lower salinity levels, and so may have acted as refugia for some species (Figure 1). An alternative refugium has also been suggested in the Gulf of Aden (Figure 1), between the Red Sea and the Indian Ocean (DiBattista et al. 2016a). Figure 1. A map of the Red Sea and surrounding areas. Locations written in bold represent potential coral reef refugia during the Last Glacial Maximum. 2 Despite the huge loss of coral reefs (Sheppard et al. 1992), it has been established that many Red Sea populations of cosmopolitan reef fish species (i.e. those that are present in both the Red Sea and the Indian Ocean) persisted during the LGM, likely in refugia (DiBattista et al. 2013). Mitochondrial DNA, a molecular clock estimate, and coalescence analyses were used to estimate the split times between populations of the same species in the Red Sea and the Indian Ocean, in an effort to understand the genetic impact on reef fishes of various environmental changes that occurred as a result of the formation of the Red Sea. Split times were estimated between ~800,000 and 21,000 years ago, spanning the time of the LGM (DiBattista et al. 2013). Such estimates provide evidence that these Red Sea populations must have persisted through a period of climate- induced habitat loss during the LGM. However, these estimates were based on data from two mitochondrial markers which limits the kind of demographic questions that can be addressed. Recent technological advancements (i.e. high-throughput sequencing) allow us to generate genetic data from thousands of loci, providing more power to address complex demographic questions. This quantity of data can be leveraged with Approximate Bayesian Computation (ABC) methods to distinguish between models and estimate unknown model parameters (Bertorelle et al. 2010; Csilléry et al. 2010) With enough simulations and computation time, randomly sampling input parameters with standard rejection sampling can get an approximate Bayesian posterior close to the actual posterior without needing the likelihood function, which is impossible to compute for complex models. A linear ABC goes further by ranking accepted input parameters by how close they are to the real estimate based on how close the output summary statistics are to the observed data. This makes it possible to fine-tune a rejection-acceptance posterior with less simulations. Though summary statistics reduce the complexity of a large dataset, a large number of summary statistics can still be problematic due to the complexity of ensuring that the observed data are covered by the simulated data, and in selecting the most informative ones. This requires a balance between the loss of information and the dimension of the parameter space, minimising the first while also keeping the number of summary statistics down (Aeschbacher et al. 2012; Blum et al. 2013). Machine learning techniques, namely random forests, have been incorporated into the Approximate Bayesian Computation framework to deal with this problem and more accurately select the most informative summary statistics for both model selection (Pudlo et al. 2016) and parameter estimation (Raynal et al. 2019). With a random forest ABC, there is no filtering of the data with a tolerance level as with standard rejection sampling and linear ABC, and you can input many summary statistics. For each parameter, a training set of randomly sampled input values and resulting summary statistics is required. The random forest will determine the informative summary 3 statistics for each parameter (those that allow inferences to be made about the input value), generating a trained forest of thousands of decision trees. This trained forest can then be used with the observed data to predict an approximate Bayesian posterior for each parameter. Thus, I can use high-throughput sequencing data coupled with such ABC framework to study the demographic history of populations of reef fish species in the Red Sea during the LGM to determine how reef fish populations were impacted by climate-induced habitat loss and fragmentation on scales relevant to global climate change. These two distinct processes (Fahrig 2003) both had the potential to impact the demography of Red Sea populations of reef fishes, the effects of which cannot be disentangled with the current methods, so I hereafter refer to the combined effects of habitat loss and fragmentation under the simplified term of ‘habitat loss’. It is important to note that by using modern genetic data to model how populations may have been impacted in the past, I am focussing on obtaining estimates for effective population size rather than census population size, when it comes to detecting changes in population size. Effective population size is the estimate of the ideal population size, which would be equivalent to the census population size that would have generated the observed genetic data assuming a panmictic population. However, it can be an underestimate when there is a bias in mating systems and not all individuals mate (as in harems for example), or an overestimate when there is structure within the populations. This supports a greater divergence, and so leads to a larger effective population size estimate if still assuming a panmictic population. The difference between effective and census population size is therefore an important consideration when making inferences about potential impacts of climate-induced habitat loss on populations in the past. Justification Coral reef fishes have many vital functional roles in the coral reef ecosystem (Bellwood et al. 2004), one of the most diverse ecosystems on our planet. As well as being of biological and ecological importance, they are also of great value to us directly. As a food source they sustain coastal populations (Teh et al. 2013; Kittinger et al. 2015) but they also have financial value (Spurgeon 1992), being very important for fisheries (White et al. 2000; Grafeld et al. 2017), live exports (Rhyne et al. 2012), and tourism (White et al. 2000; Cagua et al. 2014; Spalding et al. 2017) in many parts of the world. Given such value, coral reef fish populations need to be protected from overexploitation and from rapid environmental change. However, enforced protection is problematic and expensive (McClanahan 1999), so resources should be directed to where they can 4 be most efficient. Understanding which species will be more at risk is complicated because of their very different ecologies. Therefore, understanding how climate change affected them in the past could provide vital information about which species may be more vulnerable to climate change in the future and may require greater protection from environmental change. An important outcome from this research is being able to determine how severely the Red Sea populations were impacted during the LGM, given the fact that they were able to recover to present day populations. This provides us with a quantitative minimum threshold for a known, viable population – a valuable tool for directing conservation efforts to where they are most needed. Lastly, it is important not just to know sizes, but also the location of any impacted populations that were able to persist through times of climate-induced habitat loss. It allows us to ask the questions as to whether it is possible or preferable to maintain likely areas of refugia in threatened or damaged areas, or whether maintaining coral reefs outside these areas would better support population recovery in the future. This research is becoming more critical as climate change continues. The main research question here is how reef fish populations respond to habitat loss on scales relevant to global climate change. Additional research questions that will help our understanding of the population response, and potentially increase the efficacy of direct conservation efforts include: how response may vary across species, the minimum size of an impacted population capable of recovery, and the location of any identified refugia. Contribution I studied the demographic histories of Red Sea populations of several reef fish species to determine how they were impacted by habitat loss on a spatial and temporal scale relevant to global climate change. Previous research studying Red Sea and Indian Ocean populations used mitochondrial DNA, which lacks the power to address complex demographic scenarios. For this research, I sourced pre-generated high-throughput sequencing data, specifically restriction site associated DNA sequencing (RADseq) datasets from collaborators and from the literature, and a published Targeted Gene Capture dataset from the literature - both ideal techniques for sequencing non-model organisms. The data was from modern samples of coral reef fishes from both inside and outside the Red Sea. The latter was required for the migration aspect of the demographic models. After running simulations for the various models, the high-throughput sequencing data was compared against simualtions in an Approximate Bayesian Computation framework to distinguish 5 between proposed alternative demographic models, and estimate population parameters. Given that the data were from colleagues or from the literature, and so I was not able to choose the maximum number of individuals to sample, I conducted multiple power analyses for the different species datasets, to ensure the data and methodological framework had sufficient power to successfully run the aforementioned analyses. My thesis consists of three chapters all addressing research questions that contribute to the overall objective to determine how reef fish populations respond to habitat loss on scales relevant to global climate change, followed by a discussion chapter and conclusion. The first chapter studies how reef fish species’ populations were impacted by climate-induced habitat loss, and how the population response compared across species. I modelled the demographic history of Dascyllus abudafur, Dascyllus trimaculatus, and Pomacanthus maculosus (Figure 2) populations in the isolated Red Sea during the LGM and analysed the population parameter estimates to determine the impact of climate-induced habitat loss. The second chapter focuses on the location of refugia which likely supported species’ populations that were later able to recover. I considered alternative demographic scenarios for the Red Sea population of Dascyllus marginatus (Figure 2), designed with differing locations for refugia during the LGM. The limited range of this species outside of the Red Sea meant it was possible to test the possibility of a suggested external refugium in the Gulf of Aden. Lastly, the third chapter investigates how populations of a larger, more motile, high-level mesopredator on coral reefs were impacted by climate-induced habitat loss, and how the population response compared to the four species of smaller, teleost reef fishes. Given their ability to migrate as adults, initial analysis compared two demographic scenarios to determine if the Red Sea population of Carcharhinus melanopterus (Figure 2) was locally extirpated or persisted during the LGM, followed by an analysis of population parameter estimates to determine the impact of climate-induced habitat loss. Additional sampling sites across the Indo-Pacific also allowed me to study the relative isolation of the Red Sea in the context of the whole Indo-Pacific region. 6 Francois Libert Figure 2. Pictures of the Red Sea coral reef fish species included in this thesis: (A) Dascyllus abudafur, (B) Dascyllus trimaculatus, (C) Pomacanthus maculosus, (D) Dascyllus marginatus, and (E) Carcharhinus melanopterus. Image credit: Francois Libert (A, C, D, E) and Patrick Randall (B). 7 8 CHAPTER 1 Title Red Sea coral reef fish species persisting in refugia suffered similarly minor impacts during the Last Glacial Maximum 1.1 Abstract Despite the continued decline of coral reefs due to increased sea surface temperatures associated with global climate change, the response of coral reef fish populations to environmentally-induced habitat loss has only been studied at small temporal and spatial scales. To our knowledge, this is the first study of the effects of habitat loss on coral reef fish populations at scales more compatible with global climate change. The Red Sea provides an ideal model system, given that low sea levels left it effectively isolated during the Last Glacial Maximum (LGM), leading to increased salinity and significant coral reef habitat loss. Restriction site associated DNA sequencing (RADseq) data was obtained for three coral reef fish species: Dascyllus abudafur, Dascyllus trimaculatus, and Pomacanthus maculosus. The D. abudafur data was shared by collaborators, while the other two datasets were obtained from the literature. They were selected based on the availability of samples from inside the Red Sea and outside, in the Indian Ocean. Though there were only a few suitable datasets available, these species provide a comparison between small-bodied (Dascyllus) and larger bodied (Pomacanthus) reef fish with distinct life histories and ecologies, as well as a comparison between two congeneric species (D. abudafur and D. trimaculatus). High-throughput sequencing combined with an Approximate Bayesian Computation framework via a machine learning tool, provides more information and power for comparing demographic models and estimating population parameters than methods based on mitochondrial DNA. Overall, all three species’ populations were similarly impacted during the LGM. Persisting in coral reef refugia, they were found to have suffered genetic bottlenecks with reduced effective population sizes of ~26% - ~37% of pre-LGM levels, sufficient to enable recovery given renewed habitat availability under more favourable conditions. This suggests that coral reef refugia supported relatively large effective population sizes compared to pre-LGM estimates, though the processes behind this are not yet clear. 9 1.2 Introduction Temperature-induced coral bleaching can lead to mass mortality (Hoegh-Guldberg 1999) on coral reefs, particularly as mass coral bleaching events have dramatically increased in frequency (Hughes et al. 2018). Overall reduction in coral cover and reef fragmentation have been shown to impact reef fish populations, reducing biodiversity and abundance (Graham et al. 2007b; Airoldi et al. 2008; Paddack et al. 2009), but most studies are at small temporal and spatial scales. Our understanding of the effect of habitat loss on coral reef fishes at the kind of scales associated with global climate change is still limited. Studying the effects of historic environmental change enables us to understand potential long-term consequences of habitat loss over large temporal scales. However, a major challenge in reconstructing such effects on large spatial scales is the connectedness of marine systems, making it difficult to track immigration, emigration, or size changes of populations using population genetics tools. The Red Sea, between Africa and Asia, provides an ideal model system to disentangle local demography from the effect of migration, as its basin is enclosed with only one connection to the Indian Ocean via the narrow Bab el Mandeb strait in the south. During the Last Glacial Maximum (LGM) approximately 21,000 years ago, low sea levels reduced coastal habitat by ~90% in tropical latitudes of the Indian Ocean (Ludt and Rocha 2015) and left the Red Sea effectively isolated (Siddall et al. 2003) developing highly saline conditions (Biton et al. 2008). These exceeded the tolerance of most organisms, with the absence of planktonic foraminifera in sediment samples from much of the Red Sea at this time suggesting aplanktonic zones (Fenton et al. 2000) , and the loss of many Red Sea coral reefs (Sheppard et al. 1992). There were certain areas, however, where conditions were believed to have been slightly less severe and foraminifera seem to have persisted (Fenton et al. 2000), likely along with other organisms capable of surviving the environmental conditions. Thus, I can use this model system to study the demographic response of different reef fish populations to large scale habitat loss. Following previous work with mitochondrial DNA, it is believed that isolated populations of cosmopolitan species persisted in the Red Sea during the LGM and later remixed with Indian Ocean populations (DiBattista et al. 2013). The results suggest initial separation times ranging from 21,000 to ~800,000 years ago (DiBattista et al. 2013). Red Sea coral reef fishes likely only persisted in refugia, suggested to have been in the Gulf of Aqaba in the north, in the south, or just outside the Red Sea in the Gulf of Aden (DiBattista et al. 2016a). These cosmopolitan species therefore provide a simple model system of a fish population that underwent a bottleneck, followed by 10 growth with immigration via a single connecion to the Indian Ocean since the LGM. Reconstructing the size of those bottlenecks and the speed of recovery could then determine the degree to which coral reef fish populations can be impacted while maintaining the capacity for recovery, and how quickly they can recover following habitat loss. To reconstruct the population dynamics in the Red Sea during and after the LGM, I will use genomic data from three species from inside and outside the Red Sea. Mitochondrial DNA lacks the power for complex demographic inference, but the large number of markers provided by genomic data (in this case RADseq autosomal loci), coupled with an Approximate Bayesian Computation framework, has the potential of providing a detailed picture of the impact of climate-induced habitat loss within the last ~30,000 years. The aims of this study are 2-fold: 1. Determine a minimum effective population size in several cosmopolitan reef fish species in the Red Sea, known to have supported a viable population over extensive periods of time and 2. Identify any differences in the response to, or recovery from, habitat loss between species based on their ecology. 1.3 Materials and methods Sample collection Dascyllus abudafur samples were collected by collaborators Dr. Vanessa Robitzch (Universidad Austral de Chile, Chile), Dr. Tilman Alpermann (Senckenberg Research Institute and Natural History Museum, Germany), and Dr. Bruno Frédérich (University of Liège, Belgium) using SCUBA or snorkelling between 2006 and 2015 and prepared for sequencing (see below). Published RADseq datasets with samples from both the Red Sea and the Indian Ocean were obtained from the literature for two further species: Dascyllus trimaculatus (Salas et al. 2019), and Pomacanthus maculosus (Torquato et al. 2019). Single nucleotide polymorphism (SNP) library preparation and sequencing (D. abudafur only. Work completed by Dr. Vanessa Robitzch) Genomic DNA was extracted from fin or gill tissue preserved in 96 % ethanol using a Nucleospin-96 Tissue kit (Macherey-Nagel, Germany). Double digest restriction associated DNA (ddRAD) libraries were prepared using 500 ng of high quality DNA per sample following the protocol described by Peterson et al., (2012) with some modifications. The libraries, containing 11 single end reads, were run on a HiSeq 2000 Illumina sequencer (see Supplementary information for more details). Power analysis, de-novo loci assembly and filtering For all three species, a minimum of nine and a maximum of 15 samples (to limit computation time, if more were available) were used for each of the Red Sea and Indian Ocean populations, determined to be sufficient following a power analysis. One thousand targets were randomly selected from 10,000 simulations generated for each species (with species-specific number of individuals, number of loci, and length of loci; see parameter estimation below) to be used as pseudo-observed data (pods). The resulting population parameter estimates were cross- validated with the original parameter values used to generate the pods. Scatterplots for the cross validation of the estimated mode and median with the original simulated data show accurate estimates for all parameters (Figures S1-S3 in Supplementary information) and low error values (Tables S1-S3 in Supplementary information). This demonstrates adequate power in our analyses for parameter estimation across all three species with these sample numbers. Where possible, numbers were balanced across sample sites within the basins, and samples were randomly selected if there was an excess. Samples from the southern Red Sea were not included given demonstrated examples of population structure associated with distinct environmental conditions (Nanninga et al. 2014; Giles et al. 2015). This resulted in a total of 30 D. abudafur samples, and 24 D. trimaculatus and P. maculosus samples (Figure 1 and Table 1). Sequences were demultiplexed and filtered for quality using the 'process_radtags.pl' pipeline in STACKS v.2.53 (Catchen et al. 2011), run up to the creation of the bam files. Resulting bam files were filtered for a minimum coverage of eight using samtools v 1.9 (Danecek et al. 2021), before continuing with the ‘gstacks’ and ‘populations’ components of the ‘denovo_map.pl’ pipeline (see Supplementary information for further details on de-novo loci assembly and loci filtering). 12 Figure 1. A map of sampling locations and sample sizes in the Red Sea and Indian Ocean for Dascyllus abudafur, Dascyllus trimaculatus, and Pomacanthus maculosus. Sampling sites within the two black boxes in the plot on the left are shown in greater detail in the two subplots on the right, namely the North and Central Red Sea (above) and the Chagos Archipelago (below). 13 Table 1. Sampling site information (Region, Location, Latitude/Longitude, and Degrees, Decimal, Minutes (DDM) GPS coordinates) and sample sizes for three cosmopolitan Red Sea coral reef fish species, Dascyllus abudafur, Dascyllus trimaculatus, and Pomacanthus maculosus. Region Location Lat/ Long DDM Co- ordinates Species Dascyllus abudafur (unpublished, from Robitzch, Alpermann, and Frédérich) Dascyllus trimaculatus (Salas et al. 2019) Pomacanthus maculosus (Torquato et al. 2019) North Red Sea Eilat, Israel 29.51734.95 N 29°31′ E 34°57′ 6 Gulf of Aqaba, Saudi Arabia 28.4 34.733 N 28°24′ E 34°44′ 7 2 Jazirat Burcan, Saudi Arabia 27.9 35.05 N 27°54′ E 35°03′ 8 3 1 North- Central Red Sea Al-Fahal / Al Wusul, Saudi Arabia 22.217 38.95 N 22°13′ E 38°57′ 9 Abu Shosha, Saudi Arabia 22.3 39.033 N 22°18′ E 39°02′ 3 Indian Ocean Lamu, Kenya 2.26740.9 N 2°16′ E 40°54′ 9 Salomon Attoll, Chagos Archipelago -5.325 72.244 S 5°19′ E 72°14′ 4 Peros Banhos, Chagos Archipelago -5.349 71.851 S 5°20′ E 71°51′ 1 Great Chagos Bank, Chagos Archipelago -6.212 72.115 S 6°12′ E 72°6′ 9 Diego Garcia, Chagos Archipelago -7.322 72.424 S 7°19′ E 72°25′ 1 Madagascar -23.36743.633 S 23°22′ E 43°38′ 15 Total: 30 24 24 Summary statistics A folded 2-dimensional site frequency spectrum (2D SFS) was generated for each species using a modified series of functions from the vcf2sfs R script v 2.0 (Liu et al. 2018). The ‘populations’ component of the ‘denovo_map.pl’ pipeline (introduced in the previous paragraph) was used to select loci present in at least 85% of individuals in both the Red Sea and Indian Ocean populations. Any missing data were imputed based on frequencies observed in other individuals. 14 Monomorphic site counts were excluded. Summarising the 2D SFS, the between-population nucleotide diversity (π) was also used as an additional summary statistic. Parameter estimation There is evidence of bidirectional colonisation between the Red Sea and the Indian Ocean across different coral reef fish species (DiBattista et al. 2013), though none of the species in this study were included in the research. To keep the model standard across the three cosmopolitan species, it is assumed that all three species originated in the Indian Ocean and later colonised the Red Sea. The demographic model in Figure 2 depicts the demographic history of Red Sea coral reef fish populations since before the LGM. The model represents the colonisation of the Red Sea at time ‘tanc’, and the persistence of a Red Sea population of size ‘Nbott’ in a refugium during the LGM, having undergone a genetic bottleneck at time ‘tbott’ reducing the effective population size from pre-LGM estimates of size ‘Nanc’. Migration ‘mig’ between basins returns following the LGM, with population recovery starting at time ‘trec’, continuing for the period ‘tleng’, and with the growth rate ‘gr’. Current effective population sizes are estimated for both the Red Sea ‘Npop1’ and the Indian Ocean ‘Npop0’ (Figure 2; for parameter priors see Table 2). This model, though simplified, provides a fair representation of the likely demographic history, given it is based on a mostly enclosed system with a well studied history supported by geological and climatic data. Figure 2. A demographic model representing the demographic history of Red Sea populations of several cosmopolitan reef fish species since before the Last Glacial Maximum (LGM). It covers the colonisation of the Red Sea, persistence in the Red Sea and a genetic bottleneck during the LGM, and the following population recovery. 15 Table 2. The range of prior values and definitions for the population parameters of a demographic model (Figure 2) representing the Red Sea populations of several cosmopolitan reef fish species since before the Last Glacial Maximum. Units include: ‘kya’ thousand years ago, ‘y’ years ago, ‘k’ thousand individuals. Parameter name Definition Prior range tanc Number of years since colonisation of Red Sea 40kya – 200kya Nanc Pre-LGM effective population size in Red Sea 104.7 (~50k) – 105.5 (~300k) tbott Number of years since the start of a genetic bottleneck 24kya – 36kya Nbott Effective population size of bottleneck in Red Sea 103.5 (~3k) – Nanc resize Ratio of effective population size change Nanc/Nbott trec Number of years since start of population recovery 11.5kya – 15kya tleng Number of years of population recovery 150y – (trec – 1) gr Growth rate during population recovery (log(Nbott/Npop1))/tleng mig Proportion of individuals per year migrating between basins 0.1x10-6 – 0.35x10-3 Npop0 Current effective population size in Indian Ocean 104.7 (~50k) – 105.9 (~800k) Npop1 Current effective population size in Red Sea Nbott – 105.5 (~300k) The prior range for some of the parameters was determined based on previously described assumptions, knowledge of the LGM in the Red Sea, and limitations of the analyses. For example, as it was assumed that these species remained in the Red Sea during the LGM, the colonisation of the Red Sea ‘tanc’ was set prior to the LGM at a minimum of 40kya. The maximum, 200kya, extends beyond the previous interglacial period (~115-130kya), but not further as it is a difficult parameter to estimate past a certain point, and does not impact the rest of the model too much once a more recent bottleneck occurs. The number of years since the start of the genetic bottleneck ‘tbott’ was set with a minimum of 24kya so the bottleneck occurred by the time of the LGM, and no older than 36kya - a realistic maximum if the populations were impacted by the LGM specifically. The effective population size of the bottleneck ‘Nbott’ was set with a maximum equalling the pre-LGM effective population size in the Red Sea, so it was possible for the population to be stable but not to grow. The ratio of effective population size change ‘resize’ was computed by dividing the pre-LGM 16 effective population size in the Red Sea by the effective population size of the bottleneck, for each simulation rather than from estimates from within the other distributions. The larger the ratio, the greater the population size change. The number of years since the start of the population recovery ‘trec’ also had a well defined range, as it is unlikely that the environment started to become more favourable before 15kya, and the climate has been relatively stable since ~11.5kya. The number of years of population recovery ‘tleng’ could have been very short with a minimum of 150y in case of almost no bottleneck, or could continue up to the present day by setting the maximum as one less than the start of the population recovery ‘trec’. The growth rate ‘gr’ during population recovery was computed by dividing the effective population size at the bottleneck by the current effective population size in the Red Sea and dividing that by the length of the recovery period. Given that the simulations run backwards in time, a negative growth rate between the present day and the genetic bottleneck represents population recovery since the bottleneck. Lastly, similar to the relationship between ‘Nbott’ and ‘Nanc’, the minimum current effective population size in the Red Sea, ‘Npop1’, was equal to ‘Nbott’, so that the population could remain stable and not recover, but would not eperience any futher decrease in size. The remaining parameters were tested with larger range estimates, and finetuned based on the shape of the posteriors. Parameter estimation was performed using an ABC framework via random forests (R package ‘abcrf’ v 1.8.1), a machine learning tool which enables greater power and robustness against the choice of summary statistics used (Pudlo et al. 2016; Raynal et al. 2019). Individual simulations were run using fastsimcoal2 (fsc25 v 2.5.2.21, Excoffier et al. 2013), using appropriate sample sizes (2N due to simulating haploid individuals) and number and size of loci based on the observed data (loci length was rounded to nearest multiple of five) (for more details on generating model simulations, see Supplementary information including Figure S4). Times are presented in years but were simulated in generations, so prior ranges (Table 2) were adjusted depending on generation time estimates for the species; 2 years for D. abudafur, 3 years for D. trimaculatus, and 4 years for P. maculosus (based on size/genus as in Crane et al. (2018), and Delrieu-Trottin et al. (2020)). Though empirical estimates for the generation times for the species in this study are unknown and thus could vary from the above, this would have limited impact on the main focus of this study – the ratio between pre- and post-LGM population sizes. Most mutation rates for fishes have been estimated at 1-2% per million years, and here I settled on the most commonly used rate in the literature: 1 x 10-8 per site per generation (Delrieu-Trottin et al. 2020). The 2D minor allele SFS and between-population π were used as summary statistics. I ran 10,000 simulations for each 17 species. For parameter estimation with the ‘abcrf’ package, a forest was trained for each demographic parameter based on the summary statistics generated during the simulations; the posterior distribution of each parameter then inferred based on the observed data. The number of trees was set to 1,000 (Pudlo et al. 2016). For ease of reading, I reported the results using the median, or the measure of central tendency of the posterior distributions with the lowest error according to the power analysis. The 0.025 and 0.975 quantiles for the posterior parameters can be found in Table S4 in the Supplementary information. 1.4 Results Data summary Following sample filtering, de-novo loci assembly (STACKS v2.53), and loci filtering, the number of loci retained for the two Dascyllus species ranged from 4,595 (D. trimaculatus) to 9,375 (D. abudafur), rounded to 95 base pairs (bp), compared to P. maculosus at 41,587 loci rounded to 90bp in length. This corresponded to 14,954 SNPs identified for D. trimaculatus across both populations, 30,332 SNPs for D. abudafur, and 44,685 SNPs for P. maculosus. The number of SNPs and their general distribution across the spectra can be seen in plots of the imputed, folded 2D SFS in Figure 3. The computed between-population π was 0.0048 for D. trimaculatus, 0.0055 for D. abudafur, and 0.0027 for P. maculosus. 18 Figure 3. Heat maps of the imputed, folded 2-dimensional site frequency spectra for Red Sea and Indian Ocean chromosomes of Dascyllus abudafur, Dascyllus trimaculatus, and Pomacanthus maculosus. Each square in the plots represents a SNP count present in a particular number of Red Sea and Seychelles chromosomes, which can give insights into the demography of the populations. All three species’ spectra have a relatively similar pattern, with higher numbers of rare alleles which can indicate large populations or a recent expansion. Dascyllus abudafur and Pomacanthus maculosus appear to have a higher frequency of common private alleles (those present in many chromosomes in one population and zero chromosomes in the other) than D. trimaculatus which could suggest lower migration levels. Monomorphic SNPs were not included in the dataset. ABC with random forests Before estimating population parameter values, the observed data needed to be compared with the simulated data to confirm that the model could recover the observed summary statistics. I tested how well the model fit the data by plotting a histogram of observed and simulated between- population π (Figure 4). 19 Figure 4. Histograms of simulated data generated with the demographic model in Figure 2, for Dascyllus abudafur, Dascyllus trimaculatus, and Pomcanathus maculosus, with the observed data overlaid in red. The observed data is well within the range of simulated data for all three species. In addition to ensuring that the number of samples, number of loci, and summary statistics provided sufficient information to be able to accurately estimate population parameter values, the power analysis was also used to determine the most appropriate way to summarise the posterior distributions by identifying the measure of central tendency with the lowest error. As discussed previously, error values were low in general, but across all three species, the median had the most consistently low error values across parameters (Tables S1-S3 in Supplementary information) so was used to report the parameter estimates below. For the two Dascyllus species, posteriors were similar for most parameters. On the other hand, posteriors for P. maculosus differed for several parameters. Starting with original colonisation of the Red Sea prior to the LGM, the estimate of the time of arrival in the Red Sea ‘tanc’ was ~60 - ~95 thousand years ago (kya) in the two Dascyllus species, with no clear signal in P. maculosus. The effective population size in the Red Sea prior to the LGM ‘Nanc’ suggested large values across all species (more than 155k individuals). A small genetic bottleneck then occurred around the time of the LGM which affected the estimates of more recent population parameters, though there was no real signal to estimate the specific time of the bottleneck ‘tbott’. This bottleneck appeared to have reduced effective population sizes ‘Nbott’ to a minium of ~65k individuals, with pre- to post- bottleneck population size ratios of ~2.7 - ~3.8 across species. This was calculated for each simulation, rather than from estimates from other distributions. When considered as a percentage 20 (eg. 100/3.8), these resize ratios translated to effective population size estimates of ~26 - ~37 % of pre-LGM estimates, from which all species were able to recover (see Figure 5, and Table S4 in Supplementary information). Figure 5. Plots for the posterior distributions following population parameter estimation for the modelled demographic history of Red Sea populations since before the Last Glacial Maximum (LGM) for three cosmopolitan Red Sea reef fish species: Dascyllus abudafur, Dascyllus trimaculatus, and Pomacanthus maculosus, using an Approximate Bayesian Computation (ABC) framework coupled with random forests methods. Prior distributions are shown in grey. 21 Looking at post-LGM recovery, this may have begun ‘trec’ ~14kya in P. maculosus, though the signal was weak within the narrow prior range, and there was no clear estimate for the two Dascyllus species. The negative growth rate back in time ‘gr’ demonstrated populations were recovering towards the present day, with growth rates of ~0.3 - ~0.4 x 10-4 per year for the Dasycllus species and ~0.6 x 10-4 per year for P. maculosus. The period of recovery ‘tleng’ seemed shorter for the two Dascyllus species, ~5 - ~6k years, compared to a longer period of ~8k years for P. maculosus which could be due to a slower life history with a longer generation time (in a related species; Crane et al. 2018) and larger body size (Allen 1985, 1991). Separately, migration ‘mig’ between the Indian Ocean and the Red Sea, when sea levels allowed, was higher in D. trimaculatus at ~1.2 x 10-4 of all individuals per year, compared to lower estimates of ~0.2 - ~0.05 x 10-4 of all individuals per year for D. abudafur and P. maculosus. Lastly, estimates for current effective population sizes for the Red Sea ‘Npop1’ were similar to pre-LGM levels across species (more than ~145k individuals). Estimates for current effective population sizes for the Indian Ocean ‘Npop0’ varied, with much larger estimates than in the Red Sea for the Dascyllus species (more than ~390k individuals), but similar estimates to the Red Sea for P. maculosus (~120k individuals, see Figure 5 and Table S4 in Supplementary information for details). 1.5 Discussion This study revealed that populations of reef fish species were similarly impacted by environmentally-induced habitat loss at spatial and temporal scales relevant to ongoing global climate change, focussing specifically on the model Red Sea system during the LGM. All three study species underwent only a mild genetic bottleneck in the Red Sea during the LGM. With a couple of notable exceptions in the parameters, two Dascyllus species (D. abudafur and D. trimaculatus) were shown to have similar demographic histories and population parameter estimates, with the third species, P. maculosus, undergoing a similar population response but with more distinct population parameter estimates. The genetic bottlenecks only reduced the estimated effective population sizes to approximately ~26 - ~37 % of pre-LGM estimates, so relatively large populations persisted in the Red Sea during the LGM and all species were able to recover to the populations present today (though only D. trimaculatus has reached median pre-LGM population sizes). I compared the demographic histories of the three Red Sea coral reef fish species studied, and discuss how our understanding of their different ecologies supports our findings. 22 The differences in population parameter estimates for P. maculosus, and for the one parameter at which the Dascyllus species (D. abudafur and D. trimaculatus) differ, are likely due to significant variation in ecology and life history across the species. Notably, the much smaller estimate for current effective population size of P. maculosus in the Indian Ocean likely reflects the smaller range size for this species covering only the northwest Indian Ocean (Randall 1988), compared to ranges extending across to the Sunda shelf for D. abudafur (Borsa et al. 2014) and into the western Pacific for D. trimaculatus (Randall and Allen 1977). The higher levels of migration between the Indian Ocean and the Red Sea for D. trimaculatus (when sea levels allowed, both pre- and post-LGM) are consistent with our knowledge of their swimming capabilities. In most reef fishes, migration mainly happens at the larval stage and it is now understood that larval reef fish have significant swimming capabilities rather than behaving as passive particles. Pomacentrid larvae (eg. Dascyllus spp.) have been shown to have a higher average U-crit score – a measure of their maximum critical swimming speeds – than Pomacanthids (eg P. maculosus) (Fisher 2005), which has been associated with greater distance dispersal events and increased connectivity (Nanninga and Manica 2018). Though none of the species in this study were included in Fishers’s experiments (2005), Dascyllus aruanus - only recently described as a distinct species from D. abudafur (Borsa et al. 2014) - recorded one of the lowest average U-crit scores of all tested Pomacentrids, closer to the Pomacanthid than a second Dascyllus species. Though consistent with our results, I cannot be sure how widespread this relationship with migration may be given this study includes only two genera and three species. The estimated genetic bottleneck was not severe, in the order of ~26 % to ~37 % of pre- LGM population sizes. The similarity in the resize ratio of pre- and post-LGM population sizes suggests that, despite differences in their ecologies, their shared dependence on coral reefs was sufficient to result in similar impacts of habitat loss across species. Care must be taken before making too broad generalisations however, given this was only for three species and previous research provides evidence for species-specific responses to environmental change amongst congeneric marine fishes (Moore and Chaplin 2014). Another example includes the effective population size in king penguins during the LGM which was estimated at ~25% of pre-LGM population size, likely due to the fragmented distribution of their island breeding and foraging resources being easily impacted by climate change, while no such bottleneck was detected in congeneric emperor penguins on the mainland (Cristofari et al. 2018). Suitable habitat availability appears to be critical for species survival during extended periods of environmental change. However, despite substantial coral reef habitat loss during the LGM, Red Sea populations of reef fish species persisted at relatively large effective population sizes compared to ancestral population 23 sizes, likely in coral reef refugia. The minimum effective population size estimate of ~65k individuals was not dissimilar to a ~60k estimate that was sufficient to sustain an endemic species at an isolated atoll (Crane et al. 2018). This result generates new questions as to why Red Sea populations of reef fish species did not suffer a greater reduction in effective population size, and how they were supported by coral reef refugia. This apparent capacity to support relatively large numbers of individuals may favour theories of multiple refugia for Red Sea populations during the LGM (DiBattista et al. 2016a). Studying the location of the refugia is likely the key to understanding how they supported reef fish populations. For example, multiple, small refugia could form a metapopulation where each deme could suffer a reduction in effective population size (and therefore diversity) but when the connectivity is re-established, the overall population would present a substantially higher level of genetic diversity compared to each deme. The higher divergence associated with population structure can result in an overestimate of effective population size if still assuming a panmictic population. How does the level of population decline in this study compare to recent anthropogenically- mediated declines? Off the west of Australia, a decrease in abundance of both fishery targeted and non-targeted species alike, including small wrasses and butterflyfishes, of up to 22% per year was reported (Vanderklift et al. 2019). Significant declines in coral reef fish density are also occurring in most sub-regions of the Caribbean, similar for both fished and non-fished species, of up to 6% loss per year in three trophic groups (Paddack et al. 2009). Moreover, in Papua New Guinea, 50% of local reef fish species declined by > 50% over eight years (Jones et al. 2004). These declines are sizeable, particularly if they are to continue over longer time frames, and it is not clear from the reconstructed bottlenecks during the LGM whether populations of coral reef fishes have the potential to recover from such levels. Monitoring of species following the setting up of marine protected areas (MPAs) confirms that rapid increases in numbers are possible (Hamilton et al. 2011). However, a key element for the survival of reef fish populations and return to prior densities is a healthy coral reef. Following the elevated salinities during the LGM, the return of more favourable conditions resulted in the expansion of coral reefs throughout the Red Sea, with increasing habitat availability supporting the recovery I observe in post-LGM reef fish populations. The length of recovery period (thousands of years) suggests this is not dictated by demographic processes such as migration which has influence over shorter timescales (Booth et al. 2018), but more likely by habitat availability. In contrast, given levels of anthropogenic impact on the climate and the alternative threat of a continuing rise in water temperature, it is not clear whether areas suffering recent coral reef habitat 24 loss will recover. Previous studies identified loss in coral cover as a likely cause of reef fish population declines (Jones et al. 2004; Wilson et al. 2006; Munday et al. 2008; Pratchett et al. 2008, 2012; Paddack et al. 2009; Vanderklift et al. 2019), and in Papua New Guinea, greater dependence on living coral as juvenile recruitment sites seemed to result in greater observed declines in abundance (Jones et al. 2004). Interestingly, none found any effect of no-take (no fishing) Marine Protected Areas when observing temporal patterns in reef fish abundance. Simply designating no- take areas sufficient in size to sustain the minimum effective population sizes estimated in this study would not likely be effective. Conservation efforts need to prioritise a reduction in coral reef degradation, thus protecting the habitat of coral reef fishes. This is particularly important as the full effects of recent habitat loss may not yet have materialised. The response of fish populations to the loss of coral has been shown to have a lag of 5-10 years in Indo-Pacific (Graham et al. 2007b), possibly longer in the Caribbean (Paddack et al. 2009). 1.6 Conclusion Overall, this study confirmed the persistence of Red Sea populations of reef fish species during periods of environmentally-induced habitat loss, and highlighted a similar population response in the form of a mild genetic bottleneck across species. I also provided evidence for some species to recover from a maximum 74% population reduction over large spatial and temporal scales. This estimate is not of great value as a population recovery threshold, but it does indicate that, compared to ancestral population sizes, relatively large populations of reef fish species persisted in the Red Sea during the LGM, likely in refugia. Studying Red Sea refugia in greater detail may reveal how they supported such population sizes, and why there was not a more severe reduction in Red Sea populations of reef fish species. Although this system is limited to the Red Sea, it provides a solid example to investigate other reef fish populations across the world. More importantly, the ability of these species to recover is strictly linked to habitat availability. Therefore, efforts have to be made in order to reduce local stressors on coral reefs and to reverse current habitat loss due to global climate change. 25 26 CHAPTER 2 Title Sticking it out when the going gets tough: in-situ Red Sea coral reef refugia maintained a coral reef fish species during the Last Glacial Maximum 2.1 Abstract Given the ongoing rise in sea surface temperatures threatening coral reefs around the world, the unique environment of the Red Sea has become an important source of knowledge on how species or populations may respond to environmental change. Many Red Sea populations of reef fish species persisted during periods of habitat loss during the Last Glacial Maximum (LGM), likely in refugia. I studied the most likely location of these refugia, either inside or outside the impacted area, and how many there were – a single refugium or multiple refugia. Widespread populations in the Indian Ocean make it difficult to assess the likelihood of a refugium located outside the Red Sea, as they risk dilution of the Red Sea signal. Though often considered a Red Sea endemic, Dascyllus marginatus is also found in a limited number of other isolated populations. It represents a model species to be able to study the location of Red Sea population refugia during the LGM given its restricted range in the Indian Ocean lessening the genetic impact on any temporary refugium outside the Red Sea. Alternative demographic models were fitted with an Approximate Bayesian Computation framework via a machine learning tool, using restriction site associated DNA sequencing (RADseq) data from multiple populations. Dascyllus marginatus showed a demographic history during the LGM similar to that of two previously studied cosmopolitan congeneric species, with an effective population size estimate of ~77% of pre-LGM estimates from which the population was able to recover. Thus, for this species, the lack of a large external population did not affect population response to or recovery from environmentally-induced habitat loss. I also showed that populations of D. marginatus currently found in the southern part of the Red Sea were unlikely to have descended from a post-LGM expansion from the neighbouring Gulf of Aden; rather, Red Sea populations persisted inside the Red Sea, most likely only in the north. My results thus provide evidence that in-situ coral reef refugia (possibly even a single one) are capable of maintaining a reef fish species population during periods of habitat loss, that is capable of recovery given the return of 27 more favourable conditions. This supports suggestions that we should consider directing resources towards identifying and protecting potential climate-change refugia on coral reefs. 2.2 Introduction As climate change continues to alter local environments around the world, organisms are having to adapt, migrate, or risk extinction (Nogués-Bravo et al. 2018). However, some populations can persist in pockets of favourable habitat, known as refugia, that are buffered from the surrounding environmental change (Keppel et al. 2012). Refugia are important as they could enable the recovery of the species given the return of more favourable conditions (Hampe et al. 2013). Thus, it is not surprising that it has been suggested that we focus conservation efforts on climate- change refugia as one strategy for adapting to climate change (Morelli et al. 2020). Climate-change refugia have mostly been studied in terrestrial systems when modelling glaciations and species’ changing distributions (Taberlet and Cheddadi 2002; Provan and Bennett 2008), but they are also likely to be important in tropical areas (eg. Graham et al. 2007a) where organisms are often living close to the other end of their thermal limits. In the face of continued environmental change, there have been attempts to hypothesise the location of potential refugia on coral reefs based on different environmental characteristics including sea surface temperature, wave height, and iridescence (Ban et al. 2016). The Red Sea has become an important source of information on how species or even populations may have adapted to high temperatures and salinities. Thus, it is important to understand the history of the region. The Red Sea underwent drastic environmental changes during the LGM (Siddall et al. 2003; Biton et al. 2008), resulting in the loss of a large proportion of its fauna and flora (Fenton et al. 2000), including coral reefs (Sheppard et al. 1992). Based on previous studies and the results from my first chapter, we know that many Red Sea reef fish species’ populations persisted during the LGM (DiBattista et al. 2013), likely in refugia. In the previous chapter, I showed that these refugia supported relatively large population sizes, enabling them to survive and subsequently recover. A key question for the persistence of Red Sea fish populations is the location of these refugia. They have been purported to have been in the Gulf of Aqaba, the South Red Sea, or the Gulf of Aden based on several different lines of evidence (DiBattista et al. 2016a). However, we do not know the most likely location, either inside or outside the Red Sea basin, or how many there were – a single refugium or multiple refugia. 28 It has been suggested that the area just outside the Red Sea, the Gulf of Aden, might have been isolated from the rest of the Indian Ocean during the LGM and thus acted as a refugium where the Red Sea population could exit into (DiBattista et al. 2016a). As a relatively restricted body of water itself, it is possible that conditions in the Gulf of Aden during the LGM changed sufficiently that only Red Sea populations, which were better adapted to more extreme environmental conditions, were able to persist there and local Indian Ocean populations were locally extirpated. As sea levels began to rise after the LGM, increasing connectivity and enabling the Red Sea reefs to recover, the Red Sea populations could have re-entered the Red Sea. In other words, the populations that we now see in the Red Sea might have been located at its mouth during the LGM. A difficulty with testing this hypothesis is that, when considering cosmopolitan species such as those I investigated in the previous chapter, any remnant of this refugium would have by now mixed with other Indian Ocean populations that were able to move back into the Gulf of Aden, thus confounding any genetic signal of a previous refugium. This could represent a natural process by which beneficial adaptations that may have moved out of the Red Sea have since been lost and are restricted once again to the more extreme environment of the Red Sea. Dascyllus marginatus represents a model species to be able to study the location of Red Sea population refugia during the LGM, given its restricted geographic range (Randall and Allen 1977) (Figure 1) compared to many other cosmopolitan species. Two such wider-ranging examples include congeneric species Dascyllus abudafur and Dascyllus trimaculatus, whose Red Sea populations were shown in the previous chapter to have persisted during the LGM and suffered a minor genetic bottleneck. This pattern is likely shared by D. marginatus, given previous work providing mitochondrial DNA evidence from other cosmopolitan species of their persistence in the Red Sea during the LGM (DiBattista et al. 2013). Sometimes considered a Red Sea endemic, D. marginatus is actually found in the Red Sea, the Gulf of Oman, and the coast inbetween (Randall and Allen 1977) (Figure 1), though the evolutionary origin of the species is not known. This restricted distribution beyond the putative refugium outside the Red Sea limits possible confounding effects of additional populations in the Indian Ocean. It can be classed as a cosmopolitan species but with limited populations outside the Red Sea lessening the genetic impact on any possible external Red Sea refugium. Thus, I can use this model species to study the location of Red Sea population refugia, whether inside or outside the Red Sea, during a period of environmentally- induced habitat loss during the LGM. After this is revealed, I can further study the demographic history of the species more accurately. Reconstructing the demographic parameters of the geographic split between basins could reveal the evolutionary origin of D. marginatus. Studying the Red Sea population specifically will enable a comparison of the impact of, and recovery from, large 29 scale habitat loss for a restricted-range species, D. marginatus, and for wider-ranging species studied in the previous chapter: congeneric species Dascyllus abudafur and Dascyllus trimaculatus, and a species from a different family and with a different ecology, the angelfish Pomacanthus maculosus. Figure 1. A map showing the approximate distribution of Dascyllus marginatus from the Red Sea to the Gulf of Oman, represented by the pink line. To determine the location of Red Sea refugia during the LGM, I will reconstruct the population dynamics for several alternative demographic scenarios using genomic data from D. marginatus, from inside and outside the Red Sea. As shown in the previous chapter, the large number of markers provided by RADseq autosomal loci, coupled with an Approximate Bayesian Computation framework, were able to provide a detailed picture of the effect of habitat loss during the LGM ~20k years ago. I will use the same methods here, providing sufficient power for the complex demographic inference necessary to consider these alternative scenarios. After confirming the persistence of the Red Sea population of D. marginatus during the LGM, the main aim of this study is to determine the location of the refugia, inside or outside the Red Sea. Additionally, I also aim to determine the evolutionary origin of D. marginatus, and identify any differences in the demographic history of the Red Sea population of D. marginatus to that of wider-ranging congeneric species. 30 2.3 Materials and Methods Sample collection, single nucleotide polymorphism (SNP) library preparation, and sequencing (work completed by collaborators, Dr. Vanessa Robitzch (Universidad Austral de Chile, Chile) and Dr. Tilman Alpermann (Senckenberg Research Institute and Natural History Museum, Germany)) Dascyllus marginatus samples were collected by Dr. Vanessa Robitzch and Dr. Tilman Alpermann, using SCUBA or snorkelling between 2009 and 2015. Dr. Vanessa Robitzch prepared them for sequencing. Genomic DNA was extracted from fin or gill tissue preserved in 96 % ethanol using a Nucleospin-96 Tissue kit (Macherey-Nagel, Germany). Double digest restriction associated DNA (ddRAD) libraries were prepared using 500 ng of high-quality DNA per sample following the protocol described by Peterson et al (2012), with some modifications. The libraries, containing single end reads, were run on a HiSeq 2000 Illumina sequencer (see Supplementary information for more details). Power analysis, de-novo loci assembly and filtering For the initial test of the time of the splitting of the Red Sea population from the Gulf of Aden, Indian Ocean, and Gulf of Oman (hereafter referred to as ‘outside populations’), 15 samples (to limit computation time) were selected from each of the Red Sea and outside populations (“time of split” dataset), determined to be sufficient following a power analysis. One thousand targets were randomly selected from 10,000 simulations generated for the selected demographic model for the Red Sea population of D. marginatus (90 haploid individuals, 2,530 loci of 95 base pairs length) to be used as pseudo-observed data (pods). The resulting population parameter estimates were cross- validated with the original parameter values used to generate the pods. Scatterplots for the cross validation of the estimated mode and median with the original simulated data show accurate estimates for all parameters (Figure S1 in Supplementary information) and low error values (Table S1 in Supplementary information). This demonstrates adequate power in our analyses for parameter estimation with these sample numbers. Where possible, numbers were balanced across sample sites within the basins, and samples were randomly selected if there was an excess. Samples from the southern Red Sea were not included given demonstrated examples of population structure associated with distinct environmental conditions (Nanninga et al. 2014; Giles et al. 2015). For testing the most likely location of Red Sea refugia, and possible origin of D. marginatus, 15 samples were used from each of the North/Central Red Sea, the South Red Sea, and the Gulf of 31 Aden (”location of refugia” dataset, Figure 2 and Table 1). Sequences were demultiplexed and filtered for quality using the 'process_radtags.pl' pipeline in STACKS v.2.53 (Catchen et al. 2011), run up to the creation of the bam files. Resulting bam files were filtered for a minimum coverage of eight using samtools v 1.9 (Danecek et al. 2021), before continuing with the ‘gstacks’ and ‘populations’ components of the ‘denovo_map.pl’ pipeline(see Supplementary information for further details on de-novo loci assembly and loci filtering). Figure 2. A map of sampling locations and sample sizes in the Red Sea, Indian Ocean, and Gulf of Oman for Dascyllus marginatus. Black filled circles represent sites and sample sizes that are utilised in both datasets in this study, white circles represent sites that are only used in the “time of split” dataset, and grey circles represent sites or sample sizes that are used only in the “location of refugia” dataset. 32 Table 1. Dascyllus marginatus sampling site information and sample sizes for two different datasets: A) to determine the time of the split of Red Sea and outside populations, and B) to determine the most likely location of Red Sea refugia during the Last Glacial Maximum. Region Location lat/long DDM Co-ordinates Datasets A Time of split B Location of refugia North Red Sea Haql, Gulf of Aqaba, Saudi Arabia 29.25, 34.93 N 29°15′ E 34°56′ 4 4 North- Central Red Sea Jazirat Burcan, Saudi Arabia 27.9, 35.05 N 27°54′ E 35°03′ 7 7 Thuwal, Saudi Arabia 22.28,39.07 N 22°17′ E 39°04′ 1 1 Al-Lith, Manilla Bay, Saudi Arabia 20.13, 40.1 N 20°08′ E 40°06′ 3 3 South Red Sea Farasan, Saudi Arabia 16.62, 41.93 N 16°37′ E 41°56′ 15 Gulf of Aden Magateen, Yemen 13.4,46.37 N 13°24′ E 46°22′ 5 15 Indian Ocean Socotra, Yemen 12.6, 54.35 N 12°36′ E 54°21′ 5 Persian Gulf Musandam, Oman 26.2,56.3 N 26°13′ E 56°17′ 5 Total: 30 45 Summary statistics A folded 2-dimensional site frequency spectrum (2D SFS) was generated for the “time of split” dataset, and three separate 2D SFS were generated for the “location of refugia” dataset; three pairwise combinations for three populations) using a modified series of functions from the vcf2sfs R script v 2.0 (Liu et al. 2018). The ‘populations’ component of the ‘denovo_map.pl’ pipeline (introduced in the previous paragraph) was used to select loci present in at least 90% of individuals in both the Red Sea and Indian Ocean populations. Any missing data were imputed based on frequencies observed in other individuals. Monomorphic site counts were excluded. Summarising each 2D SFS, the between-population nucleotide diversity (π) was also used as an additional summary statistic. 33 Model selection and parameter estimation The population split time was tested first, followed by analysis to determine the location of the refugia. None of the demographic models in Figure 3 assume a place of origin for D. marginatus, though this will later be discussed following analysis of some of the posterior parameter estimations. The demographic model in Figure 3A depicts a simplified representation of the splitting of the Red Sea and outside populations of D. marginatus at time ‘tanc_A’ - the prior range of which stretches from before to after the LGM. I then considered three alternative scenarios representing the demographic history of the Red Sea population of D. marginatus assuming the persistence of a Red Sea population since before the LGM, to be confirmed with the intial test of the population split time. A power analysis was also completed to ensure that the “location of refugia” dataset is sufficient to conduct a successful model selection analysis. One thousand targets were randomly selected from 10,000 simulations generated with the “North” model (Figure 3Bi), to be used as pseudo-observed data (pods) and tested with the model selection analysis. No pods were incorrectly assigned to either of the alternative models, and all 1,000 were correctly assigned to the “North” model that generated them, within a 70% confidence threshold (Table S2 in Supplementary information). A total of 830 pods out of the 1,000 were correctly assigned to the “North” model within an 80% confidence threshold (Table S2 in Supplementary information). Thus, this dataset is sufficient for accurate model selection. The three models in Figure 3B vary in the location of refugial Red Sea populations but all have at least one refugium in the North Red Sea: i) a single population in the north, ii) a single population in the north and one outside the Red Sea in the Gulf of Aden, and iii) two populations in the Red Sea - one in the north and one in the south. All models in Figure 3B start with an initial effective population size ‘Nanc’ splitting into Gulf of Aden and Red Sea populations at time ‘tanc’, with immediate population expansions from effective population sizes ‘Nsplit0’ or ‘Nsplit1’, respectively, to ‘Npre’. The ratio between expected population sizes prior to and following these intial expansions are termed ‘resize0’ and ‘resize1’ for the Gulf of Aden and Red Sea, respectively. Migration ‘mig1’ occurs between the two basins. The models then begin to differ at the LGM. The “North” and “North/Gulf” models (Figure 3Bi and 3Bii) assume a single refugial population inside the Red Sea during the LGM. A separate population in the south splits from the north following population recovery after the LGM in the “North” model, or forms following recolonisation from a refugial population in the Gulf of Aden immediately after the LGM in the “North/Gulf” model. The 34 “North/South” model (Figure 3Biii) differs in that it assumes two refugial populations in the Red Sea during the LGM, with both suffering genetic bottlenecks and subsequently recovering. The “North” and “North/Gulf” models (Figure 3Bi and 3Bii) represent a genetic bottleneck at time ‘tbott’ during which there is no migration and which reduces the effective population size from pre-LGM estimates of size ‘Npre’ to estimates of size ‘Nbott’ in a refugium during the LGM – the resize ratio is termed ‘resizebott’. The “North” model then sees migration between the two basins ‘mig1’ return following the LGM, with population recovery in the Red Sea starting at time ‘trec’, continuing for the period ‘tleng’, and with the growth rate ‘gr’. Following this recovery period, at time ‘tstop’, the South Red Sea population splits from the north, with migration ‘mig0’ then occuring between the Gulf of Aden and the South Red Sea and between the South Red Sea and the North Red Sea. Alternatively, the “North/Gulf” model represents a recolonisation from the Gulf of Aden into the South Red Sea ‘Nbott2’ following the LGM at time ‘trec’, at the same time as the persisting North Red Sea population begins to recover, with migration ‘mig0’ returning as above. Population growth continues for the period ‘tleng’, with growth rates of ‘gr1’ and ‘gr2’ in the North Red Sea and South Red Sea populations, respectively. Meanwhile, the “North/South” model (Figure 3Biii) represents two Red Sea bottlenecks at time ‘tbott’ during which there is no migration and which reduces the effective population size from pre-LGM estimates of size ‘Npre1’ or ‘Npre2’ to estimates of size ‘Nbott1’ or ‘Nbott2’ in northern or southern refugia, respectively, during the LGM – the resize ratios are termed ‘resizebott1’ or ‘resizebott2’. Migration ‘mig0’ returns following the LGM, with population recovery starting at time ‘trec’, continuing for the period ‘tleng’, and with growth rates of ‘gr1’ or ‘gr2’ in the North Red Sea and South Red Sea populations, respectively. For all three models, current effective population sizes are estimated for the Gulf of Aden ‘Npop0’, the South Red Sea ‘Npop2’ and the North Red Sea ‘Npop1’ (Figure 3B; for parameter priors see Table 2). These models, though simplified, provide a fair representation of likely demographic histories, given they are based on a mostly enclosed system with a well studied history supported by geological and climatic data. 35 36 Figure 3. Demographic models representing the demographic history of Dascyllus marginatus since before the Last Glacial Maximum (LGM). They represent A) a simplified model to test the time of the splitting of the Red Sea and outside populations and B) three likely demographic scenarios for the Red Sea population, assuming persistence in the Red Sea and a genetic bottleneck during the LGM, with alternative theories for the location of refugial populations: i) a single population in the north, ii) a single population in the north and one outside the Red Sea in the Gulf of Aden, and iii) two populations within the Red Sea - one in the north and one in the south. Most population parameters are labelled in B) i) but where these differ in the other models, additional labels are included. Table 2. The range of prior values and definitions for the population parameters of four demographic models (Figure 3) representing the demographic history of Dascyllus marginatus since before the Last Glacial Maximum. Units include: ‘kya’ thousand years ago, ‘ya’ years ago, ‘k’ thousand individuals. Parameter name Definition Prior range “time of split" model tanc_A Number of years since splitting of Red Sea and outside populations 11.5kya – 100kya “location of refugia” models: “North” “North/Gulf” “North/ South” Nanc Effective population size prior to splitting of Red Sea and Gulf of Aden populations ~50k – ~800k tanc Number of years since colonisation of Red Sea 40kya – 200kya Nsplit0 Effective population size in Gulf of Aden after first population split ~50k – Npop0 resize0 Ratio of effective population size change in Gulf of Aden after initial epansion Nsplit0/Npop0 Nsplit1 Effective population size in Red Sea after first population split ~50k – Npre resize1 Ratio of effective population size change in Red Sea after initial expansion Nsplit1/Npre Npre Effective population size in Red Sea after initial expansion ~50k - ~300k tpop Number of years since splitting of South Red Sea and Gulf of Aden / of South Red Sea and North Red Sea trec +1 tbott +1 tbott Number of years since a genetic bottleneck at LGM 24kya – 36kya Npre1 Effective population size in North Red Sea after Red Sea population split, prior to bottleneck at LGM ~50k – Npre resizebott/1 Ratio of effective population size change in North Red Sea at bottleneck Npre / Nbott Npre1 / Nbott1 37 Nbott/1 Effective population size of bottleneck in North Red Sea 10k – Npre 10k – Npre1 Npre2 Effective population size in South Red Sea after Red Sea population split, prior to bottleneck at LGM ~50k – Npre resizebott2 Ratio of effective population size change in South Red Sea at bottleneck Npre2 / Nbott2 Nbott2 Effective population size of bottleneck in South Red Sea / of recolonizing population in the South Red Sea from the Gulf of Aden after the LGM 10k – Npop2 10k – Npre2 trec Number of years since start of population recovery 11.5kya – 15kya tleng Number of years of population recovery trec - tstop gr/1 Growth rate during population recovery in North Red Sea (log(Nbott/1 / Npop1))/tleng gr2 Growth rate during population recovery in South Red Sea (log(Nbott2 / Npop2))/tleng mig1/0 Proportion of individuals per year migrating between 1) Gulf of Aden and Red Sea or later 0) Gulf of Aden and South Red Sea, and South Red Sea and North Red Sea 0.1x10-6 – 0.35x10-3 tstop Number of years since reaching current effective population size 5kya – (trec-1) Npop0 Current effective population size in Gulf of Aden ~50k - ~300k Npop1 Current effective population size in North Red Sea Nbott1 - ~300k Npop2 Current effective population size in South Red Sea, split from the North Red Sea after the population recovered Nbott1 – Npop1 Current effective population size in South Red Sea, recolonized from the Gulf of Aden after the LGM ~50k - ~300k Current effective population size in South Red Sea, recovered from genetic bottleneck Nbott2 – ~300k The prior range for some of the parameters was determined based on previously described assumptions, knowledge of the LGM in the Red Sea, and limitations of the analyses. For example, as it was not assumed that D. marginatus persisted in the Red Sea during the LGM, ‘tanc_A’ of the initial time of split model needed to cover time periods both before and since the LGM. It was also unlikely to have been more recent than 11.5kya as the environment has been relatively stable since then. Once it was confirmed that D. marginatus likely did persist in the Red Sea during the LGM like the three species in my first chapter, the colonisation of the Red Sea ‘tanc’ was set prior to the LGM at a minimum of 40kya. The maximum, 200kya, extends beyond the previous interglacial 38 period 115-130kya, but not further as it is a difficult paramter to estimate past a certain point, and does not impact the rest of the model too much once a more recent bottleneck occurs. The initial split, expansion, and associated extra parameters, ‘Nsplit0’, ‘resize0’, ‘Nsplit1’, and ‘resize1’, mean that no assumption of the source and sink population, or the geographic origin of D. marginatus, had to be made. The number of years since the splitting ‘tpop’ of local populations in the “North/Gulf” and “South/Gulf” models was set at either ‘trec+1’ or ‘tbott+1’, to occur just prior to the ‘trec’ or ‘tbott’ events. The number of years since the start of the genetic bottleneck ‘tbott’ was set with a minimum of 24kya so the bottleneck occurred by the time of the LGM, and no older than 36kya - a realistic maximum if the populations were impacted by the LGM specifically. The effective population size of the bottleneck ‘Nbott’ was set with a maximum equalling the pre-LGM effective population size in the Red Sea ‘Npre’, so it was possible for the population to be stable but not to grow. The ratio of effective population size change ‘resizebott’ was computed by dividing the pre-LGM effective population size in the Red Sea by the effective population size of the bottleneck, for each simulation rather than from estimates from within the other distributions. The larger the ratio, the greater the population size change. The similar sets ‘Npre1’, ‘resizebott1’, and ‘Nbott1’, and ‘Npre2’, ‘resizebott2’, and ‘Nbott2’ in the “North/South” model represent similar bottlenecks but in 2 locations in the Red Sea, and so required their own maxima and minima for the effective population sizes. The number of years since the start of the population recovery ‘trec’ also had a well defined range, as it is unlikely that the environment started to become more favourable before 15kya, and the climate has been relatively stable since ~11.5kya. The number of years of population recovery ‘tleng’ was simply defined by the difference between the start and end of population growth, ‘tstop’, which was defined so that it could be as little as 1 year after it started. The growth rate ‘gr’ during population recovery was computed by dividing the effective population size at the bottleneck by the current effective population size in the North Red Sea, and dividing that by the length of the recovery period. Both the “North/Gulf” and “North/South” models required two sets of growth rates, ‘gr1’ and ‘gr2’ to model population growth in the North and South Red Sea. Given that the simulations run backwards in time, a negative growth rate between the present day and the genetic bottleneck represents population recovery since the bottleneck. Lastly, similar to the relationship between ‘Nbott’ and ‘Npre’, the minimum current effective population size in the North Red Sea, ‘Npop1’, was equal to ‘Nbott/1’, so that the population could remain stable and not recover, but would not eperience any futher decrease in size. The current effective population size in the South 39 Red Sea, ‘Npop2’ was defined differently for each model depending on the source of the population. The remaining parameters were tested with larger range estimates, and finetuned based on the shape of the posteriors. Model selection, and subsequently parameter estimation for the most supported model, was performed using an ABC framework via random forests (R package ‘abcrf’ v 1.8.1), a machine learning tool which enables greater power and robustness against the choice of summary statistics used (Pudlo et al. 2016). Individual simulations were run using fastsimcoal2 (fsc25 v 2.5.2.21, Excoffier et al. 2013), using a sample size of 30 for each population (2N due to simulating haploid individuals), loci lengths of 95 (loci length was rounded to nearest multiple of five), and number of loci from 2,530 – 4,031 based on the observed data (for more details on generating model simulations, see Supplementaty information including Figure S2). Times are presented in years but were simulated in generations, so prior ranges (Table 2) were adjusted based on a generation time estimate of 2 years for D. marginatus (based on size/genus as in Crane et al. (2018), and Delrieu- Trottin et al. (2020)). Though an empirical estimate for the generation time for the species in this study is unknown and thus could vary from the above, this should have limited impact on the main points of this study – the location of refugia inside or outside the Red Sea, and the ratio between pre- and post-LGM population sizes. Most mutation rates for fishes have been estimated at 1-2% per million years, and here I settled on the most commonly used rate in the literature: 1 x 10-8 per site per generation (Delrieu-Trottin et al. 2020). A minor allele 2D SFS and between-population π were used as summary statistics for the “time of split” model with two current populations, and three lots of minor allele 2D SFS’ and three lots of between-population π were used for the “location of refugia” models with three current populations. I ran 10,000 simulations for each species. For model selection to test the location of refugia, the forest is trained to distinguish between the three scenarios based on the summary statistics and then it is used to classify the observed data. Similarly, for parameter estimation, a forest is trained for each demographic parameter based on the summary statistics generated during the simulations; the posterior distribution of each parameter is then inferred based on the observed data. In both instances, the number of trees was set to 1,000 (Pudlo et al. 2016). For ease of reading, I reported the results using the median, or the measure of central tendency of the posterior distributions with the lowest error according to the power analysis. The 0.025 and 0.975 quantiles for the posterior parameters can be found in Table S3 in the Supplementary information. 40 2.4 Results Data summary Following sample filtering, de-novo loci assembly (STACKS v 2.53), and loci filtering, the number of loci retained for D. marginatus ranged from 4,031 for the “time of split” model to 2,530 for the “location of refugia” models, rounded to 95 base pairs (bp) in length. The number of loci varied due to the different number and choice of samples selected for these two datasets – 30 between the Red Sea and outside populations for the “time of split”, and 45 between the Gulf of Aden, North Red Sea, and South Red Sea for the “location of refugia”. Thus, a different number of loci were shared across individuals in these two datasets. The loci corresponded to 21,032 SNPs for testing the initial split time between the Red Sea and outside populations, and 15,246 SNPs for testing the “location of refugia” models. The number of SNPs and their general distribution across the spectra can be seen in plots of the imputed, folded 2-dimensional site frequency spectra in Figure 4. The computed between-population π was 0.0065 for the “time of split” dataset, between Red Sea and outside populations. For the “location of refugia” dataset, the computed between- population π was 0.0052 between the Gulf of Aden and North Red Sea, 0.0056 between the Gulf of Aden and South Red Sea, and 0.0042 between the North Red Sea and South Red Sea. 41 Figure 4. Heat maps of the imputed, folded 2-dimensional site frequency spectra for Dascyllus marginatus chromosomes from (A) the Red Sea and outside populations from the “time of split” dataset, or from left to right along the bottom: (B) the Gulf of Aden and North Red Sea, (C) the Gulf of Aden and South Red Sea, and (D) the North Red Sea and South Red Sea from the “location of refugia” dataset. Each square in the plots represents a SNP count present in a particular number of chromosomes from each of the two populations, which can give insights into the demography. All the spectra have a relatively similar pattern, with higher numbers of rare alleles which can indicate large populations or a recent expansion. The North Red Sea and South Red Sea plot on the right appears to have a lower frequency of common private alleles (those present in many chromosomes in one population and zero chromosomes in the other) than the others which could suggest higher migration levels. Monomorphic SNPs were not included in the dataset. ABC with random forests Before estimating population parameter values, the observed data needed to be compared with the simulated data to confirm that the model could recover the observed summary statistics. I tested how well the model fit the data by plotting a histogram of observed and simulated between- population π (Figure 5). 42 Figure 5. Histograms of simulated data generated with the models in Figure 3 for Dascyllus marginatus, with the observed data overlaid in red. The models include (A) the “time of split” model, (Bi) the “North” model, (Bii) the “North/Gulf” model, and (Biii) the “North/South” model. The observed data is well within the range of simulated data for all four models. In addition to ensuring that the number of samples, number of loci, and summary statistics provide sufficient information to perform a successful model selection and be able to accurately estimate population parameter values, the power analysis was also used to determine the most appropriate way to summarise the posterior distributions by identifying the measure of central tendency with the lowest error. As discussed previously, error values were low in general, but the median had the most consistently low error values across parameters (Table S1 in Supplementary information) so was used to report the parameter estimates below. Under the simplified “time of split” model, the estimate of the split time between the Red Sea and outside populations ‘tanc_A’ for D. marginatus was ~90 thousand years ago (kya) – prior to the LGM. This confirmed that D. marginatus in the Red Sea survived in refugia during the LGM, the location of which was then tested by considering three different demographic scenarios: the “North”, “North/Gulf”, and “North/South” models, with the “location of refugia” dataset. 43 The key result of the model selection suggested that a refugium in the Gulf of Aden (in addition to the Red Sea) was not likely, with only 17% trees supporting the “North/Gulf” model. Distinguishing between the other two models was more difficult, with 51% of trees supporting the “North” model and 32% trees supporting the “North/South” model (Table S4 in Supplementary information). This could have been expected given the similarities between the “North” and “North/South” models – the South Red Sea population split from the rest of the Red Sea in both cases, whether following post-LGM recovery in the “North” model or prior to the LGM in the “North/South” model. However, as the “North” model did receive the most support in the model selection (51%), I opted to recover the posterior distributions of the demographic parameters for this model. These estimates enabled us to study the history of the Red Sea population during the LGM and attempt to place the origin of D. marginatus in the Red Sea or Gulf of Aden. The original effective population size ‘Nanc’ was estimated to be ~250k prior to the population split between the Red Sea and Gulf of Aden, but there was no clear signal to narrow down the time of the split ‘tanc’ . Following the split, effective population sizes were estimated to start at ~120k for the Gulf of Aden ‘Nsplit0’, and ~110k for the Red Sea ‘Nsplit1’. Assuming immediate re-expansions to the current effective population size in the Gulf of Aden ‘Npop0’, and to the effective population size in the Red Sea prior to the LGM ‘Npre’, both were estimated to be ~275 - ~280k, similar to ‘Nanc’. The pre- to post- expansion population size ratios were ~0.5 for the Gulf of Aden ‘resize0’ and ~0.3 for the Red Sea ‘resize1’, translating to an approximate doubling of the Gulf of Aden population and tripling of the Red Sea population. A small genetic bottleneck then occurred around the time of the LGM which affected the estimates of more recent population parameters. There was no clear estimate for the exact time of the bottleneck ‘tbott’ within the relatively narrow prior range. This bottleneck appeared to have reduced effective population size in the North Red Sea ‘Nbott’ to ~235k individuals, with a pre- to post- bottleneck population size ratio of ~1.3 ‘resizebott’. When considered as a percentage (100/1.3), this translated to an effective population size estimate of ~77% of pre-LGM estimates, and the population was able to recover (see Figure 6, and Table S3 in Supplementary information). 44 Figure 6. Plots for the posterior distributions following population parameter estimation for the modelled demographic history, also presented above, of Red Sea populations since before the Last Glacial Maximum (LGM) for Dascyllus marginatus, using an Approximate Bayesian Computation framework coupled with random forests methods. The demographic model covers the persistence in a refugium in the North Red Sea and a genetic bottleneck during the LGM, the following population recovery, and later separation of the South Red Sea. 45 Looking at post-LGM recovery, our results indicated that population recovery ‘trec’ began ~13kya. With a growth rate ‘gr’ of ~0.3 x 10-4 per year, the period of recovery ‘tleng’ lasted for ~5k years. The apparent negative growth rate demonstrated population growth as it was modelled back in time from the present day. Population recovery stopped ‘tstop’ ~9kya, and the population in the South Red Sea separated. Migration between the Gulf of Aden and the Red Sea ‘mig1’ or between the Gulf of Aden and South Red Sea and between the South Red Sea and the North Red Sea ‘mig0’, when sea levels allowed, was estimated at ~0.03 x 10-4 of all individuals per year. Lastly, current effective population sizes for the North Red Sea ‘Npop1’ and South Red Sea ‘Npop2’ were similar with estimates of ~245 - ~250k to those of ‘Nanc’, ‘Npre’, and ‘Npop0’, with estimates of ~250- ~280k (see Figure 6 and Table S3 in Supplementary information for details). 2.5 Discussion After confirming the persistence of the Red Sea population of D. marginatus during the LGM, I considered three possible demographic scenarios to determine the location of the refugium/a. I found that the Gulf of Aden was unlikely to have acted as an LGM refugium for populations of D. marginatus in the Red Sea; rather this species survived inside the Red Sea during the LGM, most likely only in the north. Furthermore, population parameter estimates suggest that D. marginatus in the Red Sea only suffered a mild genetic bottleneck during the LGM. It was not possible to determine the evolutionary origin of D. marginatus based on the initial expansions after the splitting of the Red Sea and outside populations. The most important inference from my model selection is that the refugium was likely not located in the Gulf of Aden. This result argues against a situation where the Gulf of Aden was isolated from the rest of the Indian Ocean during the LGM, acting as a refugium for the Red Sea population, at least for D. marginatus. As the most likely location of the refugium during the LGM according to my model selection, the North Red Sea is clearly an important region to protect, not only based on its history as a habitat loss refugium in the high salinity conditions during the LGM, but also as a modern refugium from mass coral bleaching. Corals in the Gulf of Aqaba in the north have a high bleaching threshold and so are less susceptible to temperature stress. Average sea surface temperatures are lower in the Gulf of Aqaba in the north than in the South Red Sea, but the higher temperatures in the south may have acted as a selective filter for heat-tolerant coral larvae colonising the Red Sea and expanding northwards (Fine et al. 2013). Thus, the Gulf of Aqaba may be a reef refugium. Elsewhere in the world, there have been other examples of in-situ refugia on 46 coral reefs, including areas of high currents off the north-east of Mexico affording reefs a decrease in acute thermal stress (Chollett and Mumby 2013). South Atlantic reefs have experienced lower heat stress and have many natural features that appear to have protected them from coral bleaching including a deeper distribution, a high tolerance for turbidity and nutrient enrichment, the dominance of more robust, massive growth forms, and flexible symbiotic associations as generalists. Thus, Brazilian reefs seem resistant to mass bleaching events and may likely represent a major refugium (Mies et al. 2020). Lessons can be learned here, in that conservation efforts should focus on regions with unique environmental characteristics as potential climate-change refugia on coral reefs. It was not possible to estimate whether the evolutionary origin of D. marginatus was in the Red Sea or outside, though this was not a primary aim of this chapter. I studied the estimated population parameters from the selected “North” demographic model. Given that the three demographic models considered were specifically designed to test the location of refugium/a, the outside population only represented the Gulf of Aden, and not populations from the southern coast of Oman or the Gulf of Oman. However, on studying the estimated effective population sizes immediately following the population split between the Red Sea and the Gulf of Aden, a significantly stronger signal of population expansion in one population than the other would have perhaps suggested a colonisation event and the expansion of a new population, split from the population of origin. In fact, both expansions following the population split between the Red Sea and the Gulf of Aden were similar (I detected a doubling or tripling of the respective populations after the split), so the origin is not clear based on this evidence. However, there is other evidence which could suggest a Red Sea origin, though I do not test the theory further in this study. The Red Sea has a lot of endemics (reviewed in DiBattista et al. 2016b; Bogorodsky and Randall 2019), whereas there are not so many in Oman (refer to Fish Base: Froese and Pauly 2022). Dascyllus marginatus is also not present in the Arabian-Persian Gulf which is connected to the Gulf of Oman in the north (Randall and Allen 1977). Lastly, along the south coast of Oman in the western Arabian Sea, summer sea surface temps have increased since the 1940s (Piontkovski and Chiffings 2014). Elsewhere, increased water temperatures have led to range expansions (Sadowski et al. 2018), so it is possible that the same happened here in D. marginatus, moving out of the Red Sea and along the Omani coast as conditions allowed. This is an interesting possibility, as more Red Sea endemics may end up undergoing range expansions as temperatures continue to increase. The magnitude of the genetic bottleneck during the LGM is similar in D. marginatus as in D. abudafur and D. trimaculatus. Thus, the population response to climate-induced habitat loss did 47 not appear affected by geographic ranges of the Indian Ocean populations. Here, I will discuss where population parameters varied between the species. The time of the population split, ‘tanc’, was much earlier in D. marginatus than in D. abudafur and D. trimaculatus (similar to P. maculosus which has the smallest range of the three species studied in the previous chapter). If D. marginatus did originate in the Red Sea as discussed earlier (and the same could be true of P. maculosus), it is possible that cosmopolitan species that originated in the Red Sea moved out early on, prior to the last interglacial period ~115 - ~130kya, whereas cosmopolitan species that originated in the Indian Ocean moved into the Red Sea more recently – i.e. D. abudafur and D. trimaculatus (though still prior to the LGM). Migration levels in D. marginatus were low, similar to D. abudafur (and P. maculosus). Though none of the species studied here were directly tested in Fishers’ U-crit (maximum larval swimming speed) study (2005), I suggested in the previous chapter that D. abudafur may have an uncharacteristically low U-crit score for a Pomacentrid, as the sister species of Dascyllus aruanus which scored very low. Dascyllus marginatus, the focus of this chapter, has been considered in the same complex as Dascyllus reticulatus within the Dascyllus genus (Godwin 1995), another species which was included in the study. Of 28 Pomacentrid species tested, only 6 had a lower average U- crit score than D. reticulatus. This suggests that the related D. marginatus may too have an uncharacteristically low U-crit score for a Pomacentrid, like D. abudafur in the previous chapter. This may explain why the migration levels for D. marginatus estimated in this chapter are again more similar to P. maculosus than to congener D. trimaculatus, despite Pomacanthids scoring lower than Pomacentrids on average (Fisher 2005). It seems that outside migrants enter the Red Sea in small numbers regardless of geographic range in the Indian Ocean, likely meaning migration did not really have any major effect on recovery. Despite similar migration levels to D. abudafur, the length of the recovery period, ‘tleng’, was several hundred years longer for D. marginatus than estimated for D. abudafur (though the latter estimate may be hindered by an uncertain estimate for ‘trec’ – start time of recovery). Pomacanthus maculosus had a longer recovery period, probably due to a slower life history often associated with a larger body size (Promislow and Harvey 1990). The smaller geographic range of D. marginatus in the Indian Ocean may suggest that they are specialised in their diet, habitat or environmental preferences – all of which may have lengthened their recovery period in the Red Sea. However, this is not explicitly tested and there are other processes that may be restricting their geographic range in the Indian Ocean including constraints on migration away from Oman. Incidentally, of the species studied between this chapter and the last, only D. trimaculatus has a 48 distribution that includes both Oman and additional locations in the Indian Ocean, which may only be possible with slightly higher migration levels as estimated for this species in the previous chapter. In the Red Sea at least, the specific ecology of a species appears more important in mediating the response to, and recovery from, climate-induced habitat loss than either migration from an external population or the geographic range of an external population. 2.6 Conclusion Overall, this study highlights the capacity for in-situ coral reef refugia to maintain a reef fish species population during periods of environmentally-induced habitat loss, allowing for subsequent recovery when conditions became more favourable. I provide evidence that this refugium was unlikely to have been outside of the impacted Red Sea basin, but rather located in the North Red Sea. I then compare the demographic history of the more range-restricted D. marginatus to previous results from wider-ranging congeneric species. Dascyllus marginatus demonstrated a similar population response and population parameter estimates underpinning the genetic bottleneck to D. abudafur and D. trimaculatus, being able to recover from a mild genetic bottleneck in the Red Sea of ~77% of pre-LGM levels. Thus, the limited geographic range of the external population did not impact the population response to, or recovery from, environmentally-induced habitat loss. I discussed how different aspects of the ecology of each species may have had an impact on a few notable differences in the recovered past demographies. Though this is a case study involving a single species, the ideal study design provides evidence for the argument that coral reef refugia were located in the Red Sea during the LGM. Given the importance of these refugia in supporting reef fish species’ populations during an environmental crisis, this suggests that we should direct resources towards identifying and protecting potential climate-change refugia in other regions that have already suffered, or are soon likely to suffer, coral reef habitat loss due to global climate change. 49 50 CHAPTER 3 Title The blacktip reef shark, Carcharhinus melanopterus, had a similar population response as several teleost reef fish species in the Red Sea during the Last Glacial Maximum 3.1 Abstract Reef sharks are directly dependent on coral reef habitat and have an important functional role as high-level mesopredators on coral reefs. Thus, continued coral reef habitat loss in the face of increasing sea surface temperatures not only threatens reef shark populations, but also has the potential to drastically alter the balance of whole coral reef ecosystems. Particular elements of their natural history make sharks vulnerable to climate change, such as late age at maturity, while others could make them resilient, like having a flexible diet. The lack of a planktonic larval phase limits connectivity among populations, but also means offspring are larger and mortality is lower. I studied the demographic history of Carcharhinus melanopterus in the Red Sea during the Last Glacial Maximum (LGM) to determine if the local population persisted during this period of environmentally-induced habitat loss, and, if so, to what extent the effective population size was impacted. Additional samples from populations across the Indo-Pacific also allowed me to contextualise the relative isolation of the Red Sea with respect to the rest of the species range. Targeted Gene Capture data from more than 1,000 loci was obtained and combined with an Approximate Bayesian Computation framework via a machine learning tool, to choose among alternative demographic models and to estimate population parameters. I compared the demographic history of C. melanopterus to that of teleosts studied in the previous two chapters, Pomacanthus maculosus, Dascyllus abudafur, Dascyllus trimaculatus, and Dascyllus marginatus, and revealed a similar population response. Population parameter estimates suggested a mild genetic bottleneck in the Red Sea as in the smaller, teleost reef fish species, with an effective population size estimate of ~63% of pre-LGM estimates from which the population was able to recover. I also show that the Red Sea still represents a barrier to gene flow in C. melanopterus that is stronger than some geographic distances of thousands of kilometres. My results provide evidence that a high-level mesopredator on coral reefs was able to locally persist and recover from an extended period of environmentally-induced habitat loss and, given the similarity in population 51 response, was likely subjected to similar pressures as potential prey species of smaller coral reef fishes. 3.2 Introduction The Red Sea represents a model system to study the impact of coral reef habitat loss on populations of coral reef fishes at spatial and temporal scales relevant to global climate change. It was effectively isolated from the Indian Ocean during the Last Glacial Maximum (LGM) due to low sea levels (Fairbanks 1989) preventing ocean mixing across the narrow, shallow Bab-el- Mandeb strait in the south (Siddall et al. 2003). This represented a physical barrier to species’ migration between the two basins. Geological evidence suggests the Red Sea suffered a severe loss of coral reef habitat (Sheppard et al. 1992) due to increased salinity (Biton et al. 2008), but, despite this, the population split times between the Red Sea and Indian Ocean for lots of cosmopolitan reef fish species pre-date the LGM (DiBattista et al. 2013), suggesting they persisted during this time. In the previous chapters, I have shown that several cosmopolitan species only suffered a mild genetic bottleneck during this period, though the causes of this limited impact of the LGM are still not clear. The Red Sea has a long history of relative isolation from the Indian Ocean, as highlighted by the large number of endemic species that are found in its basin (reviewed in DiBattista et al. 2016b). The Red Sea is known to harbour a unique environment with elevated temperatures and salinity compared to many other marine habitats (Edwards 1987; Kleypas et al. 2008; Trommer et al. 2009; Cantin et al. 2010; Fine et al. 2013; Chaidez et al. 2017). The southern part of the basin, in particular, is characterised by unusually high levels of primary productivity which affect other features of the environment such as levels of turbidity (Roberts et al. 1992; Raitsos et al. 2013). Thus, the entrance to the Red Sea can act as an environmental barrier to migration between the Red Sea and the Indian Ocean, even at times of elevated sea levels. So far, I only investigated the impact of the LGM on the Red Sea populations of small, teleost reef fish species which have small home ranges and limited movement as adults, namely several damselfishes and an angelfish (Green et al. 2015). Connectivity between populations of coral reef fishes mostly happens at the pelagic larval stage (Cowen et al. 2006; Jones et al. 2009). We do not know how larger, more motile species that travel and contribute to connectivity at the adult life stage, such as reef sharks, responded to the environmental crisis during the LGM in the Red Sea, and how migration across the Red Sea barrier compared when sea level allowed. 52 Species diversity has been suggested as a key characteristic that has enabled sharks, as a group, to survive multiple mass extinction events in the 420 million years since they evolved (Compagno 1990). Modern shark diversity likely arose following the end-Cretaceous extinction ~66 million years ago (Bazzi et al. 2018, 2021), with extant reef shark species having survived for at least tens of millions of years. Several aspects of sharks’ natural history suggest likely resilience to environmental change – they can be highly migratory (Weng et al. 2008; Klimley 2013; Sequeira et al. 2013), can take refuge in deeper waters (Guinot et al. 2013), can have flexible diets (Randall 1992; Klimley 2013), can repair DNA (Marra et al. 2019), they have strong immune systems (Litman 1996; Marchalonis et al. 1998; Luer et al. 2004), and their offspring lack a pelagic larval stage (Wourms and Lombardi 1992; Momigliano et al. 2015) which is susceptible to oceanic changes. However, despite being larger and more motile than most teleost reef fish species, reef sharks can still be directly dependent on coral reef habitat (Roff et al. 2016; Heupel et al. 2019), so are also vulnerable to the effects of environmentally-induced habitat loss. Sharks generally exhibit slower life histories too, featuring characteristics such as late age at maturity, large body size, low fecundity, and long gestation periods (Compagno 1990; Reynolds et al. 2001; Field et al. 2009), alternative aspects of their natural history which make them vulnerable to climate change. The lack of a pelagic dispersal stage may also compound on this vulnerability by limiting connectivity among populations (Momigliano et al. 2015). Thus, the degree to which reef sharks may have been impacted by the LGM in the Red Sea is not clear. In addition to recent and projected coral reef habitat loss (Pandolfi et al. 2003; Descombes et al. 2015), reef sharks have suffered from overexploitation for decades, massively impacting their populations. A recent study has given a damning overview of the extent of this damage across coastal tropical oceans, which should be of particular concern given their functional role as high- level mesopredators (Frisch et al. 2016; Roff et al. 2016) in coral reef ecosystems. MacNeil et al (2020) used baited remote underwater video systems (BRUVS) footage to provide a standardised index of relative shark abundance. They observed no sharks on nearly 20% of reefs surveyed and in more than 60% of 15,000+ BRUVS sets. Nearly 60% of nations had abundance scores of less than half their regional expectation – a shocking set of evidence for the global decline of reef sharks. Given their precarious status on present day coral reefs, it would be very valuable to determine how this group of reef fishes responded to a historic environmental crisis. Additionally, comparing levels of migration to that of smaller teleost reef fish species may show variation in the degree to which individuals are impacted by the Red Sea barrier. It has been proposed that reef fish larvae are particularly vulnerable to environmental variation (McLeod et al. 2013), which may have impacted levels of migration between the Indian Ocean and the Red Sea more so than in reef sharks. 53 Given its recognisable status as a coral reef resident (Heupel et al. 2019), I plan to use the blacktip reef shark, Carcharhinus melanopterus, to study how larger, more motile species responded to climate-induced habitat loss in the Red Sea during the LGM, compared to the smaller, teleost reef fishes studied in the first two chapters. The Targeted Gene Capture dataset comes from a previous study (Maisano Delser et al. 2019), which looks at the demography and colonisation history of C. melanopterus while testing whether range expansions can be recovered by combining spatial genetic modelling with metapopulation models. It includes 14 samples from each of the Red Sea and the Seychelles. An additional provision from this dataset is the inclusion of more sampling locations across the Indo-Pacific. This means that I can also assess how isolated the Red Sea population is within the context of the whole Indo-Pacific region, and thus how the Red Sea barrier compares to other barriers across the Indo-Pacific. Large barriers include upwelling at the boundary between the Gulf of Aden and the Indian Ocean (Kemp 2000), monsoon winds in the Indian Ocean (Lighthill 1969; Schott and McCreary 2001), the Indo-Pacific land barrier which has been exposed and covered at multiple points during the Pleistocene glaciations (Briggs 1974; Randall 1998; Benzie 2000), and several stretches of large geographic distances between islands in the Pacific Ocean (Briggs 1961; Planes and Fauvelot 2002). I start by confirming the presence of C. melanopterus in the Red Sea during the LGM by reconstructing the population dynamics for two alternative demographic scenarios using genomic data from C. melanopterus from inside and outside the Red Sea. As shown in previous work (Maisano Delser et al. 2016), the large number of autosomal, single-copy, coding DNA sequence markers isolated with Targeted Gene Capture, coupled with an Approximate Bayesian Computation framework, were suitable for accurate model selection. Here, I will follow the same approach, providing sufficient power for the complex demographic inference necessary to consider these alternative scenarios and then to analyse the demographic parameters to provide a detailed picture of the effect of habitat loss during the LGM ~20k years ago. The main aims of this study are 1) to identify any differences in the demographic history of the Red Sea population of C. melanopterus to that of other smaller, teleost reef fish species, including migration across the Red Sea barrier, and 2) to compare the strength of the Red Sea barrier to other biogeographic barriers across the Indo- Pacific region. 54 3.3 Materials and Methods Samples, Targeted Gene Capture, sequencing, and variant calling (relating to published dataset from Maisano et al (2019)) A total of 130 C. melanopterus samples were included in this study (Figure 1 and Table 1) - a subset of the filtered dataset previously published in Maisano Delser et al (2019). Extracted genomic DNA was tested for species identification and 1,077 independent autosomal regions (single-copy, coding DNA sequence markers) were sequenced using Targeted Gene Capture (Li et al. 2013) (see below for number of loci used for simulations). Reads were mapped to an assembled reference genome. For details on sample preparation, bait design, target library preparation, sequencing, reference sequence assembly, quality control, variant calling and filtering, see Supplemental Materials and Methods in Maisano et al (2019). To study the impact of the LGM on the Red Sea population of C. melanopterus, the 14 available samples from each of the Red Sea and the Seychelles (representing the Indian Ocean population) were all selected (Figure 1 and Table 1), determined to be sufficient following a power analysis. One thousand targets were randomly selected from 10,000 simulations generated for the selected demographic model for the Red Sea population of C. melanopterus (56 haploid individuals, 976 loci with an average 620 base pairs length) to be used as pseudo-observed data (pods). The resulting population parameter estimates were cross-validated with the original parameter values used to generate the pods. Scatterplots for the cross validation of the estimated mode and median with the original simulated data show accurate estimates for all parameters (Figure S1 in Supplementary information) and low error values (Table S1 in Supplementary information). This demonstrates adequate power in our analyses for parameter estimation with these sample numbers. Most available samples were used across all sites, with samples being randomly selected if in excess of 14. 55 Figure 1. A map of 13 sampling locations and their sample sizes across the Indo-Pacific for Carcharhinus melanopterus. Note the two locations in very close proximity to one another at ~210ºE. Table 1. Sampling site (including Latitude and Longitude) and sample size information for 130 Carcharhinus melanopterus samples collected from 13 sites across the Indo-Pacific. Region Lat Long Sample size Red Sea 21.5 39.1 14 Seychelles -5.4 53.3 14 Western Australia -22.9 113.8 13 Northern Territory, Australia -12.7 130.3 6 Queensland, Australia -12.6 141.8 5 Great Barrier Reef -16.3 146.5 7 Chesterfield Islands -19.3 158.7 10 Noumea -22.3 166.5 8 Kiribati 1.88 202.7 9 Moorea -17.6 210.1 8 Tetiaroa -17 210.4 8 Fakahina -16 220 14 Vahaga -21.3 223.4 14 Total: 130 56 Summary statistics To model the demographic history of the Red Sea population of C. melanopterus, a folded 2-dimensional site frequency spectrum (2D SFS) was generated for SNPs in the Red Sea and Seychelles populations using a modified series of functions from the vcf2sfs R script v 2.0 (Liu et al. 2018). The vcf files had already been filtered to eliminate any missing data. Monomorphic site counts were excluded. Summarising the 2D SFS, the between-population nucleotide diversity (π) was also used as an additional summary statistic. Model selection and parameter estimation The origin of C. melanopterus was previously determined to be outside the Red Sea (in the Indo-Australian Archipelago), with the Red Sea itself representing one of the last places to be colonised by the range expansion (Maisano Delser et al. 2019). I considered two alternative scenarios for the demographic history of the Red Sea population of C. melanopterus since before the LGM. A power analysis was completed to ensure that the Red Sea and Seychelles dataset was sufficient to conduct a successful model selection analysis. One thousand targets were randomly selected from 10,000 simulations generated with the “Bottleneck” model (Figure 2), to be used as pseudo-observed data (pods) and tested with the model selection analysis. No pods were incorrectly assigned to the “Recolonisation” model, and all 1,000 were correctly assigned to the “Bottleneck” model that generated them, within a 60% confidence threshold (Table S2 in Supplementary information). A total of 977 pods out of the 1,000 were correctly assigned to the “Bottleneck” model within an 80% confidence threshold, or 733 pods within a 90% confidence threshold (Table S2 in Supplementary information). Thus, this dataset is sufficient for accurate model selection. Model “Recolonisation” represents the local extirpation of a Red Sea population when connection to the Indian Ocean was lost during the LGM, followed by recolonisation of a population of size ‘Nanc’ at time ‘tanc’, upon reconnection with the Indian Ocean. Post-LGM recovery started immediately at time ‘trec’ (Figure 2; Table 2). Model "Bottleneck" instead represents the persistence of a Red Sea population of size ‘Nbott’ in a refugium during the LGM, having undergone a genetic bottleneck at time ‘tbott’, reducing the effective population size from pre-LGM estimates of size ‘Nanc’. Following a period of time in this bottleneck, recovery later started at time ‘trec’ (Figure 2; Table 2). In both models, migration between the basins ‘mig’ returns following the LGM, with population recovery starting at time ‘trec’ and continuing for the period ‘tleng’. Current effective population sizes are estimated for both the Red Sea ‘Npop1’ and the 57 Indian Ocean ‘Npop0’. These models, though simplified, provide a fair representation of the likely demographic history, given they are based on a mostly enclosed system with a well studied history supported by geological and climatic data. Figure 2. Two demographic models representing likely demographic histories of the Red Sea population of Carcharhinus melanopterus since before the Last Glacial Maximum (LGM). Model “Recolonisation” represents a local extirpation of C. melanopterus in the Red Sea during the LGM, followed by recolonisation on the return of more favourable conditions. Model “Bottleneck” represents the persistence of C. melanopterus in the Red Sea, a genetic bottleneck during the LGM, and the following population recovery. 58 LGM Table 2. The range of prior values and definitions for the population parameters of models “Recolonisation” and “Bottleneck”, two demographic models representing likely demographic histories of the Red Sea population of Carcharhinus melanopterus since before the Last Glacial Maximum. Units include: ‘kya’ thousand years ago, ‘ya’ years ago, ‘y’ years, and ‘k’ thousand individuals. Parameter name Definition Prior range Model “Recolonisation” Model “Bottleneck” tanc Number of years since colonisation of Red Sea 11.5kya – 15kya 40kya – 600kya Nanc Pre-LGM effective population size in Red Sea 103 (1k) – 104.5 (~30k) 103 (1k) – 104.7 (~50k) tbott Number of years since the start of a genetic bottleneck 24kya – 36kya Nbott Effective population size of bottleneck in Red Sea 103 (1k) – Nanc resize Ratio of effective population size change Nanc/Nbott trec Number of years since start of population recovery tanc – 1 11.5kya – 15kya tleng Number of years of population recovery 150y – (trec – 1) gr Growth rate during population recovery (log(Nanc/Npop1))/tleng (log(Nbott/Npop1))/tleng mig Proportion of individuals per year migrating between basins 0.14x10-5 – 0.14x10-4 Npop0 Current effective population size in Indian Ocean 104 (10k) – 104.7 (~50k) Npop1 Current effective population size in Red Sea Nanc – 104.7 (~50k) Nbott – 104.7 (~50k) The prior range for some of the parameters was determined based on previously described assumptions, knowledge of the LGM in the Red Sea, and limitations of the analyses. For example, as it was not known if C. melanopterus remained in the Red Sea during the LGM, the colonisation of the Red Sea ‘tanc’ was set after the LGM for Model “Recolonisation”, but at a minimum of 40kya for Model “Bottleneck”, prior to the LGM. The number of years since the start of the genetic bottleneck ‘tbott’ for Model “Bottleneck” was set with a minimum of 24kya so the bottleneck occurred by the time of the LGM, and no older than 36kya - a realistic maximum if the populations were impacted by the LGM specifically. The effective population size of the bottleneck ‘Nbott’ was set with a maximum equalling the pre-LGM effective population size in the Red Sea, so it was possible for the population to be stable but not to grow. The ratio of effective population size 59 change ‘resize’ was computed by dividing the pre-LGM effective population size in the Red Sea by the effective population size of the bottleneck, for each simulation rather than from estimates from within the other distributions. The larger the ratio, the greater the population size change. The number of years since the start of the population recovery ‘trec’ also differed between the two models, set to start immediately after the recolonisation of the Red Sea in Model “Recolonisation”, or had a well defined range for Model “Bottleneck”, as it is unlikely that the environment started to become more favourable before 15kya, and the climate has been relatively stable since ~11.5kya. The number of years of population recovery ‘tleng’ could have been very short with a minimum of 150y in case of almost no bottleneck, or could continue up to the present day by setting the maximum as one less than the start of the population recovery ‘trec’. The growth rate ‘gr’ during population recovery was computed by dividing the effective population size at the bottleneck (Model “Bottleneck”) or the effective population size at the recolonisation of the Red Sea (Model “Recolonisation”) by the current effective population size in the Red Sea, and dividing that by the length of the recovery period. Given that the simulations run backwards in time, a negative growth rate between the present day and the genetic bottleneck represents population recovery since the bottleneck. The minimum current effective population size in the Red Sea, ‘Npop1’, was equal to ‘Nbott’ in Model “Bottleneck” or ‘Nanc’ in Model “Recolonisation”, so that the population could remain stable and not recover, but would not eperience any futher decrease in size. The remaining parameters were tested with larger range estimates, and finetuned based on the shape of the posteriors. Model selection, and subsequently parameter estimation for the most supported model, was performed using an ABC framework via random forests (R package ‘abcrf’ v 1.8.1), a machine learning tool which enables greater power and robustness against the choice of summary statistics used (Pudlo et al. 2016). Individual simulations were run using fastsimcoal2 (fsc25 v 2.5.2.21, Excoffier et al. 2013) using a sample size of 28 for each population (2N due to simulating haploid individuals), 976 loci (of 1,077) at least 100bp in length, and average loci length of 620bp based on the observed data (for more details on generating model simulations, see Supplementary information including Figure S2). Times are presented in years but were simulated in generations, so prior ranges (Table 2) were adjusted based on a generation time estimate of 7 years for C. melanopterus (based on Smith et al. 1998; as used in Maisano Delser et al. 2019). Though estimates for the generation time of C. melanopterus could vary, this would have limited impact on the main focus – the ratio between pre- and post-LGM population sizes. Mutation rate was previously estimated in Maisano Delser et al (2016) and allowed to vary - here I settled on the mean: 8.3 x 10-9 60 per site per generation. The recombination rate was set at 0.5 x 10 -8, the mean of the distribution used in Maisano Delser et al (2019). The minor allele 2D SFS and between-population π were used as summary statistics. For each of the two demographic scenarios, I ran 10,000 simulations. For model selection, the forest was trained to distinguish between the two scenarios based on the summary statistics and then it was used to classify the observed data. Similarly, for parameter estimation, the forest was trained for each demographic parameter based on the summary statistics generated during the simulations; the posterior distribution of each parameter was then inferred based on the observed values of those statistics. In both instances, the number of trees was set to 1,000 (Pudlo et al. 2016). For ease of reading, I reported the results using the median, or the measure of central tendency of the posterior distributions with the lowest error according to the power analysis. The 0.025 and 0.975 quantiles for the posterior parameters can be found in Table S3 in the Supplementary information. Population structure and EEMS Plink v 1.9 (Purcell et al. 2007; Chang et al. 2015) was used to generate a ped file from the filtered vcf file to produce a structure plot in ADMIXTURE v 1.3 (Alexander et al. 2009) for the 130 C. melanopterus samples from across the Indo-Pacific. Values of K (number of subpopulations) from 1 to 13 were tested, and the most likely plot was visualised with R package ‘pophelper’ v 2.3.1 (Francis 2017). I then ran an EEMS analysis (Estimating Effective Migration Surfaces, Petkova et al. 2016) to investigate how migration between populations contributed to the current pattern of population structure. Based on isolation by distance (IBD) theory (Wright 1943; Malécot 1955), populations tend to become more genetically distinct with increasing geographic distance. Plotting the effective migration enables us to identify potential barriers to gene flow by taking IBD into account when visualising genetic variation across an area (Petkova et al. 2016). Low effective migration means genetic similarity has decreased more quickly than expected based purely on IBD, so I can study potential barriers to gene flow across the Indo-Pacific region. Binary files (bed, bim, and fam) were generated using plink v 1.9 (Purcell et al. 2007; Chang et al. 2015) to calculate a genetic dissimilarity matrix. Combined with GPS points for the sampling locations and for the regional border, an EEMS analysis (Petkova et al. 2016) was run with 1k demes, 400k MCMC iterations, a 10k burn-in period and thinning of 99. If the longitude value was negative, it was summed with 360º for the equivalent positive value. 61 3.4 Results Data summary The Red Sea and Seychelles dataset consisted of 1,514 SNPs, the distribution of which can be seen in the folded 2D SFS in Figure 3. The computed between-population π was 0.00059. Figure 3. Heat map of the folded 2-dimensional site frequency spectrum for 28 Carcharhinus melanopterus chromosomes from each of the Red Sea and the Seychelles. Each square in the plots represents a SNP count present in a particular number of chromosomes from each of the two populations, which can give insights into the demography. The spectrum appears to have quite a high frequency of common private alleles (those present in many chromosomes in one population and zero chromosomes in the other ) which could suggest low migration levels. Monomorphic SNPs were not included in the dataset. ABC with random forests Before estimating population parameter values, the observed data needed to be compared with the simulated data to confirm that the model could recover the observed summary statistics. I tested how well the model fit the data by plotting a histogram of observed and simulated between- population π (Figure 4). 62 Figure 4. Histograms of simulated data generated with (A) Model “Recolonisation and (B) Model “Bottleneck” in Figure 2 for Carcharhinus melanopterus, with the observed data overlaid in red. The observed data is well within the range of simulated data for both models. In addition to ensuring that the number of samples, number of loci, and summary statistics provide sufficient information to perform a successful model selection and be able to accurately estimate population parameter values, the power analysis can also be used to determine the most appropriate way to summarise the posterior distributions by identifying the measure of central tendency with the lowest error. As discussed previously, error values were low in general, but the median had the most consistently low error values across parameters (Table S1 in Supplementary information) so will be used to report the parameter estimates below. The model selection supported the “Bottleneck” model, with 94% of the random forest trees being assigned to this model, compared to only 6% of trees supporting the “Recolonisation” model (Table S4, Supplementary information). Thus, I opted to recover the posterior distributions of the demographic parameters for the “Bottleneck” model, to study the demographic history of the Red Sea population of C. melanopterus during the LGM. The population split between the Red Sea and Indian Ocean at time ‘tanc’ was ~445kya – prior to multiple glacial and interglacial cycles pre-dating the LGM. Pre-LGM effective population 63 size ‘Nanc’ was estimated at ~27k individuals, before a small genetic bottleneck occurred around the time of the LGM which affected the estimates of more recent population parameters. There is no clear estimate for the exact time of the bottleneck ‘tbott’ but it appears to have reduced effective population size in the Red Sea ‘Nbott’ to ~20k individuals, with a pre- to post- bottleneck population size ratio ‘resize’ of ~1.6. When considered as a percentage (100/1.6), this translated to an effective population size estimate of ~63% of pre-LGM estimates (see Figure 5, and Table S3 in Supplementary information). 64 Figure 5. Plots for the posterior distributions following population parameter estimation for the modelled demographic history of the Red Sea population of Carcharhinus melanopterus since before the Last Glacial Maximum (LGM), using an Approximate Bayesian Computation framework with random forests methods. The demographic model covers the initial population split from the Indian Ocean, persistence in the Red Sea and a genetic bottleneck during the LGM, and population recovery. Prior distributions are shown in grey. 65 Looking at post-LGM recovery, it was not possible to obtain a clear signal for the time when recovery started ‘trec’. The negative growth rate back in time ‘gr’ demonstrates populations were recovering towards the present day, with good evidence for a growth rate ‘gr’ of ~1.2 x 10-4 per year, whilst the period of recovery ‘tleng’ lasted for ~8k years. Migration between the Indian Ocean and the Red Sea ‘mig’, when sea levels allowed, was estimated at ~0.05 x 10-4 of all individuals per year. Lastly, current effective population size for the Red Sea ‘Npop1’ is estimated to be larger than pre-LGM levels, at ~39k individuals, while an estimate for current effective population size in the Seychelles ‘Npop0’ was smaller, at ~11k individuals (see Figure 5 and Table S3 in Supplementary information for details). Population structure and EEMS Based on the sampling design of 13 sites, and cross validation error estimates in ADMIXTURE v 1.3 (Alexander et al. 2009), the most likely number of ancestral populations of C. melanopterus across the Indo-Pacific was K=5 or K=13 (Table S5, Supplementary information). Distinct populations were clustered in the Red Sea, and in the Seychelles (Figure 6). Based on the K=5 estimate, the west to the east of Australia represented a third cluster, Chesterfield Islands, Noumea, and Kiribati a fourth, and the rest of the Pacific formed the fifth cluster. The Great Barrier Reef appeared to have experienced admixture between the Australian and Pacific subpopulations, and Kiribati also showed some of the genetic signature from the more eastern Pacific subpopulations. The K=13 estimate showed a similar pattern but with further differentiation within the clusters, including additional ancestral populations from Chesterfield Islands across to the rest of the Pacific. Moorea and Tetiaroa (being only a few kilometres apart) appeared to cluster together, and the admixture at the Great Barrier Reef was also evident (Figure 6). 66 Figure 6. Population structure for 130 C. melanopterus samples from 13 sites across the Indo-Pacific: Red Sea (RS), Seychelles (Sey), Western Australia (WA), Northern Territory (NT), Queensland (QLD), Great Barrier Reef (GBR), Chesterfield Islands (Chesterf), Noumea, Kiribati, Moorea, Tetiaroa, Fakahina, and Vahaga. Cross-validation estimates in ADMIXTURE v 1.3 (Alexander et al. 2009) suggested the most likely number of subpopulations was K=5 or K=13 (Table S5, Supplementary information). How this population structure relates to effective migration across the region can be determined by looking at the EEMS plot (Figure 7). The associated MCMC chain is a process that works its way through combinations of parameter values (eg. migration over each plane on the EEMS map), plotting the absolute likelihood. Ideally there is some movement in the MCMC plot to make sure I am sampling from all the available space (Figure S3 in Supplementary information). The EEMS map is generated by sampling random points along the MCMC chain, which have a migration estimate for every plane, so generating a posterior distribution for all the parameter values from which the average is plotted (Petkova et al. 2016). Effective migration across the Red Sea barrier is difficult to plot given that there are no closer sampling locations than the Seychelles. There are some white/light orange coloured cells between the Red Sea and the Seychelles, suggesting lower migration rates. This is also the case between the Seychelles and eastern Australia – again, the next closest sampling location. Effective migration seems lower (white/light orange) over these distances than the estimated effective migration across the open ocean between New 67 Caledonia and the central Pacific islands. However, the lowest effective migration was across the Indo-Pacific barrier separating the Indian Ocean and eastern Australia from New Caledonia. Figure 7. Posterior mean migration rates for 130 sampled C. melanopterus individuals from across the Indo- Pacific, following EEMS analysis (Estimating Effective Migration Surfaces, Petkova et al. 2016) with 1,788 loci, 1,000 demes, and 400,000 MCMC iterations (with 10,000 burn-in period and thinning set at 99). The black circles represent sampling sites of various sizes between 5 and 15 individuals. 3.5 Discussion This study reveals that the Red Sea population of C. melanopterus persisted during the LGM, suffering a mild genetic bottleneck with an effective population size estimate of ~63% of pre-LGM levels, and was able to recover. I compare the demographic history of C. melanopterus to that of teleosts Pomacanthus maculosus, Dascyllus abudafur, Dascyllus trimaculatus, and Dascyllus marginatus, studied in the two previous chapters, and discuss how species’ ecology and life history support our results, and what this means for populations of C. melanopterus in the future. Our results provide evidence that populations of high-level mesopredators on coral reefs can survive periods of environmentally-induced habitat loss and recover. Addressing the relative strength of the 68 Red Sea barrier in the Indo-Pacific, I found that it was greater than several barriers across the Pacific for this species, but not as strong as the Indo-Pacific barrier. The relative size of the genetic bottleneck during the LGM is similar in C. melanopterus as in other smaller, teleost reef fishes studied in the two previous chapters. Thus, for this region at least, population response to habitat loss did not appear to be affected by body size or the ability to migrate as adults. It could be assumed that in difficult conditions, individual sharks might leverage this ability and leave the Red Sea. However, C. melanopterus have small home ranges and exhibit strong site fidelity, their limited movement matching the distribution of coral reef habitat (Stevens 1984; Papastamatiou et al. 2009b, 2010). Their dependence on coral reefs means they were likely subjected to the same pressures as the smaller reef fishes, potentially leading to a similar response. Alternatively, the changes in their populations may mirror prey species patterns. Predator and prey population dynamics are linked (Lack 1954), with larger prey populations able to support larger predator populations to a degree (Holling 1959). Though C. melanopterus are opportunistic predators (Stevens 1984; Lyle and Timms 1987; Salini et al. 1992; Papastamatiou et al. 2009a), fish make up the majority of their diet (Stevens 1984). If all coral reef fishes were similarly impacted by environmentally-induced habitat loss, declines in prey species would have a bottom-up effect and lead to a similar response in C. melanopterus. Both these drivers (direct habitat loss and a bottom up effect of the population dynamics of prey species) are plausible and not mutually exclusive. Given that C. melenopterus was able to survive this genetic bottleneck in the LGM and subsequently recover, it would be useful to compare this reduction in effective population size to more recent population declines in reef shark species. Population viability analyses projected ongoing declines in Carcharhinus amblyrhynchos and Triaenodon obesus on the Great Barrier Reef, which, within 20 years, would reduce their abundance on legally fished reefs to 0.1% and 5% of even their highest current abundance levels on no-entry reefs (Robbins et al. 2006). Graham et al (2010) reported a decline of ~90% in the number of reef sharks observed on scientific dives in the Chagos Archipelago between 1975 and 2006, with C. melanopterus specifically accounting for 8.2% of sharks seen per dive in 1996, 5.6% in 2001, and 0% (was not sighted) in 2006. Though not directly comparable (as they are based on census data rather than estimates of effective population size which assumes a panmictic population and so can be affected by population structure or nonrandom mating), these declines in reef shark species are likely to be much greater than the estimated effective population size of the genetic bottleneck in the Red Sea of ~63% of pre-LGM estimates, reported in this study. Thus, it is not clear whether current populations of C. melanopterus will be able to recover as they did in the Red Sea following the LGM, particularly if 69 population declines continue at the current rate and if Allee effects (Allee et al. 1949; Stephens et al. 1999) come into play at a critical population threshold. There were also some differences in the demographic history of C. melanopterus compared to that of the smaller, teleost reef fish species. The time of colonisation of the Red Sea, or the split between the Red Sea and Indian Ocean populations ‘tanc’ was much earlier than the ~60 - ~95kya estimated for Dascyllus abudafur and Dascyllus trimaculatus. This means that C. melanopterus must have persisted in the Red Sea through multiple glacial and interglacial cycles. Some of the characteristics that enabled sharks to survive multiple mass extinction events in the 420 million years since they evolved likely also helped C. melanopterus to survive repeated glaciations in the Pleistocene period – they have flexible diets (Stevens 1984; Lyle and Timms 1987; Salini et al. 1992; Papastamatiou et al. 2009a) and strong healing/immune systems (Chin et al. 2015). However, despite younger estimates for ‘tanc’ in the smaller, teleost reef fishes, this does not mean it was the first time these species entered the Red Sea. Species divergence times for reef fishes tend to be much older than this, as evidenced for example by a coalescence estimate of ~560 - ~630kya for the D. trimaculatus species complex (D. trimaculatus, Dascyllus auripinnis, Dascyllus albisella, and Dascyllus strasburgi) across the Indo-Pacific (Leray et al. 2010). So, it is possible that there have been multiple colonisation and re-colonisation events for these reef fish species in the Red Sea that are not detected by my analysis, as the ‘tanc’ estimates in this study are only representative of the current population. Local extirpation between these recolonisation events may have been more likely for the smaller, teleost reef fishes than for reef sharks in times of environmental change due to different modes of reproduction and other life history traits. Reef fish larvae (whether from pelagic or benthic spawned eggs) are very small and vulnerable to environmental change (McLeod et al. 2013). Combined with short lifespans, even a short period of difficult conditions could dramatically affect recruitment and a species’ local persistence far before any noticeable impact on shark populations, or the coral reef habitat itself. On the other hand, though heavily impacted by fishing pressure, the larger, longer lived, viviparous C. melanopterus are likely more robust to varying environmental conditions (for other examples see: Graham et al. 2011; Lynam et al. 2017; McLean et al. 2019). Smaller reef fishes are also more speciose, so communities have likely been less stable over time (Talbot et al. 1978), with extinctions or extirpations leaving niches that were quicky filled by another species. However, aside from these possibilities, there is also the fundamental problem of a potentially fictitious ‘tanc’. Though the genetic bottleneck during the LGM was not so severe that it likely eliminated the genetic signature of the original colonisation of the Red Sea, it is possible that 70 more recent waves of reef fish larvae from the Indian Ocean could have expanded and populated the Red Sea since the original colonisation, altering the genetic signature of the population and reducing ‘tanc’ to a younger estimate (since my model only allowed for a single colonisation followed by steady migration). This scenario is far more likely to have occurred in the teleost reef fishes than the reef sharks due to the former laying or releasing several thousand eggs a day during the spawning season (Danilowicz 1995; Asoh 2003), which could result in the arrival of a large wave of larvae if conditions were right. Given the potential variation in vulnerability to environmental change between C. melanopterus and the smaller, teleost reef fish species as discussed above, it was surprising to find that the levels of migration between the Indian Ocean and the Red Sea for C. melanopterus when sea levels permitted, both before and since the LGM, were of the same order of magnitude as those obtained for two of the teleost reef fish species with pelagic larvae (P. maculosus and D. marginatus). It could have been expected that the distinct environment in the Red Sea compared to the Indian Ocean, particularly in the south, would have posed a greater barrier to the larvae of smaller, teleost reef fish species than to the migration of adult C. melanopterus. Yet migration in C. melanopterus was in the same range of estimates as reported in the previous chapters for Pomacanthus maculosus, Dascyllus abudafur, and Dascyllus marginatus. Only Dascyllus trimaculatus was estimated to have higher levels of migration, fitting with current knowledge of larval swimming capabilities. It is possible that, despite it being physically feasible, C. melanopterus actively avoids migration between the two basins. Contrary to some other shark species, C. melanopterus is not a migratory species. Their average daily activity space is only ~10km2 (Mourier et al. 2012), and ~70% of their time is spent within an area of ~0.3km2 over a year (Mourier and Planes 2013). If migrations do occur, it is around a single island or between neighbouring islands (Mourier and Planes 2013). Thus, though connectivity with the Indian Ocean has been restored, the entrance to the Red Sea still represents a biogeographic barrier to gene flow between the two basins for a broad range of species. The population structure of C. melanopterus across the Indo-Pacific also reflects the Red Sea barrier, which is corroborated by our EEMS (Petkova et al. 2016) results visualising estimated effective migration in the region. As a non-migratory species, long distance movements are uncommon in C. melanopterus. They have been observed to travel distances of up to 275km (Speed et al. 2016), but distances between islands across the Pacific can be thousands of kilometres. These long distances have had an impact on the population genetics of the species. Population structure is evident, and effective migration is not high, implying some interruption to gene flow beyond scaled 71 IBD. However, the lower effective migration estimated between the Red Sea and the Seychelles suggests a greater barrier to gene flow relative to the distance involved – likely the Red Sea barrier. Incidentally, the Seychelles also seems isolated from the eastern Indian Ocean, or the west coast of Australia. The isolation of this small group of islands with only a few hundred kilometres of coastline likely explains the small current effective population size estimated for the Indian Ocean (represented by the Seychelles), ‘Npop0’. The Red Sea barrier is not the strongest in the region though. The lowest effective migration was found at the Indo-Pacific barrier, specifically between Australia and the Pacific. Many species populations are genetically distinct between the Indian and Pacific oceans, with western Australia often grouping with eastern Australia (Benzie 2000; Bay et al. 2004; for more examples see Gaither et al. 2010) – a pattern that also seems to be evident in C. melanopterus. This is despite the fact that there were more connections between the Indian Ocean and the Pacific Ocean when sea levels were high than the single Bab-el-Mandeb strait connecting the Red Sea to the Indian Ocean. It is possible that, despite the Red Sea environment being particularly hot and saline (Edwards 1987; Kleypas et al. 2008; Trommer et al. 2009; Cantin et al. 2010; Fine et al. 2013; Chaidez et al. 2017), there is a stronger environmental gradient between the Indian Ocean and the Pacific Ocean than between the Red Sea and the Indian Ocean. Low effective migration at various locations across their range, and their general tendency for site fidelity (Stevens 1984; Papastamatiou et al. 2009b, 2010), has implications for conservation efforts for C. melanopterus. Due to the barriers to gene flow, efforts at the local population level will not be hindered by negative impacts in other areas, but any resulting successes will not extend to other populations either. Independent plans in multiple locations would be required to support the species across its range. 3.6 Conclusion Overall, my study highlighted the similarity in population response of large reef sharks to that of smaller, teleost reef fishes during periods of environmentally-induced habitat loss in the Red Sea. In parallel with the smaller species, I provide evidence that the Red Sea population of C. melanopterus suffered a mild genetic bottleneck and was able to fully recover, but there were a few notable differences in other aspects of its demographic history. The Red Sea was also shown to have remained a barrier to gene flow for current populations. Though this study is limited to the Red Sea and C. melanopterus specifically, it provides an example of the importance of coral reef habitat to all reef-associated fish species and the interconnectedness between trophic levels in an ecosystem. The persistence of coral reef refugia and limited population reductions in other reef fish species 72 likely supported population recovery of C. melanopterus. It is not clear whether this capacity would persist given potentially greater impacts on all species of coral reef fishes if current rates of coral reef habitat loss are allowed to continue. Coral reef habitat loss and continued overexploitation need to be addressed for populations to retain the ability to recover. 73 74 DISCUSSION Over the three previous chapters, my research aims to address the main objective of this thesis, namely: to determine how reef fish populations respond to habitat loss on scales relevant to global climate change. Here, I will show how this work successfully achieves this objective, while also addressing several other, related research questions that further help our understanding of the population response: how response varies across species, the minimum size of impacted populations that are capable of recovery, and the location of any identified refugia. The first chapter revealed that reef fish species’ populations were negatively impacted by climate-induced habitat loss, but effective population sizes remained relatively large compared to ancestral estimates. I modelled the demographic history of several reef fish species’ populations in the isolated Red Sea during the Last Glacial Maximum (LGM) to study the impact of climate- induced habitat loss and fragmentation. These two distinct processes both had the potential to impact the demography of Red Sea populations of reef fishes, the effects of which cannot be disentangled with the current methods, so I have referred to the combined effects under the simplified term of ‘habitat loss’. Population parameter estimates of the minimum effective population size during the LGM indicated that all three study species underwent only a mild genetic bottleneck, while current effective population size estimates demonstrated population recovery. The second chapter focussed on the location of coral reef refugia, and revealed that they were unlikely to have been located outside the Red Sea. I considered alternative demographic scenarios for the Red Sea population of Dascyllus marginatus, with differing locations for refugia during the LGM. The limited range of this near-endemic species outside the Red Sea meant it was possible to test the possibility of an external refugium in the Gulf of Aden, but there was low support for this model compared to models with in-situ Red Sea refugia, particularly a single refugium in the north. Lastly, the third chapter revealed that populations of a larger, more motile, high-level mesopredator on coral reefs, Carcharhinus melanopterus, could also survive periods of climate-induced habitat loss, and were also negatively impacted in the form of a mild genetic bottleneck. Given their ability to migrate as adults, initial analysis compared two demographic scenarios to determine if the Red Sea population of C. melanopterus was locally extirpated or persisted during the LGM. Analysis of the population parameter estimates for the latter, most supported model detailed a similar response to the smaller, teleost reef fish species, as summarised above. When considering the population parameter estimates for the demographic histories of D. marginatus and C. melanopterus alongside the three species in the first chapter, populations across all study species appeared to respond 75 similarly to habitat loss on scales relevant to global climate change. The minimum size of genetic bottlenecks at the LGM were not severe – relatively large effective population sizes survived. Estimates for these mild genetic bottlenecks suggest effective population sizes between ~26 and ~77% of pre-LGM levels across all species. Perhaps unsurprisingly, these reef fish populations were shown to have successfully recovered, but it remains unclear how these effective population sizes were sustained considering the extensive habitat loss in the Red Sea during the LGM. Relating the main findings of this work back to some of the existing literature introduced previously, there are some results that fall in line with expectations. Firstly, mitochondrial DNA was previously used to estimate population split times between the Red Sea and the Indian Ocean in reef fish species, reporting estimates ranging from hundreds of thousands of years ago to roughly around the time of the LGM (DiBattista et al. 2013). This suggests that these species persisted in the Red Sea during the LGM, prior to the return of more favourable conditions. I assumed such persistence of the Red Sea populations when I designed the demographic models for the smaller, teleost reef fishes, and this was corroborated by the estimate of a mild genetic bottleneck at the LGM. Estimates suggesting an extreme bottleneck would have required additional testing of alternative demographic models, as they may have more likely represented local extirpation from the Red Sea during the LGM, followed by a recolonisation event. Persistence of the Red Sea population during the LGM was directly tested for C. melanopterus via demographic model selection, and this model received far greater support than the alternative model which represented local extirpation and recolonisation. Secondly, it could be argued that the mild genetic bottlenecks detected across all study species were not particularly surprising. The fact that none of the cosmopolitan study species (i.e. present in the Red Sea and the Indian Ocean), in DiBattista et al (2013) or in this research, went extinct following the LGM disturbance provides evidence that they were all able to recover from persisting populations. This would suggest that the bottlenecked populations were unlikely to have been extremely small during the LGM if all populations were able to recover. On the other hand, if we consider the extent of the environmental crisis in the Red Sea during the LGM (Fenton et al. 2000; Siddall et al. 2003; Biton et al. 2008), it seems like a remarkably minimal impact on the effective population sizes of coral reef fishes. There was a huge loss of coral reef habitat throughout much of the Red Sea during the LGM (Sheppard et al. 1992), yet it did not seem to translate into drastic reductions in reef fish species’ populations. One possible explanation could be the presence of multiple small refugia dotted throughout the Red Sea during the LGM which were able to maintain a lot of the genetic diversity even if census population sizes declined. When coral reef habitat returned, these populations would have later mixed, with the genetic signature translating to elevated estimates of effective population size during these times. However, given that it was not 76 even possible to distinguish between one or two major refugia inside the Red Sea basin for D. marginatus, I do not have the data or the resolution in the methodology to directly test this scenario. As with all projects, there were of course some limitations to this body of work. My research has strongly focused on the Red Sea basin. Its unique geography, history, and environment (Smeed 1997; Fenton et al. 2000; Trommer et al. 2009; Cantin et al. 2010; Raitsos et al. 2013) represent an ideal, unusually isolated model marine system to ask the required questions about the impact of climate-induced habitat loss. However, the results represent a single geographic example of how reef fish species’ populations responded to climate-induced habitat loss. Given the advantages of the system however, and the fact that it enables us to use information from the past to inform potential future responses, these results from the Red Sea represent an important tool for identifying at-risk coral reef fish populations and for directing future conservation efforts. Similarly, I was only able to obtain data for three Dascyllus species, one other teleost coral reef fish species, and one species of reef shark. However, these were the datasets available, and the results open a valid discussion. Are all groups equally vulnerable to climate change, or are there some winners and some losers which would affect community composition? It would be beneficial to study additional groups including invertebrates such as molluscs and crustaceans to compare population response between less related taxa. Depending on the response, it could also be interesting to model the demography of the tawny nurse shark, Nebrius ferrugineus, in the Red Sea during the LGM. It is also non-migratory with high site fidelity (Ferreira et al. 2013) and dependence on coral reef habitat (Rizzari et al. 2014) like C. melanopterus, but has a more benthic diet including molluscs (Stevens 1984; Cortés 1999). If the population response to habitat loss of these taxonomic groups was distinct from that of the reef fishes presented herein, I could potentially detangle the effects of population reduction in prey vs. the effects of habitat loss on reef sharks. Additional work in the third chapter further investigates the Red Sea region in the context of the whole Indo-Pacific. High rates of endemism (reviewed in DiBattista et al. 2016b) and distinct populations of cosmopolitan species in the Red Sea (DiBattista et al. 2013), coupled with the unusual environmental conditions in this basin (Edwards 1987; Kleypas et al. 2008; Trommer et al. 2009; Cantin et al. 2010; Fine et al. 2013; Chaidez et al. 2017), imply a likely role for local adaptation but I was not able to study this process specifically. If it had been possible to directly choose the study species rather than sourcing published or collected datasets, further suggested work in this area could focus on identifying possible loci under selection in the Red Sea. With sufficient data, selection scans can help identify these loci. Sampling additional Red Sea endemic sister species (to the cosmopolitan species studied here) would highlight possible shared selection 77 targets in the Red Sea populations while controlling for natural species divergences. Another source of variation in this research is the estimated mutation rate and generation times used for simulating populations of the smaller, teleost reef fishes. The potential for error in these parameters means that estimates for the timing of specific demographic events, such as the time of the genetic bottleneck, may not be accurate. However, it is the ratio between the effective population size of pre- and post- LGM populations that is the most important value when discussing the impact of climate-induced habitat loss on the reef fish species included in this research. To document variation from the reported median of the posterior distributions, I made sure to include the 0.025 and 0.975 quantiles for all population parameter estimates in the Supplementary information for each chapter. Most importantly, this research provides the evidence required to show that reef fish species’ populations are negatively impacted by habitat loss on a spatial and temporal scale that is relevant to climate change. Red Sea populations were confirmed to have persisted during the LGM, most likely remaining within the Red Sea basin rather than in an external refugium, and were shown to have suffered a mild genetic bottleneck. Population parameter estimates highlight a maximum population reduction that we know was capable of recovery, but it raises the question as to why there was not a greater reduction in effective population size at the genetic bottleneck compared to the ancestral population size. Had there been a greater reduction, this quantitative estimate could have been used to guide conservation resources towards reef fish populations that may have reached, or be likely to reach this threshold, in an effort to ensure that a population is maintained that still has the capacity for recovery. Instead, the key question is now how these relatively substantial effective population sizes were sustained in refugia during the LGM. Was a single refugium or a limited number of refugia larger than has been predicted? Perhaps multiple, small refugia formed a metapopulation that maintained a relatively high level of genetic diversity despite substantial habitat loss and a likely reduction in census population sizes. Once connectivity between the Red Sea and the Indian Ocean was restored, conditions would have started to become more favourable, enabling the return of coral reef habitat throughout the Red Sea, reconnecting the refugia, and resulting in a minimal reduction in effective population size estimated for the bottleneck. Mapping these refugia (discussed below) will likely help to determine the processes that led to this observation. Efforts should be directed towards reducing the current rate of coral reef habitat loss as a priority, but this information in particular might reveal how to best mitigate the impact of climate change on coral reef fish populations in the future, by identifying and protecting certain refugia. 78 CONCLUSION Using population genetics and demographic modelling, I was able to reveal that reef fish species’ populations suffered population reductions in the model Red Sea system during the Last Glacial Maximum (LGM), but were also able to recover. Importantly, in a development from prior, small-scale work on the impact of habitat loss on reef fish populations, my research provides direct evidence that reef fish species’ populations are negatively impacted during periods of environmentally-induced habitat loss over spatial and temporal scales relevant to global climate change. I present a quantitative minimum effective population size of ~26% of pre-LGM estimates, which was able to support population recovery. Despite the fact that the genetic bottlenecks were not as small as I might have expected, it still provides a significant threshold to be avoided in at-risk populations and provides good news in evidencing the ability of reef fish species’ populations to recover following a long-term, large-scale climatic disturbance, highlighting the potential for resilience in coral reef fishes. Moving forwards, this research should lead us to try to understand how the populations persisted at such large effective population sizes and did not suffer greater reductions considering the extent of environmental change and habitat loss that occurred in the Red Sea during the LGM. One possibility introduced in the first and discussed in the second chapter, is the potential for genetic diversity to have been maintained in multiple, small patches of refugia (that I do not have the resolution to test for), or that the proposed refugia were simply larger than expected. Our knowledge of the substantial climate-induced habitat loss in the Red Sea during the LGM helped inform the demographic models used in this research, but they actually only take into account the population genetics of the reef fish populations. One important feature of the impact of environmental change during the LGM that is not considered in detail in this research, is how the distribution of coral reefs changed over time with the changing availability of coastal habitat (see Ludt and Rocha 2015). These patterns could help map out the locations of refugial populations and explain why the Red Sea populations of reef fish species did not suffer greater reductions in effective population sizes. Including a spatial distribution aspect by mapping the potential distribution of coral reefs over time, and comparing this with the modelled demographic history of the Red Sea reef fish populations, could not only improve our overall picture of the response of Red Sea reef fish populations to habitat loss on scales relative to global climate change, but could also be used to estimate the relative importance of habitat loss vs. habitat fragmentation. 79 Given the time-scale of this environmental crisis, it is also possible that both the corals and reef fish populations were able to successfully adapt to the changing environment, enabling them to persist in sizeable populations. New research streams are focussing on adaptation and acclimation in coral reef fishes in the face of environmental change, particularly looking at generational transmission, but again this is at small spatial and temporal scales, often in controlled laboratory settings (Bernal et al. 2022). There are some opportunities to study these processes in natural systems, such as in reef fish populations living among CO2 streams emerging from under the seabed on the reefs off Papua New Guinea (Kang et al. 2022), but we need to scale up this work to determine if the benefits of adaptation and acclimation could have emerged within the timeframe of the LGM, maintaining reef fish populations in the Red Sea despite substantial climate-induced habitat loss. 80 REFERENCES Aeschbacher S, Beaumont MA, Futschik A (2012) A novel approach for choosing summary statistics in Approximate Bayesian Computation. Genetics 192:1027–1047 Airoldi L, Balata D, Beck MW (2008) The Gray Zone: Relationships between habitat loss and marine diversity and their applications in conservation. J Exp Mar Biol Ecol 366:8–15 Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664 Allee WC, Park O, Emerson AE, Park T, Schmidt KP (1949) Principles of animal ecology. W. B. Saunders Co., Philadelphia Allen GR (1985) Butterfly and angelfishes of the world. Mergus Publishers, Melle Allen GR (1991) Damselfishes of the world. Mergus Publishers, Melle Asoh K (2003) Reproductive parameters of female Hawaiian damselfish Dascyllus albisella with comparison to other tropical and subtropical damselfishes. Mar Biol 143:803–810 Ban SS, Alidina HM, Okey TA, Gregg RM, Ban NC (2016) Identifying potential marine climate change refugia: A case study in Canada’s Pacific marine ecosystems. Glob Ecol Conserv 8:41–54 Bay LK, Choat JH, van Herwerden L, Robertson DR (2004) High genetic diversities and complex genetic structure in an Indo-Pacific tropical reef fish (Chlorurus sordidus): evidence of an unstable evolutionary past? Mar Biol 144:757–767 Bazzi M, Campione NE, Ahlberg PE, Blom H, Kear BP (2021) Tooth morphology elucidates shark evolution across the end-Cretaceous mass extinction. PloS Biol 19:e3001108 Bazzi M, Kear BP, Blom H, Ahlberg PE, Campione NE (2018) Static Dental disparity and morphological turnover in sharks across the end-Cretaceous mass extinction. Curr Biol 28:2607–2615 Bellwood DR, Hughes TP, Folke C, Nyström M (2004) Confronting the coral reef crisis. Nature 429:827–833 Benzie JAH (2000) The detection of spatial variation in widespread marine species: methods and bias in the analysis of population structure in the crown of thorns starfish (Echinodermata: Asteroidea). Hydrobiologia 420:1–14 Berkelmans R, van Oppen MJH (2006) The role of zooxanthellae in the thermal tolerance of corals: a ‘nugget of hope’ for coral reefs in an era of climate change. Proc R Soc B Biol Sci 273:2305–2312 Bernal MA, Ravasi T, Rodgers GG, Munday PL, Donelson JM (2022) Plasticity to ocean warming is influenced by transgenerational, reproductive, and developmental exposure in a coral reef fish. Evol Appl 15:249–261 81 Bertorelle G, Benazzo A, Mona S (2010) ABC as a flexible framework to estimate demography over space and time: some cons, many pros. Mol Ecol 19:2609–2625 Biton E, Gildor H, Peltier WR (2008) Red Sea during the Last Glacial Maximum: implications for sea level reconstruction. Paleoceanography 23:1–12 Blum MGB, Nunes MA, Prangle D, Sisson SA (2013) A Comparative Review of Dimension Reduction Methods in Approximate Bayesian Computation. Stat Sci 28:189–208 Bogorodsky SV, Randall JE (2019) Endemic fishes of the Red Sea. In: Rasul N.M.A., Stewart I.C.F. (eds) Oceanographic and biological aspects of the Red Sea. Springer International Publishing, Cham, pp 239–265 Booth DJ, Beretta GA (2002) Changes in a fish assemblage after a coral bleaching event. Mar Ecol Prog Ser 245:205–212 Booth DJ, Beretta GA, Brown L, Figueira WF (2018) Predicting success of range-expanding coral reef fish in temperate habitats using temperature-abundance relationships. Front Mar Sci 5:31 Borsa P, Sembiring A, Fauvelot C, Chen W-J (2014) Resurrection of Indian Ocean humbug damselfish, Dascyllus abudafur (Forsskål) from synonymy with its Pacific Ocean sibling, Dascyllus aruanus (L.). C R Biol 337:709–716 Briggs JC (1961) The East Pacific Barrier and the distribution of marine shore fishes. Evolution 15:545–554 Briggs JC (1974) Operation of Zoogeographic Barriers. Syst Biol 23:248–256 Brown BE (1997) Coral bleaching: causes and consequences. Coral Reefs 16:S129–S138 Buddemeier RW, Fautin DG (1993) Coral bleaching as an adaptive mechanism. BioScience 43:320–326 Cagua EF, Collins N, Hancock J, Rees R (2014) Whale shark economics: a valuation of wildlife tourism in South Ari Atoll, Maldives. PeerJ 2:e515 Cantin NE, Cohen AL, Karnauskas KB, Tarrant AM, McCorkle DC (2010) Ocean warming slows coral growth in the central Red Sea. Science 329:322–325 Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH (2011) Stacks: building and genotyping loci de novo from short-read sequences. G3 Genes Genomes Genet 1:171–182 Chaidez V, Dreano D, Agusti S, Duarte CM, Hoteit I (2017) Decadal trends in Red Sea maximum surface temperature. Sci Rep 7:8144 Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ (2015) Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4:s13742-015 Chin A, Mourier J, Rummer JL (2015) Blacktip reef sharks (Carcharhinus melanopterus) show high capacity for wound healing and recovery following injury. Conserv Physiol 3:1–9 Chollett I, Mumby PJ (2013) Reefs of last resort: Locating and assessing thermal refugia in the wider Caribbean. Biol Conserv 167:179–186 82 Coker DJ, Pratchett MS, Munday PL (2009) Coral bleaching and habitat degradation increase susceptibility to predation for coral-dwelling fishes. Behav Ecol 20:1204–1210 Coker DJ, Pratchett MS, Munday PL (2012) Influence of coral bleaching, coral mortality and conspecific aggression on movement and distribution of coral-dwelling fish. J Exp Mar Biol Ecol 414:62–68 Compagno LJV (1990) Alternative life-history styles of cartilaginous fishes in time and space. Environ Biol Fishes 28:33–75 Cortés E (1999) Standardized diet compositions and trophic levels of sharks. ICES J Mar Sci 56:707–717 Cowen RK, Paris CB, Srinivasan A (2006) Scaling of Connectivity in Marine Populations. Science 311:522–527 Crane NL, Tariel J, Caselle JE, Friedlander AM, Robertson DR, Bernardi G (2018) Clipperton Atoll as a model to study small marine populations: endemism and the genomic consequences of small population size. PloS One 13:e0198901 Cristofari R, Liu X, Bonadonna F, Cherel Y, Pistorius P, Le Maho Y, Raybaud V, Stenseth NC, Le Bohec C, Trucchi E (2018) Climate-driven range shifts of the king penguin in a fragmented ecosystem. Nat Clim Change 8:245–251 Csilléry K, Blum MGB, Gaggiotti OE, François O (2010) Approximate Bayesian Computation (ABC) in practice. Trends Ecol Evol 25:410–418 Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H (2021) Twelve years of SAMtools and BCFtools. GigaScience 10:giab008 Danilowicz BS (1995) Spatial patterns of spawning in the coral-reef damselfish Dascyllus albisella. Mar Biol 122:145–155 Delrieu-Trottin E, Hubert N, Giles EC, Chifflet-Belle P, Suwalski A, Neglia V, Rapu-Edmunds C, Mona S, Saenz-Agudelo P (2020) Coping with Pleistocene climatic fluctuations: demographic responses in remote endemic reef fishes. Mol Ecol 29:2218–2233 Descombes P, Wisz MS, Leprieur F, Parravicini V, Heine C, Olsen SM, Swingedouw D, Kulbicki M, Mouillot D, Pellissier L (2015) Forecasted coral reef decline in marine biodiversity hotspots under climate change. Glob Change Biol 21:2479–2487 DiBattista JD, Berumen ML, Gaither MR, Rocha LA, Eble JA, Choat JH, Craig MT, Skillings DJ, Bowen BW (2013) After continents divide: comparative phylogeography of reef fishes from the Red Sea and Indian Ocean. J Biogeogr 40:1170–1181 DiBattista JD, Choat JH, Gaither MR, Hobbs J-PA, Lozano-Cortés DF, Myers RF, Paulay G, Rocha LA, Toonen RJ, Westneat MW, Berumen ML (2016a) On the origin of endemic species in the Red Sea. J Biogeogr 43:13–30 DiBattista JD, Roberts MB, Bouwmeester J, Bowen BW, Coker DJ, Lozano‐Cortés DF, Choat JH, Gaither MR, Hobbs J-PA, Khalil MT, Kochzius M, Myers RF, Paulay G, Robitzch VSN, Saenz‐Agudelo P, Salas E, Sinclair‐Taylor TH, Toonen RJ, Westneat MW, Williams ST, 83 Berumen ML (2016b) A review of contemporary patterns of endemism for shallow water reef fauna in the Red Sea. J Biogeogr 43:423–439 Douglas AE (2003) Coral bleaching––how and why? Mar Pollut Bull 46:385–392 Edwards FJ (1987) Climate and Oceanography. In: Edwards A.J., Head S.M. (eds) Red Sea. Pergamon, Amsterdam, pp 45–69 Emanuel K (2005) Increasing destructiveness of tropical cyclones over the past 30 years. Nature 436:686–688 Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M (2013) Robust demographic inference from genomic and SNP data. PloS Genet 9:e1003905 Fahrig L (2003) Effects of habitat fragmentation on biodiversity. Annu Rev Ecol Evol Syst 34:487– 515 Fairbanks RG (1989) A 17,000-year glacio-eustatic sea level record: influence of glacial melting rates on the Younger Dryas event and deep-ocean circulation. Nature 342:637–642 Feary DA, Almany GR, McCormick MI, Jones GP (2007) Habitat choice, recruitment and the response of coral reef fishes to coral degradation. Oecologia 153:727–737 Fenton M, Geiselhart S, Rohling EJ, Hemleben Ch (2000) Aplanktonic zones in the Red Sea. Mar Micropaleontol 40:277–294 Ferreira LC, Afonso AS, Castilho PC, Hazin FHV (2013) Habitat use of the nurse shark, Ginglymostoma cirratum, off Recife, northeast Brazil: a combined survey with longline and acoustic telemetry. Environ Biol Fishes 96:735–745 Field IC, Meekan MG, Buckworth RC, Bradshaw CJA (2009) Susceptibility of sharks, rays and chimaeras to global extinction. Adv Mar Biol 56:275–363 Fine M, Gildor H, Genin A (2013) A coral reef refuge in the Red Sea. Glob Change Biol 19:3640– 3647 Fisher R (2005) Swimming speeds of larval coral reef fishes: impacts on self-recruitment and dispersal. Mar Ecol Prog Ser 285:223–232 Francis RM (2017) pophelper: an R package and web app to analyse and visualize population structure. Mol Ecol Resour 17:27–32 Frisch AJ, Ireland M, Rizzari JR, Lönnstedt OM, Magnenat KA, Mirbach CE, Hobbs J-PA (2016) Reassessing the trophic role of reef sharks as apex predators on coral reefs. Coral Reefs 35:459–472 Froese R, Pauly D (2022) FishBase. World Wide Web electronic publication. www.fishbase.org, version (06/2022) Gaither MR, Toonen RJ, Robertson DR, Planes S, Bowen BW (2010) Genetic evaluation of marine biogeographical barriers: perspectives from two widespread Indo-Pacific snappers (Lutjanus kasmira and Lutjanus fulvus). J Biogeogr 37:133–147 84 Giles EC, Saenz‐Agudelo P, Hussey NE, Ravasi T, Berumen ML (2015) Exploring seascape genetics and kinship in the reef sponge Stylissa carteri in the Red Sea. Ecol Evol 5:2487– 2502 Glynn PW (1984) Widespread coral mortality and the 1982–83 El Niño warming event. Environ Conserv 11:133–146 Godwin J (1995) Phylogenetic and habitat influences on mating system structure in the humbug damselfishes (Dascyllus, Pomacentridae). Bull Mar Sci 57:637–652 Grafeld S, Oleson KLL, Teneva L, Kittinger JN (2017) Follow that fish: uncovering the hidden blue economy in coral reef fisheries. PloS One 12:e0182104 Graham MH, Kinlan BP, Druehl LD, Garske LE, Banks S (2007a) Deep-water kelp refugia as potential hotspots of tropical marine diversity and productivity. Proc Natl Acad Sci 104:16576–16580 Graham NAJ, Chabanet P, Evans RD, Jennings S, Letourneur Y, Aaron MacNeil M, McClanahan TR, Öhman MC, Polunin NVC, Wilson SK (2011) Extinction vulnerability of coral reef fishes. Ecol Lett 14:341–348 Graham NAJ, Spalding MD, Sheppard CRC (2010) Reef shark declines in remote atolls highlight the need for multi-faceted conservation action. Aquat Conserv Mar Freshw Ecosyst 20:543– 548 Graham NAJ, Wilson SK, Jennings S, Polunin NVC, Robinson J, Bijoux JP, Daw TM (2007b) Lag effects in the impacts of mass coral bleaching on coral reef fish, fisheries, and ecosystems. Conserv Biol 21:1291–1300 Graham NAJ, Wilson SK, Pratchett MS, Polunin NVC, Spalding MD (2009) Coral mortality versus structural collapse as drivers of corallivorous butterflyfish decline. Biodivers Conserv 18:3325–3336 Green AL, Maypa AP, Almany GR, Rhodes KL, Weeks R, Abesamis RA, Gleason MG, Mumby PJ, White AT (2015) Larval dispersal and movement patterns of coral reef fishes, and implications for marine reserve network design. Biol Rev 90:1215–1247 Guinot G, Adnet S, Cavin L, Cappetta H (2013) Cretaceous stem chondrichthyans survived the end- Permian mass extinction. Nat Commun 4:2669 Hamilton RJ, Potuku T, Montambault JR (2011) Community-based conservation results in the recovery of reef fish spawning aggregations in the Coral Triangle. Biol Conserv 144:1850– 1858 Hampe A, Rodríguez-Sánchez F, Dobrowski S, Hu FS, Gavin DG (2013) Climate refugia: from the Last Glacial Maximum to the twenty-first century. New Phytol 197:16–18 Heupel MR, Papastamatiou YP, Espinoza M, Green ME, Simpfendorfer CA (2019) Reef shark science – key questions and future directions. Front Mar Sci 6:1–14 Hoegh-Guldberg O (1999) Climate change, coral bleaching and the future of the world’s coral reefs. Mar Freshw Res 50:839–866 85 Hoegh-Guldberg O, Mumby PJ, Hooten AJ, Steneck RS, Greenfield P, Gomez E, Harvell CD, Sale PF, Edwards AJ, Caldeira K, Knowlton N, Eakin CM, Iglesias-Prieto R, Muthiga N, Bradbury RH, Dubi A, Hatziolos ME (2007) Coral reefs under rapid climate change and ocean acidification. Science 318:1737–1742 Holling CS (1959) The Components of Predation as Revealed by a Study of Small-Mammal Predation of the European Pine Sawfly. Can Entomol 91:293–320 Hughes TP, Anderson KD, Connolly SR, Heron SF, Kerry JT, Lough JM, Baird AH, Baum JK, Berumen ML, Bridge TC, Claar DC, Eakin CM, Gilmour JP, Graham NAJ, Harrison H, Hobbs J-PA, Hoey AS, Hoogenboom M, Lowe RJ, McCulloch MT, Pandolfi JM, Pratchett M, Schoepf V, Torda G, Wilson SK (2018) Spatial and temporal patterns of mass bleaching of corals in the Anthropocene. Science 359:80–83 Hughes TP, Baird AH, Bellwood DR, Card M, Connolly SR, Folke C, Grosberg RK, Hoegh- Guldberg O, Jackson JBC, Kleypas J, Lough JM, Marshall P, Nyström M, Palumbi SR, Pandolfi JM, Rosen B, Roughgarden J (2003) Climate change, human impacts, and the resilience of coral reefs. Science 301:929–933 Jones GP, Almany GR, Russ GR, Sale PF, Steneck RS, van Oppen MJH, Willis BL (2009) Larval retention and connectivity among populations of corals and reef fishes: history, advances and challenges. Coral Reefs 28:307–325 Jones GP, McCormick MI, Srinivasan M, Eagle JV (2004) Coral decline threatens fish biodiversity in marine reserves. Proc Natl Acad Sci 101:8251–8253 Kang J, Nagelkerken I, Rummer JL, Rodolfo-Metalpa R, Munday PL, Ravasi T, Schunter C (2022) Rapid evolution fuels transcriptional plasticity to ocean acidification. Glob Change Biol 28:3007–3022 Kemp JM (2000) Zoogeography of the coral reef fishes of the north-eastern Gulf of Aden, with eight new records of coral reef fishes from Arabia. Fauna Arab 18:293–321 Keppel G, Van Niel KP, Wardell-Johnson GW, Yates CJ, Byrne M, Mucina L, Schut AGT, Hopper SD, Franklin SE (2012) Refugia: identifying and understanding safe havens for biodiversity under climate change. Glob Ecol Biogeogr 21:393–404 Kittinger JN, Teneva LT, Koike H, Stamoulis KA, Kittinger DS, Oleson KLL, Conklin E, Gomes M, Wilcox B, Friedlander AM (2015) From reef to table: social and ecological factors affecting coral reef fisheries, artisanal seafood supply chains, and seafood security. PloS One 10:e0123856 Kleypas JA, Danabasoglu G, Lough JM (2008) Potential role of the ocean thermostat in determining regional differences in coral reef bleaching events. Geophys Res Lett 35:1–6 Klimley AP (2013) The biology of sharks and rays. University of Chicago Press, Chicago Lack D (1954) The natural regulation of animal numbers. Oxford Clarendon Press, London Ladner JT, Barshis DJ, Palumbi SR (2012) Protein evolution in two co-occurring types of Symbiodinium: an exploration into the genetic basis of thermal tolerance in Symbiodinium clade D. BMC Evol Biol 12:217 86 LaJeunesse TC, Parkinson JE, Gabrielson PW, Jeong HJ, Reimer JD, Voolstra CR, Santos SR (2018) Systematic revision of Symbiodiniaceae highlights the antiquity and diversity of coral endosymbionts. Curr Biol 28:2570–2580 Leray M, Beldade R, Holbrook SJ, Schmitt RJ, Planes S, Bernardi G (2010) Allopatric divergence and speciation in coral reef fish: the three-spot dascyllus, Dascyllus trimaculatus, species complex. Evolution 64:1218–1230 Lewis CL, Coffroth MA (2004) The acquisition of exogenous algal symbionts by an octocoral after bleaching. Science 304:1490–1492 Li A, Reidenbach MA (2014) Forecasting decadal changes in sea surface temperatures and coral bleaching within a Caribbean coral reef. Coral Reefs 33:847–861 Li C, Hofreiter M, Straube N, Corrigan S, Naylor GJP (2013) Capturing protein-coding genes across highly divergent species. Biotechniques 54:321–326 Lighthill MJ (1969) Dynamic response of the Indian Ocean to onset of the Southwest Monsoon. Philos Trans R Soc Lond Ser Math Phys Sci 265:45–92 Litman GW (1996) Sharks and the origins of vertebrate immunity. Sci Am 275:67–71 Liu S, Ferchaud A-L, Grønkjær P, Nygaard R, Hansen MM (2018) Genomic parallelism and lack thereof in contrasting systems of three-spined sticklebacks. Mol Ecol 27:4725–4743 Ludt WB, Rocha LA (2015) Shifting seas: the impacts of Pleistocene sea-level fluctuations on the evolution of tropical marine taxa. J Biogeogr 42:25–38 Luer CA, Walsh CJ, Bodine AB (2004) The immune system of sharks, skates, and rays. In: Carrier J.C., Musick J.A., Heithaus M.R. (eds) Biology of sharks and their relatives. CRC Press, pp 369–395 Lyle JM, Timms GJ (1987) Predation on aquatic snakes by sharks from northern Australia. Copeia 1987:802–803 Lynam CP, Llope M, Mollmann C, Helaouet P, Bayliss-Brown GA, Stenseth NC (2017) Interaction between top-down and bottom-up control in marine food webs. Proc Natl Acad Sci 114:1952–1957 MacNeil MA, Chapman DD, Heupel M, Simpfendorfer CA, Heithaus M, Meekan M, Harvey E, Goetze J, Kiszka J, Bond ME, Currey-Randall LM, Speed CW, Sherman CS, Rees MJ, Udyawer V, Flowers KI, Clementi G, Valentin-Albanese J, Gorham T, Adam MS, Ali K, Pina-Amargós F, Angulo-Valdés JA, Asher J, Barcia LG, Beaufort O, Benjamin C, Bernard ATF, Berumen ML, Bierwagen S, Bonnema E, Bown RMK, Bradley D, Brooks E, Brown JJ, Buddo D, Burke P, Cáceres C, Cardeñosa D, Carrier JC, Caselle JE, Charloo V, Claverie T, Clua E, Cochran JEM, Cook N, Cramp J, D’Alberto B, de Graaf M, Dornhege M, Estep A, Fanovich L, Farabaugh NF, Fernando D, Flam AL, Floros C, Fourqurean V, Garla R, Gastrich K, George L, Graham R, Guttridge T, Hardenstine RS, Heck S, Henderson AC, Hertler H, Hueter R, Johnson M, Jupiter S, Kasana D, Kessel ST, Kiilu B, Kirata T, Kuguru B, Kyne F, Langlois T, Lédée EJI, Lindfield S, Luna-Acosta A, Maggs J, Manjaji- Matsumoto BM, Marshall A, Matich P, McCombs E, McLean D, Meggs L, Moore S, Mukherji S, Murray R, Kaimuddin M, Newman SJ, Nogués J, Obota C, O’Shea O, Osuka K, Papastamatiou YP, Perera N, Peterson B, Ponzo A, Prasetyo A, Quamar LMS, Quinlan J, Ruiz-Abierno A, Sala E, Samoilys M, Schärer-Umpierre M, Schlaff A, Simpson N, Smith 87 ANH, Sparks L, Tanna A, Torres R, Travers MJ, van Zinnicq Bergmann M, Vigliola L, Ward J, Watts AM, Wen C, Whitman E, Wirsing AJ, Wothke A, Zarza-Gonzâlez E, Cinner JE (2020) Global status and conservation potential of reef sharks. Nature 583:801–806 Maisano Delser P, Corrigan S, Duckett D, Suwalski A, Veuille M, Planes S, Naylor GJP, Mona S (2019) Demographic inferences after a range expansion can be biased: the test case of the blacktip reef shark (Carcharhinus melanopterus). Heredity 122:759–769 Maisano Delser P, Corrigan S, Hale M, Li C, Veuille M, Planes S, Naylor GJP, Mona S (2016) Population genomics of C. melanopterus using target gene capture data: demographic inferences and conservation perspectives. Sci Rep 6:33753 Malécot G (1955) The decrease of relationship with distance. Cold Springer Harb Symp Quant Biol 20:52–53 Marchalonis JJ, Schluter SF, Bernstein RM, Hohman VS (1998) Antibodies of sharks: revolution and evolution. Immunol Rev 166:103–122 Marra NJ, Stanhope MJ, Jue NK, Wang M, Sun Q, Pavinski Bitar P, Richards VP, Komissarov A, Rayko M, Kliver S, Stanhope BJ, Winkler C, O’Brien SJ, Antunes A, Jorgensen S, Shivji MS (2019) White shark genome reveals ancient elasmobranch adaptations associated with wound healing and the maintenance of genome stability. Proc Natl Acad Sci 116:4446–4455 McClanahan TR (1999) Is there a future for coral reef parks in poor tropical countries? Coral Reefs 18:321–325 McLean MJ, Mouillot D, Goascoz N, Schlaich I, Auber A (2019) Functional reorganization of marine fish nurseries under climate warming. Glob Change Biol 25:660–674 McLeod IM, Rummer JL, Clark TD, Jones GP, McCormick MI, Wenger AS, Munday PL (2013) Climate change and the performance of larval coral reef fishes: the interaction between temperature and food availability. Conserv Physiol 1:1–12 Mies M, Francini-Filho RB, Zilberberg C, Garrido AG, Longo GO, Laurentino E, Güth AZ, Sumida PYG, Banha TNS (2020) South Atlantic coral reefs are major global warming refugia and less susceptible to bleaching. Front Mar Sci 7:1–13 Momigliano P, Harcourt R, Stow A (2015) Conserving coral reef organisms that lack larval dispersal: are networks of Marine Protected Areas good enough? Front Mar Sci 2:1–5 Moore GI, Chaplin JA (2014) Contrasting demographic histories in a pair of allopatric, sibling species of fish (Arripidae) from environments with contrasting glacial histories. Mar Biol 161:1543–1555 Morelli TL, Barrows CW, Ramirez AR, Cartwright JM, Ackerly DD, Eaves TD, Ebersole JL, Krawchuk MA, Letcher BH, Mahalovich MF, Meigs GW, Michalak JL, Millar CI, Quiñones RM, Stralberg D, Thorne JH (2020) Climate-change refugia: biodiversity in the slow lane. Front Ecol Environ 18:228–234 Mourier J, Planes S (2013) Direct genetic evidence for reproductive philopatry and associated fine- scale migrations in female blacktip reef sharks (Carcharhinus melanopterus) in French Polynesia. Mol Ecol 22:201–214 88 Mourier J, Vercelloni J, Planes S (2012) Evidence of social communities in a spatially structured network of a free-ranging shark species. Anim Behav 83:389–401 Munday PL, Jones GP, Pratchett MS, Williams AJ (2008) Climate change and the future for coral reef fishes. Fish Fish 9:261–285 Muscatine L, Porter JW (1977) Reef corals: mutualistic symbioses adapted to nutrient-poor environments. BioScience 27:454–460 Nanninga GB, Manica A (2018) Larval swimming capacities affect genetic differentiation and range size in demersal marine fishes. Mar Ecol Prog Ser 589:1–12 Nanninga GB, Saenz‐Agudelo P, Manica A, Berumen ML (2014) Environmental gradients predict the genetic population structure of a coral reef fish in the Red Sea. Mol Ecol 23:591–602 Nogués-Bravo D, Rodríguez-Sánchez F, Orsini L, de Boer E, Jansson H, Morlon H, Fordham DA, Jackson ST (2018) Cracking the code of biodiversity responses to past climate change. Trends Ecol Evol 33:765–776 Paddack MJ, Reynolds JD, Aguilar C, Appeldoorn RS, Beets J, Burkett EW, Chittaro PM, Clarke K, Esteves R, Fonseca AC, Forrester GE, Friedlander AM, García-Sais J, González-Sansón G, Jordan LKB, McClellan DB, Miller MW, Molloy PP, Mumby PJ, Nagelkerken I, Nemeth M, Navas-Camacho R, Pitt J, Polunin NVC, Reyes-Nivia MC, Robertson DR, Rodríguez- Ramírez A, Salas E, Smith SR, Spieler RE, Steele MA, Williams ID, Wormald CL, Watkinson AR, Côté IM (2009) Recent region-wide declines in Caribbean reef fish abundance. Curr Biol 19:590–595 Pandolfi JM, Bradbury RH, Sala E, Hughes TP, Bjorndal KA, Cooke RG, McArdle D, McClenachan L, Newman MJH, Paredes G, Warner RR, Jackson JBC (2003) Global trajectories of the long-term decline of coral reef ecosystems. Science 301:955–958 Papastamatiou Y p., Caselle J e., Friedlander A m., Lowe C g. (2009a) Distribution, size frequency, and sex ratios of blacktip reef sharks Carcharhinus melanopterus at Palmyra Atoll: a predator-dominated ecosystem. J Fish Biol 75:647–654 Papastamatiou YP, Friedlander AM, Caselle JE, Lowe CG (2010) Long-term movement patterns and trophic ecology of blacktip reef sharks (Carcharhinus melanopterus) at Palmyra Atoll. J Exp Mar Biol Ecol 386:94–102 Papastamatiou YP, Lowe CG, Caselle JE, Friedlander AM (2009b) Scale-dependent effects of habitat on movements and path structure of reef sharks at a predator-dominated atoll. Ecology 90:996–1008 Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PloS One 7:e37135 Petkova D, Novembre J, Stephens M (2016) Visualizing spatial population structure with estimated effective migration surfaces. Nat Genet 48:94–100 Piontkovski SA, Chiffings T (2014) Long-term changes of temperature in the Sea of Oman and the Western Arabian Sea. Int J Oceans Oceanogr 8:53–72 89 Planes S, Fauvelot C (2002) Isolation by distance and vicariance drive genetic structure of a coral reef fish in the Pacific Ocean. Evolution 56:378–399 Pratchett MS, Coker DJ, Jones GP, Munday PL (2012) Specialization in habitat use by coral reef damselfishes and their susceptibility to habitat loss. Ecol Evol 2:2168–2180 Pratchett MS, Munday PL, Wilson SK, Graham NAJ, Cinner JE, Bellwood DR, Jones GP, Polunin NVC, McClanahan TR (2008) Effects of climate-induced coral bleaching on coral-reef fishes - ecological and economic consequences. In: Gibson R.N., Atkinson R.J.A., Gordon J.D.M. (eds) Oceanography and marine biology: an annual review. CRC Press, pp 251–296 Promislow DEL, Harvey PH (1990) Living fast and dying young: A comparative analysis of life- history variation among mammals. J Zool 220:417–437 Provan J, Bennett KD (2008) Phylogeographic insights into cryptic glacial refugia. Trends Ecol Evol 23:564–571 Pudlo P, Marin J-M, Estoup A, Cornuet J-M, Gautier M, Robert CP (2016) Reliable ABC model choice via random forests. Bioinforma Oxf Engl 32:859–866 Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ, Sham PC (2007) PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81:559–575 Raitsos DE, Pradhan Y, Brewin RJW, Stenchikov G, Hoteit I (2013) Remote sensing the phytoplankton seasonal succession of the Red Sea. PloS One 8:e64909 Randall HA, Allen GR (1977) A revision of the damselfish genus Dascyllus (Pomacentridae) with the description of a new species. Rec Aust Mus 31:349–385 Randall JE (1988) Pomacanthus rhomboides (Gilchrist and Thompson), the valid name for the South African angelfish previously known as Pomacanthus striatus. J.L.B. Smith Institute of Ichthyology, Grahamstown Randall JE (1992) Review of the biology of the tiger shark (Galeocerdo cuvier). Mar Freshw Res 43:21–31 Randall JE (1998) Zoogeography of shore fishes of the Indo-Pacific region. Zool Stud 37:227–268 Raynal L, Marin J-M, Pudlo P, Ribatet M, Robert CP, Estoup A (2019) ABC random forests for Bayesian parameter inference. Bioinformatics 35:1720–1728 Reynolds JD, Mace GM, Redford KH, Robinson JG (2001) Conservation of exploited species. Cambridge University Press, Cambridge Rhyne AL, Tlusty MF, Schofield PJ, Kaufman L, Jr JAM, Bruckner AW (2012) Revealing the appetite of the marine aquarium fish trade: the volume and biodiversity of fish imported into the United States. PloS One 7:e35808 Rizzari JR, Frisch AJ, Magnenat KA (2014) Diversity, abundance, and distribution of reef sharks on outer-shelf reefs of the Great Barrier Reef, Australia. Mar Biol 161:2847–2855 Robbins WD, Hisano M, Connolly SR, Choat JH (2006) Ongoing collapse of coral reef shark populations. Curr Biol 16:2314–2319 90 Roberts CM, Shepherd ARD, Ormond RFG (1992) Large-scale variation in assemblage structure of Red Sea butterflyfishes and angelfishes. J Biogeogr 19:239–250 Roff G, Doropoulos C, Rogers A, Bozec Y-M, Krueck NC, Aurellado E, Priest M, Birrell C, Mumby PJ (2016) The ecological role of sharks on coral reefs. Trends Ecol Evol 31:395– 407 Sadowski JS, Gonzalez JA, Lonhart SI, Jeppesen R, Grimes TM, Grosholz ED (2018) Temperature- induced range expansion of a subtropical crab along the California coast. Mar Ecol 39:1–7 Salas EM, Bernardi G, Berumen ML, Gaither MR, Rocha LA (2019) RADseq analyses reveal concordant Indian Ocean biogeographic and phylogeographic boundaries in the reef fish Dascyllus trimaculatus. R Soc Open Sci 6:172413 Salini JP, Blaber SJM, Brewer DT (1992) Diets of sharks from estuaries and adjacent waters of the north-eastern Gulf of Carpentaria, Australia. Mar Freshw Res 43:87–96 Schott FA, McCreary JP (2001) The monsoon circulation of the Indian Ocean. Prog Oceanogr 51:1– 123 Sequeira AMM, Mellin C, Meekan MG, Sims DW, Bradshaw CJA (2013) Inferred global connectivity of whale shark Rhincodon typus populations. J Fish Biol 82:367–389 Sheppard C, Price A, Roberts C (1992) Marine ecology of the Arabian region: patterns and processes in extreme tropical environments. Academic Press, London Siddall M, Rohling EJ, Almogi-Labin A, Hemleben C, Meischner D, Schmelzer I, Smeed DA (2003) Sea-level fluctuations during the last glacial cycle. Nature 423:853–858 Smeed D (1997) Seasonal variation of the flow in the strait of Bah al Mandab. Ocean Acta 20:773– 781 Smith SE, Au DW, Show C (1998) Intrinsic rebound potentials of 26 species of Pacific sharks. Mar Freshw Res 49:663–678 Spalding M, Burke L, Wood SA, Ashpole J, Hutchison J, zu Ermgassen P (2017) Mapping the global value and distribution of coral reef tourism. Mar Policy 82:104–113 Speed CW, Meekan MG, Field IC, McMahon CR, Harcourt RG, Stevens JD, Babcock RC, Pillans RD, Bradshaw CJA (2016) Reef shark movements relative to a coastal marine protected area. Reg Stud Mar Sci 3:58–66 Spurgeon JPG (1992) The economic valuation of coral reefs. Mar Pollut Bull 24:529–536 Stephens PA, Sutherland WJ, Freckleton RP (1999) What Is the Allee Effect? Oikos 87:185–190 Stevens JD (1984) Life-history and ecology of sharks at Aldabra Atoll, Indian Ocean. Proc R Soc Lond B Biol Sci 222:79–106 Taberlet P, Cheddadi R (2002) Quaternary refugia and persistence of biodiversity. Science 297:2009–2010 Talbot FH, Russell BC, Anderson GRV (1978) Coral reef fish communities: unstable, high‐diversity systems? Ecol Monogr 48:425–440 91 Teh LSL, Teh LCL, Sumaila UR (2013) A global estimate of the number of coral reef fishers. PloS One 8:e65397 Torquato F, Range P, Ben‐Hamadou R, Sigsgaard EE, Thomsen PF, Riera R, Berumen ML, Burt JA, Feary DA, Marshell A, D’Agostino D, DiBattista JD, Møller PR (2019) Consequences of marine barriers for genetic diversity of the coral-specialist yellowbar angelfish from the northwestern Indian Ocean. Ecol Evol 9:11215–11226 Trommer G, Siccha M, van der Meer MTJ, Schouten S, Sinninghe Damsté JS, Schulz H, Hemleben C, Kucera M (2009) Distribution of Crenarchaeota tetraether membrane lipids in surface sediments from the Red Sea. Org Geochem 40:724–731 Vanderklift MA, Babcock RC, Boschetti F, Haywood MDE, Pillans RD, Thomson DP (2019) Declining abundance of coral reef fish in a World-Heritage-listed marine park. Sci Rep 9:1– 8 Webster PJ, Holland GJ, Curry JA, Chang H-R (2005) Changes in tropical cyclone number, duration, and intensity in a warming environment. Science 309:1844–1846 Weng KC, Foley DG, Ganong JE, Perle C, Shillinger GL, Block BA (2008) Migration of an upper trophic level predator, the salmon shark Lamna ditropis, between distant ecoregions. Mar Ecol Prog Ser 372:253–264 White AT, Vogt HP, Arin T (2000) Philippine coral reefs under threat: the economic losses caused by reef destruction. Mar Pollut Bull 40:598–605 Wilson SK, Graham NAJ, Pratchett MS, Jones GP, Polunin NVC (2006) Multiple disturbances and the global degradation of coral reefs: are reef fishes at risk or resilient? Glob Change Biol 12:2220–2234 Wourms JP, Lombardi J (1992) Reflections on the evolution of piscine viviparity. Am Zool 32:276– 293 Wright S (1943) Isolation by Distance. Genetics 28:114–138 92 APPENDIX A SUPPLEMENTARY INFORMATION CHAPTER 1 Single nucleotide polymorphism (SNP) library preparation and sequencing (Dascyllus abudafur only. Work completed by Dr. Vanessa Robitzch (Universidad Austral de Chile, Chile)) Genomic DNA was extracted from fin or gill tissue preserved in 96 % ethanol using a Nucleospin-96 Tissue kit (Macherey-Nagel, Germany). Double digest restriction associated DNA (ddRAD) libraries were prepared using 500 ng of high quality DNA per sample following the protocol described by Peterson et al., (2012) with some modifications. In brief, the genomic DNA was digested at 37 °C for three hours using the restriction enzymes SphI and MluCI (NEB) followed by a ligation step, where each sample was assigned to one of 16 unique adaptors. Pools of 16 individually labelled samples were combined and run on a 1.5 % agarose gel, from which fragments of approx. 400 base-pairs (bp) were manually excised and purified using a Zymoclean Gel DNA recovery kit. Each pool was then amplified adding a unique indexing primer for each pool according to the standard Illumina multiplexed sequencing protocol in a 50 μl PCR reaction containing 25 μl Kapa Hifi Hotstart Ready Mix Taq, 20 μl of pooled library DNA, 2.5 μl of the universal Illumina PCR primer, and an additional 2.5 μl of one of 12 unique indexing primers for each pool. Amplifications were carried out in an Eppendorf 94-well vapo.protect Mastercycler Pro System (Fisher Scientific) thermocycler using the following protocol: initial denaturation step at 95 °C for 3 min, followed by ten amplification cycles of 98 °C for 20 sec, 60 °C for 30 sec, and 72 °C for 30 sec, and a final step at 72 °C for 5 min. DNA libraries were then quantified using the high sensitivity DNA analysis kit in a 2100 Bioanalyzer (Agilent Technologies) and by running a qPCR in an ABI 7900HT fast real-time PCR system (Thermo Fisher Scientific) using the KAPA Library Quantification Kits (Kapa Biosystems). One last time, the length (in bp) and quality of the library fragments was measured in the 2200 TapeStation (Agilent Technologies) using the High Sensitivity D1000 ScreenTape kit. Pools were subsequently combined in equimolar concentration to form a single genomic library. In total, two libraries were created and run on two separate lanes of a HiSeq 2000 Illumina sequencer (each library and lane contained maximum 80 individuals; single end reads, 1 x 101 bp; v3 reagents). 93 94 95 96 Figure S1. Scatterplots for the cross validation of the estimated vs simulated median and mode for population parameters of a demographic model (Table 2 and Figure 2 in main text) representing the Red Sea population of Dascyllus abudafur since before the Last Glacial Maximum. 97 Table S1. Error values for the cross validation of estimated vs simulated data for population parameters of a demographic model (Table 2 and Figure 2 in main text) representing the Red Sea population of Dascyllus abudafur since before the Last Glacial Maximum. Error values include: ‘95% coverage’ – the normalised percentage of times (for 1,000 power analysis targets) that the simulated input is within 95% confidence interval of the posterior estimate; ‘R2’ – the proportion of the variation in the raw estimated values that can be explained by the raw simulated values, comparable across the different parameters. Error value Population parameters tanc Nanc tbott Nbott resize trec tleng gr mig Npop0 Npop1 95% coverage 0.994 0.996 0.999 1.000 0.999 0.999 0.998 0.998 0.998 0.997 0.998 R2 median 0.999 0.999 0.999 0.999 0.994 1.000 0.999 0.998 1.000 1.000 1.000 R2 mode 0.999 0.998 0.999 0.992 0.994 0.999 0.998 1.000 0.998 0.995 0.990 R2 mean 0.666 0.718 0.638 0.939 0.772 0.610 0.899 0.813 0.968 0.981 0.970 98 99 100 101 Figure S2. Scatterplots for the cross validation of the estimated vs simulated median and mode for population parameters of a demographic model (Table 2 and Figure 2 in main text) representing the Red Sea population of Dascyllus trimaculatus since before the Last Glacial Maximum. 102 Table S2. Error values for the cross validation of estimated vs simulated data for population parameters of a demographic model (Table 2 and Figure 2 in main text) representing the Red Sea population of Dascyllus trimaculatus since before the Last Glacial Maximum. Error values include: ‘95% coverage’ – the normalised percentage of times (for 1,000 power analysis targets) that the simulated input is within 95% confidence interval of the posterior estimate; ‘R2’ – the proportion of the variation in the raw estimated values that can be explained by the raw simulated values, comparable across the different parameters. Error value Population parameters tanc Nanc tbott Nbott resize trec tleng gr mig Npop0 Npop1 95% coverage 0.997 0.996 0.999 0.999 0.996 0.999 0.999 0.998 0.998 0.999 0.997 R2 median 0.999 0.999 0.999 0.998 0.993 0.999 0.999 0.997 1.000 1.000 1.000 R2 mode 0.999 0.997 0.999 0.992 0.993 0.999 0.998 1.000 0.998 0.995 0.989 R2 mean 0.673 0.707 0.623 0.939 0.753 0.588 0.886 0.824 0.941 0.978 0.929 103 104 105 106 Figure S3. Scatterplots for the cross validation of the estimated vs simulated median and mode for population parameters of a demographic model (Table 2 and Figure 2 in main text) representing the Red Sea population of Pomacanthus maculosus since before the Last Glacial Maximum. 107 Table S3. Error values for the cross validation of estimated vs simulated data for population parameters of a demographic model (Table 2 and Figure 2 in main text) representing the Red Sea population of Pomacanthus maculosus since before the Last Glacial Maximum. Error values include: ‘95% coverage’ – the normalised percentage of times (for 1,000 power analysis targets) that the simulated input is within 95% confidence interval of the posterior estimate; ‘R2’ – the proportion of the variation in the raw estimated values that can be explained by the raw simulated values, comparable across the different parameters. Error value Population parameters tanc Nanc tbott Nbott resize trec tleng gr mig Npop0 Npop1 95% coverage 0.997 0.995 0.999 0.996 0.994 0.999 0.997 0.996 0.995 0.998 0.998 R2 median 0.999 0.999 0.999 0.998 0.993 0.999 0.998 0.998 1.000 1.000 1.000 R2 mode 0.999 0.997 0.999 0.993 0.994 0.999 0.998 1.000 0.998 0.996 0.988 R2 mean 0.707 0.708 0.646 0.923 0.754 0.603 0.898 0.880 0.960 0.983 0.940 De-novo loci assembly Sequences were demultiplexed and filtered for quality using the 'process_radtags.pl' pipeline in STACKS v 2.53 (Catchen et al. 2011). Individual reads with uncalled bases (- c) and with low quality scores (- q) were discarded. Reads with phred-scores below an average of 10 (within a sliding window 0.15x the read length) were discarded by default. RAD-tags and barcodes were rescued (- r). After demultiplexing, at least 300,000 reads were recovered for all individuals. In the absence of any reference genomes, loci were assembled de-novo using the 'denovo_map.pl' pipeline (STACKS v 2.53), run manually up to the creation of the bam files. Options for building stacks were as described in Salas et al. (2019) where the maximum number of mismatches allowed between stacks within individuals was set to three (ustacks: - M 3), with default values for minimum depth of coverage to create a stack and maximum number of mismatches allowed to align secondary reads to primary stacks (ustacks: - m 3 and – N: M + 2 = 5, respectively). The maximum number of mismatches allowed between stacks between individuals was set to two (cstacks: - n 2). 108 Loci filtering Resulting bam files were first filtered for a minimum coverage of eight using samtools v 1.9 (Danecek et al. 2021), before continuing with the ‘gstacks’ and ‘populations’ components of the ‘denovo_map.pl’ pipeline. Following genotype calling, the ‘populations’ component was used to select loci present in both populations (Red Sea and Indian Ocean) and in at least 85% of individuals in each population (p 2 r 0.85). Total number of loci retained ranged from ~4,600 - ~41,600 across the three species, with a length of 90 - 95 base pairs. Genotypes in the exported SNPs-only vcf file (--vcf) were extracted to explore diversity in the form of between-population π, assuming alleles as haploid individuals to match the haploid simulations for the demographic models described below. The total number of sites varied slightly for each pairwise calculation between samples, as the percentage of sites with missing values was calculated and those sites removed. The same percentage of missing data was assumed and discarded for the monomorphic sites not included in the vcf (total sites minus SNPs). Generating model simulations We used fastsimcoal2 (fsc25 v 2.5.2.21, Excoffier et al. 2013) to simulate the demographic model, using appropriate sample sizes (2n - haploid individuals) and number and size of loci (loci length was rounded to nearest multiple of five) based on the observed data. For each run, the scenario was defined in a template file (-t .tpl, Figure S4) and the parameter values found in a definition file (-f .def), with one simulation run for each set of predefined parameter values (-n 1). All SNPs in a sequence were output (-s 0) and mutations generated according to the infinite site mutation model (-I). The fastsimcoal2 output included most of the associated summary statistics - a 2D minor allele SFS (-m) in .dadi format (-D). Additionally, between-population π was generated using Arlequin (arlsumstat v 3.5.2.2, Excoffier and Lischer 2010) for each simulation, and appended, representing a summary of the 2D SFS. Example fastsimcoal2 run command: ./fsc25221 -t input_autosome.tpl -n1 -f input_autosome.def -s0 -I -m -D > fsc_output 109 //Parameters for the coalescence simulation program : simcoal.exe 2 samples to simulate : Exponential growth //Population effective sizes (number of genes 2*diploids) Npop0 Npop1 //Samples sizes (number of genes 2*diploids) 30 30 //Growth rates : negative growth implies population expansion 0 0 //Number of migration matrices : 0 implies no migration between demes 2 //migration matrix 0 0 mig mig 0 //migration matrix 1 0 0 0 0 //historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix index 4 events tstop 1 1 1 1 gr 0 trec 1 1 1 1 0 1 tbott 1 1 1 resize 0 0 tanc 1 0 1 1 0 1 //Number of independent loci [chromosome] 9375 0 //Per chromosome: Number of linkage blocks 1 //per Block: data type, num loci, rec. rate and mut rate + optional parameters DNA 95 0 0.00000001 Figure S4. Example of a .tpl input file required by fastsimcoal2 to describe the model. The population parameters in blue would be defined in the required .def input file (also defined alongside their prior ranges in the main text). This specific example describes the sample sizes, number of loci, length of loci, recombination rate, and mutation rate specific to the Dascyllus abudafur dataset used in this chapter. 110 Posterior parameters Table S4. Results of population parameter estimation for a demographic model representing the Red Sea populations for three cosmopolitan reef fish species: Dascyllus abudafur, Dascyllus trimaculatus, and Pomacanthus maculosus, since before the Last Glacial Maximum (LGM), using Approximate Bayesian Computation random forest methods with 1,000 trees and 10,000 simulations. Population parameters and their prior ranges are defined (as in Table 1 in the main text). The median is highlighted as the descriptor with the lowest error according to the power analysis, and is referred to throughout the study. It should be noted that the ‘resize’ parameter is calculated for each simulation, rather than from estimates from within the distributions. The value of ‘gr’ – the growth rate of the recovering population, appears negative as it was modelled back in time from the present day. 111 References Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH (2011) Stacks: building and genotyping loci de novo from short-read sequences. G3 Genes Genomes Genet 1:171–182 Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H (2021) Twelve years of SAMtools and BCFtools. GigaScience 10:giab008 Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M (2013) Robust demographic inference from genomic and SNP data. PloS Genet 9:e1003905 Excoffier L, Lischer HEL (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10:564–567 Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PloS One 7:e37135 Salas EM, Bernardi G, Berumen ML, Gaither MR, Rocha LA (2019) RADseq analyses reveal concordant Indian Ocean biogeographic and phylogeographic boundaries in the reef fish Dascyllus trimaculatus. R Soc Open Sci 6:172413 112 APPENDIX B SUPPLEMENTARY INFORMATION CHAPTER 2 Single nucleotide polymorphism (SNP) library preparation and sequencing (work completed by Dr. Vanessa Robitzch (Universidad Austral de Chile, Chile)) Genomic DNA was extracted from fin or gill tissue preserved in 96 % ethanol using a Nucleospin-96 Tissue kit (Macherey-Nagel, Germany). Double digest restriction associated DNA (ddRAD) libraries were prepared using 500 ng of high quality DNA per sample following the protocol described by Peterson et al (2012), with some modifications. In brief, the genomic DNA was digested at 37 °C for three hours using the restriction enzymes SphI and MluCI (NEB) followed by a ligation step, where each sample was assigned to one of 16 unique adaptors. Pools of 16 individually labelled samples were combined and run on a 1.5 % agarose gel, from which fragments of approx. 400 base-pairs (bp) were manually excised and purified using a Zymoclean Gel DNA recovery kit. Each pool was then amplified adding a unique indexing primer for each pool according to the standard Illumina multiplexed sequencing protocol in a 50 μl PCR reaction containing 25 μl Kapa Hifi Hotstart Ready Mix Taq, 20 μl of pooled library DNA, 2.5 μl of the universal Illumina PCR primer, and an additional 2.5 μl of one of 12 unique indexing primers for each pool. Amplifications were carried out in an Eppendorf 94-well vapo.protect Mastercycler Pro System (Fisher Scientific) thermocycler using the following protocol: initial denaturation step at 95 °C for 3 min, followed by ten amplification cycles of 98 °C for 20 sec, 60 °C for 30 sec, and 72 °C for 30 sec, and a final step at 72 °C for 5 min. DNA libraries were then quantified using the high sensitivity DNA analysis kit in a 2100 Bioanalyzer (Agilent Technologies) and by running a qPCR in an ABI 7900HT fast real-time PCR system (Thermo Fisher Scientific) using the KAPA Library Quantification Kits (Kapa Biosystems). One last time, the length (in bp) and quality of the library fragments was measured in the 2200 TapeStation (Agilent Technologies) using the High Sensitivity D1000 ScreenTape kit. Pools were subsequently combined in equimolar concentration to form a single genomic library. In total, two libraries were created and run on two separate lanes of a HiSeq 2000 Illumina sequencer (each library and lane contained maximum 80 individuals; single end reads, 1 x 101 bp; v3 reagents). 113 114 115 116 117 118 119 Figure S1. Scatterplots for the cross validation of the estimated vs simulated median and mode for population parameters for the “North” model (Table 2 and Figure 3Bi in main text), a demographic model representing the most likely demographic history of the Red Sea population of Dascyllus marginatus since before the Last Glacial Maximum, suggesting a refugium in the north of the Red Sea. Table S1. Error values for the cross validation of estimated vs simulated data for population parameters for the “North” model (Table 2 and Figure 3Bi in main text), a demographic model representing the most likely demographic history of the Red Sea population of Dascyllus marginatus since before the Last Glacial Maximum, suggesting a refugium in the north of the Red Sea. Error values include: ‘95% coverage’ – the normalised percentage of times (for 1,000 power analysis targets) that the simulated input is within 95% confidence interval of the posterior estimate; ‘R2’ – the proportion of the variation in the raw estimated values that can be explained by the raw simulated values, comparable across the different parameters. Error value Population parameters Nanc tanc Nsplit0 resize0 Nsplit1 resize1 Npre Nbott tbott resizebott trec 95% coverage 0.996 0.995 0.999 0.998 0.994 0.995 0.994 1.000 0.999 0.997 1.000 R2 median 0.999 0.999 0.998 0.999 0.993 0.999 0.998 1.000 0.999 0.992 1.000 R2 mode 0.996 0.999 0.999 0.996 0.999 0.997 0.997 0.995 0.999 0.996 0.999 R2 mean 0.775 0.636 0.816 0.851 0.633 0.678 0.745 0.969 0.647 0.787 0.662 tleng gr mig tstop Npop0 Npop1 Npop2 95% coverage 0.999 0.997 0.997 0.999 0.999 0.999 0.998 R2 median 0.999 0.978 1.000 0.999 1.000 1.000 0.999 R2 mode 0.998 0.984 0.998 0.998 0.996 0.992 0.993 R2 mean 0.910 0.621 0.973 0.921 0.974 0.970 0.972 120 De-novo loci assembly Sequences were demultiplexed and filtered for quality using the 'process_radtags.pl' pipeline in STACKS v.2.53 (Catchen et al. 2011). Individual reads with uncalled bases (- c) and with low quality scores (- q) were discarded. Reads with phred-scores below an average of 10 (within a sliding window 0.15x the read length) were discarded by default. RAD-tags and barcodes were rescued (- r). After demultiplexing, at least 500,000 reads were recovered for all individuals. In the absence of any reference genomes, loci were assembled de-novo using the 'denovo_map.pl' pipeline (STACKS v.2.53), run up to the creation of the bam files. Options for building stacks were as described in Salas et al. (2019) where the maximum number of mismatches allowed between stacks within individuals was set to three (ustacks: - M 3), with default values for minimum depth of coverage to create a stack and maximum number of mismatches allowed to align secondary reads to primary stacks (ustacks: - m 3 and – N: M + 2 = 5, respectively). The maximum number of mismatches allowed between stacks between individuals was set to two (cstacks: - n 2). Loci filtering Resulting bam files were first filtered for a minimum coverage of eight using samtools v 1.9 (Danecek et al. 2021), before continuing with the ‘gstacks’ and ‘populations’ components of the ‘denovo_map.pl’ pipeline. Following genotype calling, the ‘populations’ component was used to select loci present in two populations (Red Sea and outside) for the time of split model or three populations (North/Central Red Sea, South Red Sea, and Gulf of Aden) for the location of refugia model, and in at least 90% of individuals in each population (p 2 or p 3, r 0.9). Total number of loci retained ranged from ~2,500 - ~4,000 between the two datasets, with a length of 95 base pairs. Genotypes in the exported SNPs-only vcf file (--vcf) were extracted to explore diversity in the form of between-population π, assuming alleles as haploid individuals to match the haploid simulations for the demographic models described below. The total number of sites varied slightly for each pairwise calculation between samples, as the percentage of sites with missing values was calculated and those sites removed. The same percentage of missing data was assumed and discarded for the monomorphic sites not included in the vcf (total sites minus SNPs). 121 Power analysis – model selection Table S2. Model assignation estimates between three demographic models (Figure 3 in main text) representing likely demographic histories of Red Sea populations of Dascyllus marginatus since before the Last Glacial Maximum. One thousand targets were randomly selected from 10,000 simulations generated with the “North” model, to be used as pseudo-observed data (pods) and tested with the model selection analysis. The bottom row dictates how many of the 1,000 pods were successfully assigned to the “North” model that originally generated them, at varying confidence thresholds (top row) within 1,000 trees. Model: “North” “North/Gulf” “North/South” % Confidence threshold: 95 90 80 70 60 50 95 90 80 70 60 50 95 90 80 70 60 50 No. of pods / 1000 158 347 830 1,000 1,000 1,000 0 0 0 0 0 0 0 0 0 0 0 0 Generating model simulations We used fastsimcoal2 (fsc25 v 2.5.2.21, Excoffier et al. 2013) to simulate the demographic models, using appropriate sample sizes of 30 for each population (2n due to simulating haploid individuals), loci lengths of 95, and between ~2,500 - ~4,000 loci based on the observed data (loci length was rounded to nearest multiple of five). For each run, the scenario was defined in a template file (-t .tpl, Figure S2) and the parameter values found in a definition file (-f .def), with one simulation run for each set of predefined parameter values (-n 1). All SNPs in a sequence were output (-s 0) and mutations generated according to the infinite site mutation model (-I). The fastsimcoal2 output included most of the associated summary statistics - 2D minor allele SFS (-m) in .dadi format (-D). For the models with three current populations, the three minor allele 2D SFS were appended one after the other. Additionally, between-population π (one pairing for two current populations, or three pairings for three current populations) was generated using Arlequin (arlsumstat v 3.5.2.2, Excoffier and Lischer 2010) for each simulation, and appended, representing a summary of the 2D SFS. Example fastsimcoal2 run command: ./fsc25221 -t input_autosome.tpl -n1 -f input_autosome.def -s0 -I -m -D > fsc_output 122 //Parameters for the coalescence simulation program : simcoal.exe 3 samples to simulate : Exponential growth //Population effective sizes (number of genes 2*diploids) Npop0 Npop1 Npop2 //Samples sizes (number of genes 2*diploids) 30 30 30 //Growth rates : negative growth implies population expansion 0 0 0 //Number of migration matrices : 0 implies no migration between demes 3 //migration matrix 0 0 0 mig 0 0 mig mig mig 0 //migration matrix 1 0 mig 0 mig 0 0 0 0 0 //migration matrix 2 0 0 0 0 0 0 0 0 0 //historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix index 7 historical event tstop 2 1 1 1 gr 1 trec 1 1 1 1 0 2 tbott 1 1 1 resizebott 0 1 tnew 1 1 1 resize1 0 1 tnew 0 0 1 resize0 0 1 tanc 0 1 1 1 0 2 texp 1 1 1 resizeexp 0 2 //Number of independent loci [chromosome] 2530 0 //Per chromosome: Number of linkage blocks 1 //per Block: data type, num loci, rec. rate and mut rate + optional parameters DNA 95 0 0.00000001 Figure S2. Example of a .tpl input file required by fastsimcoal2 to describe the model. The population parameters in blue would be defined in the required .def input file (also defined alongside their prior ranges in the main text. Note that tnew (tanc-1) is not presented in the main text but is required to set up the model). This specific example describes the sample sizes, number of loci, length of loci, recombination rate, and mutation rate specific to the “location of refugia” dataset for Dascyllus marginatus used in this chapter. 123 Posterior parameters Table S3. Results of population parameter estimation for the selected demographic model most likely representing the Red Sea population of Dascyllus marginatus since before the Last Glacial Maximum (LGM), using Approximate Bayesian Computation random forest methods with 1,000 trees and 10,000 simulations. Population parameters and their prior ranges are defined (as in Table 1 in the main text). The median is highlighted as the descriptor with the lowest error according to the power analysis, and is referred to throughout the study. It should be noted that ‘resize’ parameters are estimated based on each simulation, rather than on an estimation from another distribution. Growth rates ‘gr’ appear negative as they were simulated back in time from the present day. Parameter name Definition Prior range Posterior parameter estimation Expectation (mean) Median 0.025 quantile 0.975 quantile Mode “Time of split” model tanc_A Number of years since splitting of Red Sea and outside populations 11.5kya – 100kya 80,779 86,425 25,713 99,673 89,431 “location of refugia” models Nanc Effective population size prior to splitting of Red Sea and Gulf of Aden populations ~50k – ~800k 303,528 253,400 129,565 755,789 245,466 tanc Number of years since colonisation of Red Sea 40kya – 200kya 131,141 135,533 46,214 197,597 164,921 Nsplit0 Effective population size in Gulf of Aden after first population split ~50k – Npop0 135,519 122,158 53,228 300,409 142,146 resize0* Ratio of effective population size change in Gulf of Aden after initial epansion Nsplit0/Npop0 0.5 0.5 0.2 0.9 0.5 Nsplit1 Effective population size in Red Sea after first population split ~50k – Npre 117,066 107,982 52,593 243,164 75,270 resize1* Ratio of effective population size change in Red Sea after initial expansion Nsplit1/Npre 0.4 0.3 0.2 0.8 0.3 Npre Effective population size in Red Sea after initial expansion ~50k - ~316k 270,505 275,020 175,711 315,153 275,736 Nbott Effective population size of bottleneck in North Red Sea 10k – Npre 236,092 236,138 155,108 310,161 234,844 tbott Number of years since a genetic bottleneck at LGM 24kya – 36kya 30,180 30,034 24,334 35,776 27,875 124 resizebott* Ratio of effective population size change in North Red Sea at bottleneck Npre / Nbott 1.5 1.3 1.0 3.1 1.3 trec Number of years since start of population recovery 11.5kya – 15kya 13,211 13,142 11,632 14,906 12,931 tleng Number of years of population recovery trec - tstop 4,990 5,222 617 9,201 5,753 gr* Growth rate during population recovery in North Red Sea (log(Nbott / Npop1))/tleng -0.4x10 -4 -0.3x10-4 -0.2x10-5 -0.1x10-3 -0.3x10-4 mig0/1 Proportion of individuals per year migrating between 1) Gulf of Aden and Red Sea or later 0) Gulf of Aden and South Red Sea, and South Red Sea and North Red Sea 0.1x10-6 – 0.35x10-3 0.04x10 -4 0.03x10-4 0.07x10-5 0.01x10-3 0.03x10-4 tstop Number of years since reaching current effective population size 5kya – (trec-1) 8,602 8,614 5,258 13,064 8,318 Npop0 Current effective population size in Gulf of Aden ~50k - ~316k 272,614 279,621 192,457 315,552 276,551 Npop1 Current effective population size in North Red Sea Nbott - ~316k 242,225 248,283 143,748 312,749 243,489 Npop2 Current effective population size in South Red Sea, split from the North Red Sea after the population recovered Nbott – Npop1 242,349 246,703 163,799 310,650 242,259 125 Model selection Table S4 Results of a demographic model selection using Approximate Bayesian Computation random forests methods with 1,000 trees and 10,000 simulations. The models represent three likely demographic histories for the Red Sea population of Dascyllus marginatus assuming persistence in the Red Sea and a genetic bottleneck during the Last Glacial Maximum, with alternative theories for the location of refugial populations: i) a single population in the north, ii) a single population in the north and one outside the Red Sea in the Gulf of Aden, and iii) two populations in the Red Sea - one in the north and one in the south. Post.proba represents an approximation of the posterior probability of the selected model after first predicting the model that best fits the data. No. ind. in each pop (North Red Sea, South Red Sea, Gulf of Aden) No. loci Length loci Number of trees (/1,000) assigning observed data to: Post.proba i) “North” model ii) “North/Gulf” model iii) “North/South” model 15 2,530 95 507 170 323 0.640 References Catchen JM, Amores A, Hohenlohe P, Cresko W, Postlethwait JH (2011) Stacks: building and genotyping loci de novo from short-read sequences. G3 Genes Genomes Genet 1:171–182 Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H (2021) Twelve years of SAMtools and BCFtools. GigaScience 10:giab008 Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M (2013) Robust demographic inference from genomic and SNP data. PloS Genet 9:e1003905 Excoffier L, Lischer HEL (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10:564–567 Peterson BK, Weber JN, Kay EH, Fisher HS, Hoekstra HE (2012) Double digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PloS One 7:e37135 Salas EM, Bernardi G, Berumen ML, Gaither MR, Rocha LA (2019) RADseq analyses reveal concordant Indian Ocean biogeographic and phylogeographic boundaries in the reef fish Dascyllus trimaculatus. R Soc Open Sci 6:172413 126 APPENDIX C SUPPLEMENTARY INFORMATION CHAPTER 3 127 128 129 Figure S1. Scatterplots for the cross validation of the estimated vs simulated median and mode for population parameters for Model “Bottleneck” (Table 2 and Figure 2 in main text), a demographic model representing the most likely demographic history of the Red Sea population of Carcharhinus melanopterus since before the Last Glacial Maximum (LGM): persistence in the Red Sea, a genetic bottleneck during the LGM, and the following population recovery. 130 Table S1. Error values for the cross validation of estimated vs simulated data for population parameters for Model “Bottleneck” (Table 2 and Figure 2 in main text), a demographic model representing the most likely demographic history of the Red Sea population of Carcharhinus melanopterus since before the Last Glacial Maximum (LGM): persistence in the Red Sea, a genetic bottleneck during the LGM, and the following population recovery. Error values include: ‘95% coverage’ – the normalised percentage of times (for 1,000 power analysis targets) that the simulated input is within 95% confidence interval of the posterior estimate; ‘R2’ – the proportion of the variation in the raw estimated values that can be explained by the raw simulated values, comparable across the different parameters. Error value Population parameters tanc Nanc tbott Nbott resize trec tleng gr Npop0 Npop1 mig 95% coverage 0.995 0.998 1.000 0.999 0.996 1.000 1.000 1.000 0.997 1.000 0.998 R2 median 0.999 0.999 1.000 0.999 0.993 0.999 0.999 0.986 1.000 1.000 1.000 R2 mode 0.999 0.998 0.999 0.995 0.995 0.999 0.998 1.000 0.997 0.998 0.991 R2 mean 0.770 0.729 0.618 0.968 0.843 0.617 0.883 0.684 0.977 0.997 0.974 Power analysis – model selection Table S2. Model assignation estimates between models “Recolonisation” and “Bottleneck”, two demographic models (Figure 2 in main text) representing likely demographic histories of Red Sea populations of Carcharhinus melanopterus since before the Last Glacial Maximum (LGM). One thousand targets were randomly selected from 10,000 simulations generated with the “Bottleneck” model, to be used as pseudo-observed data (pods) and tested with the model seleciton analysis. The bottom row dictates how many of the 1,000 pods were successfully assigned to the “Bottleneck” model that originally generated them, at varying confidence thresholds (top row) within 1,000 trees. Model: “Recolonisation” “Bottleneck” % Confidence threshold: 95 90 80 70 60 50 95 90 80 70 60 50 Number of pods / 1000: 0 0 0 0 0 0 499 733 977 999 1,000 1,000 131 Generating model simulations We used fastsimcoal2 (fsc25 v 2.5.2.21, Excoffier et al. 2013) to simulate the demographic models, using appropriate sample sizes of 28 for each population (2n due to simulating haploid individuals), and 976 loci of 620bp in length, based on the observed data (loci at least 100bp in length, loci length was averaged). For each run, the scenario was defined in a template file (-t .tpl, Figure S2) and the parameter values found in a definition file (-f .def), with one simulation run for each set of predefined parameter values (-n 1). All SNPs in a sequence were output (-s 0) and mutations generated according to the infinite site mutation model (-I). The fastsimcoal2 output included most of the associated summary statistics - 2D minor allele SFS (-m) in .dadi format (-D). Additionally, between-population π was generated using Arlequin (arlsumstat v 3.5.2.2, Excoffier and Lischer 2010) for each simulation, and appended, representing a summary of the 2D SFS. Example fastsimcoal2 run command: ./fsc25221 -t input_autosome.tpl -n1 -f input_autosome.def -s0 -I -m -D > fsc_output 132 //Parameters for the coalescence simulation program : simcoal.exe 2 samples to simulate : Exponential growth //Population effective sizes (number of genes 2*diploids) Npop0 Npop1 //Samples sizes (number of genes 2*diploids) 28 28 //Growth rates : negative growth implies population expansion 0 0 //Number of migration matrices : 0 implies no migration between demes 2 //migration matrix 0 0 mig mig 0 //migration matrix 1 0 0 0 0 //historical event: time, source, sink, migrants, new deme size, new growth rate, migration matrix index 4 events tstop 1 1 1 1 gr 0 trec 1 1 1 1 0 1 tbott 1 1 1 resize 0 0 tanc 1 0 1 1 0 1 //Number of independent loci [chromosome] 976 0 //Per chromosome: Number of linkage blocks 1 //per Block: data type, num loci, rec. rate and mut rate + optional parameters DNA 620 0.000000005 0.0000000083 Figure S2. Example of a .tpl input file required by fastsimcoal2 to describe the model. The population parameters in blue would be defined in the required .def input file (also defined alongside their prior ranges in the main text). This specific example describes the sample sizes, number of loci, length of loci, recombination rate, and mutation rate specific to the Red Sea and Seychelles dataset for Carcharhinus melanopterus used in this chapter. 133 Posterior parameters Table S3. Results of population parameter estimation for Model “Bottleneck”, the demographic model that most likely represents the Red Sea population of Carcharhinus melanopterus since before the Last Glacial Maximum (LGM), using an Approximate Bayesian Computation framework with random forest methods with 1,000 trees and 10,000 simulations. Population parameters and their prior ranges are defined (as in Table 1 in the main text). The median is highlighted as the descriptor with the lowest error according to the power analysis, and is referred to throughout the study. It should be noted that the ‘resize’ parameter is calculated for each simulation, rather than from estimates from within the distributions. The value of ‘gr’, the growth rate of the recovering population, appears negative as it was modelled back in time from the present day. Parameter name Definition Prior range Posterior parameter estimation Expectatio n (mean) Median 0.025 quantile 0.975 quantile Mode tanc Number of years since colonisation of Red Sea ~40,000 – ~600,000 415,642 444,863 100,489 596,662 504,555 Nanc Pre-LGM effective population size in Red Sea 1,000 – ~50,000 27,793 26,770 11,077 47,956 33,679 tbott Number of years since agenetic bottleneck ~24,000 – ~36,000 30,104 30,054 24,423 35,861 32,836 Nbott Effective population size of bottleneck in Red Sea 1,000 – Nanc 21,202 19,974 5,807 37,065 21,443 resize* Ratio of effective population size change Nanc/Nbott 2.2 1.6 1.0 7.2 1.5 trec Number of years since start of population recovery ~11,500 – ~15,000 13,396 13,405 11,620 14,980 14,297 tleng Number of years of population recovery 150 – (trec – 1) 8,330 8,461 2,072 13,940 9,610 gr* Growth rate during population recovery (log(Nbott/ Npop1))/tleng -1.2x10 -4 -1.2x10-4 -2.7x10-5 -2.1x10-4 -1.3x10-4 mig Proportion of individuals per year migrating between basins 0.14x10-5 – 0.14x10-4 0.6x10 -5 0.05x10-4 0.2x10-5 0.1x10-4 0.05x10-4 Npop0 Current effective population size in Indian Ocean 10,000 – ~50,000 11,866 11,287 10,036 17,928 11,346 Npop1 Current effective population size in Red Sea Nbott – ~50,000 37,465 39,224 17,902 49,682 38,422 134 Model selection Table S4. Results of a demographic model selection using Approximate Bayseian Computation ranfom forests methods with 1,000 trees and 10,000 simulations. Models “Recolonisation” and “Bottleneck” are two demographic models (Figure 2 in main text) representing likely demographic histories of Red Sea populations of Carcharhinus melanopterus since before the Last Glacial Maximum (LGM). Post.proba represents an approximation of the posterior probability of the selected model after first predicting the model that best fits the data. No. ind. in each pop (Red Sea and Seychelles) No. loci Length loci Number of trees (/1,000) assigning observed data to: Post.proba “Recolonisation” model “Bottleneck” model 15 976 620 58 942 0.966 Population structure and EEMS Table S5. Cross validation error (CV error) estimates in ADMIXTURE v 1.3 (Alexander et al. 2009) when determining the most likely number of ancestral populations (K) of Carcharhinus melanopterus across the Indo-Pacific. Value of K CV error 1 0.30140 2 0.20898 3 0.16256 4 0.14966 5 0.14226 6 0.14865 7 0.14774 8 0.15655 9 0.16450 10 0.16759 11 0.16036 12 0.17069 13 0.14116 135 Figure S3. Plot of 400,000 MCMC iterations, after a burn-in period of 10,000 and with thinning set at 99, following the Estimating Effective Migration Surfaces (EEMS) analysis (Petkova et al. 2016) with 1,788 loci and 1,000 demes, for 130 sampled Carcharhinus melanopterus individuals from across the Indo-Pacific. The MCMC chain is a process that works its way through combinations of parameter values (eg. migration over each plane on the EEMS map), plotting the absolute likelihood. The movement in the plot ensures I sampled from all the available space. References Alexander DH, Novembre J, Lange K (2009) Fast model-based estimation of ancestry in unrelated individuals. Genome Res 19:1655–1664 Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M (2013) Robust demographic inference from genomic and SNP data. PloS Genet 9:e1003905 Excoffier L, Lischer HEL (2010) Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. Mol Ecol Resour 10:564–567 Petkova D, Novembre J, Stephens M (2016) Visualizing spatial population structure with estimated effective migration surfaces. Nat Genet 48:94–100 136