iee 12 (2019) 9 Ideas in Ecology and Evolution 12: 9–21, 2019 doi:10.24908/iee.2019.12.2.e © 2019 The Author Received 6 April 2019; Accepted 18 June 2019 Guest Editorial Data-based, synthesis driven: Setting the agenda for computational ecology Timothée Poisot, Richard LaBrie, Erin Larson, Anastasia Rahlin, and Benno I. Simmons Timothée Poisot (timothee.poisot@umontreal.ca), Département de Sciences Biologiques and Groupe de Recherche Interuniversitaire en Limnologie et environnement aquatique, Université de Montréal, Pavillon Marie-Victorin C.P. 6128, succ. Centre-ville Montréal, Québec H3C 3J7 Canada, and Québec Centre for Biodiversity Sciences, McGill University, Montréal, Canada Richard LaBrie, Département de Sciences Biologiques and Groupe de Recherche Interuniversitaire en Limnologie et environnement aquatique, Université de Montréal, Pavillon Marie-Victorin C.P. 6128, succ. Centre-ville Montréal, Québec H3C 3J7 Canada Erin Larson, Department of Ecology and Evolutionary Biology, Cornell University, Ithica, New York 14853 USA Anastasia Rahlin, Illinois Natural History Survey, Champaign, Illinois 61820 USA Benno I. Simmons, Conservation Science Group, Department of Zoology, University of Cambridge, Cambridge, UK and Department of Animal and Plant Sciences, University of Sheffield, Sheffield, UK Computational science happens when algorithms, software, data management practices, and advanced research computing are put in interaction with the explicit goal of solving “complex” problems. Typically, prob- lems are considered complex when they cannot be solved appropriately with mathematical modelling (defined here as the application of mathematical models that are not explicitly grounded into empirical data) or data- collection only (Dörner and Funke 2017). Computational science is the application of computational thinking to research questions (Papert 1996), i.e. the feedback loop of abstracting a problem to its core mechanisms, expressing a solution in a way that can be automated, and using interactions between simulations and data to refine the original problem or suggest new knowledge. Computational approaches are commonplace in most areas of biology, to the point where one would almost be confident that they represent a viable career path (Bourne 2011). Collecting ecological data is a time-consuming, costly, and demanding project; in addition, the variability of these data is high (both in terms of variance and in terms of quantity and completeness). In parallel, many ecological problems lack appropriate formal math- ematical formulations, which we need in order to construct strong, testable hypotheses. For these reasons, computational approaches hold great possibilities, notably to further ecological synthesis and assist decision-making (Petrovskii and Petrovskaya 2012). Levin (2012) suggested that ecology (and evolution- ary biology) should continue their move towards a marriage of theory and data. In addition to the lack of adequately expressed models, this effort is hampered by the fact that data and models are often developed by different groups of scientists, and reconciling both can be difficult. This has been suggested as one of the reasons for why theoretical papers (defined as papers with at least one equation in the main text) experience a lower number of citations (Fawcett and Higginson 2012); this is the tragic sign that empirical scientists either do not see the value of theoretical work, or have not received the training to usefully rely on math-heavy theoretical papers, which of course can be blamed on both parties. One of the leading textbooks on mathematical models in ecology and evolution (Otto and Day 2007) is more iee This work is licensed under a Creative Commons Attribution 3.0 License. iee 12 (2019) 10 focused with algebra and calculus, and not with the integration of models with data. Other manuals that cover the integrat-ion of models and data tend to lean more towards statistical models (Bolker 2008, Soetaert and Herman 2008). This paints a picture of ecology as a field in which dynamical models and empirical data do not interact much, and instead the literature develops in silos. Computational ecology is the application of comput- ational science to ecological problems. This defines three core characteristics of computational ecology. First, it recognizes ecological systems as complex and adaptive; this places a great emphasis on mathematical tools that can handle, or even require, a certain degree of stochasticity to accommodate or emulate what is found in nature (Zhang 2010, 2012). Second, it understands that data are the final arbiter of any simulation or model (Petrovskii and Petrovskaya 2012); this favours the use of data-driven approaches and analyses (Beaumont 2010). On this point, computational approaches differ greatly from the production of theoretical models able to stand on their own with no data input. Finally, it accepts that some ecological systems are too complex to be form- ulated in mathematical or programmatic terms (Pascual 2005); the use of conceptual, or “toy” models, as long as they can be confronted to empirical data, is preferable to “abusing” mathematics by describing the wrong mechanism well (May 2004). By contrast, modelling approaches are, by construction, limited to problems that can be expressed in mathematical terms. To summarize, we define computational ecology as the sub-field tasked with integrating real-world data with mathematical, conceptual, and numerical models (if possible by deeply coupling them), in order to assist with the most-needed goal of improving the predictive accuracy of ecological research and synthesising ecological knowledge (Houlahan et al. 2017, Maris et al. 2017). Jørgensen (2008) identified that one of the current challenges is to facilitate the integration of existing data in the explosion of modelling techniques (most of which were designed to answer long-standing questions in ecological research). Ecology as a whole (and community ecology in particular) circumvented the problem of model and data mismatch by investing in the development and refine- ment of statistical models (see Warton et al. 2014 for an excellent overview) and “numerical” approaches (Legendre and Legendre 1998) based on multivariate statistics. These models are able to explain data, but very rarely do they give rise to new predictions – despite it being a very clear priority even if we “simply” seek to further our understanding (Houlahan et al. 2017). Computational ecology can fill this niche; at the cost of a higher degree of abstraction, its integration of data and generative models (i.e. models that, given rules, will generate new data) can be helpful to initiate the investigation of questions that have not received (or perhaps cannot receive) extensive empirical treatment, or for which usual statistical approaches fall short. In particular, we argue that computational approaches can serve a dual purpose. First, they can deliver a more predictive science, because they are explicitly data- driven. Second, they can guide the attention of research- ers towards mechanisms of interest; in a context where time and resources are finite, and the urgency to understand ecological systems is high, this may be the main selling point of computational techniques. In a thought-provoking essay, Markowetz (2017) suggests that all biology is computational biology—the rationale behind this bold statement being that integrating computational advances, novel mathematical tools, and the usual data from one field, has a high potential to deliver synthesis. A more reasonable statement would be that all ecology can benefit from computational ecology, as long as we can understand how it interacts with other approaches; in this paper, we attempt to situate the practice of computational ecology within the broader scope of ecological research. The recent years have given us an explosion of new tools, training opportunities, and mechanisms for data access. One can assume that computational approaches will become more tempting, and more broadly adopted. This requires us to address the questions of the usefulness and promises of this line of research, as well as the caveats associated with it. In particular, we highlight the ways in which computational ecology differs from, and complements, ecological modelling that does not involve data directly. We finally move on to the currency of collaborations between different sub-disciplines of ecologists, and discuss the need to add more quantitative skills in ecological training and to develop a culture where specialising in comput- ational research is accepted. Advancing ecology through computational techniques is an ongoing work and has already delivered many results (some of which we discuss in the text). To elevate this approach, the community of practising ecologists needs to establish baselines of appropriate practices for the sharing and re-use of existing data, especially when they are massively aggregated and re-purposed; reach a consensus on a common core of training which enables students to explore computational approaches in addition to more usual approaches. Ultimately, a better integration of computational techniques in the practice of ecological research has the potential to improve transparency and reproducibility, and facilitate the synthesis of ecological knowledge. A success story: Species Distribution Models The practice known as “species distribution model- ling” (and the species distribution models, henceforth SDMs, it generates) is a good example of computational practices generating novel ecological insights. At their core, SDMs seek to model the presences or absences of a iee 12 (2019) 11 species based on previous observations of its presences or absences, and knowledge of the environment in which the observations were made. More formally, SDMs can be interpreted as having the form P(𝑆|𝐸) (or P(𝑆 = 1|𝐸) for presence-only models), where 𝑆 denotes the presence of a species, and 𝐸 is an array of variables representing the local state of the environment at the point where the prediction is made (the location is represented, not by its spatial positions, but by a suite of environmental variables). As Franklin (2010) highlights, SDMs emerged at a time where access to computers and the ability to effectively program them became easier. Although ecological insights, statistical methods, and data already existed, the ability to turn these ingredients into something predictive required what is now called “computational literacy”—the ability to abstract and automate a system to generate predictions through computer simulations and their validation. One of the strengths of SDMs is that they can be used either for predictions or explanations of where a given species occur (Elith and Leathwick 2009) and can be corroborated with empirical data. To calculate P(𝑆|𝐸) is to make a prediction (what is the likelihood of observing species 𝑆 at a given location), that can be refined, validated, or rejected based on cross-validation (Hijmans 2012) or de novo field sampling (West et al. 2016). To understand 𝐸, i.e. the environmental aspects that determine species presence, is to form an explanation of a distribution that relates to the natural history of a species. SDMs originated as statistical and correlative models, and are now incorporating more ecological theory (Austin 2002)—being able to integrate (abstract) ideas and knowledge with (formal) statistical and numerical tools is a key feature of computational thinking. In fact, one of the most recent and most stimulating develop- ments in the field of SDMs is to refine their predictions not through the addition of more data, but through the addition of more processes (Franklin 2010). These SDMs rely on the usual statistical models, but also on dynamical models (for example simulations; e.g. Wisz et al. (2012) or Pellissier et al. (2013) for biotic interactions, and Miller and Holloway (2015) for movement and dispersal). What they lack in mathematical expressive- ness (i.e. having a closed-form solution (Borwein and Crandall 2013), which is often ruled out by the use of stochastic models or agent-based simulations), they assume to gain in predictive ability through the explicit consideration of more realistic ecological mechanisms (D’Amen et al. 2017, Staniczenko et al. 2017). SDMs have been a success, but there are many other areas of ecology that could be improved by a marriage of computational ecology and empirical data. The novel use of genomic RNA-seq data and existing worldclim climate data allowed the creation of random forest models in order to make predictions of where yellow warbler populations, a species of conservation concern, are most vulnerable to climate change (Bay et al. 2018). Environmental DNA metabarcoding data coupled with machine learning approaches and linear models was used to create, test, and predict biodiversity indices for benthic foraminifera, which can be applied to monitoring health of fish farm ecosystems (Cordier et al. 2017). The increase in data volume, coupled with access to comput- ing techniques and power, will result in a multiplication of these boundary-pushing studies in the next years. Outlining computational ecology Most research approaches exist on a gradient. In this section, we will outline research practices which differ enough in their approaches to fall under the umbrella of computational science, and specifically discuss how they can provide novel information. We will first show how computational ecology complements other research approaches, then discuss how it can be used in the current context to facilitate interactions between theoretical and empirical research. Computational ecology in focus The specific example of predator-prey interactions should be a familiar illustration of how the same problem can be addressed through a variety of research approaches (Figure 1). The classic predator-prey equations of Lotka and Volterra are an instance of a “modelling” based perspective, wherein mathematical analysis reveals how selected parameters (rates of interactions and growth) affect an ecologically-relevant quantity (population stability and coexistence). These models, although they have been formulated to explain data generated through empirical observations, are disconnected from the data themselves. In fact, this family of model lies at the basis of a branch of ecological modelling that now exists entirely outside of data (Ackland and Gallagher 2004, Gyllenberg et al. 2006, Coville and Frederic 2013). These purely mathematical models are often used to describe trends in time series. But not all of them hold up to scrutiny when explicitly compared to empirical data. Gilpin (1973) famously reports that based on the predictions of the Lotka-Volterra model, hares in the Hudson bay are feeding on Lynx— this example goes to show that applying models that have not been validated could be dangerous, and their output should be framed in the context of external data. By contrast Sallan et al. (2011) study the same issue (sustained persistence and fluctuations of predator–prey couples through time) using a paleo-ecological timeseries, and interpret their data in the context of predictions from the Lotka-Volterra family of models (namely, they find support for Lotka-Volterra-like iee 12 (2019) 12 Figure 1. An overview of how computational approaches can complement other research approaches. On the top line, we have represented empirical studies (center) as well as modelling (left) and meta-analysis (right; represented as a funnel plot) approaches. In the bottom line, we have represented three possible approaches to study predator-prey relationships: knowledge graphs can represent interactions between the concepts; agent-based modelling (ABM) can provide some predictions about the future of the system; methods from machine learning (ML/AI) can assist both in understanding and prediction. Importantly, the goal of these approaches should always be to return to empirical data. oscillations in time). Although dynamical models and empirical data interact in this example, they do not do so directly; that is, the analysis of empirical data is done within the context of a broad family of model, but not coupled to e.g. additional simulations. The two are done in parallel, and not so much in interaction. A number of other models have been shown to generate predictions that quantitatively match empirical data (Nicholson and Bailey 1935, Beverton and Holt 1957) – this represents, in our opinion, the sole test of whether a mathematical model is adapted to a particular problem and system. While models are undeniably useful to make mechanisms interact in a low-complexity setting, it is a grave mistake to assume they will, in and of themselves, be relevant to empirical systems. Meta-analyses, such as the one by Bolnick and Preisser (2005), are instead interested in collecting the outcome of observational and manipulative studies, and synthesizing the effects they report. These are often purely statistical, in that they aggregate significance and effect size, to measure how robust a result is across different systems. Meta-analyses most often require a critical mass of pre-existing papers (Lortie et al. 2013). Although they are irreplaceable as a tool to measure the strength of results, they are limited by their need for primary literature with experimental designs that are sufficiently similar to be comparable. Predator-prey (and other biotic) interactions have been studied with a few computational approaches to date. Colon et al. (2015) show how an agent-based model can guide the interpretation of the same system represented as ordinary differential equations. This is an important result, as it offers suggestions to bridge families of models—not only can agent-based approaches provide answers about the biological systems of interest, they can also provide information about the behaviour of other families of models. Although this example is primarily model-driven, there are a number of data-driven approaches that rely on computational techniques. One example is the prediction of species interactions. Stock et al. (2017) suggested linear filtering to identify false- negatives (i.e. interactions that exist, but may have been missed) in an empirical dataset. This can guide sampling in the field, and is to an extent a predictive task, but cannot inform our understanding of the system. Similarly, Desjardins-Proulx et al. (2017a) used various recommender systems to infer the prey items of predators based on knowledge of (i) diet and (ii) functional traits. This results in testable predictions, but is not necessarily increasing our understanding of the rules involved in the system. Chen et al. (2016) used symbolic regression to infer a differential equations model from data about predator- prey interactions. This is a fascinating result, as it shows just how much signal is contained in data: enough to describe a mathematical model explaining their behave- iour. And while understanding mechanisms by looking at a time series may be difficult, understanding the mechan- isms when studying equations dictated by the data themselves is feasible. In a similar vein, Desjardins- Proulx et al. (2017b) suggest that logic networks, which describe the relationships between concepts, can be inferred by optimizing a knowledge bank on the data. This category of approaches offer the opportunity to increase our understanding of empirical data, not by thinking deeply about the rules, but by extracting the rules from the data. Computational ecology in context In Life on the Mississippi, Mark Twain wrote that “There is something fascinating about science. One gets such wholesale returns of conjecture out of such a trifling investment of fact”. This is a good description of the purpose of computational ecology: in a data-limited context, merging phenomenological models with pre- existing datasets is a way to efficiently develop conjectures, or more appropriately, build on our knowledge of models and data to put forward testable, iee 12 (2019) 13 quantified hypotheses. Perretti et al. (2013) intriguingly report that model-free inference based on data always outperforms the best model: in other words, we do not understand ecological systems as well as we think, and approaches putting the data first might always outperform those relying on expert knowledge. Pascual (2005) outlined that computational ecology has a unique ability to go from the complex (natural systems) to the simple (representations and conceptual models), and back (testable predictions). Although the natural world is immensely complex, it is paradoxically the high degree of model abstraction in computational approaches that gives them generality across several systems. In the years since this article was published, the explosion in machine learning tools and their predictive ability, and their adoption by ecologists (Thessen 2016) should have changed the situation quite significantly. Yet, with the exception of a still-narrow family of problems that can be addressed by remote-sensing or meta-genomics, there has been no regime shift in the rate at which ecological data are collected. Observations from citizen science accumulate but are highly biased by societal preferences rather than conservation priority (Donaldson et al. 2016, Troudet et al. 2017), by proximity to urban centers and infrastructure (Geldmann et al. 2016), as well as by the interaction between these factors (Tiago et al. 2017). In addition, Lindenmayer and Likens (2018) raise the significant concern that the “culture” of ecology must be maintained—even in the context of a sudden (though debatable) avalanche of data, ecology as a field should always put robust hypotheses first. This is especially true since our needs for testable and actionable predictions increased dramatically. This provides a clear mission statement for computational ecology: refining the models and further integrating them with data is necessary, and using methods that work well on reduced amounts of heterogeneous data must be part of this effort. Enthusiastic reports about the big data revolution coming to ecology (Hampton et al. 2013, Soranno and Schimel 2014) have been premature at best, and the challenge associated with most of our datasets being decidedly tiny cannot be easily dismissed. Yet data, even small, are “unreasonably effective” (Halevy et al. 2009)—they can reveal trends and signal that may not be immediately apparent from causal modelling alone, for example. Ecological models make, by definition, high-accuracy predictions, but they tend to be difficult to test (Rykiel 1996)—models relying on precise mathematical expressions can be difficult to calibrate or parameterize. Observations (field sampling) or manipulative approaches (micro/meso/macro-cosms, field experiments) are highly accurate (but have also immense human and monetary costs that limit the scale at which they can be applied). There is simply too much nature around for us to observe, monitor, and manipulate it all. Computational approaches able to generalize some rules from the data (Desjardins-Proulx et al. 2017a, 2017b) may help guide the attention of researchers onto mechanisms that are worthy of a deeper investigation. Computational approaches will more likely shine in support to more established areas of research. Recent advances in computational epidemiology (reviewed in Marathe and Ramakrishnan 2013) provide an interesting roadmap for computational ecology: there have been parallel advances in (i) adapting data acquisition to maximize the usefulness of novel data analyses methods, (ii) integration of novel analytical methods from applied mathematics and social sciences, mostly related to computations on large graphs, to work on pre-existing data, and (iii) a tighter integration of models to data fluxes to allow near real-time monitoring and prediction. All of these things are possible in ecological research. In fact, recent examples (Bush et al. 2017, Harris et al. 2017, Dietze et al. 2018, White et al. 2018) suggest that near real-time forecasting of biodiversity is becoming feasible, and is identified by computational ecologists as a key priority. En route towards synthesis Ecological synthesis, usually defined as the integration of data and knowledge to increase scope, relevance, or usability of results both across and within sub-fields (Carpenter et al. 2009), is an essential first step to achieve policy relevance (Baron et al. 2017). Most of the global policy challenges have an ecological or environmental component, and outside of the socio- ecological, socio-economical, socio-cultural, aspects, ecologists can contribute to the mitigation or resolution of these challenges by i) assessing our knowledge of natural systems, ii) developing methods to produce scenarios using state-of-the-art models and tools, and iii) communicating the output of these scenarios to impact policy-making. White et al. (2015) propose that this falls under the umbrella of action ecology, i.e. using fundamental knowledge and ecological theory to address pressing, real-world questions. Raghavan et al. (2016) suggest that this approach can also accommodate stakeholder knowledge and engage- ment. By building models that rely on ecological concepts, empirical data, and stakeholder feedback, they propose a computational agroecology program, to use computational tools in the optimization of sustainable agricultural practices. This example suggests that not only can computational approaches yield fundamental research results in a short time frame, they can also be leveraged as a tool for applied research and knowledge transfer now. The definition of “a short time” is highly sensitive to the context—some predictions can be generated using routine tools (in a matter of weeks), whereas some require the development of novel methodologies, and may require years. Accelerating the iee 12 (2019) 14 time to prediction will, in large part, require the development of software that can be deployed and run more rapidly. Computational ecology is nevertheless nimble enough that it can be used to iterate rapidly over a range of scenarios, to inform interactions with policy makers or stakeholders in near real time. We need to mention that there is a lower bound on time to prediction: some applications require different degrees of accuracy. While an approximate result is good enough for fundamental research, outputs used to enact policy making that can affect thousands of citizens (and change the dynamics of a region or an ecosystem) require a better accuracy. The variety of computational techniques allows moving across these scales, while the advances in programming practices and computing power decreases the severity of the accuracy/runtime tradeoff. Mapping the domains of collaboration Understanding how computational ecology will fit within the broader research practices requires answering three questions: what can computational ecology bring to the table, what are the needs of computational ecologists, and what are the current limitations of computational approaches that could limit their immediate applic- ability? It seems, at this point, important to minimize neither the importance nor the efficiency of sampling and collection of additional data. Sampling is important because ecological questions, no matter how fund- amental, ought to be grounded in phenomena happening in nature, and these are revealed by observation or manipulation of natural systems. Sampling is efficient because it is the final arbiter: how good any prediction is at explaining aspects of a particular empirical system is determined by observations of this system, compared to the predictions. Relying heavily on external information implies that computational research is dependent on standards for data representation. The Ecological Metadata Language (Fegraus et al. 2005) is an attempt at standardizing the way meta-data are represented for ecological data; adherence to this standard, although it has been shown to improve the ease of assembling large datasets from single studies (Gil et al. 2011), is done on a voluntary basis (and its uptake is therefore abysmal). An alternative approach is to rely on community efforts to pre-curate and pre- catalog ecological data, such as with the flagship effort EcoDataRetriever (Morris and White 2013). Yet even this approach is ultimately limited because of the human factor involved—when the upstream data change, they have to be re-worked into the software. A community consensus on data representation, although unlikely, would actually solve several problems at once. First, it would make the integration of multiple data sources trivial. Second, it would provide clear guidelines about the input and storage of data, thus maybe improving their currently limited longevity (Vines et al. 2014). Finally, it would facilitate the integration of data and models with minimum efforts and risk of miscommunication, since the format would be the same for all. To this extent, a recent proposal by Ovaskainen et al. (2017) is particular- ly interesting: rather than deciding on formats based on knowledge of eco-informatics or data management best practices, why not start from the ecological concepts, and translate them in digital representation? The current way to represent, for example, biodiversity data has largely been designed based on the needs of collection managers, and bears little to no relevance to most extant research needs. Re-designing the way we store and manipulate data based on research practices is an important step forward, and will ultimately benefit researchers. To be generalized, this task requires a strong collaboration between ecologists with topic expertise, ecologists with field expertise, and those of us leaning closest to the computational part of the field. With or without a common data format, the problem remains that we have very limited insights into how error in predictions made on synthetic datasets will propagate from an analysis to another (Poisot et al. 2016); in a succession of predictive steps, do errors at each step amplify, or cancel one another? Biases exist in the underlying data and in the models used to generate the predictions, and these biases can manifest in three possible outcomes. First, predictions from these datasets accumulate bias and cannot be used. Second, because the scale at which these predictions are expressed is large, errors are (quantitatively) small enough to be over-ridden by the magnitude of actual variation. Finally, in the best- case but low-realism scenario, errors end up cancelling each other out. The best possible way to understand how errors propagate is to validate predictions de novo, through sampling. Model-validation methods can be used, as they are with SDMs (Hijmans 2012), but de novo sampling carries the additional weight of being an independent attempt at testing the prediction. Improved collaborations on this aspect will provide estimates of the robustness of the predictions, in addition to highlighting the steps of the process in which uncertainty is high— these steps are natural candidates for additional methodological development. Finally, there is a need to assess how the predictions made by purely computational approaches will be fed back into other types of research. This is notably true when presenting these approaches to stakeholders. One possible way to make this knowledge transfer process easier is to be transparent about the way predictions were derived: which data were used (with citations for credits and unique identifiers for reproducibility), which software was used (with versions numbers and code), and what the model/simulations do (White et al. 2013). In iee 12 (2019) 15 short, the onus is on practitioners of computational research to make sure we provide all the information needed to communicate how predictions came to be. Establishing the currencies of collaboration An important question to further the integration of computational approaches to the workflow of ecological research is to establish currencies for collaborations. Both at the scale of individual researchers, research group, and larger research communities, it is important to understand what each can contribute to the research effort. As ecological research is expected to be increasingly predictive and policy-relevant, and as fundamental research tends to tackle increasingly refined and complex questions, it is expected that research problems will become more difficult to resolve. This is an incentive for collaborations that build on the skills that are specific to different approaches. In an editorial to the New England Journal of Medicine, Longo and Drazen (2016) characterized scientists using previously published data as “research parasites” (backlash by a large part of the scientific community caused one of the authors to later retract the statement—Drazen (2016)). Although community ecol- ogists would have, anyways, realized that the presence of parasites indicates a healthy ecosystem (Marcogliese 2005, Hudson et al. 2006), this feeling of unfair benefit for ecological data re-analysis (Mills et al. 2015) has to be addressed, because it has no empirical support. The rate of data re-use in ecology is low and has a large delay (Evans 2016), and there are no instances of re-analysing existing data for the same (or similar) purpose they were produced for. There is a necessary delay between the moment data are available, and the moment where they are aggregated and re-purposed (especially considering that data are, at the earliest, published at the same time as the paper). This delay is introduced by the need to understand the data, see how they can be combined, develop a research hypothesis, etc… On the other hand, there are multiple instances of combining multiple datasets collected at different scales to address an entirely different question (see GBIF 2016 for an excellent showcase)—it is more likely that data re- use is done with the intent of exploring different questions. It is also worth remembering that ecology as a whole, and macroecology and biogeography in partic- ular, already benefit immensely from data re-use. For example, data collected by citizen scientists are used to generate estimates of biodiversity distribution, but also set and refine conservation targets (Devictor et al. 2010); an overwhelming majority of our knowledge of bird richness and distribution comes from the eBird project (Sullivan et al. 2009, 2014), which is essentially fed by the unpaid work of citizen scientists. With this in mind, there is no tip-toeing around the fact that computational ecologists will be data consumers, and this data will have to come from ecologists with active field programs (in addition to government, industry, and citizens). Recognizing that computational ecology needs these data as a condition for its continued existence and relevance should motivate the establish- ment of a way to credit and recognize the role of data producers (which is discussed in Poisot et al. 2016, in particular in the context of massive dataset aggregation). Data re-users must be extremely pro-active in the establishment of crediting mechanisms for data produc- ers; as the availability of these data is crucial to computational approaches, and as we do not share any of the cost of collecting these data, it behooves us to make sure that our research practices do not accrue a cost for our colleagues with field or lab programs. Encouraging conversations between data producers and data consumers about what data will be shared, when, and how databases will be maintained will improve both collaborations and research quality. In parallel, data producers can benefit from the new analytical avenues opened by advances in computational ecology. Research funders should develop financial incentives to these collaborations, specifically by dedicating a part of the money to developing and implementing sound data archival and re-use strategies, and by encouraging researchers to re-use existing data when they exist. Training data-minded ecologists The fact that data re-use is not instantaneously convenient reveals another piece of information about computational ecology: it relies on different skills, and different tools than those typically used by field ecologists. One of the most fruitful avenues for collaboration lies in recognizing the strengths of different domains: the skills required to assemble a dataset (taxonomic expertise, natural history knowledge, field know-how) and the skills required to develop robust computational studies (programming, applied math- ematics) are different. Because these skills are so transversal to any form of ecological research, we are confident that they can be incorporated in any cur- riculum. If anything, this calls for increased collaboration, where these approaches are put to work in complementarity. Barraquand et al. (2014) highlighted the fact that professional ecologists received less quantitative and computational thinking that they think should be necessary. Increasing the amount of such training does not necessarily imply that natural history or field practice will be sacrificed on the altar of mathematics: rather, ecology would benefit from introducing more quant- itative skills and reasoning across all courses, and iee 12 (2019) 16 introductory ones in particular (Hoffman et al. 2016). Instead of dividing the field further between empirically and theoretically-minded scientists, this would showcase quantitative skills as being transversal to all questions that ecology can address. What to teach, and how to integrate it to the existing curriculum, does of course require discussion and consensus building by the com- munity. A related problem is that most practising ecologists are terrible role models when it comes to showcasing good practices of data management (because there are no incentives to do this); and data management is a crucial step towards easier computational approaches. Even in the minority of cases where ecologists do share their data on public platforms, there are so few metadata that not being able to reproduce the original study is the rule (Roche et al. 2014, 2015). This is a worrying trend, because data management affects how easily research is done, regardless of whether the data are ultimately archived. Because the volume and variety of data we can collect tends to increase over time, and because we expect higher standards of analysis (therefore requiring more programmatic approaches relying on the use or develop- ment of purpose-specific code), data management has already became a core skill for ecologists to acquire. This view is echoed in recent proposals. Mislan et al. (2016) suggested that highlighting the importance of code in most ecological studies would be a way to bring the community to adopt higher standards, all the while de-mystifying the process of producing code. As with increased mandatory data release alongside more reprod- ucible publication requirements by funding agencies, mandatory code release would benefit a more reprod- ucible science and show how data were transformed during the analysis. This also requires teaching ecologists how to evaluate the quality of the software they use (Poisot 2015). Finally, Hampton et al. (2015) proposed that the “Tao of Open Science” would be particularly beneficial to the entire field of ecology; as part of the important changes in attitude, they identified the solicitation and integration of productive feedback throughout the research process. Regardless of the technical solution, this emphasizes the need to foster, in ecologists in training, a culture of discussion across disciplinary boundaries. All of these points can be distilled into practical training recommendations for different groups in the community of ecologists. Classes based around lab or field experience should emphasize practical data man- agement skills which have been validated as best practices by the community (Soyka et al. 2017), and introduce tools that would make the maintenance of data easier. Modelling classes, especially when concerned about purely mathematical models, should add modules on the way these models can be integrated with empirical data. Finally, computational classes should emphasize communication skills: what do these new tools do, and how can they be used by other fields in ecology; but also, how do we properly track citations to data, and give credit to data producers? Building these practices into training would ensure that the next generation of ecologists will be able to engage in a meaningful dialogue across methodological boundaries. Fostering a culture of mutual respect and acceptance While the origins of ecology are grounded in field research, the growth of computational ecology has been accompanied by an increasing segment of ecologists who do not, or have never, conducted fieldwork. Anecdotally, this new class of researcher has caused some tensions between computational ecologists and field ecologists, at the level of individuals, mixed-method research groups and the ecological community at large. Expectations of fieldwork are also sometimes embedded institutionally, such as with hiking equipment as prizes in ecological competitions. Part of these tensions may be driven by a view of computational ecologists as ‘research parasites’, and this is another reason to develop crediting mechanisms for data producers, as discussed above. However, tensions are sometimes also predicated on two sequential assump- tions: (i) that computational ecologists have less affinity for the natural world and/or less natural history knowledge of the systems they work in; and (ii) that these deficits reduce the ability of computational ecologists to carry out sound ecological science. Whether the first assumption is true will vary widely among individuals. However, it is important to note that interest in ecological research and enjoyment of outdoor pursuits are not necessarily collinear, and that computational skills do not preclude natural history knowledge. Assumption two is both incorrect and unhelpful. Such views must be addressed because they may negatively affect ecology as a discipline. For example, people in disciplines like mathematics or physics, who may have superb quantitative skills but little interest in field work, may be less likely to become ecologists or to work with field ecologists, despite the potential to make profound contributions. Similarly, those who do not carry out fieldwork due to physical or mental disability, medical conditions, or simply personal preference, should not be made to feel as though they are unable to make valid scientific contributions. Criticizing, or ridiculing, the work or choices of early-career/student computational ecologists could be particularly damaging. It is important to note that such tensions may run the other way, and it is essential too for computational ecologists to recognize that field ecologists make irreplaceable contributions, whether possessing advanced quantitative skills or not. iee 12 (2019) 17 Ultimately, it is necessary for ecology to foster a culture where methodological specialization is accepted. The importance of field knowledge—such as sampling and natural history—is undoubtedly important, as is advanced quantitative and computational knowledge. That different individuals may hold these skills should motivate collaboration, not hostility. Such changes are urgently needed for computational ecology to flourish with, rather than alongside, field ecology. Concluding remarks None of the theoretical, mathematical, computational approaches to ecological research have any intrinsic superiority—in the end, direct observation and experi- mentation trumps all, and serve as the validation, rejection, or refinement of predictions derived in other ways, but lacks the scaling power to be the only viable solution. The growing computational power, growing amount of data, and increasing computational literacy in ecology means that producing theory and predictions is becoming cheaper and faster (regardless of the quality of these products). Yet the time needed to test any prediction is not decreasing (or at least not as fast). Computational science has resulted in the development of many tools and approaches that can be useful to ecology, since they allow ecologists of all kinds to wade through these predictions and data. Confronting theoretical predictions to data is a requirement, if not the core, of ecological synthesis; this is only possible under the conditions that ecologists engage in meaningful dialogue across disciplines, and recognize the currencies of their collaborations. Discussing the place of computational ecology within the broader context of the ecological sciences will highlight areas of collaborations with other areas of science. Thessen (2016) makes the point that long- standing ecological problems would benefit from being examined through a variety of machine learning techniques—we fully concur, because these techniques usually make the most of existing data (Halevy et al. 2009). Reaching a point where these methods are routinely used by ecologists will require a shift in our culture: quantitative training is currently perceived as inadequate (Barraquand et al. 2014), and most graduate programs do not train ecology students in contemporary statistics (Touchon and McCoy 2016). Ultimately, any additional data collection has its scope limited by financial, human, and temporal constraints— or in other words, we need to chose what to sample, because we can’t afford to sample it all. Computational approaches, because they can work through large amounts of data, and integrate them with models that can generate predictions, might allow answering an all important question: what do we sample, and where? Some rely on their ecological intuition to answer; although computational ecologists may be deprived of such intuitions, they have the know-how to couple data and models, and can meaningfully contribute to this answer. Computational ecology is also remarkably cost- effective. Although the reliance on advanced research computing incurs immense costs (including hardware maintenance, electrical power, and training of highly qualified personnel; these are often absorbed by local or national consortia), it allows the generation of predictions that are highly testable. Although the accuracy of these predictions is currently unknown (and will vary on a model/study/question basis), any additional empirical effort to validate predictions will improve their quality, reinforcing the need for dialogue and collaborations. Acknowledgements TP thanks Dr. Allison Barner and Dr. Andrew McDonald for stimulating discussions, and the Station de Biologie des Laurentides de l’Université de Montréal for hosting him during part of the writing process. TP thanks the Canadian Institute for Ecology and Evolution for financial support. BIS is supported by the Natural Environment Research Council as part of the Cambridge Earth System Science NERC DTP (NE/L002507/1). We thank the volunteers of Software Carpentry and Data Carpentry, whose work contribute to improving the skills of ecologists. Carabid picture by Maxime Dahirel (CC- BY 4.0), spider image by Sidney Frederic Harmer, Arthur Everett Shipley, digitized by Maxime Dahirel (CC-BY 4.0). References Ackland, G. J., and I.D. Gallagher. 2004. Stabilization of large generalized Lotka-Volterra foodwebs by evol- utionary feedback. Physical Review Letters 93. CrossRef Austin, M.P. 2002. Spatial prediction of species distribution: an interface between ecological theory and statistical modelling. Ecological Modelling 157:101– 118. CrossRef Baron, J.S., Specht, A. Garnier, E., Bishop, P., Campbell, C.A., Davis, F.W., et al. 2017. Synthesis centers as critical research infrastructure. BioScience 67(8): 750– 759. CrossRef Barraquand, F., Ezard, T.H.G., Jørgensen, P.S., Zimmerman, N., Chamberlain, S., Salguero-Gómez, R., et al. 2014. Lack of quantitative training among early-career ecologists: a survey of the problem and potential solutions. PeerJ 2:e285. CrossRef Bay, R. A., Harrigan, R.J., Underwood, V.L., Gibbs, H. L., Smith, T.B. and K. Ruegg. 2018. Genomic signals of selection predict climate-driven population declines in a migratory bird. Science 359:83–86. CrossRef iee 12 (2019) 18 Beaumont, M.A. 2010. Approximate Bayesian computation in evolution and ecology. Annual Review of Ecology, Evolution, and Systematics 41:379–406. CrossRef Becks, L., Hilker, F.M., Malchow, H., Jürgens, K., and H. Arndt. 2005. Experimental demonstration of chaos in a microbial food web. Nature 435:1226–1229. CrossRef Benincà, E., Huisman, J., Heerkloss, R., Jöhnk, K.D., Branco, P., Van Nes, F.H., et al. 2008. Chaos in a long- term experiment with a plankton community. Nature 451:822–825. CrossRef Beverton, R J., and S.J. Holt. 1957. On the dynamics of exploited fish populations. Springer Science & Business Media. CrossRef Bolker, B.M. 2008. Ecological models and data in R. Princeton University Press. CrossRef Bolnick, D.I., and E.L. Preisser. 2005. Resource competition modifies the strength of trait-mediated predator–prey interactions: a meta-analysis. Ecology 86:2771–2779. CrossRef Borwein, J. M., and R. E. Crandall. 2013. Closed Forms: What they are and why we care. Notices of the American Mathematical Society 60:50. CrossRef Bourne, P. E. 2011. Ten simple rules for getting ahead as a computational biologist in academia. PLoS Comput Biol 7:e1002001. CrossRef Bush, A., Sollmann, R., Wilting, A., Bohmann, K., Cole, B., Balzter, H., et al. 2017. Connecting Earth observation to high-throughput biodiversity data. Nature Ecology & Evolution 1:s41559-017-0176–017. CrossRef Carpenter, S.R., Armbrust, E.V., Arzberger, P.W., Chapin, F.S., Elser, J.J., Hackett, E.J., et al. 2009. Accelerate synthesis in ecology and environmental sciences. BioScience 59:699–701. CrossRef Chen, Y., Angulo, M.T., and Y-Y. Liu. 2016. Revealing complex ecological dynamics via symbolic regression. bioRxiv:074617. CrossRef Colon, C., Claessen, D., and M. Ghil. 2015. Bifurcation analysis of an agent-based model for predator–prey interactions. Ecological Modelling 317:93–106. CrossRef Cordier, T., Esling, P., Lejzerowicz, R., Visco, J., Ouadahi, A., Martins, C. et al. 2017. Predicting the ecological quality status of marine environments from eDNA metabarcoding data using supervised machine learning. Environmental Science & Technology 51:9118–9126. CrossRef Coville, J., and F. Frederic. 2013. Convergence To The Equilibrium In A Lotka-Volterra Ode Competition System With Mutations arXiv:1301.6237. D’Amen, M., Mateo, R.G., Pottier, J., Thuiller, W., Maiorano, L., Pellissier, L., et al. 2017. Improving spatial predictions of taxonomic, functional and phylogenetic diversity. Journal of Ecology 106(1): 76– 86. CrossRef Desjardins-Proulx, P., Laigle, I., Poisot, T., and D. Gravel. 2017a. Ecological interactions and the Netflix problem. PeerJ 5. CrossRef Desjardins-Proulx, P., Poisot, T., and D. Gravel. 2017b. Scientific theories and artificial intelligence. bioRxiv:161125. Devictor, V., Whittaker, R.J., and C. Beltrame. 2010. Beyond scarcity: citizen science programmes as useful tools for conservation biogeography. Diversity and distributions 16:354–362. CrossRef Dietze, M.C., Fox, A., Beck-Johnson, L.M., Betancourt, J.L., Hooten, M.B., Jarnevich, C.S., et al. 2018. Iterative near-term ecological forecasting: Needs, opportunities, and challenges. Proceedings of the National Academy of Sciences:201710231. CrossRef Donaldson, M.R., Burnett, N.J., Braun, D.C., Suski, C.D., Hinch, S.G., Cooke, S.J., and J.T. Kerr. 2016. Taxonomic bias and international biodiversity conservation research. FACETS 1: 105–113. CrossRef Dörner, D., and J. Funke. 2017. Complex problem solving: What it is and what it is not. Frontiers in Psychology 8. CrossRef Drazen, J. M. 2016. Data sharing and the journal. New England Journal of Medicine 374:e24. CrossRef Elith, J., and J.R. Leathwick. 2009. Species Distribution Models: Ecological Explanation and Prediction Across Space and Time. Annual Review of Ecology, Evolution, and Systematics 40:677–697. CrossRef Evans, S.R. 2016. Gauging the purported costs of public data archiving for long-term population studies. PLOS Biology 14:e1002432. CrossRef Fawcett, T.W., and A.D. Higginson. 2012. Heavy use of equations impedes communication among biologists. Proceedings of the National Academy of Sciences 109:11735–11739. CrossRef Fegraus, E.H., Andelman, S., Jones, M.B., and M. Schildhauer. 2005. Maximizing the value of ecological data with structured metadata: An introduction to Ecological Metadata Language (EML) and principles for metadata creation. Bulletin of the Ecological Society of America 86:158–168. CrossRef Franklin, J. 2010a. Mapping species distributions: spatial inference and prediction. Cambridge University Press. CrossRef Franklin, J. 2010b. Moving beyond static species distribution models in support of conservation biogeography. Diversity and Distributions 16:321–330. CrossRef GBIF. 2016. GBIF Science Review 2016. Text. Geldmann, J., Heilmann-Clausen, J., Holm, T.E., Levinsky, I., Markussen, B., Olsen, K., et al., 2016. What determines spatial bias in citizen science? Exploring four recording schemes with different proficiency requirements. Diversity and Distributions 22:1139–1149. CrossRef iee 12 (2019) 19 Gil, I.S., Vanderbilt, K., and S.A. Harrington. 2011. Examples of ecological data synthesis driven by rich metadata, and practical guidelines to use the Ecological Metadata Language specification to this end. International Journal of Metadata, Semantics and Ontologies 6:46. CrossRef Gilpin, M. E. 1973. Do hares eat lynx? The American Naturalist 107:727–730. CrossRef Grilli, J., Adorisio, M., Suweis, S., Barabás, G., Banavar, J.R., Allesina, S., and A. Maritan. 2017. Feasibility and coexistence of large ecological communities. Nature Communications 8:14389. CrossRef Gyllenberg, M., Yan, P., and Y. Wang. 2006. Limit cycles for competitor–competitor–mutualist Lotka–Volterra systems. Physica D: Nonlinear Phenomena 221:135– 145. CrossRef Halevy, A., Norvig, P., and F. Pereira. 2009. The unreasonable effectiveness of data. IEEE Intelligent Systems 24:8–12. CrossRef Hampton, S.E., Anderson, S.S., Bagby, S.C., Gries, C., Han, X., Hart, E.M., et al., 2015. The Tao of open science for ecology. Ecosphere 6:1–13. CrossRef Hampton, S.E., Strasser, C.A., Tewksbury, J.J., Gram, W.K., Budden, A.E., Batcheller, A.L., et al. 2013. Big data and the future of ecology. Frontiers in Ecology and the Environment 11:156–162. CrossRef Harris, D. J., Taylor, S., and E.P. White. 2017. Forecasting biodiversity in breeding birds using best practices. bioRxiv:191130. CrossRef Hijmans, R.J. 2012. Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model. Ecology 93:679–688. CrossRef Hoffman, K., Leupen, S., Dowell, K., Kephart, K., and J. Leips. 2016. Development and assessment of modules to integrate quantitative skills in introductory biology courses. Cell Biology Education 15:ar14–ar14. CrossRef Houlahan, J.E., McKinney, S.T., Anderson, T.M., and B.J. McGill. 2017. The priority of prediction in ecological understanding. Oikos 126:1–7. CrossRef Hudson, P.J., Dobson, A.P., and K.D. Lafferty. 2006. Is a healthy ecosystem one that is rich in parasites? Trends in Ecology & Evolution 21:381–385. CrossRef Jørgensen, S. E. 2008. Overview of the model types available for development of ecological models. Ecological Modelling 215:3–9. CrossRef Legendre, P., and L. Legendre. 1998. Numerical ecology. Elsevier, Oxford, UK. Letten, A. D., P.-J. Ke, and T. Fukami. 2016. Linking modern coexistence theory and contemporary niche theory. Ecological Monographs. CrossRef Levin, S.A. 2012. Towards the marriage of theory and data. Interface Focus 2:141–143. CrossRef Lindenmayer, D.F., and G.E. Likens. 2018. Maintaining the culture of ecology. Frontiers in Ecology and the Environment 16:195–195. CrossRef Longo, D.L., and J.M. Drazen. 2016. Data Sharing. New England Journal of Medicine 374:276–277. CrossRef Lortie, C. J., Stewart, G., Rothstein, H., and J. Lau. 2013. Practical interpretation of ecological meta-analyses. PeerJ PrePrints. CrossRef Marathe, M., and N. Ramakrishnan. 2013. Recent advances in computational epidemiology. IEEE intelligent systems 28:96–101. CrossRef Marcogliese, D.J. 2005. Parasites of the superorganism: Are they indicators of ecosystem health? International journal for parasitology 35:705–716. CrossRef Maris, V., Huneman, P., Coreau, A., Kéfi, S., Pradel, R., and V. Devictor. 2017. Prediction in ecology: promises, obstacles and clarifications. Oikos. CrossRef Markowetz, F. 2017. All biology is computational biology. PLOS Biology 15:e2002050. CrossRef May, R.M. 2004. Uses and abuses of mathematics in biology. Science 303:790–793. CrossRef Miller, J.A., and P. Holloway. 2015. Incorporating movement in species distribution models. Progress in Physical Geography 39:837–849. CrossRef Mills, J.A., Teplitsky, C., Arroyo, B., Charmantier, A., Becker, P.H., Birkhead, T.R., et al., 2015. Archiving primary data: Solutions for long-term studies. Trends in Ecology & Evolution 30:581–589. CrossRef Mislan, K.A.S., Heer, J.M., and E.P. White. 2016. Elevating the status of code in ecology. Trends in Ecology & Evolution 31:4–7. CrossRef Morris, B.D., and E.P. White. 2013. The EcoData Retriever: Improving access to existing ecological data. PLoS ONE 8:e65848. CrossRef Nicholson, A.J., and V.A. Bailey. 1935. The balance of animal populations.—Part I. Proceedings of the Zoological Society of London 105:551–598. CrossRef Otto, S.P., and T. Day. 2007. A biologist’s guide to mathematical modeling in ecology and evolution. Princeton University Press. CrossRef Ovaskainen, O., Tikhonov, G., Norberg, A., Guillaume Blanchet, F., Duan, L., Dunson, D., et al., 2017. How to make more out of community data? A conceptual framework and its implementation as models and software. Ecology Letters. CrossRef Papert, S. 1996. An exploration in the space of mathematics educations. International Journal of Computers for Mathematical Learning 1. CrossRef Pascual, M. 2005. Computational Ecology: From the complex to the simple and back. PLoS Comp Biol 1:e18. CrossRef Pellissier, L., Rohr, R.P., Ndiribe, C., Pradervand, J-N., Salamin, N., Guisan, A., and M. Wisz. 2013. Combining food web and species distribution models iee 12 (2019) 20 for improved community projections. Ecology and Evolution 3:4572–4583. CrossRef Perretti, C.T., Munch, S.B., and G. Sugihara. 2013. Model-free forecasting outperforms the correct mechanistic model for simulated and experimental data. Proceedings of the National Academy of Sciences 110:5253–5257. CrossRef Petrovskii, S., and N. Petrovskaya. 2012. Computational ecology as an emerging science. Interface Focus 2:241– 254. CrossRef Poisot, T. 2015. Best publishing practices to improve user confidence in scientific software. Ideas in Ecology and Evolution 8: 50–54. CrossRef Poisot, T., Gravel, D., Leroux, S., Wood, S.A., Fortin, M- J., Baiser, B., et al., 2016. Synthetic datasets and community tools for the rapid testing of ecological hypotheses. Ecography 39:402–408. CrossRef Raghavan, B., Nardi, B., Lovell, S.T., Norton, J., Tomlinson, B., and D.J. Patterson. 2016. Computational Agroecology. Page Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing Systems–CHI EA ’16. Association for Computing Machinery (ACM). CrossRef Roche, D.G., Kruuk, L.E.B., Lanfear, R., and S.A. Binning. 2015. Public data archiving in ecology and evolution: How well are we doing? PLOS Biology 13:e1002295. CrossRef Roche, D.G., Lanfear, R., Binning, S.A., Haff, T.M., Schwanz, L.E., Cain, K.E., et al. 2014. Troubleshooting public data archiving: Suggestions to increase participation. PLoS Biology 12:e1001779. CrossRef Rykiel, E.J. 1996. Testing ecological models: the meaning of validation. Ecological Modelling 90:229– 244. CrossRef Sallan, L.C., Kammer, T.W., Ausich, W.I., and L.A. Cook. 2011. Persistent predator-prey dynamics revealed by mass extinction. Proceedings of the National Academy of Sciences 108:8335–8338. CrossRef Soetaert, K., and P.M.J. Herman. 2008. A practical guide to ecological modelling: Using R as a simulation platform. Springer Verlag. CrossRef Soranno, P.A., and D.S. Schimel. 2014. Macrosystems ecology: big data, big ecology. Frontiers in Ecology and the Environment 12:3–3. CrossRef Soyka, H., Budden, A., Hutchison, V., Bloom, D., Duckles, J., Hodge, A., et al., 2017. Using peer review to support development of community resources for research data management. Journal of eScience Librarianship 6. CrossRef Staniczenko, P.P.A., Sivasubramaniam, P., Suttle, K.P., and R.G. Pearson. 2017. Linking macroecology and community ecology: refining predictions of species distributions using biotic interaction networks. Ecology Letters. CrossRef Stock, M., Poisot, T., Waegeman, W., and B.D. Baets. 2017. Linear filtering reveals false negatives in species interaction data. Scientific Reports 7:45908. CrossRef Sullivan, B.L., Aycrigg, J.L., Barry, J.H., Bonney, R.E., Bruns, N., Cooper, C.B., et al., 2014. The eBird enterprise: an integrated approach to development and application of citizen science. Biological Conservation 169:31–40. CrossRef Sullivan, B.L., Wood, C.L., Iliff, M.J., Bonney, R.E., Fink, O., and S. Kelling. 2009. eBird: A citizen-based bird observation network in the biological sciences. Biological Conservation 142:2282–2292. CrossRef Thessen, A. 2016. Adoption of machine learning techniques in ecology and earth science. One Ecosystem 1:e8621. CrossRef Tiago, P., Ceia-Hasse, A., Marques, T.A., Capinha, C., and H.M. Pereira. 2017. Spatial distribution of citizen science casuistic observations for different taxonomic groups. Scientific Reports 7:12832. CrossRef Touchon, J.C., and M.W. McCoy. 2016. The mismatch between current statistical practice and doctoral training in ecology. Ecosphere 7:e01394. CrossRef Troudet, J., Grandcolas, P., Blin, A., Vignes-Lebbe, R., and F. Legendre. 2017. Taxonomic bias in biodiversity data and societal preferences. Scientific Reports 7:9132. CrossRef Vines, T.H., Albert, A.Y.K., Andrew, R.L., Débarre, F., Bock, D.G., Franklin, M.T., et al., 2014. The availability of research data declines rapidly with article age. Current Biology 24:94–97. CrossRef Warton, D.I., Foster, S.D., De’ath, G., Stoklosa, J., and P.K. Dunstan. 2014. Model-based thinking for community ecology. Plant Ecology 216:669–682. CrossRef West, A.M., Kumar, S., Brown, C.S., Stohlgren, T.J., and J. Bromberg. 2016. Field validation of an invasive species Maxent model. Ecological Informatics 36:126– 134. CrossRef White, E.P., Baldridge, E., Brym, Z.T., Locey, K.J., McGlinn, D.J., and S.R. Supp. 2013. Nine simple ways to make it easier to (re)use your data. Ideas in Ecology and Evolution 6: 1–10. CrossRef White, E.P., Yenni, G.M., Taylor, S.D., Christensen, E.M., Bledsoe, F.K., Simonis, J.L., and S.K.M. Ernest. 2018. Developing an automated iterative near-term forecasting system for an ecological study. bioRxiv:268623. CrossRef White, R.L., Sutton, A.E., Salguero-Gómez, R., Bray, T.C., Campbell, H., Cieraad, F., et al. 2015. The next generation of action ecology: novel approaches towards global ecological research. Ecosphere 6:1–16. CrossRef Wisz, M.S., Pottier, J., Kissling, W.D., Pellissier, L., Lenoir, J., Damgaard, C.F., et al. 2012. The role of biotic interactions in shaping distributions and realised assemblages of species: implications for species iee 12 (2019) 21 distribution modelling. Biological Reviews 88:15–30. CrossRef Zhang, W. 2010. Computational ecology: artificial neural networks and their applications. World Scientific Publishing, Singapore. CrossRef Zhang, W. 2012. Computational ecology: graphs, networks and agent-based modeling. World Scientific, New Jersey. CrossRef