Astroinformatics 2018

René Andrae, MPIA, Germany

[Thurs. 16:20 | 25+5] Stellar parameter estimates in Gaia DR3: Plans and challenges: In this talk, I will first give a brief overview of astrophysical parameter estimation within the Gaia processing pipeline, which have already contributed to Gaia DR2. I will then focus on stellar parameter estimation from Gaia's low resolution spectra, presenting the chosen algorithmic design and its underlying rationale. Finally, I will discuss practical problems that we will face for Gaia DR3. Here, I will focus on the dilemma of an imperfect instrument model possibly hindering us from using model spectra while at the same time empirically trained models suffer from erroneous or missing labels causing idiosyncratic training distributions that differ strongly from the stellar distribution as observed by Gaia.

Demián Arancibia, CORFO, Chile

[Thurs. 14:30 | 25+5] CORFO Astroinformatics Program: The Astroinformatics program is funded by the Strategic Investment Fund of the Economy Ministry, through the Digital Transformation Agency (CTD) of the Chilean Economy Development Agency (CORFO). Its mission is to identify and initiate measures and investments to diversify and grow Chilean economy in Big Data, using natural advantages in Astronomy and its data-driven challenges. According to the strategy of CORFO and its CTD, the capacity to add data-driven value will be critical for competitive edges across industries over next decade, and a driver for diversification of the Chilean productive matrix. We worked with scientific and industrial communities to facilitate multi-sectorial agreements about what are our opportunities to achieve this mission. This is a progress report of the work done over the last 9 months.

Dalya Baron, Tel Aviv University, Israel

[Wed. 14:30 | 25+5] Finding simple structures in complex astronomical datasets: Astronomy is going through a revolution. As surveys become larger and deeper, we face unprecedented data volumes which challenge the classical methods with which we extract information. This challenge is not only due to the volumes of the data, it is also due to its complexity. How do we find trends and high-dimensional correlations in these large, complex, datasets? In this talk I will present the Sequencer, an algorithm developed by us to find one dimensional sequences in complex datasets. The algorithm is unsupervised, has no tuning parameters, and provides a score for the extracted sequence. I will present the main building blocks of the algorithm, its strengths and its weaknesses. I will finish by presenting new scientific discoveries we made using the Sequencer.

George Djorgovski, California Institute of Technology, USA

[Mon. 11:15 | 30+15] Astroinformatics: Exploring Space in the Cyberspace:

Fabian Gieseke, University of Copenhagen, Denmark

[Tues. 14:30 | 25+5] Nearest Neighbours in Astronomy: The data volumes have increased dramatically in many domains. This is also the case in astronomy, where current projects produce data volumes in the terabyte range. Upcoming ones will produce such data volumes per day, week, or even per hour. In many cases, the sheer data volumes render a manual analysis impossible and machine learning methods have been identified as one of the key tools to handle the data flood in future. Nearest neighbor methods give rise to simple yet powerful machine learning models. In my presentation, I will sketch the potential of these models in the context of astronomy by giving various examples, ranging from photometric redshift estimation to the visualization of high-dimensional data.

Matthew Graham, California Institute of Technology, USA

[Mon. 14:30 | 40+5] A primer on time series astronomy: We now have access to over 1 billion astronomical time series, most of which are irregularly and sparsely sampled, noisy, gappy, and heteroskedastic. However, we want to cluster, categorize, classify, and identify outliers in collections of these. In this talk, I will give an introduction to how we can work with these data and address some of the challenges that new surveys, such as ZTF and LSST, will face.

Ashish Mahabal, Caltech, USA

[Mon. 16:45 | 30+15] Deep Learning: A Broad Introduction and some explorations: Deep learning has started pervading most sciences, and thereby our lives. We provide a brief introduction to the underlying concepts with a focus on search for structure and similarity in astronomy where images and light curves are used alike. We mention various experiments as readily available libraries make implementations easier by the day. We also describe the parallels with other sciences where structure is sought. Finally we caution about issues like overlearning and undersampling, and about blindly using canned workflows.

Katharina Morik, TU Dortmund, Germany

[Thurs. 11:30 | 25+5] Dortmund Spectrum Estimation Algorithm with regularisation (DSEA+) for Cherenkov Telescopes: "Estimating the density function f: Y —> R of a physical quantity on the basis of measurements of another quantity g: X —> R is a common problem in (astro-)physics. For Cherenkov telescopes, the gamma particle energy is to be estimated on the basis of the observed light shape and size. Tim Ruhe [1] has turned the reconstruction of f from g as a classification task. The DSEA algorithm has now been improved by a regularization. Building blocks of deconvolution approaches are determined so that they can be combined in numerous ways. In the talk, DSEA+ is compared with Iterative Bayesian Unfolding and Regularized UNfolding. It is shown, that IBU is a special case of DSEA+.

Ray Norris, Western Sydney University, Australia

[Tues. 9:30 | 25+5] Machine Learning approaches to measuring redshifts for large radio surveys: All-sky radio surveys pose a special challenge for redshift determination. First, about half the sources are AGN, which present notorious challenges for measuring photometric redshifts. Second, their median redshift is about 1.4, so most of the galaxies will be optically very faint, with consequently poor photometry. Third, the available all-sky photometry is far less sensitive than that available in the classic deep fields where most photometric redshift studies are conducted. On the other hand, for most science cases, we do not require accurate redshifts, but instead are content to place sources in a number of redshift bins. Our driving goal is, therefore, not to maximise the accuracy of the redshifts, but to minimise the number of outliers. Furthermore, machine learning techniques enable us to use features such as radio photometry and polarisation which are not normally used in photometric redshift determination. We describe some experiments using machine-learning techniques on simulated EMU data, which show that a surprisingly high fraction of radio sources will have scientifically valuable redshift determinations.

Kai Polsterer, Astroinformatics HITS, Germany

[Mon. 12:15 | 30+15] Machine Learning in Astronomy: lessons learned from learning machines: The amount and size of astronomical data-sets was growing rapidly in the last decades. Now, with new technologies and dedicated survey telescopes, the databases are growing even faster. VO-standards provide an uniform access to this data. What is still required is a new way to analyze and tools to deal with these large data resources. E.g., common diagnostic diagrams have proven to be good tools to solve questions in the past, but they fail for millions of objects in high dimensional features spaces. Besides dealing with poly-structed and complex data, the time domain has become a new field of scientiffic interest. By applying technologies from the field of computer sciences, astronomical data can be accessed more efficiently. Machine learning is a key tool to make use of the nowadays freely available datasets. This talk exemplarily presents what we have learned when applying machine learning algorithms to data-sets in astronomy.

Pavlos Protopapas, Harvard University, USA

[Mon. 15:30 | 30+15] Transfer Learning Methods for Astronomical Datasets: It is often expensive and time-consuming to obtain labeled examples. In such cases, knowledge transfer from and between related domains would greatly boost performances, without the need for extensive labeling efforts. In this scenario, transfer and multi-task learning come in hand. In this talk, I will first present a Bayesian approach to transfer learning and then a deep variational autoencoder approach. The deep variational transfer learning is able to tackle all major challenges posed by transfer learning: different feature spaces, different data distributions, different output spaces, different and highly imbalanced class distributions, and the presence of irrelevant source domains. We test DVT on images and stars datasets. We perform both a quantitative evaluation of the model discriminative ability and qualitative exploration of its generative capacity.

André Schaaff, CDS, France

[Wed. 10:20 | 25+5] Chatting with the services. VO standards and NLP: Voice interaction is natural and more and more present in our everyday life, through assistants at our home, for heterogeneous requests (booking, weather, shopping, etc.). The purpose is to relate a budding R&D work, Natural Language Processing applied to the querying of astronomical data services. We are not geeks and we would like to propose this new way of interaction in the future, as an alternative to the traditional forms exposing parameter fields, check boxes, etc. It is of course easy to prototype something. But our aim is to answer the fundamental question : is it possible to reach query results satisfying professional astronomers? We have not started from scratch, the Virtual Observatory (VO) brings us standards like TAP, UCDs, ..., implemented in the CDS services. The VO enables the interoperability which is a mandatory backbone, helping us to query our services in NL and which will be useful in a further step to query the whole VO through this way. We will present our pragmatic approach exploiting all the resources we have (authors in Simbad, missions and wavelengths in VizieR, UCDs, ADQL/TAP, ...) and using a chatbot interface (involving Machine Learning) to reduce the gap between good and imprecisely/ambiguous queries. We would be happy to collect, on this occasion, comments (if possible enthusiastic) or to initiate collaborations.

Dany Vohl, ASTRON, the Netherlands

[Wed. 09:30 | 25+5] New approaches to Volume and Velocity challenges of Modern Astronomy: Fundamental problems facing modern astronomy relate to processing, visualization, analysis, and remote access to data. As the volume and velocity at which data is generated and stored increases, new approaches, methods and analytical tools are required to let us fully explore information hidden within our data. In this talk, I discuss ways to enhance and streamline analysis tasks in surveys by adopting a set of informatics tricks—including concepts like display ecology, visual and immersive analytics, `single instruction, multiple visualizations`, graphics shaders, and data compression—to alleviate a number of bottlenecks and offer a better experience for individuals and research teams.

Rein Herm Warmels, ESO, Germany

[Tues. 16:30 | 25+5] Science, Data and Software - About Transparency, Credit, and Citation: Within the science community it is well understood that previous results that are relevant or used in research papers must be properly referenced. Similar practices are common when the science results are based on observational data obtained from e.g. astronomical observing faculties. However, despite being a critical component for enabling scientific results, research software often lacks transparency, credit and citation. The presentation discusses the current situation with regarding recognition of research software and possible scenarios for further improvements.

Contributed Talks

Aryeh Brill, Columbia University, USA

[Thurs. 12:00 | 15+3] CTLearn: Deep Learning for Gamma-ray Astronomy: We introduce a Python package, CTLearn, for using deep learning to analyze data from arrays of Imaging Air Cherenkov Telescopes (IACTs), particularly, but not exclusively, the Cherenkov Telescope Array (CTA), the next-generation observatory for gamma-ray astronomy. CTLearn includes modules for loading and manipulating IACT data and for running machine learning models with TensorFlow. Its high-level interface provides a configuration-file-based workflow to drive reproducible training and prediction. IACTs detect very-high-energy gamma rays by capturing images of the particle showers they produce when they are absorbed by the atmosphere. The sensitivity of IACTs to gamma-ray sources depends strongly on efficiently rejecting the background of much more frequent cosmic-ray showers. We present preliminary results using CTLearn to train an event classification model using data from a simulated array of CTA telescopes.

Guillermo Cabrera-Vives, Department of Computer Science, University of Concepción, Chile

[Wed. 16:30 | 15+3] The Universe in a stream: brokering alerts with the ALeRCE system: With a new generation of large etendue survey telescopes there is a growing need for alert processing systems. This includes real-time processing of the raw data for alert generation, real-time annotation and classification of the alerts, and real-time reaction to interesting alerts using available astronomical resources. We present the Automatic Learning for the Rapid Classification of Events (ALeRCE), an integrated system aiming at the rapid classification of events from time domain surveys for the purpose of automatically selecting relevant candidates for follow-up. ALeRCE is an initiative led by an interdisciplinary and interinstitutional group of scientists in Chile in collaboration with international researchers from a wide range of institutions. The system considers state-of-the-art solutions in data orchestration, machine learning, databases, and visualization among others. ALeRCE will be adapted to different surveys including the Zwicky Transient Facility (ZTF), ATLAS, and the Large Synoptic Survey Telescope (LSST), among others.

Antonio D'Isanto, HITS, Germany

[Tues. 12:10 | 15+3] Return of the features: The explosion of data in recent years has generated an increasing need for new analysis techniques in order to extract knowledge from massive data-sets. Machine learning has proved particularly useful to perform this task. Fully automatized methods (e.g. deep neural networks) have recently gathered great popularity, even though those methods often lack physical interpretability. In contrast, feature based approaches can provide both well-performing models and understandable causalities with respect to the correlations found between features and physical processes. In this view, an efficient feature selection is essential to boost the performance of machine learning models. Therefore we propose to compute, evaluate, and characterize better performing features for the important task of photometric redshift estimation. We synthetically created 4, 520 features by combining magnitudes, errors, radii, and ellipticities of quasars, taken from the SDSS. A forward selection strategy is then adopted, a recursive method in which a huge number of feature sets is tested through a k-Nearest-Neighbours algorithm, leading to a tree of feature sets. The branches of the feature tree are used to perform experiments with the random forest, in order to validate the best set with an alternative model. We demonstrate that the unusual sets of features determined with this approach improve the performances of the regression models significantly when compared to the performance of the classic features from the literature. Therefore, a method to interpret some of the found features in a physical context is presented. The feature selection methodology is very general and can be used to improve the performance of machine learning models for any regression or classification task.

Tim Galvin, CSIRO, Australia

[Tues. 10:00 | 15+3] Applying PINK to Radio Galaxy Zoo Data: With the advent of large scale surveys the manual analysis and classification of individual radio source morphologies is rendered impossible as existing approaches do not scale. The analysis of complex morphological features in the spatial domain is a particularly important task. Here we discuss the challenges of transferring crowdsourced labels and introduce a proper transfer mechanism via modified random forest regression. By using parallelized rotation and flipping invariant Kohonen-maps, the images of radio galaxies are first projected down to a two-dimensional embedding in an unsupervised way. This embedding can be seen as a discretised space of shapes with the coordinates reflecting morphological features as expressed by the automatically derived prototypes. In the second step, images are compared with those prototypes to create a heat-map, which is the morphological fingerprint of each object and the basis for transferring the user generated labels. We also question the currently used discrete classification schema and introduce a continuous scale that better reflects the uncertainty in transition between two classes, caused by sensitivity and resolution limits.

Nikos Gianniotis, Astroinformatics HITS, Germany

[Thurs. 10:20 | 15+3] :

Ping Guo, Beijing Normal University, China

[Thurs. 17:30 | 15+3] A Two-dimensional Sky Background Model for LAMOST with Self-similarity Algorithm Optimization: Sky-subtraction, which is to remove the sky light from the target spectra observed by LAMOST, is one of the key procedures while dealing with two-dimensional (2D) fiber spectral data. Both the efficiency and the precision of sky-subtraction will affect the following-up data processing such as spectra flux extraction and so on. Currently approaches applied in the LAMOST system are the ‘super sky light’ method and principal component analysis (PCA) method. However, these methods are all modeled based on the one-dimensional (1D) extracted spectra, which ignore the similarity and continuity of the spectra both in the wavelength and spatial direction. In this paper, on considering the characteristics of the spectra, we proposed a novel approach to model 2D sky background. We first initialize the sky background according to the positions of the sky fibers relative to any object, and define a sliding window to represent the flux matrix that being corrected currently. Then, the self-similarity algorithm is applied around each other targets spectra considering the similarity both in wavelength and spatial directions. We compare our method with the commonly used ‘super sky light’ method through the same criterion (MSE between simulated and real skylight data and the variance of object spectrum after sky-subtraction). The experimental results show that our method is effective and it not only can ensure the correlation of each spectra but also can improve the precision of sky-subtraction. It will be a promising technique for sky background subtraction in CCD images.

Wolfgang Kerzendorf, European Southern Observatory, Germany

[Thurs. 10:00 | 15+3] Emulators - physics meets machine learning: Comparing sophisticated simulations with complex observations is one of the main challenges in modern astrophysics. The computational time requirements (several minutes for some of the simplest simulations) and large parameter spaces (sometimes several tens of parameters) prohibit an exploration using traditional techniques. The rapid growth of observational data exacerbates the problem of the mentioned data analysis. Thus, it is essential to have tools that allow automated extraction of meaningful physical quantities from the large data vaults. I will present our supernova radiative transfer code (TARDIS - Kerzendorf & Sim 2014) that can quickly synthesize supernova spectra with some physical accuracy (using well-tested methods). We couple this code with a novel deep-learning enabled emulator that can interpolate in the multi-dimensional space and allow us to extract posterior density functions given data. In this talk, I will introduce the code, give an overview of some of the preliminary results, and will close with an overview of our future research.

Jakob Knollmüller, Max-Planck-Institute for Astrophysics, Germany

[Tues. 15:20 | 15+3] UBIK - a universal Bayesian imaging toolkit: The abstract problem of recovering signals from data is at the core of any scientific progress. In astrophysics, the signal of interest is the universe around us. It is varying in space, time and frequency and is populated by a large variety of phenomena. To capture some of its aspects, large and complex instruments are built. We present an approach how to combine information from various instruments in one inference machinery. A joint reconstruction of multiple instruments can provide a deeper picture of the same object, as it facilitates to distinguish between signal and instrumental effects. Joint imaging can be used to re-adjust the internal calibration of the instruments. Underlying this is a variational Bayesian inference, which allows for the quantification of uncertainty on all parameters. In many cases the full spatio-spectral-temporal problem is computationally too costly and one can decrease the dimensionality of the reconstruction significantly. The versatility of this approach is demonstrated by numerous examples.

Rafael Martínez-Galarza, Harvard-Smithsonian Center for Astrophysics, USA

[Wed. 15:20 | 15+3] Finding anomalous light curves in large surveys with machine learning: Upcoming large observational surveys such as the Large Synoptic Survey Telescope (LSST) will produce millions of irregularly-sampled astronomical light curves. The enhanced sensitivity and time-sampling strategies of LSST will open a new window for several fields of astronomy, including precision cosmology, variable stars, as well as the discovery and characterization of new solar system objects. However, the large volume of the data that LSST will produce will require sophisticated algorithms for processing and interpreting these light curves. One important related question is how to find the most anomalous light curves, those that are perhaps not explained by current models. In this talk I will discuss state-of-the-art anomaly detection methods that use machine learning to find needles in the upcoming haystack of data. I will discuss several approaches to feature extraction for irregular light curves, including the use of auto-encoding recurrent neural networks, and the performance of several anomaly detection algorithms with respect to the features used. I will show the results of applying these methods to several time domain surveys, including SDSS's Stripe 82 and the All Sky Automated Survey (ASAS) catalog of variable stars, and present some of the weirdest light curves found. Finally, I will present the results of our upcoming Photometric LSST Astronomical Time-Series Classification Challenge (PLAsTiCC), which will reach out to the community to spur development of algorithms to classify astronomical transients.

Gábor Marton, Konkoly Observatory, Hungary

[Thurs. 16:45 | 15+3] YSO candidates in the Gaia Photometric Alerts System: The second Gaia Data Release (DR2) is available since 25 April 2018. It contains astrometry and photometry for more than 1.6 billion objects brighter than magnitude 21.4. The number of objects in the catalogue implies that many Young Stellar Objects (YSOs) were also observed and measured. These YSOs are in different stages of their evolution and sometimes produce sudden light changes that are caused by the dynamic processes of the protostellar disk around these objects. The Gaia Photometric Science Alerts System is able to identify such light variations, and already alerted for some YSOs, but the system is currently not sensitive enough for most of the YSO light changes. Probability based classification of >103 million sources of the GAIA DR2 was done with different machine learning techniques. Our aim was to identify YSOs in the Gaia Photometric Science Alerts System and to help improving the system sensitivity to YSO events by providing the most complete list of YSO candidates as possible.

Francesco Mauro, Universidad Catolica del Norte, Chile

[Wed. 15:00 | 15+3] SkZpipe: more than a simple photometric pipeline: In an era characterized by big sky surveys and the availability of large amount of photometric data, it is important for the astronomers to have tools to process their data in an efficient, accurate and easy way, minimizing elaboration time. We present SkZpipe, a Python3 module designed mainly to process generic data performing point-spread function fitting photometry with the DAOPhot suite (Stetson 1987). The software has already demonstrated its accuracy and efficiency with the adaptation VVV-SkZ_pipeline (Mauro et al. 2013) for the "VISTA Variables in the Vía Láctea" ESO survey, showing how it can replace the user avoiding repetitive interaction in all the operations, retaining all of the benefits of the power and accuracy of the DAOPhot suite, detaching them from the burden of data elaboration. SkZpipe is an evolution designed to increase the ease of the use and costumization. The software provides mainly not a pipeline, but an environment with all the tools to run each atomic step of the photometric procedure, to match the results, to extract information from fits headers, and to retrieve information from the internal instrumental database.We plan to add the support to other photometric software in the future.

Rafaël Mostert, Leiden University, The Netherlands

[Thurs. 15:00 | 15+3] Unveiling the morphologies of the the LOFAR Two-metre Sky Survey radio source population with self-organised maps: The Low Frequency Array (LOFAR) Two-metre Sky Survey (LoTSS) is undertaking a low-frequency radio continuum survey of the sky at unparalleled resolution and sensitivity. In order to fully exploit this huge dataset and the tens of Tbit/s produced by the SKA in the next decade, automated methods in machine learning and data-mining are going to become increasingly essential both for identifying optical counterparts to the radio sources and for morphological classifications. Using Self-Organizing Maps (SOMs), a form of unsupervised machine learning, we explore the diversity of radio morphologies for the ~20k extended radio continuum sources in the LoTSS first data release (a few percent of the final LoTSS survey). We make use of PINK (Polsterer et al.), a code which extends the SOM algorithm with rotation and flipping invariance, increasing its suitability and effectiveness for training on astronomical sources. After training, the SOMs can be used for a wide range of science exploitation and we present an illustration of their potential for mining the full LoTSS survey: finding an arbitrary number of the morphologically most rare sources in our data. Objects found this way span a wide range of morphological and physical categories: extended jets of radio AGN, diffuse cluster haloes/relics and nearby spiral galaxies. Finally, to enable accessible, interactive and intuitive data exploration we showcase the LOFAR-PINK Visualization Tool that allows users to easily explore the LoTSS dataset through the trained SOMs.

Ancla Müller, Ruhr-University Bochum, Germany

[Tues. 15:00 | 15+3] Sharpening up Galactic all-sky maps with complementary data; A machine learning approach: Galactic all-sky maps at very disparate frequencies, like in the radio and γ-ray regime, show similar morphological structures. This mutual information reflects the imprint of the various physical components of the interstellar medium. We want to use multifrequency all-sky observations to test resolution improvement and restoration of unobserved areas for maps in certain frequency ranges. For this we aim to reconstruct or predict from sets of other maps all-sky maps that, in their original form, lack a high resolution compared to other available all-sky surveys or are incomplete in their spatial coverage. Additionally, we want to investigate the commonalities and differences that the ISM components exhibit over the electromagnetic spectrum. We build an n-dimensional representation of the joint pixel-brightness distribution of n maps using a Gaussian mixture model and see how predictive it is: How well can one map be reproduced based on subsets of other maps? Tests with mock data show that reconstructing the map of a certain frequency from other frequency regimes works astonishingly well, predicting reliably small-scale details well below the spatial resolution of the initially learned map. Applied to the observed multifrequency data sets of the Milky Way this technique is able to improve the resolution of, e.g., the low-resolution Fermi LAT maps as well as to recover the sky from artifact-contaminated data like the ROSAT 0.855 keV map. The predicted maps generally show less imaging artifacts compared to the original ones. A comparison of predicted and original maps highlights surprising structures, imaging artifacts (fortunately not reproduced in the prediction), and features genuine to the respective frequency range that are not present at other frequency bands. We discuss limitations of this machine learning approach and ideas how to overcome them.

Szymon Nakoneczny, National Centre for Nuclear Research, Poland

[Tues. 11:30 | 15+3] Machine learning based quasar detection in Kilo-Degree Survey: We present a machine learning pipeline for quasar detection in Kilo-Degree Survey (KIDS) - an imaging deep and wide field survey covering 447 sq. deg. on the sky. Presently, it contains 49 millions of sources among which, however, a vast majority does not have any spectroscopically confirmed identification. Our aim was to design a method to successfully select quasars based solely on the KIDS photometric observations. We tested many machine learning models and successfully trained them based on the KIDS data and a set of quasars identified thanks to the SDSS spectroscopic classification. Our final catalog was created with Random Forest classifier, consists of 190,000 quasar candidates and its training purity equals 91%. Additional validation of the catalog was made by the means of comparison with second data release of GAIA, external quasar catalogs and number count analysis.

Robert Nikutta, NOAO, USA

[Wed. 10:00 | 15+3] The NOAO Data Lab: Data Lab (http://datalab.noao.edu) is NOAO's new science exploration platform. It provides open access to large-scale survey data (e.g. DES and Legacy Surveys), images and spectra acquired on NOAO facilities, and allows users to share and publish their own data with collaborators or the wider community. Data Lab furnishes users with compute resources, virtual storage (disk and database) and interfaces to assist in data analysis, filtering, processing and visualization. Additionally, access to local copies of high-value reference data (Gaia, AllWISE, SDSS, etc) as well as external data resources provides an integrated environment that is a great resource for anyone interested in large scale survey science, and especially for researchers seeking readiness for LSST now. I will introduce the Data Lab ecosystem, its functionalities, and database holdings including the newly released uniformly-processed, all-sky NOAO Source Catalog (NSC) of all public data obtained with NOAO instruments. In a separate demo/tutorial session participants can try Data Lab for themselves by working through several specific science examples ranging from extracting light curves of variable objects to the detection of Milky Way dwarf satellites. This will hopefully inspire participants to bring their own big-data science questions to an integrated science platform such as Data Lab.

Samson Ogagaoghene Ojako, University of KwaZulu-Natal Durban, South Africa

[Thurs. 10:20 | 15+3] Massless Scalar Field Gravitational Collapse: We study the gravitational collapse of a generalized Vaidya spacetime in the context of the massless Scalar Field and that of the structure and dynamics of a thin shell going at the speed of light which was acquired from a basic and solid solution that is a clear expansion and persistent breaking point of the natural extrinsic-curvature algorithm for subliminal shells. The prescription is applied to several examples of interest in general relativity and cosmology.

did not show up!

Artem Poliszczuk, National Centre for Nuclear Research, Poland

[Tues. 11:50 | 15+3] Fuzzy SVM algorithms applied for searching for AGN in the AKARI NEP Deep data: In the era of big data driven astronomy creation of trustable catalogs of astronomicalobjects plays crucial role in the observational cosmology. In particular, AGN catalogs still cause many difficulties for the observers due to the strongly underrepresented type 2 AGN class of objects obscured by dust. In this presentation selection of the AGN candidates based on the modified version of the support vector machine (SVM) algorithm, a fuzzy SVM, will be presented. This modification allows to incorporate measurement uncertainties into the classification process and, as the result, gives more trustable and physically motivated result. Searching for AGN and type 2 AGN specifically is especially efficient in the mid-infrared passbands, what makes AKARI NEP Deep source catalog a unique dataset for AGN selection.

Francisco Pozo Nunez, Haifa Research Center for Theoretical Physics and Astrophysics, University of Haifa, Israel

[Wed. 15:40 | 15+3] Automated observations, data reduction and time series analysis of Active galactic Nuclei: Photometric reverberation mapping (PRM) is an efficient method to study the central engine of active galactic nuclei (AGNs) through the use of broad and narrow-band filters to trace the variations from the AGN accretion disk, broad emission line region (BLR) and dust distribution. Compared to spectroscopic reverberation mapping which require large 2-4m telescopes, the advantage of PRM lies in the high accuracy performance which can be reached by using small telescopes and the larger amount of calibration stars in the field of view. However, to carry out successful PRM experiments, for instance to study the accretion disk in AGNs one needs to measure the light echoes traced between continuum emission at different wavelengths with superb accuracy. Precise time delays can only be achieved using high precision photometry from high quality light curves, which in turn depend on good time sampling combined with efficient data reduction processes. Motivated by the recent progress in PRM, we have started a fully automated photometric monitoring of AGNs using robotic telescopes located at the Wise Observatory in Israel. The telescopes are specially equipped with broad and narrow-band filters to perform high-fidelity PRM of the accretion disk and the BLR in V <17 mag sources up to z = 0.4. Here, we describe the autonomous operation of the telescopes together with the fully automatic pipeline used to achieve high-performance unassisted observations, data reduction, and time series analysis using different methods.

Nima Sedaghat, University of Freiburg, Germany

[Wed. 16:50 | 15+3] TransiNet: deep feature learning for transient detection: Large sky surveys are increasingly relying on image subtraction pipelines for real-time (and archival) transient detection. In this process one has to contend with varying point-spread function (PSF) and small brightness variations in many sources, as well as artefacts resulting from saturated stars and, in general, matching errors. Very often the differencing is done with a reference image that is deeper than individual images and the attendant difference in noise characteristics can also lead to artefacts. We present here a deep-learning approach to transient detection that encapsulates all the steps of a traditional image-subtraction pipeline -- image registration, background subtraction, noise removal, PSF matching and subtraction -- in a single real-time convolutional network. Once trained, the method works lightening-fast and, given that it performs multiple steps in one go, the time saved and false positives eliminated for multi-CCD surveys like Zwicky Transient Facility and Large Synoptic Survey Telescope will be immense, as millions of subtractions will be needed per night.

Petr Škoda, Astronomical Institute of the Czech Academy of Sciences, Czech

[Thurs. 17:10 | 15+3] A Heuristic Deep Learning Based Discovery of Emission-line stars in LAMOST Spectra Surveys: The current archives of LAMOST telescope contain millions of spectra of stellar objects classified by two post-processing pipelines into few object classes (star, galaxy, quasar, unknown) with a attempt to estimate an approximate spectral class for a subsample of stars. Such a catalogue is, however, insufficient for finding objects identified by characteristic spectral line profiles, namely emission line stars, like Be or B[e] stars. We present a heuristic iterative method for finding such objects, combining the domain adaptation of spectra from high to low resolution spectrographs in pre-training phase, deep learning using TensorFlow and purification of target class in training set driven by visual preview of random sample of previously predicted candidates. The performance of the method is discussed and the resulting set of newly discovered Be stars candidates are compared with other catalogues obtained by pixel-based statistics of predicted line positions. The deep learning approach proves to be better and more general, allowing the construction of a "query by example" engine tunable for selection of objects with particular spectral feature in upcoming mega-spectra surveys.

Hossen Teimoorinia, CADC, Canada

[Thurs. 12:20 | 15+3] Automated CFHT Data Quality Assessment With Machine Learning: Optical images taken by ground-based telescopes, such as the Megacam instrument mounted on CFHT, cover a wide range of images in terms of quality. Low-quality images contain problems regarding the telescope tracking, bad sky conditions (e.g., bad seeing and cloudy conditions) and different problems in the background, such as high background fluctuations, very bad object saturations and dead CCDs. At the Canadian Astronomy Data Center (CADC), we have developed a Machine Learning method to classify Megacam images into different categories in order to separate the 'good' and usable images from the low-quality ones. We use information from the pixels in all 36 or 40 CCDs of an exposure and present different probabilities associated with the different classes for each CCD. The good images are selected based on the obtained probabilities and a set of decision boundaries considering the probability distributions. We have tested our trained network on different, independent test data (randomly selected from ~220,000 exposures which contain ~9,000,000 CCDs) and obtained more than 97% accuracy. This method will be applied on more than 1,000,000 exposures in CADC database, in a fully automated way.

Mattia Vaccari, IDIA/UWC, South Africa

[Thurs. 15:20 | 15+3] The IDIA Cloud and the HIPPO Project: The IDIA cloud is a cloud computing system being developed at the Institute for Data Intensive Astronomy (IDIA). The IDIA cloud is a data intensive research facility whose main aim is to facilitate the reduction and the scientific exploitation of MeerKAT data. Building on the IDIA cloud, the HELP-IDIA Panchromatic Project (HIPPO) is developing an environment for the effective multi-wavelength characterization of radio sources detected by MeerKAT. In my talk I will introduce the IDIA cloud, detail the aims and the status of the HIPPO project and demonstrate some IDIA and HIPPO use cases in a Jupiter Notebook.

Christian Wolf, Australian National University, Australia

[Tues. 10:20 | 15+3] SkyMapper Southern Survey and All-Sky Virtual Observatory: I will present the SkyMapper Southern Survey: its legacy component covering 21,000 deg^2 of Southern sky including the upcoming DR2, and time-domain aspects including the follow-up of LIGO alerts. I will comment on crucial aspects of the science data pipeline, its development, performance and database. I will discuss object classification, redshift estimation and variability detection among the billion brightest objects. Finally, I will present the Australian All-sky Virtual Observatory ASVO and comment on its future. My talk addresses time-domain and survey astronomy, data-intense processing, knowledge discovery in large databases and virtual-observatory infrastructure.

Time	Mon. 03	Tues. 04	Wed. 05	Thurs. 06	Fri. 07
09:00		registration	registration		registration


09:15


09:30		R. Norris	D. Vohl	registration	hackathon & unconference


09:45


10:00	registration	T. Galvin	R. Nikutta	W. Kerzendorf


10:15
		C. Wolf	A. Schaaff	N. Gianniotis

10:30

		discussion		discussion
10:45
			discussion
			discussion
11:00	welcome	coffee break	coffee break	coffee break	coffee break


11:15	G. Djorgovski


11:30		S. Nakoneczny	posters & demos	K. Morik	hackathon & unconference


11:45	q&a
		A. Poliszczuk

12:00	short break			A. Brill

		A. D'Isanto
12:15	K. Polsterer
				H. Teimoorinia

12:30		discussion

				discussion
12:45
		conference picture
		conference picture
13:00	lunch break	lunch break	lunch break	lunch break	lunch break


13:15


13:30


13:45


14:00

					closing
14:15


14:30	M. Graham	F. Gieseke	D. Baron	D. Arancibia


14:45


15:00		A. Müller	F. Mauro	R. t


15:15	short break
		J. Knollmüller	R. Martinez-Galarza	M. Vaccari

15:30	P. Protopapas

		discussion	F. Pozo Nunez	discussion
15:45				discussion
				coffee break

16:00	q&a	coffee break	coffee break


16:15	coffee break
				R. Andrae

16:30		R. Warmels	G. Cabrera-Vives


16:45	A. Mahabal
			N. Sedaghat	G. Marton

17:00		discussion

			discussion	P. Skoda
17:15


17:30	welcome reception 17:30-20:00			P. Guo


17:45
				discussion
				discussion
		city tour 18:30-20:00	dinner 19:00-24:00	pub-crawl 21:00-24:00

Introductory / Overview	Extragalactic	AGN / Quasars	Machine Learning / Data Reconstruction
Reproducibility / Research Software	Research Environments / Infrastructure	Complex Datasets / Time Series	Theory
CTA / CFHT / Survey Processing	GAIA / LAMOST

3-7 September | HITS | Heidelberg | Germany

Programme

Live Streams

The recorded talks of each presenter can be found below.

Invited Speakers / Introductory Talks

Contributed Talks

Samson Ogagaoghene Ojako, University of KwaZulu-Natal Durban, South Africa

Posters

Alexander Becker, Ruhr-University Bochum, Germany

Guillermo Cabrera-Vives, Department of Computer Science, University of Concepción, Chile

Manuel Dörr, University Würzburg, Germany

Miroslav Fedurco, Pavol Jozef Šafárik University, Slovakia

Erica Hopkins, HITS, Germany

Tumelo Mangena, University of the Western Cape, South Africa

Sibusiso Mdhluli, University of Western Cape, South Africa

Events

Welcome Reception

City Tour

Conference Dinner

Pub Crawl