N.B.: I reserve the right to make changes to the following schedule. Any revisions will be discussed in class, and you are encouraged to ask as many questions as needed.





Week 1: January 18

Teaching philosophy and class logistics.

Emerging scientific data trends and impacts on information systems. Data science and data informatics.


Chris Anderson, "The End of Theory: The Data Deluge Makes the Scientific Method Obsolete"

John Timmer, "Preserving science: what to do with raw research material?"
John Timmer, "Preserving science: what data do we keep? What do we discard?"

Week 2: January 25

What are scientific and technical data? Data forms, formats, properties, and sources.

Assignment:Data Management: Case study (read the case, answer the quiz, record your answers).

Preserving Scientific Data on Our Physical Universe: A New Strategy for Archiving the Nation's Scientific Information Resources. National Research Council (1995) ISBN: 0-309-52106-8. pp 10-32

Long-Lived Digital Data Collections: Enabling Research and Education in the 21st Century. National Science Board (NSF) (2005), pp. 17-23.

William Kent, The Many Forms of a Single Fact (1989)

Another look at data. George H. Mealy, Fall Joint Computer Conference (1967), pp. 525-534.

Week 3: February 1

Looking at specific datasets: describing, evaluating, and documenting S&T data.

Assignment: find and evaluate three different S&T datasets. Document findings on the wiki.

Due: February 8.

Bioinformatics in the Information Age. Sylvia J. Spengler, Science, 18 February 2000, Vol. 287, pp. 1221-1223. DOI: 10.1126/science.287.5456.1221

Scientific Data Management in the Coming Decade. Jim Gray, et al., ACM SIGMOD Record (Dec 2005), Vol. 34, No. 4, pp. 34-41.

Managing Scientific Data. Anastasia Ailamaki, et al., Communications of the ACM (June 2010), Vol. 53, No. 8, pp. 68-78. DOI:10.1145/1743546.1743568

Policy-making for Research Data in Repositories: A Guide: [http://www.disc-uk.org/docs/guide.pdf] pp 5-17.

Week 4: February 8

Looking a datasets (cont'd.): evaluation of S&T metadata and markup.

Assignment: find and evaluate three more different S&T datasets. Document findings on the wiki.

Due: February 15.

Thomson, Judi, et al., Metadata's Role in a Scientific Archive , 2003.

The Unsolvable Identity Problem. William Kent, Proceedings Extreme Markup Languages 2003

Introduction to 'Taxonomy for the twenty-first century'. H.C.J. Godfray & S. Knapp, (2004) Phil. Trans. R. Soc. Lond. B(2004) 359, pp. 559-569. DOI: 10.1098/rstb.2003.3.1457

Week 5: February 15 Evaluating and documenting S&T data quality.
Lide, D., Data Quality - More Important than ever in the Internet Age , CODATA Data Science Journal, Volume 6, 23 December 2007.

Zednik, Stephan, Characterizing quality for science data products , Tetherless World Weblog, 30 December 2011.

Quality of Research Data, an Operational Approach, Waaijers, L., D-Lib Magazine, Vol. 17, No. 1/2, Jan/Feb 2011 [[http://www.dlib.org/dlib/january11/waaijers/01waaijers.html]]

Data Archiving and Networked Services Data Seal of Approval 2009 [BlackBoard]

IMF Data Quality Framework 2003 [BlackBoard]

Week 6: February 22 Introduction to the Semantic Web and Linked Data initiatives in the sciences.   Which Semantic Web? Catherine C. Marshall and Frank M. Shipman, Hypertext '03 (2003).[BlackBoard]

Ontologies and the Semantic Web, Elin K. Jacob, Bulletin of the American Society for Information Science and Technology, April/May 2003, pp. 19-22.

The Semantic Web, Linked and Open Data, Lorna M. Campbell and Sheila MacNeill, JISC cetis Briefing Paper (2010) [[http://wiki.cetis.ac.uk/images/1/1a/The_Semantic_Web.pdf]]

Ontological foundations for conceptual modelling, Giancarlo Guizzard and Terry Halpin, Applied Ontology 3 (2008), pp. 1-2, 8-10. DOI: 10.32333/AO-2008-049. [BlackBoard]

Week 7: February 29 Introduction to the Semantic Web and Linked Data initiatives in the sciences (redux).
Assignment: Concept map model of an S&T data informatics topic area.

Due March 7.

Week 8: March 7 Semantic and Linked S&T data: ontologies in the wild. Assignment: Abstract of term paper.
Due March 21. Submit by e-mail.

Assignment: FOAF file fun.
Due March 28.

Clay Shirky, Semantic Web, Syllogism, and Worldview. (2003)

Paul Ford, Response to Shirky (2003)

Joshua Tauberer, Quick Intro to RDF

David Shotton, CiTO, the Citation Typing Ontology, Journal of Biomedical Semantics (2010), 1(Suppl 1):56




Week 9: March 21 Anderson - travel.
Guest lecture - Infochimps.

Readings - TBA.
Week 10: March 28 Semantic and Linked S&T data: trends and outstanding informatics issues.   Bio-ontologies: current trends and future directions, Olivier Bodenreider and Robert Stevens, Briefings in Bioinformatics, Vol. 7, No. 3, pp. 256-274 (2006), [[http://dx.doi.org/10.1093/bib/bbl027]]

Adventures in Semantic Publishing: Exemplar Semantic Enhancements of a Research Article, David Shotton, Katie Portwin, Graham Klyne, and Alastair Miles, PLoS Computational Biology 5(4) e1000361 (2009). doi: 10.1371/journal.pcbi.1000361.

Theoretical foundations and engineering tools for building ontologies as reference conceptual models, Giancarlo Guizzardi, Semantic Web, Vol. 1 (2010), pp. 3-10. DOI 10.3222/SW-2010-0015 [BlackBoard]
Week 11: April 4 Standards for S&T data citation and publication.

A Vast Machine: Standards as Social Technology, P.N. Edwards (2004). Science 304: 827-828.

M.A. Parsons & P.A. Fox Is Data Publication the Right Metaphor? (2011) [preprint blog post with comments]

Micah Altman and Gary King, (2007), A Proposed Standard for the Scholarly Citation of Quantitative Data

Bryan Lawrence, Catherine Jones, Brian Matthews, Sam Pepler, Sarah Callaghan. (2011) Citation and Peer Review of Data: Moving Towards Formal Data Publication

Paskin, N. (2005). Digital object identifiers for scientific data. Data Science Journal 4 (1): 12-20. [[http://www.jstage.jst.go.jp/article/dsj/4/0/4_12/_article]]
Week 12: April 11 Preservation of S&T data.

 Reader | Discussant: Boettcher | Franco

Berman, F. (2008). Got data? A guide to data preservation in the information age. CACM 51 (12): 50-56. doi:10.1145/1409360.1409376

Extracting, Transforming and Archiving Scientific Data, Daniel Lemire and Andre Vellino (2011). arXiv:1108.4041v2 [cs.DL] 23 Aug 2011
Week 13: April 18 Preservation of S&T data (cont'd.): discipline specific case studies. Reader | Discussant: Large | Marquardt

Term paper draft turned in for review.
Due: April 25

Data Preservation in High Energy Physics. David M. South, on behalf of the ICFA DPHEP Study Group, 17 January 2011, arXiv:1101.3186v1

Wiser, S. K., Bellingham, P. J., and Burrows, L. E. (2001). Managing biodiversity information: Development of New Zealand's National Vegetation Survey databank. New Zealand Journal of Ecology 25 (2): 1-17. [[http://nvs.landcareresearch.co.nz/html/Wiser_etal_2001_screen.pdf]]
Week 14: April 25 Data provenance, and data processing curation. Reader | Discussant: Travis | Waelder A Survey of Data Provenance in e-Science. Y.L. Simmhan, et al., ACM SIGMOD Record, Vol. 34, No. 3, Sept. 2005

Lineage Retrieval for Scientific Data Processing: A Survey, Bose, R. and Frew, J., ACM Computing Surveys, Vol. 37, No. 1, March 2005, pp. 1–28.

Provenance Management in Curated Databases, Buneman, P., Chapman, A.P., and Cheney, J., Proceedings of SIGMOD'06, June 2006, pp. 536-550.
Goble, C., Stevens R., Hull D., Wolstencroft K., and Lopez R. (2008). Data curation + process curation = data integration + science. Briefings in Bioinformatics 9 (6): 506-517, doi:10.1093/bib/bbn034
Week 15: May 2 Class presentations.

Term paper final revision.
Due May 5.