NCSU GeoForAll Lab
at the
Center for Geospatial Analytics
North Carolina State University
GIS 710: Geospatial Analytics for Grand Challenges
November 11, 2024
Stodden et al. (PNAS, March 13, 2018)
204 computational articles from Science in 2011–2012
It would be preferable to have the time to do it right, while simultaneously allowing scientists to be human and make mistakes, instead of focusing on novelty, being first and publishing in highly selective journals.Discussion questions: What research gets published? What research gets funded?
Holding, A. N. (2019). Novelty in science should not come at the cost of reproducibility. The FEBS Journal, 286(20), 3975–3979. DOI 10.1111/febs.14965
[...] irreproducible research [...] careers [...] personal cost: young scientists [...] their families [...] visas that are conditional [...] Running out of time [...] pressure on early-career researchers to deliver high-impact results. The outcome is an environment that pushes people to get across the line as quickly as possible, while the incentives to challenge or to reproduce previous studies are minimal.Discussion questions: How important is to challenge or reproduce previous studies? How important is being able to reproduce your own studies?
Holding, A. N. (2019). Novelty in science should not come at the cost of reproducibility. The FEBS Journal, 286(20), 3975–3979. DOI 10.1111/febs.14965
‘There is always this fear, that someone steals your ideas, or is doing the same thing at the same time, and some people fear it more than other people, I think especially younger people, also some older. I think this causes a lot of stress to the scientists, and it has happened to me. […] you try not to think about it, you still think that what if someone else is doing the same thing and this is useless work, so then it takes your energy.’Discussion questions: What is scooping (being scooped) in science? Are you afraid of it?
— research participant in
Laine, Heidi (2017). Afraid of scooping: Case study on researcher strategies against fear of scooping in the context of open science. Data Science Journal. DOI 10.5334/dsj-2017-029
[...] releasing datasets as open data may threaten privacy, for instance if they contain personal or re-identifiable data. Potential privacy problems include chilling effects on people communicating with the public sector, a lack of individual control over personal information, and discriminatory practices enabled by the released data.Discussion questions: Do you use or create personal or private data in your research or do you expect you will?
Borgesius, F. Z., Gray, J., & van Eechoud, M. (2016). Open Data, Privacy, and Fair Information Principles: Towards a Balancing Framework. 10.15779/Z389S18
Insights obtained by compiling public information from Open Data sources, may represent a risk to Critical Infrastructure Protection efforts. This knowledge can be obtained at any time and can be used to develop strategic plans of sabotage or even terrorism.Discussion questions: Do you use or create sensitive data in your research or do you expect you will?
Fontana, R. (2014). Open Data analysis to retrieve sensitive information regarding national-centric critical infrastructures. http://open.nlnetlabs.nl/downloads/publications/...
[...] that’s going to be harder. [...] I’m expecting to get screenshots of MATLAB procedures and horrible Python code that even the author can’t read anymore, and I don’t know what we’re going to do about that. Because in some sense, you can’t push too hard because if they go back and rewrite the code or clean it up, then they might actually change it.Discussion questions: Have you ever broadly shared source code or other internal parts of your work?
— An interviewed journal editor-in-chief in
Sholler, D., Ram, K., Boettiger, C., & Katz, D. S. (2019). Enforcing public data archiving policies in academic publishing: A study of ecology journals. Big Data & Society, 6(1). DOI 10.1177/2053951719836258
“That’s really the tragedy of the funding agencies in general,” says Carpenter. “They’ll fund 50 different groups to make 50 different algorithms, but they won’t pay for one software engineer.”Discussion questions: What open source software which high-relevant to research do you know? Any idea about how it is funded?
— Anne Carpenter, a computational biologist at the Broad Institute of Harvard and MIT in Cambridge in
Nowogrodzki, Anna (2019). How to support open-source software and stay sane. Nature, 571(7763), 133–134. DOI 10.1038/d41586-019-02046-0
Discussion questions: Do you know GRASS GIS?[around 1990] [...] GIS industry claimed that it was unfair for the Federal Government to be competing with them.
Westervelt, J. (2004). GRASS Roots. Proceedings of the FOSS/GRASS Users Conference. Bangkok, Thailand.In 1996 USA/CERL, [...] announced that it was formally withdrawing support [...and...] announced agreements with several commercial GISs, and agreed to provide encouragement to commercialization of GRASS. [...] result is a migration of several former GRASS users to COTS [...] The first two agreements encouraged the incorporation of GRASS concepts into ESRI's and Intergraph's commercial GISs.
Hastings, D. A. (1997). The Geographic Information Systems: GRASS HOWTO. tldp.org/HOWTO/GIS-GRASS
Original announcement: grass.osgeo.org/news/cerl1996/grass.html
NIH expects that [...] researchers will maximize the appropriate sharing of scientific data, acknowledging certain factors (i.e., legal, ethical, or technical) [...] Shared scientific data should be made accessible as soon as possible [...]Discussion questions: Would you expect a health-related organization to be on the forefront of sharing data?
NOT-OD-21-013: Final NIH Policy for Data Management and Sharing. Retrieved November 6, 2024, from https://grants.nih.gov/grants/guide/notice-files/NOT-OD-21-013.html
[The White House Office of Science and Technology Policy (OSTP)] is [...] launching the Year of Open Science, featuring actions across the federal government throughout 2023 to advance national open science policy, provide access to the results of the nation’s taxpayer-supported research, accelerate discovery and innovation, promote public trust, and drive more equitable outcomes.Discussion questions: Anything you consider new?
FACT SHEET: Biden-Harris Administration Announces New Actions to Advance Open and Equitable Research. January 11, 2023. whitehouse.gov/...open-and-equitable-research
[In 2022,] NASA committed $20 million per year to advance open science, beginning in 2023.Discussion questions: What do you think this will be spend on?
Why NASA and federal agencies are declaring this the Year of Open Science. Nature 613, 217 (2023). DOI 10.1038/d41586-023-00019-y
[...] all peer-reviewed scholarly publications [...] resulting from federally funded research are made freely available [...] without any [...] delay after publication.Discussion questions: What open-science concept this refers to?
White House Office of Science and Technology Policy (2022). Desirable Characteristics of Data Repositories for Federally Funded Research. DOI 10.5479/10088/113528
Scientific data underlying peer-reviewed scholarly publications resulting from federally funded research should be made freely available [...] at the time of publication, unless subject to limitations [...]Discussion questions: What open-science concept or concepts this refers to?
White House Office of Science and Technology Policy (2022). Desirable Characteristics of Data Repositories for Federally Funded Research. DOI 10.5479/10088/113528 [...] “scientific data” include the recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings. Such scientific data do not include laboratory notebooks, preliminary analyses, case report forms, drafts of scientific papers, plans for future research, peer-reviews, communications with colleagues, or physical objects and materials, such as laboratory specimens, artifacts, or field notes.
Open source became a movement – a mentality. Suddenly infrastructure software was nearly free [comparing to 1999]. We paid 10% of the normal costs for the software and that money was for software support. A 90% disruption in cost spawns innovation – believe me.Discussion questions: Do you know any open-source software success stories?
— Mark Suster (2011) in
Eghbal, Nadia (2016). Roads and bridges: The unseen labor behind our digital infrastructure. Ford Foundation
First journal ever published:
Philosophical Transactions (of the Royal Society)
CC BY Stefan Janusz, Wikipedia
Discussion questions: How are these publishing goals fulfilled by journal papers?
Discussion questions: What is your experience with getting back to your own research or continuing research started by someone else? (See PhD Comics: Scratch.) How does open science relate to team science? How making things public can help us to achieve the desired effect and what challenges that brings?
Image: “Free beer bottles” by free beer pool (CC BY 2.0)
Discussion questions: What is the difference between “free as in free beer” and “free as in freedom”? Have you seen “open” being used for something not fulfilling the Open Definition?
Discussion questions: What would add to the list? What do you see something for the first time? What is openwashing?
Discussion questions: Is spatial special? Is recomputing the results useful for research? How long should it take to recompute results? Do dependencies need to be open source as well?
Publication Component | in the Petras et al. 2017 use case |
---|---|
Text | background, methods, results, discussion, conclusions, … (OA) |
Data | input data (formats readable by open source software) |
Reusable Code | methods as GRASS GIS modules (C & Python) |
Publication-specific Code | scripts to generate results (Bash & Python) |
Computational Environment | details about all dependencies and the code (Docker, Dockerfile*) |
Versions | repository with current and previous versions* (Git, GitHub) |
* Version associated with the publication included also as a supplemental file.
Petras, V. (2018). Geospatial analytics for point clouds in an open science framework. Doctoral dissertation. URI http://www.lib.ncsu.edu/resolver/1840.20/35242Discussion questions: What are other technologies which are good fit for these components? Are there other components or categories? What parts of research did you publish or tried to publish and what challenges did you face?
Discussion questions: What is the skill set needed to publish results like this? What is the long-term sustainability of online recomputability tools such as Code Ocean?
Discussion questions: What software can play this role? What are the different levels of integration with a piece of software and their advantages and disadvantages?
“Creative Commons License Spectrum” by Shaddim (CC BY 4.0), Creative Commons: Understanding Free Cultural Works
Discussion questions: Do you read “terms and conditions”? Have you ever read any “terms and conditions” or end user license agreement (EULA)? What about an open source software license? (Read license of GDAL right now! It's less than 170 words.)
The principles emphasize machine-actionability (…) because humans increasingly rely on computational support to deal with data (…)
[Wilkinson 2016]Image: “FAIR guiding principles for data resources” by SangyaPundir (CC BY-SA 4.0)
Discussion questions: Which parts are unique to FAIR and not present in open science? Is source code part of data, data provenance, or it is a separate thing?
A scientific publication needs to consist of text, data, source code, software environment, and reviews which are all openly licensed, in open formats, checked during the submission process, and publicly available without any delay after publication.