Skip to main content

Publishing and Archiving Large Datasets in Data Repositories

Researchers collect and analyze large datasets in a majority of academic areas. At Purdue we receive a lot of requests from researchers who desire to publish and share the large datasets in the university’s data repository, which is the platform for researchers to share and publish datasets and meet federal funding agency recommendations and requirements for dataset sharing.

However, the repositories usually have maximum limit for the size of datasets researchers can upload and share publicly. Most data repositories require file sizes no larger than 2.5 GB. So, how does the data repository accommodate the large datasets from various disciplines? Providing curation and long-term preservation are becoming a critical issue that libraries must solve. There are several aspects that need investigation, including what large datasets the data repository accepts, data storage and solutions for uploading, transferring, and sharing data, the cost model for these datasets, metadata description requirements, and creating archival information packages, along with secure network storage solutions for long-term preservation. In the presentation, I will share the different questions that we research to our data repository and hope to hear the ideas from the library community.

Speaker(s)


11:50 AM
10 minutes