JCDL 2006 Conference Notes

Day 2 – Afternoon Panel Session

Augmenting Interoperability Across Scholarly Reposoitories

Don Waters – mellon foundatation

history: late 1999 – provisional agreement – santa fe conventions – now we know this as OAI for metadata harvester; simplicity; does allow for complex features – exchange of native metadata structures; growing frustration with dublin core; repositories need new ways to interchange complex objects; demand for something more than oai; microsoft, mellon, etc. are interested in forming this new protocol; need new data model; framework must be intelligent about various objects; basic question:how do we enable communities who care?

tony hey – microsoft

looking to support scientists and engineers in scholarly communities

new science paradigms: e-science / data-centric science; microsoft understands it needs to embrace open standards

need more than text – weokring on IVO: astronomy data grid; skyserver.sdss.org

chemistry – e-prints of text of paper to graphic of paper to see raw data; analyze the data yourself

pubmed central – portable version by microsoft; federate through web services;

e-science mashups – combine services to give added value – combined datasets used to perform analysis;

interoperable repositories?

arXIV at Cornell –

NIH PubMedCentral – Microsoft funded
EPrints project in Southhampton – JISC-funded TARDis project

Herbert Van de Sompel – Los Alamos

– pathways project – nsf grant – cornell and los alamos
– context – emergence of repositories; ir; publisher repositories; dataset repositories
– compound digital objects – multiple media and content: paper, dataset, simulations, software, etc

– leverage materials in ir; reuse and use them; rather than making them accessible only to local users, but as active nodes in a global environment

motivators for something other than oai

– motivation 1: richer cross-repository services; objects as source materials; e.g. chemical search engine – machine readable chemical formulas; no foundation today to achieve; one would need a digital object representation of the formula; need semantics

– motivation 2: scholarly communication workflow; global workflow across repositories; recombine existing material, add value and store new object

– looking to a shared data model and services across repositories

– scholarly communication is a long-term endeavor; abstract definitions of repository interfaces; selective framework;

– new model: 3 interfaces: obtain, harvest, put; e.g. submit surrogates -> available through harvest and obtain intefaces -> service is populated by harvesting surrogates -> need lightweight service registry (like an object catalog in a federation – we don't need this as the surrogates carry their own information)

Carl Lagoze – Cornell Information Science

– Pathways Project; NSF grant http://www.infosci.cornell.edu/pathways

– set of metadata like dublin core is not sufficient; want to address modeling complex objects; datamodels (e.g. Dspace, Fedora, Mets, ePrints, etc.)

– pathways core data model: sits above individual models; abstract model vs. pkg for asset transfer

– avoid IP issues; allow 'live' references rather than static objects;

– key requirements of data model: 1 – identity; 2 – persistence; 3 – lineage; 4 – semantics; 5 – recursion; 6 – link to concrete representation;

– serialize data model; ship surrogates back and forth between services; obtain and harvest; deposit via pdf;

meeting website: http://msc.mellon.org/Meetings/Interop


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: