Day 3 – Panel – NDIIPP Preservation Network
NDIIPP Preservation Network:
Progress, problems & promise
LC – William Furgy
Us context for dig preservation
Gov at various levels
Universities
Non-profits
Corps
Consortial domains
Decentralized in focus – library is node on network
Facilitating digital preservation in US/UK has more centralized approach
3 areas of focus:
- network of pres partners
- arch framework for pres
- dig pres research
2 phases of investment
2004 & 2006; library has used this fund official projects, research
congress provided 100 million for project;
network of preservation partners –
content scope: public tv, dot-com era documents
outcomes:
- identify and preserve significant at-risk content
- leverage resources thru collaboration
- digital stewardship network
- technical infrastructure
- public policy issues
- 2010 report to congress
national network – interoperability
value chain
projects:
digitalpreservation.gov/technical/aiht.html
los Alamos tools
storage- distributed – san diego supercomputer center, ejournal edeposit
joint or shared repositories at thee state level
ask sam about va – could write grant for this
study group section 108 – looks at IP issues – group is ½ from libraries/archives and ½ from content industries – recommendations for how to rewrite the law
phase 2 investments:
- preserving creative america –commercial content producers
- working with states
- additional bus models
- projects:
o data replication
o risk assessment
o data integrity assurance
o content validation
DigArch Program – Helen Tibbo
VidArch Team – preserving video; preserving meaning and context;
Goals: make video accessible and understandable in the future; context; preservation framework;
Background: oais reference model; finding aids merged with rich nature of video
METS, NLNZ, PREMIS – metadata schemas today;
This project looks at longterm understandabilty;
Expensive to capture context;
Part of project is partnering with NASA
ACM also has a collection;
OAIS framework – need to develop better articulation
VidArch – typology of elements to be documented within video collection
FAs – considering these as digital objects that should be ingested into repository
Collaborations: sils, ibiblio, open video; renaissance computing center; internet archive & prelinger archive
jim tuttle – geospatial data librarian at nc state
nc geospatial data archiving project
state & local content
NC Onemap – provides framework
Content: vector data;
Local data often more detailed…
Enormous amount of data
Risk data: future supports of data formats; web services; no metadata; geospatial databases – difficult to archive
Trying to influence data producers in NC
Using Dspace repository;
Changing thinking: ajax
Odom institute:
Data-pass: meeting the challenges of a digital data world
Survey, polls data – how to preserve these archives? Social science purposes
Largest repository: ICPSR
Sas data files
Today can do text searches of questionnaires
Day 3 – First Session – Time and Space
Talk 1 – Supporting Literary Scholars with Data Mining and Visual Interfaces:
visual interfaces: accessible, provacative
text mining just beginning in the humanities
nora project: http://www.noraproject.org
systems today provide access not necessarily text analysis
text analysis – new area; classificiation problems; scholars typically need assistance;
other work being done to visualize metadata;
users: small group of computer programmers; broad base of scholars uninterested in computational tools themselves, but doing the work
users' needs: classifying documents; reading; finding indicators – what makes a document fall into one class or another
case study: emily dickinson's letters; 300 xml encoded documents
demo:
manual classificaiton -> automatic classification -> correlations with document metadata
manually rate documents through system ; this serves as training set for data mining classifier
start analysis -> data mining algorithm determines likelihood and ratio of being in 1 class or another
manual classification takes a bit of time;
found that the word indicators were not as helpful as the computational probability
after classificaiton want to understand relationship btw the documents you've classified. look for correlations
uses naive bayes algorithm;
Talk 2 – Time Period Directories
search in humanities – chronology, geo, bio, subject
trying to develop search capabilities to search 4 facets
want to try use metadata as infrastructure; search across genres
what metadata to use for temporal aspect? chronology?
date/time standards, hard to put on a timeline
named time period problems: unstable; multiple names; ambiguous; how to disambiguate between periods and dates; all problems occur with places as well
place name gazatteer; use structure – associate witha date and associate where it happened and the time of event -> this becomes the time period directory
this was then put into an xml schema
prototype developed from LC SH authority records
demo:
map interface: location data and puts on a map
timeline browse
country browse – list
vivienp@sims.berkeley.edu
Day 3 – Opening Session
http://jonathan.law.harvard.edu/questions
open information: redaction | restriction | removal
keynote: jonathan zittrain – harvard university and university of oxford
google search: “milk supply terrorists”
security breach information act; law about metadata; 2003 if you are a company with a lot of data and it could be compromised, you must alert the users
ways ot protect personal data – borrow from ip?
Sysinternals blog – software can possibly spy to get habit usage
Soultion is to not continue fighting war – antispy, etc. but to think about privacy and the expression of your identity; often that means to contextualize data about you; different than traditional view of privacy; more accepting of open environment
You tube – encourage folks to broadcast yourself;
Mashups – podcast, music, etc. retracting any of this is difficult to do; ppl are willing to put themselves out there
What does redaction mean in an open environment?
Best example: enron – shredding content: “accurate document destruction”
Technological future makes it more difficult to retract, recall information
e.g. omniva – every email generated is encrypted; key generated for each day of the week; for a company, you would only have to destroy the key which destroys all relevant documents;
libraries decent point of control for distribution; libraries “best friends” to publishers/content providers/book sellers rather than adversaries; creating systems that ressemble systems like omniva; libraries would be where you to retrieve content rather than the “open jungle” of the web
libraries: what’s a library for? Are there commonalities between public, academic;
LOCKSS mentioned: mirror and synchronize across libraries
Libraries are so far the best hope for those in a position to release something; privacy with libraries; largest advances in digital library space from “left-field”
When to pull something back?
is running a library just about indexing? or is it like brewster kale?
what is the purpose of a library?
one conception: the fortress; keeping non-scholars away; filtering what's important and what's not; if there's no limit on what dig libraries store, is there a reason to discriminate?
ask jeeves – everytime someone asks him instead of a librarian…jeeves doesn't have authority control;
idea of collections – libraries have collections that become archives;
non-institutional collections that mirror the library:
- "gawker stalker" email gawker if you're in ny and see a celebrity; up within 15 minutes
- facebook; 90% of american college students have entries;
- riya photo search – face recognition technology; new incoming photos are autmoatically tagged using face recognition; makes the libraries "castle" seem like the outside; gps tagging;
- databases that transform the way we understand information
- protest/gatherings you bring your identity to
non-institutional judgements
today rudimentary system like ebay's star system
cyworld – one of the most popular sites in the world in korea; wake up in the morning check the world's collective judgement about you; as you interact with ppl, they rank you;
systems of collective judgement for which library can play role in saying what information is credible; maybe the decision is not about whether to keep it; wikipedia ex: seigenthaler article that was removed. should the history have been removed? muhammad cartoon controversy – one of wikipedia's best moments that libraries and news have not done
Day 3 – Afternoon Session – DL Education
Supporting Digital Library Education:
Factors Motivating Use of Digital Libraries
- findings: faculty dont' necessarily distinguish between a web page with a series of links and a digital library
- google preferred overall to academics ; they use it for pages they go to regularly
- using google to find things quickly – looking to update existing lecture materials
- barriers:
- lack of awareness; information overload; priorities not lack of time; no motivation to use digital learning materials
emerging questions
- should we match what faculy are using
- granularity of items
- what are faculty dev strategies that work?
- faculty do their own analysis of information they find;
http://serc.carleton.edu/facultypart
recruiting institutions that might be interested in survey
alot of visitors are coming through google