JCDL 2006 Conference Notes

Category Archive

The following is a list of all entries from the Day 3 category.

Day 3 – Panel – NDIIPP Preservation Network

NDIIPP Preservation Network:

Progress, problems & promise


LC – William Furgy


Us context for dig preservation

Gov at various levels




Consortial domains


Decentralized in focus – library is node on network

Facilitating digital preservation in US/UK has more centralized approach


3 areas of focus:

– network of pres partners

– arch framework for pres

– dig pres research


2 phases of investment

2004 & 2006; library has used this fund official projects, research


congress provided 100 million for project;


network of preservation partners –

content scope: public tv, dot-com era documents



–         identify and preserve significant at-risk content

–         leverage resources thru collaboration

–         digital stewardship network

–         technical infrastructure

–         public policy issues

–         2010 report to congress


national network – interoperability

value chain




los Alamos tools

storage- distributed – san diego supercomputer center, ejournal edeposit

joint or shared repositories at thee state level

          ask sam about va – could write grant for this

study group section 108 – looks at IP issues – group is ½ from libraries/archives and ½ from content industries – recommendations for how to rewrite the law


phase 2 investments:

–         preserving creative america –commercial content producers

–         working with states

–         additional bus models

–         projects:

o        data replication

o        risk assessment

o        data integrity assurance

o        content validation


DigArch Program – Helen Tibbo

VidArch Team – preserving video; preserving meaning and context;

Goals: make video accessible and understandable in the future; context; preservation framework;

Background: oais reference model; finding aids merged with rich nature of video

METS, NLNZ, PREMIS – metadata schemas today;

This project looks at longterm understandabilty;

Expensive to capture context;

Part of project is partnering with NASA

ACM also has a collection;

OAIS framework – need to develop better articulation

VidArch – typology of elements to be documented within video collection

FAs – considering these as digital objects that should be ingested into repository

Collaborations: sils, ibiblio, open video; renaissance computing center; internet archive & prelinger archive



jim tuttle – geospatial data librarian at nc state

nc geospatial data archiving project

state & local content

NC Onemap – provides framework

Content: vector data;

Local data often more detailed…

Enormous amount of data

Risk data: future supports of data formats; web services; no metadata; geospatial databases – difficult to archive


Trying to influence data producers in NC

Using Dspace repository;

Changing thinking: ajax


Odom institute:

Data-pass: meeting the challenges of a digital data world

Survey, polls data – how to preserve these archives? Social science purposes

Largest repository: ICPSR

Sas data files

Today can do text searches of questionnaires


Day 3 – First Session – Time and Space

Talk 1 – Supporting Literary Scholars with Data Mining and Visual Interfaces:

visual interfaces: accessible, provacative

text mining just beginning in the humanities

nora project: http://www.noraproject.org

systems today provide access not necessarily text analysis

text analysis – new area; classificiation problems; scholars typically need assistance;

other work being done to visualize metadata;

users: small group of computer programmers; broad base of scholars uninterested in computational tools themselves, but doing the work

users' needs: classifying documents; reading; finding indicators – what makes a document fall into one class or another

case study: emily dickinson's letters; 300 xml encoded documents


manual classificaiton -> automatic classification -> correlations with document metadata

manually rate documents through system ; this serves as training set for data mining classifier

start analysis -> data mining algorithm determines likelihood and ratio of being in 1 class or another

manual classification takes a bit of time;

found that the word indicators were not as helpful as the computational probability

after classificaiton want to understand relationship btw the documents you've classified. look for correlations

uses naive bayes algorithm;

Talk 2 – Time Period Directories

search in humanities – chronology, geo, bio, subject

trying to develop search capabilities to search 4 facets

want to try use metadata as infrastructure; search across genres

what metadata to use for temporal aspect? chronology?

date/time standards, hard to put on a timeline

named time period problems: unstable; multiple names; ambiguous; how to disambiguate between periods and dates; all problems occur with places as well

place name gazatteer; use structure – associate witha date and associate where it happened and the time of event -> this becomes the time period directory

this was then put into an xml schema

prototype developed from LC SH authority records


map interface: location data and puts on a map

timeline browse

country browse – list



Day 3 – Opening Session


open information: redaction | restriction | removal

keynote: jonathan zittrain – harvard university and university of oxford

google search: “milk supply terrorists”

security breach information act; law about metadata; 2003 if you are a company with a lot of data and it could be compromised, you must alert the users

ways ot protect personal data – borrow from ip?

Sysinternals blog – software can possibly spy to get habit usage

Soultion is to not continue fighting war – antispy, etc. but to think about privacy and the expression of your identity; often that means to contextualize data about you; different than traditional view of privacy; more accepting of open environment

You tube – encourage folks to broadcast yourself;

Mashups – podcast, music, etc. retracting any of this is difficult to do; ppl are willing to put themselves out there

What does redaction mean in an open environment?

Best example: enron – shredding content: “accurate document destruction”

Technological future makes it more difficult to retract, recall information

e.g. omniva – every email generated is encrypted; key generated for each day of the week; for a company, you would only have to destroy the key which destroys all relevant documents;

libraries decent point of control for distribution; libraries “best friends” to publishers/content providers/book sellers rather than adversaries; creating systems that ressemble systems like omniva; libraries would be where you to retrieve content rather than the “open jungle” of the web

libraries: what’s a library for? Are there commonalities between public, academic;

LOCKSS mentioned: mirror and synchronize across libraries

Libraries are so far the best hope for those in a position to release something; privacy with libraries; largest advances in digital library space from “left-field”

When to pull something back?

is running a library just about indexing? or is it like brewster kale?

what is the purpose of a library?

one conception: the fortress; keeping non-scholars away; filtering what's important and what's not; if there's no limit on what dig libraries store, is there a reason to discriminate?

ask jeeves – everytime someone asks him instead of a librarian…jeeves doesn't have authority control;

idea of collections – libraries have collections that become archives;

non-institutional collections that mirror the library:

– "gawker stalker" email gawker if you're in ny and see a celebrity; up within 15 minutes

– facebook; 90% of american college students have entries;

– riya photo search – face recognition technology; new incoming photos are autmoatically tagged using face recognition; makes the libraries "castle" seem like the outside; gps tagging;

– databases that transform the way we understand information

– protest/gatherings you bring your identity to

non-institutional judgements

today rudimentary system like ebay's star system

cyworld – one of the most popular sites in the world in korea; wake up in the morning check the world's collective judgement about you; as you interact with ppl, they rank you;

systems of collective judgement for which library can play role in saying what information is credible; maybe the decision is not about whether to keep it; wikipedia ex: seigenthaler article that was removed. should the history have been removed? muhammad cartoon controversy – one of wikipedia's best moments that libraries and news have not done





Day 3 – Afternoon Session – DL Education

Supporting Digital Library Education:

Factors Motivating Use of Digital Libraries

– findings: faculty dont' necessarily distinguish between a web page with a series of links and a digital library

– google preferred overall to academics ; they use it for pages they go to regularly

– using google to find things quickly – looking to update existing lecture materials

– barriers:

– lack of awareness; information overload; priorities not lack of time; no motivation to use digital learning materials

emerging questions

– should we match what faculy are using

– granularity of items

– what are faculty dev strategies that work?

– faculty do their own analysis of information they find;


recruiting institutions that might be interested in survey

alot of visitors are coming through google