This is a great little piece comparing and contrasting how to relatively similar online communities and social silos are shutting down their services. One is going a much better route than the other and providing export tools and archive ability to preserve the years of work and effort.
WithKnown is a fantastic, free, and opensource content management service that supports some of the most bleeding edge technology on the internet. I’ve been playing with it for over two years and love it!
And today, there’s another reason to love it even more…
This is also a great reminder that developers can have a lasting and useful impact on the world around them–even in the political arena.
If you missed the notes from Day 1, see this post.
It may take me a week or so to finish putting some general thoughts and additional resources together based on the two day conference so that I might give a more thorough accounting of my opinions as well as next steps. Until then, I hope that the details and mini-archive of content below may help others who attended, or provide a resource for those who couldn’t make the conference.
Overall, it was an incredibly well programmed and run conference, so kudos to all those involved who kept things moving along. I’m now certainly much more aware at the gaping memory hole the internet is facing despite the heroic efforts of a small handful of people and institutions attempting to improve the situation. I’ll try to go into more detail later about a handful of specific topics and next steps as well as a listing of resources I came across which may provide to be useful tools for both those in the archiving/preserving and IndieWeb communities.
Archive of materials for Day 2
Below are the recorded audio files embedded in .m4a format (using a Livescribe Pulse Pen) for several sessions held throughout the day. To my knowledge, none of the breakout sessions were recorded except for the one which appears below.
Summarizing archival collections using storytelling techniques
Presentation: Summarizing archival collections using storytelling techniques by Michael Nelson, Ph.D., Old Dominion University
Saving the first draft of history
Special guest speaker: Saving the first draft of history: The unlikely rescue of the AP’s Vietnam War files by Peter Arnett, winner of the Pulitzer Prize for journalism
Kiss your app goodbye: the fragility of data journalism
Panel: Kiss your app goodbye: the fragility of data journalism
Featuring Meredith Broussard, New York University; Regina Lee Roberts, Stanford University; Ben Welsh, The Los Angeles Times; moderator Martin Klein, Ph.D., Los Alamos National Laboratory
The future of the past: modernizing The New York Times archive
Panel: The future of the past: modernizing The New York Times archive
Featuring The New York Times Technology Team: Evan Sandhaus, Jane Cotler and Sophia Van Valkenburg; moderated by Edward McCain, RJI and MU Libraries
Lightning Rounds: Six Presenters
Lightning rounds (in two parts)
Six + one presenters: Jefferson Bailey, Terry Britt, Katherine Boss (and team), Cynthia Joyce, Mark Graham, Jennifer Younger and Kalev Leetaru
1: Jefferson Bailey, Internet Archive, “Supporting Data-Driven Research using News-Related Web Archives” 2: Terry Britt, University of Missouri, “News archives as cornerstones of collective memory” 3: Katherine Boss, Meredith Broussard and Eva Revear, New York University: “Challenges facing preservation of born-digital news applications” 4: Cynthia Joyce, University of Mississippi, “Keyword ‘Katrina’: Re-collecting the unsearchable past” 5: Mark Graham, Internet Archive/The Wayback Machine, “Archiving news at the Internet Archive” 6: Jennifer Younger, Catholic Research Resources Alliance: “Digital Preservation, Aggregated, Collaborative, Catholic” 7. Kalev Leetaru, senior fellow, The George Washington University and founder of the GDELT Project: A Look Inside The World’s Largest Initiative To Understand And Archive The World’s News
Technology and Community
Presentation: Technology and community: Why we need partners, collaborators, and friends by Kate Zwaard, Library of Congress
Breakout: Working with CMS
Working with CMS, led by Eric Weig, University of Kentucky
Alignment and reciprocity
Alignment & reciprocity by Katherine Skinner, Ph.D., executive director, the Educopia Institute
Closing remarks by Edward McCain, RJI and MU Libraries and Todd Grappone, associate university librarian, UCLA
Live Tweet Archive
Reminder: In many cases my tweets don’t reflect direct quotes of the attributed speaker, but are often slightly modified for clarity and length for posting to Twitter. I have made a reasonable attempt in all cases to capture the overall sentiment of individual statements while using as many original words of the participant as possible. Typically, for speed, there wasn’t much editing of these notes. Below I’ve changed the attribution of one or two tweets to reflect the proper person(s). Fore convenience, I’ve also added a few hyperlinks to useful resources after the fact that didn’t have time to make the original tweets. I’ve attached .m4a audio files of most of the audio for the day (apologies for shaky quality as it’s unedited) which can be used for more direct attribution if desired. The Reynolds Journalism Institute videotaped the entire day and livestreamed it. Presumably they will release the video on their website for a more immersive experience.
Condoms were required issue in Vietnam–we used them to waterproof film containers in the field.
Do not stay close to the head of a column, medics, or radiomen. #warreportingadvice
I told the AP I would undertake the task of destroying all the reporters’ files from the war.
Instead the AP files moved around with me.
Eventually the 10 trunks of material went back to the AP when they hired a brilliant archivist.
“The negatives can outweigh the positives when you’re in trouble.”
Our first panel:Kiss your app goodbye: the fragility of data jornalism
I teach data journalism at NYU
A news app is not what you’d install on your phone
Dollars for Docs is a good example of a news app
A news app is something that allows the user to put themself into the story.
Often there are three CMSs: web, print, and video.
News apps don’t live in any of the CMSs. They’re bespoke and live on a separate data server.
This has implications for crawlers which can’t handle them well.
Then how do we save news apps? We’re looking at examples and then generalizing.
Everyblock.com was a good example based on chicagocrime and later bought by NBC and shut down.
What?! The internet isn’t forever? Databases need to be save differently than web pages.
Reprozip was developed by NYU Center for Data and we’re using it to save the code, data, and environment.
We make apps that serve our audience.
We also make internal tools that empower the newsroom.
We also use our nerdy skills to do cool things.
Most of us aren’t good programmers, we “cheat” by using frameworks.
Frameworks do a lot of basic things for you, so you don’t have to know how to do it yourself.
Archiving tools often aren’t built into these frameworks.
Instagram, Pinterest, Mozilla, and the LA Times use django as our framework.
Memento for WordPress is a great way to archive pages.
We must do more. We need archiving baked into the systems from the start.
Slides at http://bit.ly/frameworkfix
Got data? I’m a librarian at Stanford University.
I’ll mention Christine Borgman’s book Big Data, Little Data, No data.
Journalists are great data liberators: FOIA requests, cleaning data, visualizing, getting stories out of data.
But what happens to the data once the story is published?
BLDR: Big Local Digital Repository, an open repository for sharing open data.
For metadata: www.ddialliance.org, RDF, International Image Interoperability Framework (iiif) and MODS
We’ll open up for questions.
What’s more important: obey copyright laws or preserving the content?
The new creative commons licenses are very helpful, but we have to be attentive to many issues.
Perhaps archiving it and embargoing for later?
Saving the published work is more important to me, and the rest of the byproduct is gravy.
I work for the New York Times, you may have heard of it…
Talking about modernizing the born-digital legacy content.
Our problem was how to make an article from 2004 look like it had been published today.
There were 100’s of thousands of articles missing.
There was no one definitive list of missing articles.
Outlining the workflow for reconciling the archive XML and the definitive list of URLs for conversion.
It’s important to use more than one source for building an archive.
I’m going to talk about all of “the little things” that came up along the way..
Article Matching: Fusion – How to convert print XML with web HTML that was scraped.
Primarily, we looked at common phrases between the corpus of the two different data sets.
We prioritized the print data over the digital data.
We maintain a system called switchboard that redirects from old URLs to the new ones to prevent link rot.
The case of the missing sections: some sections of the content were blank and not transcribed.
We made the decision of taking out data we had in lieu of making a better user experience for missing sections.
In the future, we’d also like to put photos back into the articles.
Can you discuss the decision to go with a more modern interface rather than a traditional archive of how it looked?
Some of the decision was to get the data into an accessible format for modern users.
We do need to continue work on preserving the original experience.
Is there a way to distinguish between the print version and the online versions in the archive?
Could a researcher do work on the entire corpora? Is it available for subscription?
We do have a sub-section of data availalbe, but don’t have it prior to 1960.
Have you documented the process you’ve used on this preservation project?
We did save all of the code for the project within GitHub.
We do have meeting notes which provide some documentation, though they’re not thorough.
Today I spent most of the majority of the day attending the first of a two day conference at UCLA’s Charles Young Research Library entitled “Dodging the Memory Hole: Saving Online News.” While I knew mostly what I was getting into, it hadn’t really occurred to me how much of what is on the web is not backed up or archived in any meaningful way. As a part of human nature, people neglect to back up any of their data, but huge swaths of really important data with newsworthy and historic value is being heavily neglected. Fortunately it’s an interesting enough problem to draw the 100 or so scholars, researchers, technologists, and journalists who showed up for the start of an interesting group being conglomerated through the Reynolds Journalism Institute and several sponsors of the event.
What particularly strikes me is how many of the philosophies of the IndieWeb movement and tools developed by it are applicable to some of the problems that online news faces. I suspect that if more journalists were practicing members of the IndieWeb and used their sites not only for collecting and storing the underlying data upon which they base their stories, but to publish them as well, then some of the (future) archival process may be easier to accomplish. I’ve got so many disparate thoughts running around my mind after the first day that it’ll take a bit of time to process before I write out some more detailed thoughts.
Twitter List for the Conference
As a reminder to those attending, I’ve accumulated a list of everyone who’s tweeted with the hashtag #DtMH2016, so that attendees can more easily follow each other as well as communicate online following our few days together in Los Angeles. Twitter also allows subscribing to entire lists too if that’s something in which people have interest.
Archiving the day
It seems only fitting that an attendee of a conference about saving and archiving digital news, would make a reasonable attempt to archive some of his experience right?! Toward that end, below is an archive of my tweetstorm during the day marked up with microformats and including hovercards for the speakers with appropriate available metadata. For those interested, I used a fantastic web app called Noter Live to capture, tweet, and more easily archive the stream.
Note that in many cases my tweets don’t reflect direct quotes of the attributed speaker, but are often slightly modified for clarity and length for posting to Twitter. I have made a reasonable attempt in all cases to capture the overall sentiment of individual statements while using as many original words of the participant as possible. Typically, for speed, there wasn’t much editing of these notes. I’m also attaching .m4a audio files of most of the audio for the day (apologies for shaky quality as it’s unedited) which can be used for more direct attribution if desired. The Reynolds Journalism Institute videotaped the entire day and livestreamed it. Presumably they will release the video on their website for a more immersive experience.
If you prefer to read the stream of notes in the original Twitter format, so that you can like/retweet/comment on individual pieces, this link should give you the entire stream. Naturally, comments are also welcome below.
Below are the audio files for several sessions held throughout the day.
Greetings and Keynote
Greetings: Edward McCain, digital curator of journalism, Donald W. Reynolds Journalism Institute (RJI) and University of Missouri Libraries and Ginny Steel, university librarian, UCLA
Keynote: Digital salvage operations — what’s worth saving? given by Hjalmar Gislason, vice president of data, Qlik
Why save online news? and NewsScape
Panel: “Why save online news?” featuring Chris Freeland, Washington University; Matt Weber, Ph.D., Rutgers, The State University of New Jersey; Laura Wrubel, The George Washington University; moderator Ana Krahmer, Ph.D., University of North Texas
Presentation: “NewsScape: preserving TV news” given by Tim Groeling, Ph.D., UCLA Communication Studies Department
Born-digital news preservation in perspective
Speaker: Clifford Lynch, Ph.D., executive director, Coalition for Networked Information on “Born-digital news preservation in perspective”
Live Tweet Archive
Getting Noter Live fired up for Dodging the Memory Hole 2016: Saving Online News https://www.rjionline.org/dtmh2016
I’m glad I’m not at NBC trying to figure out the details for releasing THE APPRENTICE tapes.
Let’s thank @UCLA and the library for hosting us all.
While you’re here, don’t forget to vote/provide feedback throughout the day for IMLS
Someone once pulled up behind me and said “Hi Tiiiigeeerrr!” #Mizzou
A server at the Missourian crashed as the system was obsolete and running on baling wire. We lost 15 years of archives
The dean & head of Libraries created a position to save born digital news.
We’d like to help define stake-holder roles in relation to the problem.
Newspaper is really an outmoded term now.
I’d like to celebrate that we have 14 student scholars here today.
We’d like to have you identify specific projects that we can take to funding sources to begin work after the conference
We’ll be going to our first speaker who will be introduced by Martin Klein from Los Alamos.
Hjalmar Gislason is a self-described digital nerd. He’s the Vice President of Data.
I wonder how one becomes the President of Data?
My Icelandic name may be the most complicated part of my talk this morning.
Speaking on Digital Salvage Operations: What’s worth Saving”
My father in law accidentally threw away my wife’s favorite stuffed animal. #DeafTeddy
Some people just throw everything away because they’re not being used. Others keep everything and don’t throw it away.
The fundamental question: Do you want to save everything or do you want to get rid of everything?
I joined @qlik two years ago and moved to Boston.
Before that I was with spurl.net which was about saving copies of webpages they’d previously visited.
I had also previously invested in kjarninn which is translated as core.
We used to have little data, now we’re with gigantic data and moving to gargantuan data soon.
One of my goals today is to broaden our perspective about what data needs saving.
There’s the Web, the “Deep” Web, then there’s “Other” data which is at the bottom of the pyramid.
I got to see into the process of #panamapapers but I’d like to discuss the consequences from April 3rd.
The amount of meetings were almost more than could have been covered in real time in Iceland.
The #panamapapers were a soap opera, much like US politics.
Looking back at the process is highly interesting, but it’s difficult to look at all the data as they unfoldedd
How can we capture all the media minute by minute as a story unfolds.
You can’t trust that you can go back to a story at a certain time and know that it hasn’t been changed. #1984 #Orwell
There was a relatively pro-HRC piece earlier this year @NYTimes that was changed.
Newsdiffs tracks changes in news over time. The HRC article had changed a lot.
Let’s say you referenced @CNN 10 years ago, likely now, the CMS and the story have both changed.
8 years ago, I asked, wouldn’t we like to have the social media from Iceland’s only Nobel Laureate as a teenager?
What is private/public, ethical/unethical when dealing with data?
Much data is hidden behind passwords or on systems which are not easily accessed from a database perspective.
Most of the content published on Facebook isn’t public. It’s hard to archive in addition to being big.
We as archivists have no claim on the hidden data within Facebook.
The #indieweb could help archivists in the future in accessing more personal data.
Then there’s “other” data: 500 hours of video us uploaded to YouTube per minute.
No organization can go around watching all of this video data. Which parts are newsworthy?
Content could surface much later or could surface through later research.
Hornbjargsviti lighthouse recorded the weather every three hours for years creating lots of data.
And that was just one of hundreds of sites that recorded this type of data in Iceland.
Lots of this data is lost. Much that has been found was by coincidence. It was never thought to archive it.
This type of weather data could be very valuable to researchers later on.
There was also a large archive of Icelandic data that was found.
Showing a timelapse of Icelandic earthquakes https://vimeo.com/24442762
You can watch the magma working it’s way through the ground before it makes it’s way up through the land.
National Geographic featured this video in a documentary.
Sometimes context is important when it comes to data. What is archived today may be more important later.
As the economic crisis unfolded in Greece, it turned out the data that was used to allow them into EU was wrong.
The data was published at the time of the crisis, but there was no record of what the data looked like 5 years earlier.
Only way to recreate the data was to take prior printed sources. This is usu only done in extraordinary cirucumstances.
We captured 150k+ data sets with more than 8 billion “facts” which was just a tiny fraction of what exists.
How can we delve deeper into large data sets, all with different configurations and proprietary systems.
“There’s a story in every piece of data.”
Once a year energy consumption seems to dip because February has fewer days than other months. Plotting it matters.
Year over year comparisons can be difficult because of things like 3 day weekends which shift over time.
Here’s a graph of the population of Iceland. We’ve had our fair share of diseases and volcanic eruptions.
To compare, here’s a graph of the population of sheep. They outnumber us by an order(s) of magnitude.
In the 1780’s there was an event that killed off lots of sheep, so people had the upper hand.
Do we learn more from reading today’s “newspaper” or one from 30, 50, or 100 years ago?
There was a letter to the editor about an eruption and people had to move into the city.
letter: “We can’t have all these people come here, we need to build for our own people first.”
This isn’t too different from our problems today with respect to Syria. In that case, the people actually lived closer.
In the born-digital age, what will the experience look like trying to capture today 40 years hence?
Will it even be possible?
Machine data connections will outnumber “people” data connections by a factor of 10 or more very quickly.
With data, we need to analyze, store, and discard data. How do we decide in a spit-second what to keep & discard?
We’re back to the father-in-law and mother-in-law question: What to get rid of and what to save?
Computing is continually beating human tasks: chess, Go, driving a car. They build on lots more experience based on data
Whoever has the most data on driving cars and landscape will be the ultimate winner in that particular space.
Data is valuable, sometimes we just don’t know which yet.
Hoarding is not a strategy.
You can only guess at what will be important.
“Commercial use in Doubt” The third sub-headline in a newspaper about an early test of television.
There’s more to it than just the web.
Hoarding isn’t a strategy really resonates with librarians, what could that relationship look like?
One should bring in data science, industry may be ahead of libraries.
Cross-disciplinary approaches may be best. How can you get a data scientist to look at your problem? Get their attention?
There’s 60K+ books about the Viet Nam War. How do we learn to integrate what we learn after an event (like that)?
Perspective always comes with time, as additional information arrives.
Scientific papers are archived in a good way, but the underlying data is a problem.
In the future you may have the ability to add supplementary data as a supplement what appears in a book (in a better way)
Archives can give the ability to have much greater depth on many topics.
Are there any centers of excellence on the topics we’re discussing today? This conference may be IT.
We need more people that come from the technical side of things to be watching this online news problem.
Hacks/Hackers is a meetup group that takes place all over the world.
It brings the journalists and computer scientists together regularly for beers. It’s some of the outreach we need.
If you’re not interested in money, this is a good area to explore. 10 minute break.
Don’t forget to leave your thoughts on the questions at the back of the room.
We’re going to get started with our first panel. Why is it important to save online news?
I’m Matt Weber from Rugters University and in communications.
I’ll talk about web archives and news media and how they interact.
I worked at Tribune Corp. for several years and covered politics in DC.
I wanted to study the way in which the news media is changing.
We’re increadingly seeing digital only media with no offline surrogate.
It’s becomign increasingly difficult to do anything but look at it now as it exists.
There was no large scale online repository of online news to do research.
#OccupyWallStreet is one of the first examples of stories that exist online in ocurence and reportage.
There’s a growing need to archive content around local news particularly politics and democracy.
When there is a rich and vibrant local news environment, people are more likely to become engaged.
Local news is one of the least thought about from an archive perspective.
I’m at GWU Librarys in the scholarly technology group.
I’m involved in social feed manager which allows archivists to put together archives from social services.
Kimberly Gross, a faculty member, studies tweets of news outlets and journalists.
We created a prototype tool to allow them to collect data from social media.
Journalists were 2011 primarily using their Twitter presences to direct people to articles rather than for conversation
We collect data of political candidates.
I’m an associate library and representing “Documenting the Now” with WashU, UCRiverside, & UofMd
Documenting the Now revolves around Twitter documentation.
It started with the Ferguson story and documenting media, videos during the protests in the community.
What can we as memory institutions do to capture the data?
We gathered 14million tweets relating to Ferguson within two weeks.
We tried to build a platform that others could use in the future for similar data capture relating to social.
Ethics is important in archiving this type of news data.
Digitally preserving pdfs from news organizations and hyper-local news in Texas.
We’re approaching 5million pages of archived local news.
What is news that needs to be archived, and why?
First, what is news? The definition is unique to each individual.
We need to capture as much of the social news and social representation of news which is fragmented.
It’s an important part of society today.
We no longer produce hard copies like we did a decade ago. We need to capture the online portion.
We’d like to get the perspective of journalists, and don’t have one on the panel today.
We looked at how midterm election candidates used Twitter. Is that news itself? What tools do we use to archive it?
What does it mean to archive news by private citizens?
Twitter was THE place to find information in St. Louis during the Ferguson protests.
Local news outlets weren’t as good as Twitter during the protests.
I could hear the protest from 5 blocks away and only found news about it on Twitter.
The story was bing covered very differently on Twitter than the local (mainstream) news.
Alternate voices in the mix were very interesting and important.
Twitter was in the moment and wasn’t being edited and causing a delay.
What can we learn from this massive number of Ferguson tweets.
It gives us information about organizing, and what language was being used.
I think about the archival portion of this question. By whom does it need to be archived?
What do we archive next?
How are we representing the current population now?
Who is going to take on the burden of archiving? Should it be corporate? Cultural memory institution?
Someone needs to currate it, who does that?
our next question: What do you view as primary barriers to news archiving?
How do we organize and staff? There’s no shortage of work.
Tools and software can help the process, but libraries are usually staffed very thinly.
No single institution can do this type of work alone. Collaboration is important.
Two barriers we deal with: terms of service are an issue with archiving. We don’t own it, but can use it.
Libraries want to own the data in perpetuity. We don’t own our data.
There’s a disconnect in some of the business models for commercialization and archiving.
Issues with accessing data.
People were worried about becoming targets or losing jobs because of participation.
What is role of ethics of archiving this type of data? Allowing opting out?
What about redacting portions? anonymizing the contributions?
Publishers have a responsibility for archiving their product. Permission from publishers can be difficult.
We have a lot of underserved communities. What do we do with comments on stories?
Corporations may not continue to exist in the future and data will be lost.
There’s a balance to be struck between the business side and the public good.
It’s hard to convince for profit about the value of archiving for the social good.
Next Q: What opportunities have revealed themselves in preserving news?
Finding commonalities and differences in projects is important.
What does it mean to us to archive different media types? (think diversity)
What’s happening in my community? in the nation? across the world?
The long-history in our archives will help us learn about each other.
We can only do so much with the resources we have.
We’ve worked on a cyber cemetery product in the past.
Someone else can use the tools we create within their initiatives.
repeating ?: What are issues in archiving longerform video data with regard to stories on Periscope?
How do you channel the energy around archiving news archiving?
Research in the area is all so new.
Does anyone have any experience with legal wrangling with social services?
The ACLU is waging a lawsuit against Twitter about archived tweets.
Outreach to community papers is very rhizomic.
How do you take local examples and make them a national model?
We’re teenagers now in the evolution of what we’re doing.
Peter Arnett just said “This is all ore interesting than I thought it would be.”
Next Presentation: NewsScape: preserving TV news
I’ll be talking about the NewsScape project of Francis Steen, Director, Communication Studies Archive
I’m leading the archiving of the analog portion of the collection.
The oldest of our collection dates from the 1950’s. We’ve hosted them on YouTube which has created some traction.
Commenters have been an issue with posting to YouTube as well as copyright.
NewsScape is the largest collecction of TV news and public affairs programs (local & national)
Prior to 2006, we don’t know what we’ve got.
Paul said “Ill record everytihing I can and someone in the future can deal with it.”
We have 50K hours of Betamax.
VHS are actually most threatened, despite being newest tapes.
Our budget was seriously strapped.
Maintaining closed captioning is important to our archiving efforts.
We’ve done 36k hours of encoding this year.
We use a layer of dead VCR’s over our good VCR’s to prevent RF interference and audio buzzing. 🙂
Post-2006 We’re now doing straight to digital
Preservation is the first step, but we need to be more than the world’s best DVR.
Searching the news is important too.
Showing a data visualization of news analysis with regard to the Heathcare Reform movement.
We’re doing facial analysis as well.
We have interactive tools at viz2016.com.
We’ve tracked how often candidates have smiled in election 2016. Hillary > Trump
We want to share details within our collection, but don’t have tools yet.
Having a good VCR repairman has helped us a lot.
Breaking for lunch…
Talk “Born-digital news preservation in perspective”
There’s a shared consensus that preserving scholarly publications is important.
While delivery models have shifted, there must be some fall back to allow content to survive publisher failure.
Preservation was a joint investment between memory institutions and publishers.
Keepers register their coverage of journals for redundancy.
In studying coverage, we’ve discovered Elsevier is REALLY well covered, but they’re not what we’re worried about.
It’s the small journals as edge cases that really need more coverage.
Smaller journals don’t have resources to get into the keeper services and it’s more expensive.
Many Open Access Journals are passion projects and heavily underfunded and they are poorly covered.
Being mindful of these business dynamics is key when thinking about archiving news.
There are a handful of large news outlets that are “too big to fail.”
There are huge numbers of small outlets like subject verticals, foreign diasporas, etc. that need to be watched
Different strategies should be used for different outlets.
The material on lots of links (as sources) disappears after a short period of time.
While Archive.org is a great resource, it can’t do everything.
Preserving underlying evidence is really important.
How we deal with massive databases and queries against them are a difficult problem.
I’m not aware of studies of link rot with relationship to online news.
Who steps up to preserve major data dumps like Snowden, PanamaPapers, or email breaches?
Social media is a collection of observations and small facts without necessarily being journalism.
Journalism is a deliberate act and is meant to be public while social media is not.
We need to come up with a consensus about what parts of social media should be preserved as news..
News does often delve into social media as part of its evidence base now.
Responsible journalism should include archival storage, but it doesn’t yet.
Under current law, we can’t protect a lot of this material without the permission of the creator(s).
The Library of Congress can demand deposit, but doesn’t.
With funding issues, I’m not wild about the Library of Congress being the only entity [for storage.]
In the UK, there are multiple repositories.
testing to see if I’m still live
What happens if you livetweet too much in one day.
t’s the beginning of yet another quarter/semester (or ovester, if you prefer) and a new crop of inquiries have come up around selling back used textbooks and purchasing new textbooks for upcoming classes. I’m not talking about the philosophical discussion about choosing your own textbooks that I’ve mentioned before. I’m considering, in the digital era,
What are the best options for purchasing, renting, or utilizing textbook products in what is a relatively quickly shifting market?
The popular press has a variety of evergreen stories that hit the wire at the beginning of each semester that scratch just the surface of the broader textbook issue or focus on one tiny upstart company that promises to drastically disrupt the market (yet somehow never does), but these articles never delve just a bit deeper into the market to give a broader array of ideas and, more importantly, solutions for the students/parents who are spending the bulk of the money to support the inequalities the market has built.
I aim to facilitate some of this digging and revealing based on years of personal book buying experience as well as having specified textbooks as an instructor in the past.
Most current students won’t have been born late enough that electronic files for books and texts will have been common enough to prefer them over physical texts, but with practice and time, many will prefer electronic texts in the long term, particularly as one can highlight, mark up, and more easily search, store, and even carry electronic texts.
Before taking a look at the pure economics of the market for the various forms of purchase, resale, or even renting, one should first figure out one’s preference for reading format. There are obviously many different means of learning (visual, auditory, experiential, etc.) which some will prefer over others, so try to tailor your “texts” to your preferred learning style as much as possible. For those who prefer auditory learning modes, be sure to check out alternatives like Audible or the wealth of online video/audio materials that have proliferated in the MOOC revolution. For those who are visual learners or who learn best by reading, do you prefer ebook formats over physical books? There are many studies showing the benefit of one over the other, but some of this comes down to personal preference and how comfortable one is with particular formats. Most current students won’t have been born late enough that electronic files for books and texts will have been common enough to prefer them over physical texts, but with practice and time, many will prefer electronic texts in the long term, particularly as one can highlight, mark up, and more easily search, store, and even carry electronic texts. It’s taken me (an avowed paper native) several years, but I now vastly prefer to have books in electronic format for some of the reasons indicated above in addition to the fact that I can carry a library of 2,500+ books with me almost anywhere I go. I also love being able to almost instantly download anything that I don’t currently own but may need/want.
The one caveat I’ll mention, particularly for visual learners (or those with pseudo-photographic or eidetic memory), is that they attempt to keep a two-page reading format on their e-reading devices as their long-term memory for reading will increase with the ability to place the knowledge on the part of the page(s) where they originally encountered it (that is, I remember seeing that particular item on the top left, or middle right portion of a particular page.) Sometimes this isn’t always possible due to an e-reader’s formatting capabilities or the readability of the size of the text (for example, a .pdf file on a Kindle DX would be preferable to the same file on a much smaller smartphone) , but for many it can be quite helpful. Personally, I can remember where particular words and grammatical constructs appeared in my 10th grade Latin text many years later while I would be very unlikely to be able to do this with the presentation of some modern-day e-readers or alternate technologies like rapid serial visual presentation (RSVP).
Purchasing to Keep
Personally, as a student and a bibliophile (read: bibliomaniac), I would typically purchase all of the physical texts for all of my classes. I know this isn’t a realizable reality for everyone, so, for the rest, I would recommend purchasing all of the texts (physical or electronic, depending on one’s preference for personal use) in one’s main area of study, which one could then keep for the long term and not sell back. This allows one to build a library that will serve as a long term reference for one’s primary area(s) of study.
Renting vs Short-term Ownership
In general, I’m opposed to renting books or purchasing them for a semester or year and then returning them for a partial refund. It’s rarely a great solution for the end consumer who ends up losing the greater value of the textbook. Even books returned and sold later as used, often go for many multiples of their turn in price the following term, so if it’s a newer or recent edition, it’s probably better to hold on to it for a few months and then sell it for a used price, slightly lower than the college bookstore’s going rate.
For tangential texts in classes I know I don’t want to keep for the long term, I’d usually find online versions or borrow (for free) from the local college or public library (many books are available electronically through the library or are borrow-able through the library reserve room.)
Most public libraries use systems like Overdrive, Axis 360 (Baker & Taylor), Adobe Digital Editions, 3M Cloud Library, etc. to allow students to check out a broad array of fiction and non-fiction for free for loan terms from as short as a week up to a month or more. Additionally well-known websites like the Project Gutenberg and Archive.org have lots of commonly used texts available for free download in a broad variety of formats. This includes a lot of classic fiction, philosophy, and other texts used in the humanities. Essentially most works published in the United States prior to 1923 and many additional texts published after this as well can be found in the public domain. Additional information on what is in the public domain can be found here: Copyright Term and Public Domain in the United States.
Why pay $10-20 for a classic book like Thomas Hobbes’ Leviathan when you can find copies for free online, unless of course you’re getting a huge amount of additional scholarship and additional notes along with it.
Often college students forget that they’re not just stuck with their local institutional library, so I’ll remind everyone to check out their local public library(s) as well as other nearby institutional libraries and inter-library loan options which may give them longer term loan terms.
General Economics in the Textbook Market
One of the most important changes in the textbook market that every buyer should be aware of: last year in Kirtsaeng v. John Wiley & Sons, Inc. the US Supreme Court upheld the ability for US-based students to buy copies of textbooks printed in foreign countries (often at huge cut-rate prices) [see also Ars Technica]. This means that searching online bookstores in India, Indonesia, Pakistan, etc. will often find the EXACT same textbooks (usually with slightly different ISBNs, and slightly cheaper paper) for HUGE discounts in the 60-95% range.
Example: I recently bought an international edition of Walter Rudin’s Principles of Mathematical Analysis (Amazon $121) for $5 (and it even happened to ship from within the US for $3). Not only was this 96% off of the cover price, but it was 78% off of Amazon’s rental price! How amazing is it to spend almost as much to purchase a book as it is to ship it to yourself!? I’ll also note here that the first edition of this book appeared in 1964 and this very popular third edition is from 1976, so it isn’t an example of “edition creep”, but it’s still got a tremendous mark up in relation to other common analysis texts which list on Amazon for $35-50.
Hint: Abe Books (a subsidiary of Amazon) is better than most at finding/sourcing international editions of textbooks.
For some of the most expensive math/science/engineering texts one can buy an edition one or two earlier than the current one. In these cases, the main text changes very little, if any, and the primary difference is usually additional problems in the homework sections (which causes small discrepancies in page number counts). If necessary, the problem sets can be easily obtained via the reserve room in the library or by briefly borrowing/photocopying problems from classmates who have the current edition. The constant “edition-churning” by publishers is mean to help prop up high textbook prices.
Definition: “Edition Churning” or “Edition Creep“: a common practice of textbook publishers of adding scant new material, if any, to textbooks on a yearly or every-other-yearly basis thereby making older editions seem prematurely obsolete and thereby propping up the prices of their textbooks. Professors who blithely utilize the newest edition of a texbook are often unknowingly complicit in propping up prices in these situations.
One may find some usefulness or convenience in traditional bookstores, particularly Barnes & Noble, the last of the freestanding big box retailers. If you’re a member of their affinity program and get an additional discount for ordering books directly through them, then it may not be a horrible idea to do so. Still, they’re paying for a relatively large overhead and it’s likely that you’ll find cheaper prices elsewhere.
These are becoming increasingly lean and many may begin disappearing over the next decade or so, much the way many traditional bookstores have disappeared in the last decade with the increasing competition online. Because many students aren’t the best at price comparison, however, and because of their position in the economic chain, many are managing to hang on quite well. Keep in mind that many campus bookstores have fine print deals in which they’ll match or beat pricing you find online, so be sure to take advantage of this fact, particularly when shipping from many services will make an equivalent online purchase a few dollars more expensive.
There are fewer and fewer of these around these days and even fewer textbook-specific stores that traditionally sprouted up next to major campuses. This last type may not be a horrible place to shop, but they’re likely to specialize in used texts of only official texts. Otherwise, general used bookstores are more likely to specialize in paperbacks and popular used fiction and have very lean textbook selection, if any.
Naturally when shopping for textbooks there are a veritable wealth of websites to shop around online including: Amazon, Alibris, Barnes & Noble, AbeBooks, Google Play, Half/EBay. Chegg, Valore, CampusBookRentals, TextBooks.com, and ECampus. But in the Web2.0 world, we can now uses websites with even larger volumes of data and meta-data as a clearing-house for our shopping. So instead of shopping and doing price comparison at the dozens of competing sites, why not use a meta-site to do the comparison for us algorithmically and much more quickly.
There are a variety of meta-retailer shopping methods including several browser plugins and comparison sites (Chrome, Firefox, InvisibleHand, PriceBlink, PriceGong, etc.) that one can install to provide pricing comparisons, so that, for example, while shopping on Amazon, one will see lower priced offerings from their competitors. However, possibly the best website I’ve come across for cross-site book comparisons is GetTextbooks.com. One can easily search for textbooks (by author, title, ISBN, etc.) and get back a list of retailers with copies that is sortable by price (including shipping) as well as by new/used and even by rental availability. They even highlight one entry algorithmicly to indicate their recommended “best value”.
Similar to GetTextbooks is the webservice SlugBooks, though it doesn’t appear to search as many sites or present as much data.
When searching for potential textbooks, don’t forget that one can “showroom” the book in one’s local bookstore or even at one’s local library(s). This is particularly useful if one is debating whether or not to take a particular class, or if one is kicking tires to see if it’s really the best book for them, or if they should be looking at other textbooks.
From an economic standpoint, keep in mind there is usually more availability and selection on editions bought a month or so before the start of classes, as often-used texts are used by thousands of students over the world, thus creating a spot market for used texts at semester and quarter starts. Professors often list their textbooks when class listings for future semesters are released, so students surfing for the best deals for used textbooks can very often find them in mid-semester (or mid-quarter) well before the purchasing rush begins for any/most titles.
And finally, there is also the black market (also known as outright theft), which is usually spoken of in back-channels either online or in person. Most mainstream articles which reference this portion of the market usually refer tangentially to a grey market in which one student passes along a .pdf or other pirated file to fellow students rather than individual students being enterprising enough to go out hunting for their own files.
Most will know of or have heard about websites like PirateBay, but there are a variety of lesser-known torrent sites which are typically hosted in foreign countries which extend beyond the reach of the United States Copyright law enforcement. Increasingly, mega-pirate websites in the vein of the now-defunct Library.nu (or previously Gigapedia) or the slowly dying empire of Library Genesis are hiding all over the web and become quick and easy clearing houses for pirated copies of ebooks, typically in .pdf or .djvu formats, though many are in .epub, .mobi, .azw, or alternate e-book formats. The typical set up for these sites is one or more illegal file repositories for allowing downloads with one (or more) primary hubs that don’t necessarily store the pirated materials, but instead serve as a searchable hub which points to the files.
Creative advanced searches for book authors, titles, ISBNs along with the words .pdf, .djvu, torrent, etc. can often reveal portions of this dark web. Naturally, caveat emptor applies heavily to these types of sites as often files can be corrupted or contain viruses to unwary or unwitting thieves. Many of these sites may attempt to extract a small token monthly fee as a subscription or will rely heavily on serving banner advertising to help to offset large web hosting and traffic fees associated with their maintenance, though it is posited that many of them make in the millions of dollars in profit annually due to advertising arrangements, though this is incredibly hard to validate given the nature of these types of markets and how they operate.
Rather than stoop as low as finding textbooks on the black market this way, students should place pressure on their professors, the faculty of their departments, and their colleges or universities to help assist in smoothing out some of the pricing inequities in the system (see below). In the long run, this will not only tend to help them, but many future generations of students who will be left adrift in the market otherwise.
Long Term Solution(s) to Improving the Textbook Market
The biggest primary issue facing the overpriced textbook market is that the end consumers of the textbooks aren’t really firmly in charge of the decision of which textbook to purchase. This is why I advocate that students research and decide by themselves which textbook they’re going to use and whether or not they really need to make that purchase. Instead, individual professors or the departments for which they work are dictating the textbooks that will be purchased. The game theory dynamics behind this small decision are the massive fulcrum which allows the publishing industry to dictate their own terms. Students (and parents) should, in a sense, unionize and make their voices heard not only to the professors, but to the departments and even the colleges/universities which they’re attending. If universities took a strong stance on how the markets worked, either for or against them and their students, they could create strong market-moving forces to drastically decrease the cost of textbooks.
The other larger issue is that market forces aren’t allowed to play out naturally in the college textbook market. Publishers lean on professors and departments to “adopt” overpriced textbooks. These departments in turn “require” these texts and students aren’t questioning enough to use other texts for fear of not succeeding in courses. If the system were questioned, they’d realize that instead of their $200-300 textbook, they could easily purchase alternate, equivalent, and often even better textbooks for $20-50. To put things into perspective, the time, effort, energy, and production cost for the typical book isn’t drastically different than the average textbook, yet we’re not paying $250 for a copy of the average new hardcover on the best seller list. I wouldn’t go so far as to say that universities, departments, and professors are colluding with publishers, but they’re certainly not helping to make the system better.
I’ve always taken the view that the ‘required’ textbook was really just a ‘suggestion’. (Have you ever known a professor to fail a student for not purchasing the ‘required’ textbook?!)
In past generations, one of the first jobs of a student was to select their own textbook. Reverting back to this paradigm may help to drastically change the economics of the situation. For the interested students, I’ve written a bit about the philosophy and mechanics here: On Choosing Your Own Textbooks.
Basic economics 101 theory of supply and demand would typically indicate to us that basic textbooks for subjects like calculus, intro physics, or chemistry that are used by very large numbers of students should be not only numerous, but also very cheap, while more specialized books like Lie Groups and Lie Algebras or Electromagnetic Theory should be less numerous and also more expensive. Unfortunately and remarkably, the most popular calculus textbooks are 2-5 times more expensive than their advanced abstract mathematical brethren and similarly for introductory physics texts versus EM theory books.
To drastically cut down on these market inequities, when possible, Colleges and Universities should:
- Heavily discourage “edition creep” or “edition churning” when there really aren’t major changes to textbooks. In an online and connected society, it’s easy enough to add supplemental errata or small amounts of supplemental material by means of the web.
- Quit making institution-specific readers and sub-editions of books for a specific department
- If they’re going to make departmental level textbook choices, they should shoulder the burden of purchasing all the textbooks in quantity (and taking quantity discounts). I’ll note here, that students shouldn’t encourage institutions to bundle the price of textbooks into their tuition as then there is a “dark curtain,” which allows institutions to take the drastic mark-ups for themselves instead of allowing the publishers to take it or passing it along to their students. Cross-reference Benjamin Ginsberg’s article Administrators Ate My Tuition or his much longer text The Fall of the Faculty (Oxford University Press, 2013).
- Discourage the use of unpopularly used textbooks written by their own faculty. Perhaps a market share of 5-10% or more should be required for a common textbook to be usable by a department, and, until that point, the professor should compete aggressively to build market share? This may help encourage professors to write new original texts instead of producing yet-another-introductory-calculus-textbook that no one needs.
- Discourage packaged electronic supplemental materials, which
- are rarely used by students,
- could be supplied online for free as a supplement,
- and often double or triple the price of a textbook package.
- Strongly encourage professors to supply larger lists of relatively equivalent books and encourage their students to make their purchase choices individually.
- Consider barring textbook sales on campus and relying on the larger competitive market to supply textbooks to students.
Calibre: E-book and Document Management Made Simple
As an added bonus, for those with rather large (or rapidly growing) e-book collections, I highly recommend downloading and using the free Calibre Library software. For my 2000+ e-books and documents, this is an indispensable program that is to books as iTunes is to music. I also use it to download dozens of magazines and newspapers on a daily basis for reading on my Kindle. I love that it’s under constant development with weekly updates for improved functionality. It works on all major OSes and is compatible with almost every e-reader on the planet. Additionally, plug-ins and a myriad of settings allow for additional extensibility for integration with other e-book software and web services (for example: integration with GoodReads or the ability to add additional data and meta-data to one’s books.)
- Students Get Savvier About Textbook Buying | The Chronicle of Higher Education (1/27/13)
- Study Indicates College Textbook Piracy is on the Rise, But Fails to Call Out Publishers for Skyrocketing Prices | TechDirt.com (9/23/14)
- How Slugbooks Subverts the Dirty Tricks of the College Textbook Market | Fast Company (4/14/14)
- More students are illegally downloading college textbooks for free | Washington Post (9/17/14)
- Why digital natives prefer reading in print. Yes, you read that right. | Washington Post (2/22/15)
- A Smarter Approach to College Textbooks | Inside HigherEd (8/18/15)
- (Update 8/26/15): Great article via Attn: Here’s Exactly Why College Textbooks are So Expensive | Attn: (3/12/15)
Be sure to read through the commentary on some of these posts for some additional great information.
What other textbook purchasing services and advice can you offer the market?
I invite everyone to include their comments and advice below as I’m sure I haven’t covered the topic completely or there are bound to be new players in the space increasing competition as time goes by.