Dr Marie-Louise Ayres, director-general of the National Library of Australia, is the nation’s top librarian. She talks to Stephen Easton about the culture of innovation that has allowed the NLA to remain an international leader in its field for decades, relying entirely on its own in-house expertise.
“Radical incrementalism” is the term Marie-Louise Ayres uses to best describe the approach that delivered the new Australian Web Archive this year — charting a direction and taking small steps that lead to profound change over time, through learning and evaluation along the way. In theory this allows organisations to achieve goals that would have previously sounded too big, ambitious or risky.
“And that is, seeing a problem for which there are no obvious solutions, making a start anyway, setting a course and then sticking to it for the very long term.”
The “problem” was how to capture the internet. The NLA began collecting Australia’s earliest HTML publications, establishing the PANDORA archive in 1996. It started storing annual snapshots of the whole Australian domain (.au) in 2005, and government sites in 2011. All three are now in the AWA, a new addition to the Trove discovery platform, which Ayres managed from 2011-17.
“So I was asking myself, can you call it innovation if it’s taken us 20 years to get here? And in our case, I would say yes, because every step along the way has been really innovative, at the time.”
One of her predecessors once told her that “nothing really significant happens in less than 20 years.”
“I sort of think for our kind of work, that’s true. We have a great record of seeing a problem and sticking with it for a very long time, whereas others might fall by the wayside.”
She says her colleagues are culturally “wired” for this approach because statutory independence insulates them from the “short-term cycles of interest” that affect some of their international counterparts.
“Until you see what it’s like in other parts of the world where civil services, and particularly organisations like mine, are organised around particular professional streams with distinct career pathways — and they shall never meet – I think you don’t realise the freedom that the APS framework gives us to do what works here in the Library, which is rich cross-disciplinary teams. I think from a public-service perspective, the way the APS operates and the way statutory agencies have legislation that says ‘go forth and do this for the very long term’ creates the environment in which a culture like ours can flourish – and therefore can deliver those innovative programs time and time again.”
The new web archive is “the most technologically challenging search project” ever undertaken by the NLA … and it had no need to outsource the technical expertise.
“We have always had really strong in-house IT; about 10% of our budget and 10% of our staffing is allocated to IT. Many of them are at the top of their game. They are in demand elsewhere but they stay here because the work’s really interesting, and lots of them have been working with our digital systems and digital content for 10, 15 years, so they’ve got great minds, but they are completely imbued with the mission of the organisation.”
It is not a separate team; it’s “completely integral” to all of the Library’s work. Software developers work in multi-disciplinary teams alongside project managers, policy and media officers, collections specialists, and user experience designers.
“Our purposes are to collect, and to make those collections accessible. And in a digital world a huge part of our collecting is digital. It’s born digital. It’s not just the web archive, it’s e-books, e-journals, music, maps, personal archives and so on. It’s not separate in any way, and therefore we do get this great strength from cross-disciplinary teams with this combination of long and deep experience, and then you do have to leaven that every so often by bringing in [new staff].”
The next big challenge is online walled gardens like social media websites, while also keeping up with the growth of Australia’s corner of the internet. Collecting websites is a selective, manual process, but the Library doesn’t do it alone.
“We work with partners around the country, all the state and territory libraries, the National Gallery, AIATSIS, the National War Memorial – we all sort of take responsibility for the parts of Australian society, as documented on the internet, that we’re interested in and we decide to intensively collect. Around the time of the same-sex marriage postal vote towards the end of November, we treated that here at the Library as if it was an election. And when there’s an election, we try to reach out across the community to say, ‘Right, all those posters, t-shirts, flyers, bookmarks — anything to do with the election from any perspective — send it to us.’ We also intensively collect websites for political parties and candidates, and so do our partners, so in the case of the same-sex marriage postal survey we did the same thing. We built a strong collection around that moment in time both in paper form — or t-shirt form — and in website form.”
The Library works with the Internet Archive, a US-based non-profit that also began recording online history in 1996, to store wholesale snapshots of the .au domain.
The Internet Archive’s Wayback Machine is a popular tool for finding pages that disappear but it doesn’t contain as much Australian content as the AWA. Both would be much the poorer if they didn’t work together.
Government websites are not themselves official records, but they are “grey literature” for the Library’s purposes and in general, Ayres thinks agencies could do more to take care of them. Public servants occasionally ask the NLA if it has a copy of past web content that goes missing – and they might – but its role is not to be this kind of backstop. It’s a historic moment when things change in government, however, so collecting goes into overdrive.
“What that means is, [at times like] last year when all the turmoil was going on, and not only were prime ministers and ministers changing but departments were quickly reorganising websites, well, we get busy then. Every time there’s an election or a [machinery of government] change, we just get right in there and capture everything we can. Now, that might not be in the four or six weeks a year that The Wayback Machine is looking at Australia, so it would disappear otherwise.”
One big challenge to the AWA project was the enormous scale of making around 9 billion files accessible through a customised full-text search engine, which meant configuring servers specially for the task.
“We knew we wanted to do the full-text searching and we had a bit of an experiment on it, and it then became clear that the index that we were going to need to serve up that content was 10 times bigger than our existing Trove index. Now that was the moment of thinking, ‘holy dooley, how are we going to do this?’”
It initially looked like the Library might not be able to afford all the hardware it needed. Ayres is sure it’s not the right time to move to cloud-based servers, but that remains an option in future. It took a lot of iterative work, testing and learning, to get to a point where the search engine’s performance was acceptable to go live.
“That was just something you had to chomp at and chomp at and chomp at, and just keep optimising, with the best minds working on it.”
Another big challenge was how to convey the dimension of time to the user, so they know they are viewing a website at a certain point in the past, displayed inside the Trove website. There are visual cues like the Trove border, messages explaining where you are going, and a kind of temporal navigation box that floats over the top, but can be minimised to a small circular icon and dragged around the page, across the border between past and present.
“Our approach normally when we’re serving up content is, ‘Get out of the way of the user. Don’t put anything in between the user and the information they want.’ In this case we had to decide to put some things in the face of the user to let them know, ‘Oh I’m at this point in time, and I might want to go there.’”
A third challenge was dealing with potentially objectionable or offensive content, in consultation with the e-safety commissioner, the information commissioner and the government via the Department of Communications and the Arts.
“It is our role to collect pornography if it’s legally published so that’s fine; we have to collect it. But it’s obviously a little different if you collect print pornography and it goes into your stacks, and really somebody’s got to give you a library card and request it … to possibly having a rather quaint 1990s [pornographic] site come up in a search engine, with a lovely National Library of Australia banner around it. So, thinking about what that might mean both for the community, and for the reputation of the Library was important. What we needed to do then was to think very carefully about relevance rankings. We haven’t removed that content, but we did do some work to make sure it wasn’t going to just be there all the time. Getting the relevance ranking right took a long time.”
With so much going online all the time, the mind boggles at how they decide what to collect and when …
“It is an impossible task but we started anyway! The internet’s dynamic and you can either say it’s endlessly dynamic, and therefore this is an impossible problem, so you’ll do nothing, or you can say it’s endlessly dynamic, it’s an impossible problem; let’s do the bits that we can. We would say if we’ve only collected a website once a year, it’s better than not collecting it at all. If it’s an Australian Government website and you’ve collected it 15 times in the year and something changed the day after, well, so be it. You can’t do everything.”
Website collecting is done quite selectively, like taking photos of what looks interesting … and the librarians spring into action when major events take place.
“As an example, when the Lindt cafe saga happened, all of a sudden we knew there was something going on in Sydney. We got on the phone to our colleagues at the State Library of NSW and we said, ‘We’re going to take care of all the website stuff.’ So we just [took snapshots] of everything as it came through and they focused on the social media, using their own very good tools for doing that … so between the two of us we didn’t get every dynamic change on that day, but you could say there’s a reasonable representation of it. So it’s the art of the possible, because the impossible paralyses people.”
Soon, the NLA needs to rebuild its digital tools for web collecting, which again were created in-house. The next generation will make greater use of machine learning and automation.
“We wrote the first tools in the world. They’re used all over the world, but we’ve actually probably [fallen] slightly behind at the moment. We’ve focused on other things, and we need to reinvest there.”
Another upgrade for the Library this year is the national e-deposit system, a joint project with state libraries launched on May 30 that is a single platform for Australian publishers to deposit digital material, and for the libraries to manage it and provide access.
“So again it’s all part of our collecting and access in the digital world, it’s not as sexy as the Web Archive, but by working together we will undoubtedly get a much richer collection, because all of the small community organisations that you might not have got to if you were all doubling up the work, we’ll be able to get to now.”
One might think we now have more capacity than ever to capture the present for the people of the future, but document preservation typically say it’s much harder to preserve masses of digital information in a robust yet accessible collection than it is with old manuscripts.
“Because we use digital all the time and it’s ubiquitous, we therefore think it’s kind of safe and it’s going to stick around. Look, I would far rather have 500-year-old books to look after than five-day-old social media messages.”