Anna McHugh  
Hi, my name is Anna McHugh, and I'm a content librarian and curator at Red Hat, which is one of the world's first open source software companies, I work on the content team within the marketing organization. And I've been tasked with organizing our content, making sure it's accessible and finable and bringing some governance and metadata strategy into our content strategy. Today, I want to take you on a short journey of the four and a half years that I have been at Red Hat, and how a small sub team within the content team has been working on our technology problems, our content management platforms, and bringing together performance data along with metadata to enrich our understanding and the context around the content that we create. So let's get started.

Anna McHugh  
I just want to, you know, come back to something that I think is really important for librarians, knowledge managers, information architects, user interface designers, there's a lot of different names for people in our related disciplines. But I think the one thing that we can all come back to is the fundamental idea that knowledge is priceless. This is something that is an idea that is old as time itself. Certainly, you know, one of my greatest loves in life is to spend time in beautiful libraries like this, exploring the stacks and, you know, looking at the great minds of the past. Now, I think it's really important to note that in today's age, we have more knowledge than ever and more information and data than ever. That being said, the knowledge value that exists hasn't been diminished whatsoever. I think that it's our task as librarians, curators, and others, to bring again, context and understanding to data and to information and turn it into knowledge and insight. But enough of the corny stuff, I want to talk about where we're at today in our field and my perspective on some of the things that we can really focus on and some of the emerging trends that are going to greatly impact our work over the course of the next few years. First and foremost, and I think this is something that everybody has been considering a great deal is the reality Of all remote work. Red Hat in, you know, particular is an engineering organization. But we have associates all around the world more than 13,000 people. And we have actually been a sort of remote friendly organization for a long time. So in my time at Red Hat anywhere between 30 and 40%, of Red Hat associates were remote only employees. Now, obviously, in the middle of March 2020, our workforce for the most part became entirely remote. And we had to really start to adapt. And you know, obviously, this is an evolutionary process of trying to get our heads around how to help people do their jobs better. Now, I think this is really important for people in our field, because we can carry forward some of the knowledge of processes, the knowledge of the wisdom and the insight and the culture within our organizations to ensure that our values carry forward even if we're not sharing office spaces together. One of the examples I looked through when I was sort of trying to understand and suss out

Anna McHugh  
How this kind of change was going to affect my work comes from the example of get lab. Now because Red Hat is an open source software company, I spent a lot of time trying to educate myself about the subject matter and the technologies that they work with. Now, this is all very complicated stuff. And I definitely won't pretend to be an expert. But I do often look to open source communities for inspiration for ideas of better ways of working and ways of also getting around some of the fundamental blockers that make it really really difficult to make a long term plan and stick with it. Get lab is a company that I have, you know, been familiar with for a number of years and I stumbled across this resource that they have that's called the remote Manifesto. Now get lab is a very important part of the software development, tracking and project management software. It definitely is used across all sides of organizations for version control, code control, managing large software projects. Now often these projects are also within an agile methodology, meaning

Anna McHugh  
That small teams have, you know, there's a variety of different practices. But fundamentally, it's about small teams who are multidisciplinary, meeting with a representative of the business and understanding on an ongoing basis what they need to be building and what they need to be doing. Now agile is not easy. And it's especially not easy when you have an all remote workforce which get lab does. So in addition to what they call the Agile Manifesto, which is just basically outlining some of the ways in which software companies and other organizations can move more quickly and dynamically react to their realities. You know, get lab formed a series of principles around which this culture of openness and flexibility could be carried into an all remote remote organization. So fundamentally, there you know, are only three key tenants of this Manifesto. I do recommend that you look it up and read through the entirety of the document. But the reason I bring this up is because two thirds of the fundamental core values so six out of nine rely on the kind of work that knowledge managers, information architects, digital librarians, again, US nerd people. And I'm just going to from here on in call us beautiful nerds, because we go under so many different labels, I'm just going to waste your time and Mind if I continue to go with a whole list. In any case, us beautiful nerds have a very important role to play in a couple of the ways that remote work can be facilitated in a successful way. First of all, within a remote organization, you need to be far more inclusive and transparent. And you need to be far more mindful of how that works. Now, I'm not saying that, you know, if you're in an office and co working with people, that you're not mindful of your communications, but just by the physical proximity to people you will often have conversations that folks who are off site are simply not going to be privy to and so get lab has made a very, very clear choice to articulate that their number one value is to establish an inclusive community and culture even though people are not in the same physical spaces. Now in order to facilitate that, that kind of environment. These are the two core areas where I think a lot of us do our work. First of all, instead of providing on the job training, or, you know, working with people, to sort of point them to some resources and calling, you know, provide clarifications, additionally, you know, receiving tickets and responding to them. What get lab really focuses on is clear documentation, ensuring the processes are clearly documented, that people also have an opportunity and a mechanism to give feedback on how those processes work. But the really important part of this that I think is, you know, one of the core responsibilities of anyone in this kind of job is virgin control, and ensuring that process documents are clear that they are also kept up to date and that nothing remains on the company intranet that should not finally asynchronous communication is incredibly important. A lot of people you know, when they think of an agile methodology. They think about meetings because agile has a couple of things. They call them rituals, essentially, they're meetings by which this small team of people come together and assess what they're doing what they are blocked on, and what they're going to be doing in a short period of time in the future. Now, the reason that they plan in short chunks is so that they can, again, be reactive and responsive, and change the project or change the product they're building as the needs of the business change. Now, this can be quite brilliant. But the problem is that asynchronous communication is absolutely necessary for a remote workforce. You know, you could be up at four o'clock in the morning because you have a young child who's ill, you could be working, you know, across the globe, which many Red Hat teams do. There are a wide array of people circumstances, and especially in this era, where we're starting to shift how we think about work, and take in many ways greater personal responsibility for blending our lives and blending our work together in a healthy way. It's really important for good employers to foster this asynchronous communication. So you don't need to be in the exact same room with someone to have meaningful and thoughtful communication. And again, that's very, very important in the documentation space, being able to point people to the right resources, and discuss with them, you know, not at the, you know, not at the commencement of the day over your coffee, but over the course of the reasonable time that you guys are available in order to teach, to learn to provide feedback and to move knowledge in the organization forward. So obviously, we're in a very new world. You know, the reason that I fall asleep in the back of my hand every night is I really love those libraries. You know, like the one I had on my first slide, makes me feel very erudite and well educated. But the the reality of it is that information at this point in time is primarily stored in data centers. And so you know, we're moving and sort of not the value of knowledge at all, but we are moving the set and the setting and the ways in which people encounter knowledge, data centers, you know, in the sort of wide array of tools that allow people to access the internet and get information very rapidly, allows us to create more knowledge, absorb more knowledge, and additionally, cut up the pieces of knowledge that we have in little chunks. So no longer you wandering through the stacks and stumbling oversleeping graduate students to find that one book from that one authoritative author, you now can gather a great deal of different perspectives and information in order to form insights and judgments. Now, I think this is a really wonderful thing. And I'm not suggesting that all knowledge managers and beautiful nerds become data center system admins. Instead, I am suggesting that we need to become familiar with the fact that there are different methods for people accessing knowledge. And we need to master the tools to basically bring these very large data sets and also these very, very diverse chunks of information together in a coherent way. Now, this is especially important in our own organizations. There are some folks who you know, I think, say okay, the, you know, Google algorithm will bring some sort of sense of context of exactly

Anna McHugh  
What our company or organization is all about. And the reality is that you have to absolutely have to as an organization collaborate on who you are and what you're saying. One of the things we do at Red Hat that I think was a terrific project was actually persona finding the Red Hat voice. So, you know, instead of saying, well, we just have the following traits when we're speaking in the Red Hat brand. Instead, we sat down and over the course of a very long time, because red Hatters like to debate, we came up with an individual whose voice was really representative of the warmth, the openness, transparency and curiosity that we want Red Hat to represent.

Anna McHugh  
So I want to move on a little bit. You know, I think that there's a lot of anxiety in the field and not just in our field, but in many others around being replaced by artificial intelligence and software that can do all of the tasks that we are able to do. Now, the reality is that there are a lot of really interesting developments and there are jobs that are going to change fundamentally or go away.

Anna McHugh  
That being said, I think that, you know, artificial intelligence and machine learning, and human beings are very, very complimentary to one another. And one of the things that human beings do far, far better than the machines at this point is to look at an image or to absorb a piece of text and understand the larger context in which it exists. And so you know, the idea that I like to talk to people about in the value of metadata is that metadata assigns labels to the different pieces of content you have now each piece of content is a star within a constellation. And using metadata to tie all of those pieces together, gives you a coherent whole, and it superimposes another layer meaning saying this is what we're all about. This is how all of our, our ideas are connected. And so this is something that human brain is uniquely good at. This is an image here of human Bay human neuron, assessing and processing and recognizing an image. Now according to MIT, it takes only 13 milliseconds for a human being to take a look at an image, recognize it and also form an impression about what it means. Now machines can do that almost as fast. But one of the things they can't do is distinguish a lot of things around Symbolic Logic, distortions, abstractions. A really great example of this is that machines, you know, are getting increasingly good at recognizing an object, a physical object. However, if you were to show a, you know, a machine, a table that included, you know, let's see a banana, a vase, and then a picture of an apple, and you ask that machine to identify the apple, it would have far more difficulty and take much, much longer to identify that object and it would take a human being who could look at it, snap of the fingers, say, Okay, I know what that is. And I understand that that's a picture that is the context around it. This is an art form. So there are a couple of really interesting examples that I find to be fascinating that really accentuate how human beings in these machine learning. You know, algorithms can work together in complimentary ways.

Anna McHugh  
First is the recapture project, which you've probably seen before, when you need to verify that you're a human being. And you see these distorted, you know, images of text. Now, as you may be aware, this is actually a massive digitization project. So digitizing tons and tons of books that will be available, you know, within our great knowledge database, that is the internet. So this is basically harnessing the power of the human mind in the human eye to look at an assorted piece of text and say, I know exactly what that is, that is also really an interesting and innovative idea because it keeps malicious bots and other, you know, computer programs out. So I think that, you know, at the end of the day, we really do have a complimentary situation and a lot of people are becoming very, very anxious that all of the tests on the internet are being done by AI or machine learning. Well, first of all, I want to tell you that that is simply not the case. You know, the Mechanical Turk is one of the most interesting I think area where you can do a whole lot of work and not make a lot of money on the internet. It is a way that, you know, individuals can contract for what are called micro tasks. And so you know, they go onto a marketplace. And there are a variety of bids for things like looking at databases of images, and identifying and tagging those images, a variety of those activities that may sound very familiar to you if you've ever done a content audit. Now, some people would presume that a lot of the information that comes out of mechanical Turks are coming from AI. But the reality is that, again, these things are complimentary to each other. And as the, you know, as these neural networks as these machine learning algorithms become more sophisticated, we are also responsible for helping them establish context. And I think that's a really important thing. Now, you don't have to take my word for it. Well, let's talk about Google for a second. So Google has recognized that not only is their algorithm exceptionally good at giving people very specific bits of information, that it can be both a benefit and a curse. And so in order to, you know, shape and train their algorithm in a way that sort of favors humanity, you know, deals with hate speech and tries to, you know, remove on Twitter trustworthy or untrue information from the web, they've decided that a lot of their training data needs to be derived from humans who can look at different pieces of content and judge their quality. And that quality is actually based upon, again, the context that the information exists within. Now, the Google quality rater guidelines is 168 page document that human beings actually use and look at thousands of websites. It's a very sophisticated taxonomy that again, allows human beings to say, Okay, this item is, you know, not appropriate for, you know, consumption on the web, because it is a false news or it is exploitative, or it is hate speech. And I think it's really important to note and, you know, I know that in this conference, there are a lot of other presentations about our inherent biases and the issues that can come up when we're not, you know, not hands on providing data that basically places humane values within these algorithms, because otherwise we can occasionally get ourselves in deep trouble. So Google recognizes that there is a lot of value in human beings looking at and reviewing content, which I think is super important. So that's kind of my overview. I think the robots are not coming for our jobs. But we do need to evolve and we need to become more technologically savvy, and especially in the tools department, but I will get to that very soon. So I want to very briefly give you an overview of where my team has been over the past several years. So I joined Red Hat in the end of 2015. In 2014, and 2015. we migrated our website from one content management system to Drupal, and then built the entire implementation. And one of the things that you know, my boss said to me when I first came on board is we inherited this website, we have a library of marketing documents, and we don't know which ones are good and which ones are bad.

Anna McHugh  
And some of them are really old. And I said, Well, why can't we just look at the published date? And my boss said, aha, the problem is that that publish date as reflected in the metadata on our documents is the date that they were migrated into the CMS. Furthermore, you know, we had an internal repository that was used by marketers, and you know, product managers to distribute their documents to sales, you know, members of our consulting organization, and that system was completely out of sync with our website. So the way that our internal repository is, you know, is and was set up, it's basically a series of file folders, it's a really traditional directory structure. problem is that each of these individual file folders are owned by a different person, they have a different subdirectory structure. So one of the you know, some of them have products and then different components of products that people may be interested in. Others have programs and then different content types within them, so it can become a very chaotic experience. Furthermore, the content lifecycle was clearly delineated by business.

Anna McHugh  
But it was contingent upon those content owners responding to an automated email saying your document has been live for six months. If you do not respond to this email, it will be archived. So on the one hand, we had an internal repository that was very difficult to navigate, even for content owners, and a life cycle that was expiring documents too rapidly, because people simply didn't have time to keep up with that automated lifecycle. On the other side, we had Red Hat comm with tons and tons of documents, and we just didn't know about their quality, or what we needed to do to do with them. And so, you know, basically, we tackled this in 20, you know, throughout 2015, the end of 2015. And then all of 2016 was the year of the audit in my life. And so basically, you know, we went through and audited every piece of content with it, that was within our library. We also removed you know, things like removing duplicates, looking at actual published dates, removing items that are very clearly out of date. During that time, we also assess the taxonomy that we had inherited, so

Anna McHugh  
We had a series of terms that we had, you know, inherited from the old content management system. And some of them were really good. So you know, we had a parent category for products. And we had most of the Red Hat products, which was great. You know, we had a topic list, which was kind of fuzzy. But there are a number of things that worked reasonably well. Within there. There were a couple of fields that we were missing, however. And at that time, we were sort of moving our marketing efforts and moving our sales efforts, and moving our overall communication style with the world to what we call the Red Hat conversations. And so these are discussions that we have with our customers that really cut to the core of their it problems. Unfortunately, even though we're creating all of our content around trying to foster these conversations, we had no metadata to support that and to identify that content within our own content ecosystem. So we spent some time auditing the taxonomy, adding those labels as well, because that was a really important way of saying, Okay, this is how we, as a broader team, we're not talking about 45 products. We're not talking about different product lines. We're not even talking about

Anna McHugh  
You know, niche topics, we are talking about collectively, five things that we hear from our customers just five. And we want to put ourselves in their shoes and foster a web experience in a conversation with them that is focused on their needs. So that's what we were doing, you know, throughout 2016 2017. And 2018 was really interesting because we focused a lot on data access in 2016. So basically, we're trying to make sure that we could extract data from all of the different software systems that we use, and then integrate them but bringing them together basically in giant spreadsheets using common keys and, you know, identifying numbers between them. As a consequence of this, we were able to work really closely with some of our data analysts and to do you know, content analysis and curation. So essentially, what that was is we would take the performance data for pieces of content, then we would also take the corrected metadata that we had assigned to those different pieces and say, okay, you know, we want to promote Red Hat Enterprise Linux, our flagship product. So what we want to do is we want to look at the higher highest performing assets. But we also want to exclude anything that is not related to Red Hat Enterprise Linux. So it made us a lot easier to slice down our content into much smaller chunks. So this allowed us to do analysis, we've also worked on search engine optimization and migrating our content into web friendly formats. And then additionally, in 2017 2018, we founded a metadata committee, one of the things that we've discovered about managing metadata is everybody has a different understanding of what these concepts mean. Additionally, there are a lot of implications when you're working with multiple software systems and multiple teams that are creating content. So we formed a committee that is, you know, open to anyone who wants to participate, and that's where we discuss and implement taxonomy changes. And those changes apply across tools without you know, throughout the organization. So, you know, 2017 2018, we're doing a whole lot of that. And then finally, we started to realize that some of our systems and our software really weren't connected as well as we wanted, we were pulling data out of different systems, throwing them together into spreadsheets, we did a lot of data scrubbing. It was pretty gross. And so I have a colleague, his name is Brian fan he very near and dear to my heart who started coming up with homebrew technical solutions to basically automate some of these unpleasant tasks using, you know, basic Python scripts. But I'll get into that story a little bit later. 2019 and 2020 were characterized basically by upgrades in tools and also upgrades in the personalization and relationships between our content. So we now have a digital asset manager. So that allows us to have a library of content that also has related assets that are associated. So if you have a white paper, and there is a LinkedIn ad, and there's modular copy for an email, you can find those assets side by side, which used to be a big struggle for us. We also started to automate workflows so people can request these kinds of assets. And then we received some support from leadership saying we need to align these tools and we need to keep the software tools in alignment, because without this, all of these other efforts become, you know, dubious at best and impossible at worst. And then also in 2020, we're using a lot of our beta data to power personalization and audience segmentation projects through Adobe audience manager and Adobe target. So essentially, we're trying to we, you know, receive visitors, those visitors go to certain pages that are labeled with different tags. And when they return, if they have a cookie on them over the course, you know, they've seen us over the course of the last 30 days, we can say, Oh, this individual interacted with Red Hat Enterprise Linux content, they are interested in open source, and they are looking at kind of high level content. So maybe they're a student or you know, they're they're not a deep technical head. They're not looking for, you know, an architecture that will show them how to build a server rack. So that's what we've been working on essentially, trying to understand the full journey that people take with us and use metadata to foster these conversations in a way that is meaningful.

Anna McHugh  
So I'm not going to sugarcoat it, my team has encountered a whole lot of problems, you know, and I think Red Hat in general, we create too much content. And so this graph here will show you basically the ramp up of the raw numbers of the creation of marketing collateral assets in the last several years. And as you can see, it is very, very steady. And it is, you know, an inclining line. Now, that would be all well and good if our marketing collateral did well. However, one of the things that is, you know, critical, even if you're doing audits, you are at risk for what some people call content rot, even if you, you know, are reasonably good at maintaining your library, if you just have a lot of content that say it says the same stuff, you end up with content rot, which stands for redundant, outdated or even worse than anything else is trivial. And so you know, my team has worked very hard to use the metadata that we have at our disposal and also the performance data we have at our disposal to inform our stakeholders and try to encourage them to make a little bit less. So what we really have discovered in bringing this information together is something that is a very well known principle, it's called the Pareto principle, which is basically the top 20% of our assets contribute 80% of the effect, and be that you know, organic traffic, it's 80%. If it is lead generation still hovering around 80%. By the same token, 80% of our work creates 20% of the traffic and conversions. And so again, we've been trying to push back against this using the software that we have the access to data that we have, and the access to the information that helps us make context around the things that we own that we promote. So I want to talk about the basics and the things that we've done at Red Hat and a little, you know, just a couple of details that may be helpful to you in establishing your own practices if you have not already. So the three, you know, main things that I have worked on and consistently like if I you know, if I were to become a zombie and have to give up my job for one reason or another and handover my responsibility to somebody else. At least my half soggy, brain desiring brain would be able to say, these are the core responsibilities. There's a lot of other sophisticated stuff my undead but can't possibly tell you about them. All right, that was a little bit weird. I'm a little bit nervous being in front of a camera is a little bit strange. That being said, these are the three things that if I were a zombie, I would tell my successor to do. First of all, you have to do your content audits, and you have to do your metadata audits. You have to do this consistently. And you know, my boss, when I first got hired, she was like, well, we this is an ABA organization or ought to be and I said, Maybe I'm wondering what kind of professional organization This is. And she said, Well, we need to always be auditing. And I could not agree more. Second, I think that it's really important to develop a taxonomy that is flat and usable and also has parallel terms within it. And when I say parallel terms, what I

Anna McHugh  
is, you know, making sure that if you have a topics List, that the topics all are singular if they're going to be singular, singular, or plural, if they're going to be plural, but also that they're kind of the same type of things, especially if you want to use your metadata on the front end of your website or on the front end of certain tools, so that people can use, you know, search boxes, or any variety of ways of filtering their content down,

Anna McHugh  
you need to make sure that those lists actually are coherent, and that the terms hang together. So usable and flat taxonomies make that a lot easier. It makes auditing revision a lot easier. And it also makes it easier for, you know, your content authors and people on the front end of your website to use. And then finally, but probably most importantly, is access to data and the integrity of that data. And all of the things that you know, we do in terms of auditing in terms of going through piles of content that as it comes in making sure that we're labeling things properly before they're even published, comes down to being able to provide reliable information.

Anna McHugh  
About what something is about, and why it's important to our business and why it's important to our audience are a lot of people seeing it, are we making money because of it? These are the kinds of questions we're trying to answer. And we simply cannot do that without unified performance information and great data data to back it up. So just to give you a little bit of a sense of how we have proceeded on our auditing journey, as I mentioned, when I joined the company, we had a situation where there was just a whole lot of content chaos. And so, you know, we established a rolling audit to begin with just to basically clear out the Deadwood. And then after that we established a quarterly audit. And so basically on a quarterly basis, I send out a report of any item that is older than 18 months, and has fewer than 25 downloads over the course of a 12 month period. Additionally, we ensure that that content is not being promoted out in the field, especially because we're a global organization. We have to acknowledge that occasionally, people are going to promote that content in a different place than our website. And so we capture that information of like, Is there a single person who filled out a lead form anywhere in the world related to this piece of content? And if so, it is worth at least hanging on to for the time being. So we do this on a quarterly basis, we get feedback and collaboration from the stakeholders. And this also often prompts conversations about like, Hmm, this is a piece of content that we really thought would do well, but it just tanked. What can we learn from this, or, you know, even a better understanding of how we ought to position our marketing content around products that have gone out of date. So over the course of these audits, we have archive 2040 PDFs. That archive button is one of my best friends. And I think it's also something that has really helped us build trust and recognition within Red Hat. At first people were a little bit dubious about the idea of somebody reviewing their content and saying you should get rid of this because it's not doing very well, but because we tried to foster it, you know, a conversation that was collaborative and say okay, these are things that are just detracting from the cool new ideas that you have that really helped us, you know, build that relationship and backed it up with the fact that we were willing to drink the seven gallons of coffee to review and read thousands and thousands of PDFs.

Anna McHugh  
But I do want to emphasize audits are often going to be manual. And again, you know, it, you can use different tools and augment a lot of the elbow grease with technology. But oftentimes, the effort that it would require from developers or IT staff compared to sitting down and just grinding it out, is, you know, the, the balance will be in favor of you sitting down and grinding it out on a spreadsheet. The real benefit of this, though, is that this gives you an opportunity to capture the context and the meaning of your library. Because information in the world is so distributed. A lot of our collections do need to be very specific and very, very relevant and up to date. And so auditing gives us a chance to look at what's there and say, what does this say?

Anna McHugh  
About Us, what are we really, you know, what is the audience we're really trying to serve? And does all of the content really meet that need?

Anna McHugh  
So you know, that sort of the the auditing component, I think that is one of the most important things that we can focus on within our careers. But there are a couple of other things as well. I definitely recommend a flat taxonomy. Now I am a an amateur mycologist which means that I like to collect wild mushrooms I'm kind of like Egon the Ghostbuster who collects spores, molds and fungus. I don't use that as a pickup line like Egon does. That being said, I think it goes without saying that I'm a bit of a taxonomy nerd. Now when I am doing my wild mushroom thing, the biological taxonomy that goes all the way down from you know, domain to the species is really really convenient. Well, convenience, the wrong world word but really specific if you're trying to be an expert, so you can understand every last relationship between different organisms and how their genetics are

Anna McHugh  
And so oftentimes when you say the word taxonomy, especially if you're in a group of people who are creating marketing and sales content, technical documentation, they'll say, Oh, no, I heard about this in high school biology, it was a real bummer. Or even sometimes that's a really long word. And metadata sounds a little scary. And I know the NSA collects it. But I don't feel all that comfortable. That's not to say that you should abandon the word taxonomy. But usually, I try to pull the teeth out of it by saying, hey, taxonomy, metadata, it's just classified information. But if you're trying to get people to use a taxonomy, you really need it to be flat. We only do one layer deep. We just don't nest unless we're building something more sophisticated for, you know, experts who read me really, really granular information like two snippets of code instead of a white paper that's 15 pages long. So I want to give you a quick mycology mushroom example because again, this is my talk and I can talk about my hobby at least for a minute and a half. So this is just an example of why a flat taxonomy is useful.

Anna McHugh  
You know, she's talking mushrooms are one of the most common culinary mushrooms in the world. They're absolutely lovely big, big fan. But because they're so yummy, and they've been cultivated for thousands of years, they have changed their species and genus designation 13 times. Now it's just impossible to keep up with that, you know, and it is a, it is a good intellectual exercise. But if you're talking about a user experience, or talking about search, or even just general access to knowledge, and being able to open up knowledge to anybody that is inscrutable and very, very problematic. So for 99.999% of the people, you're not going to want to say, well, it is currently called lenticular Adobe days. And here is the list of 1200 other names that it has gone by and a very long list of the different you know, sub categorizations and said, No, all you need is a taxonomy parent category for mushrooms should talk to you being one of the items on that list. Data Access and integrity. This is something that my team has worked on a tremendous amount

Anna McHugh  
You know, if you're building a new content management system or another piece of software, or working with a vendor or you know brewing it up yourself, I cannot make this recommendation more highly everything else I say you could probably ignore, except for this one thing, you need to be able to extract information from your content management system in a CSV format. The reason being, you need to be able to blend that information with other software outputs, otherwise, it is functionally useless to you. And I say this because you know, with a little bit of pain in my heart, there are certain parts of Red Hat comm that we can label and tag as beautifully as we want. But we have to do a lot of hacking and a lot of data cleaning to actually get that metadata. So if you're working with a team, and you're trying to put together just the basic priorities for a project, say I need csvs because csvs are my superhero. And I'll give you an example of what my team has been able to do with these, you know, capacities, reasonably simple stuff, but it is very helpful to our stakeholders. What you'll see at the bottom of the slide is what we call our collateral inventory. And so it basically is a list of items that are assets that people can use to market our products and services. It gives you a lot of the dynamic information, in fact, everything that's within the taxonomy that allows you to filter and sort, you know, using multiple criteria to really get down to the top area you want to work on. And then you can see the interactions that people have had, so you can get a sense of performance, we have a couple of different measures that we use, so downloads, and that's basically organic traffic. So it's like, okay, is this something that's interesting to a wide audience. So we give people that information. We also give them information about the value of that asset when it comes to making deals and basically closing contracts and having people buy Red Hat subscriptions. So it gives people a sense of what the organic value and appeal of an item is, but also how well it does for lead generation and conversion opportunities. We could not do any of that without a basic click CSV, all of our metadata gets pulled out of the CMS

Anna McHugh  
One of the things that also this allows us to do is content analysis and curation, and a lot more sort of granular content optimization that is really exciting to us, because it gives us greater insight into what content we're making, that it's worthwhile and what content really isn't. So here's an example of one of the projects that we've been doing. Since we've been, you know, working on the collateral inventory, we've been able to, you know, use our taxonomy to say, Okay, these are the metadata fields that directly correspond to business objectives right now. So for instance, in this quarter Red Hat as a whole is talking about cloud native development and talking about how to use agile and how to build cool applications really quickly. Well, because we have that label within our taxonomy, we can say, Okay, these are the items that relate to it. A lot of our outbound sort of organic efforts as well are going to be focusing around this topic. So we can select the highest performing items that relate to this topic that's relevant right now. And we can migrate them from PDF to HTML. Now, this may sound like a simple thing, but it makes our content far more accessible. But also it gives us more granular insights. When we start to look at tools, for example, this item here was migrated. And we're able to look at the heat map that shows where people's attention where their mouse moves. So we can start to extract understanding, not just as like, well, this is a piece of popular content because someone hit a download button. But this is an interesting piece of content, because Hmm, everyone looked at this graphic, but they kind of functionally ignored all of the text. So this, you know, again, access to data and good data is really important. Finally, in terms of projects, we've worked on governance and alignment have been really critical as well. You know, when I joined Red Hat, there was a lot of people who were invested in understood metadata, but they were sort of distributed across different teams. And Red Hat has a very vibrant, collaborative culture. But we also add a lot of people very regularly and you know, as I mentioned before, there's a

Anna McHugh  
Large remote workforce associates spread around the globe. And it can be really difficult to know what other people are working on. And so in order to sort of, you know, kick off a conversation about how to align our different software systems, and how to ensure that we were empowering people to create high quality, contextual metadata that we can use, we formed a beautiful nerd group. So we are called the metadata initiative for structure taxonomies, also known as mist, that is also a backronym. So we decided we wanted to be missed, because we wanted to be kind of shadowy heroes behind the scenes, supporting everything, but you know, kind of unseen. So we were like, Well, what does what does mist actually stand for? And again, we came up with a data initiative for structured taxonomy, which actually is very descriptive and really helpful because people, even though it can be a little bit overwhelming, they know that that we're beautiful nerds and that we're willing to welcome them if they want to talk about labels, or the cool things that they've done with their sock drawer. So I included this image

Anna McHugh  
Justin Amash to you know, the the founding of the group, we have a lot of different teams across, you know, not just marketing and sales, but we have engineering folks, we have people from my tea, you know, our technical documentation folks, all kinds of red Hatters participate. And when we were founding the group, we were trying to come up with what to call ourselves, besides beautiful nerds and mist. We also decided we wanted to call ourselves superhero librarians as well. And so I did an image of a whole bunch of the old logo with the Red Hat Shadow Man in the midst because he was always kind of a mysterious, you know, figure, we identify, identify very strongly with him and sort of this, you know, symbol of freedom, but also just just a little bit around the edges and, and, you know, trying to challenge the status quo. So, that's missed. It's something that has received executive support in empowering us to say, okay, your responsibility is to align these tools and if people want to participate and contribute you absolutely

Anna McHugh  
need to do outreach with them. But additionally, you need to be clear with people that at a certain point a decision will be made. And we will implement them in the software systems in a simultaneous way. So the all the data that we're generating is in sync. And that very much leads into my next point, which I think is a sort of like advanced stuff that our team is working on, that I'm always inspired by. But it's very, very difficult. And so by no means do I want to imply that Red Hat has solved these riddles. But these are just my observations about why it is that people like us need to be technologists. I think that, you know, at the very foundation, one of the greatest things that we can bring a value as technologists, to any organization is the fact that we can identify and document bugs. Now that may sound like a trivial thing, but very frequently, I hear from somebody who is you know, within the business and they say, hey, x piece of software is broken, I can't find why piece of content, and I don't really grill them very much.

Anna McHugh  
Because I trust that that is the experience that they're having. But because I have experience with that software, I can do some investigation. So instead of just taking that random, it don't work kind of message and throwing it over to the IT team, I spent a little bit of time investigating and and you know, also educating myself about the mechanics and about the limitations and benefits of the system I'm working with. So I think it's really important to take initiative in this regard. Because, again, it prepares you with a great deal of knowledge about the system as it exists. If you inherited something, it gives you the ability to identify needed new features. And it also allows you to have really positive relationships with two ends of the spectrum. So on the one side, you have your business folks, they want their tools to work. They don't really particularly care about what's under the hood and nor should they, on the other side, you have developers and you don't have it, and they don't really enjoy hearing very vague, you know, complaints because that requires them to do the investigation. Now that's not to say, okay, just take it as work and do it for them. Because again, we have responsibility for the content and a lot of the other mechanics of the system, and we understand not only how it's supposed to work, but what it's supposed to be communicating, we're often better equipped than it in depth to do it. And that makes both of those sides very, very happy. So it's really good, you know, I think for, you know, fostering a culture of interpersonal relationships that are meaningful, being open, but also educating yourself constantly about how the system can be iterated upon and improved.

Anna McHugh  
That's it, I do want to talk about this rule that I love, or it's a concept called the paradox of the active user. And this is something that we run into all the time with folks who, you know, work in our different business units, you know, they will report that something is broken. And we have to presume that there's a really good chance in fact an abundant chance that they have not read the manual and not looked at the documentation and I am as guilty of this as anybody else when I download a download a new applicant.

Anna McHugh  
If I get a series of like a little demo, or tooltips, or you know, a tutorial wizard that will walk me through nine times out of 10, I dismiss it, I start messing around in the app, I get frustrated. And either I go back to that tutorial, or worst case scenario, I just delete the app because I assume that it was poorly designed. So the reality is that people again, they're wonderful, but they're also irrational. They want to get, especially when it comes to tech, people want to get to the information they want to get to the job they would like to have done. They don't care about the software, particularly, which is one of the reasons that when we hear feedback from people, we have to assume that, you know, they're expressing a need and a frustration, but they probably don't clearly know how to articulate that, probably because they haven't had the opportunity to dig into the documentation. People who are a little bit less kindly than the IBM user interface Institute, people, especially on the developer side, occasionally call this the principle of users always lie

Anna McHugh  
They do. But it is not intentional and it is never malicious. And it's also an opportunity for us to work better as a team and sort of make the most of our different strengths and things that we already know. Another part of becoming a great technologist within an organization in the kind of role where you are working regularly with content repositories and metadata is looking to your passionate amateurs. And you know, besides this information that you're getting from users who may not have read the documentation, you also need to look to the people who will always read the documentation, and will also do very inventive and innovative things to get their jobs done better. So I'm going to go back to the example of my colleague Brian, who put together our collateral inventory using csvs and Python scripts. Essentially, he realized that we had meta data in our CMS. We had it in our internal repository that I referred to. We had it in Adobe analytics, which is our web analytics platform. We also had it in Salesforce.

Anna McHugh  
Sales forces where we track revenue and lead generation and closing accounts, we needed to bring all that information together and there was simply no way to do it. Now with an organization as large as Red Hat, it wasn't as though this was some sort of revelation from on high. And nobody knew that there was a problem, we simply didn't have the tools to address it. And so Brian, in his brilliance, bought himself a Raspberry Pi, all of $55 worth of hardware, wrote Python scripts, pulls automated reports out of all of these different systems, and jams them together. Now, this is abundantly useful, but this also says this is a need, that is probably of higher priority than other needs. When people start to hack together solutions. You're not only seeing what the biggest problems are, but you're also finding really good advocates for solutions because these are people who've had to think through the problem and say, I just need to come up with something that will work for me today.

Anna McHugh  
So you know, one of the things that you'll frequently see is Software as a Service becoming more and more important in all of our workplaces.

Anna McHugh  
Software as a Service, if you're not familiar with the term simply means web based applications where people get their work done. So think of things like Gmail, Google Drive, if you work in a project management system like JIRA, or Slack, not that slack is project management. But basically all of these different tools that we use, that are hosted on the internet are very, very popular. And you know, at this point, there are a lot of reasons for their popularity and a lot of them are very good reasons. They are highly flexible and customizable, so you can build an implementation that really works. You can also update the software as time goes on. So it's a you know, a lot easier to say. We're going to change our entire data structure and start fresh than if you had to do that obviously with a you know, giant library in days of old

Anna McHugh  
software as a service is also to be perfectly honest, it's a lot easier for people who are it you know, spending money on technology who are not IT experts and according to research from IDC

Anna McHugh  
Nearly 50% of spending on technology does not come from it, or, you know, technology leaders at all. Instead, it's these different business leaders who are saying, Okay, this is a problem I need to solve, I need to find a software solution that will do that for me. And very frequently, you, you know, end up with a software as a service solution.

Anna McHugh  
There is a tremendous role that we can play in making sure that these projects are successful, because a SAS project poured off or you know, pulled off poorly can be more of a pain than it's worth, and also a tremendous amount of money that is for a tool that's not really useful. So this is a good example. This is our digital asset manager that I refer to in our long strange trip on the dancing bears timeline, which I'm so very fond of. And, you know, this is a digital asset manager that we launched this past year. And our primary objective was to bring together the assets that people use and the assets that they use to promote them so the people can build and mix and match campaigns without going to. Here's the social media spreadsheet, here is the list of approved email templates. Here's the list of assets now bring it all together, we really don't, you know, one of the values of our dam is that we've been able to say, Okay, here are the different components and parts, you have made a data that's very clearly articulated, that you can use to put together and configure your own campaign.

Anna McHugh  
And one of the things that, you know, our team was profoundly involved with, I mean, the entire project we were involved with from start to finish, because we're Content Management people was the configuration of the tool and working with a vendor to ensure that we have the right filters exposed for search, that we are able to extract that metadata that we are so very, you know, obsessed with having access to, and also staying involved with data governance on an ongoing basis, and data governance. Not only is it making sure that our labels are consistent, stay up to date with everything else, but it's also being responsive to the feedback that we get from you.

Anna McHugh  
Because sometimes, you know, we say these are great search filters we think these are the taxonomies people are going to care about. And then we start to ask around and people are like, I don't use that tool, it's impossible to find anything. And so being able to gather information about people search experiences is really, really critical. It's also an opportunity to start to learn cool things about natural language processing and machine learning, because then you can start to poke at some of these vendors and say, hey, how exactly is it that your search engine works under the hood? Does it in just ingest the text of all of these different PDFs and come up with a relevancy score on its own? Or is it looking at your labels and your tags? Or does it do both? These are the kinds of questions that I think we're, you know, really often very well equipped to ask and collaborate with our peers to get answers to.

Anna McHugh  
I've talked a little bit about, you know, machine learning and AI, but I do want to emphasize that I honestly don't think the robots are coming for our jobs tomorrow, not only because human beings are really good at context, and we can recognize images

Anna McHugh  
Super fast and, you know, were interesting and fun to hang out with By way of comparison with data center, you know, server racks.

Anna McHugh  
The reality is that machine learning and artificial intelligence is starting to make its way into the tools that we use every day. But the amount of data that is used is so vast that you know, as human beings, we need to be actively participating in training these different algorithms. And so you know, not only is it a matter of establishing an understanding of the knowledge that we think is important, but also understanding what are common principles around what knowledge is not, and how we can encourage, you know, the creation of knowledge that is valuable and not just valuable to one group of people but valuable, you know, valuable globally. Additionally, I think that people are super excited about AI and machine learning, myself included. And then occasionally we'll say we're going to try a natural language processing project. And we end up with a report we're like, well, that's interesting. We have 40,000 pages that the machine read

Anna McHugh  
It tells us that we sound that our sentiment tone is neutral. Well, that does tell us that we need to maybe be a little more cheerful. But we're a tech company. So yeah, maybe we are a little neutral. This is not to say that you can't get insights out of it. But I think that we're in a very nascent age of understanding how machine learning and artificial intelligence will complement and not replace human beings within, you know, again, guiding how knowledge becomes wisdom. And I think that's one of the things that we really can bring to the table is, you know, there's this great concept from Robert rose, who is a mentor to our team, and has helped us with a lot of different things. But he has this concept that he calls moving from just being a knowledge worker to a wisdom worker. And so that is, you know, fundamentally about understanding the processes and systems within your organization and, you know, being knowledgeable about the subject matter that's important to your organization, and moving to a place where you understand the global impact of those things. And you start to understand also how they can be improved and where the chinks in the armor are, where are those moments when your algorithm is going to, you know, spit out results that are going to not be the information that someone wants in not just a way that can be irrelevant and irritating, but actually deleterious. So I've got a skip back one slide. So basically, I want to sum up, you know, with the four key areas that I've talked about, you know, these are the four red boxes, these are essentially what I view as my core deliverables within my job. First of all, is meaningful user interfaces. And then secondly, a mechanisms for governance and control, making sure that those change logs and you know, revision histories all work the way they're supposed to. And you know, the information that you need in order to get this done. It's First of all, what is your audience trying to get done, there is a, you know, particular way of developing software, you know, gathering requirements for software that's called jobs to be done, and it really backs away from all the crazy demographic stuff and focus groups, and all of this different sort of sophisticated ways that we try to assess what people are and what they want. And it gets very blunt, what do you want to get done and what's standing in your way. So that's the kind of research that you need to do to be successful with that. Additionally, you need to develop a metadata strategy, and you need to broadcast it. And usually that strategy can be something as simple as we want things to be organized. And when the business shifts and pivots and we start to have a new conversation or there's a new campaign, that we're ready to react and respond, we understand how to identify and tag the content that's relevant to it. So it can be paired up with this new strategy. And then third, and fourth, these two final red boxes are what I consider to be sort of the difficult things that we're working on automating workflows so people can create metadata and put it into different software systems that can automate publishing for instance, or can automate personalized experiences. So people who are interested in one product will encounter the same product and you know, information about the same product in the future. These kinds of automations are very contingent upon, again, high quality metadata that establishes the meaning of the content. And then additionally, managing and analyzing really large data sets is very, very important as well. And, you know, as we create more content, which invariably we will, we need to be very mindful of how we can look at the historical performance of data and say, okay, it's not just the next and the new, it's not always the information that we think is the coolest stuff that we've just discovered. As a tech company, we have a lot of very brilliant and interest, you know, curious engineers, and sometimes they want to write about technologies that are not going to be feasible for our customers for five or 10 years. They also will occasionally say, well, no one's interested in Java anymore. And I'm like, Look, I know that that's not the case. And here's a white paper with, you know, 7000 downloads in the last several months that will prove you wrong. So it's really important to be able to, you know, harness and capture that data to have these meaningful analytical conversations.

Anna McHugh  
So in order to get those things done, you need to be able to integrate your applications and your data, again, pulling things together, making sure that, you know, you can, in a lightweight way connect these different software as a service systems because they will proliferate. And that I think that's something that is a reality, I don't think we're going to see the single source of truth repository or the Library of Congress that all of us want. So I think it's really important to, you know, when you're working with a new vendor or working on a new system, it's not just how can we create a good internal design that matches up with the labeling systems and taxonomies it's, how do we pass that metadata back, back and forth, how can ultimately, we can have one thing that feels like the single source of truth, but it pushes it out to all these other nodes have relevant tools and relevant things that we need access to. So that's a really important part of it. And then finally, consolidating your data. And, you know, one of the areas that we've really been successful is bringing all of our data together into essentially a big data lake that can be queried by all of our different tools and also pushing that data by way of lightweight application programming interfaces or API's, so you're not writing complicated scripts or building software that just connects one piece of software to another, you're using these, basically, you know, modular and reusable calls to say, Okay, I want to get information out of this project management software to get a tracking number that I am going to assign and stamp onto an item that is published to our CMS. Now a lot of this is stuff that we're very much working on. And some of it is pipe dream stuff, we're putting it in our roadmaps. We're working aggressively toward these integrations aggressively toward these, you know, how we can process and manage these large sets of data. But that said, it's really difficult and there's, you know, a tremendous amount of progress to be made in understanding the best way is to use, you know, our software systems in a complimentary way that serves the audience that specifically has a job to be done that that software can assist with.

Anna McHugh  
Now, at the end of the day, and in conclusion, I simply want to share that you know, my greatest inspiration in my job is that I feel like I am a steward of knowledge. And you know, I really hope that the information that I give people access to in the search experiences that I helped design, give them moments not, you know, not just a joy and renewal, but curiosity and an understanding of where they exist within the world, and bringing those insights and those collaborative ideas to life. That's where innovation comes from is this, you know, iterative, collaborative and inclusive and transparent way of approaching knowledge. And we can't do that without stewardship. We can't do that without people who are willing to dive into spreadsheets up to their elbows and spend a lot of time understanding the why of content. And I just want to, you know, leave you with this final thought. Our new CEO Paul Cormier, who has been a red Hatter for many, many years, but he was recently became the CEO of our company, one of the primary things that he likes to impress upon us, is that truth happens. And that doesn't mean that there's some fundamental mental underlying truth of the universe that will become, you know, revealed to us over time. The reality is that there is a truth out there that unfolds that none of us can see. And it is important that we acknowledge that but it's also important that we keep our eyes to seeing that truth as it emerges, and sharing it with our fellows and tweaking our perspectives. If the things we think are true, turn out not to be true. And I think the thing that best exemplifies this is the great Bill Gates, probably one of the greatest technological minds of all time, back in the ad said that 640 k was just going to be more memory than anyone could possibly need on a computer personal or otherwise. This is untrue. As we know, the future is bright, interesting, strange and unpredictable. But we need to be there to understand what the value of knowledge is, how to share it, how to be tedious and detailed and integrate our software and fall in love with the new ways that people are able to learn, in conclusion, I appreciate your time. I wish you're well. And I encourage you to contact me if you have any questions, feedback or commentary. Thank you

Transcribed by https://otter.ai