For example, an employee who queries/owns datasets, views/owns dashboards, authors research reports, and/or runs A/B test experiments related to the given keyword will be returned in the list of results. How fantastic is that? To kick things off, we spent time conducting user research to learn more about our users, their needs, and their specific pain points regarding data discovery. viewing a dashboard). Don’t have enough data? In this role, you will help drive the roadmap and development of Spotify’s Ads ecosystem data and analytics products. Compare to last visit See how your personal ranking changes over … With the first iteration of Lexikon, we used the knowledge management strategy of codification, which is based on the objectivist perspective of knowledge. So the conclusion is to rely on data whenever possible. But to make use of it is actually really easy. Using these features on the artist page after your first listen allows you to truly discover and build a connection with the artist. The first release allowed users to search and browse available BigQuery tables (i.e. We found there were a few issues with this approach. With Spotify’s option to export your personal data, and Google’s free, easy-to-use tool to visualize data called Google Data Studio, we’re going to show you just how to do that. More than half of them are free, … In early 2017, we released Lexikon, a library for data and insights, as the solution to this problem. You strike up a conversation and learn that she is a jazz aficionado. Skiley.net. However, in some cases, data scientists found it difficult to find the right person to talk to about a particular topic. “show me queries on this table that reference this specific field”). At this time, we also drastically increased our hiring of insights specialists (data scientists, analysts, user researchers, etc.) Get a detailed audio analysis for a single track identified by its unique Spotify ID. I also participated in a hackathon where I developed a Spotify App code-named Genderify that tapped into our massive data-set to determine exactly how “manly” a playlist is. If you have yet to set up your Spotify … datasets)— as well as discover knowledge generated through past research and analysis. You happen to notice that your coworker has a jazz album on Spotify pulled up on her desktop screen. We’ve learned a lot since we first launched this product. owning a dashboard) rather than insights consumption (e.g. New engineers at Spotify will notice that the culture has a way of engulfing you in a data-driven mindset. Our People Analytics model is set up for tracking HR data and metrics for getting informed better and faster, for progressive thinking, planning, acting, and leading. It was mostly a joke, but utilized listening data to provide an accurate statistical map of a playlist and displayed a result of 0-100, 100 representing an extreme edge case where a person registered as female had never listened to any tracks on your playlist. Data Warehouse is a more complex system that allows you to access our data-set directly. We do our best to base every decision, programmatic and managerial, on data and this extends into the culture. It allows us to recognize trends, discover bugs, and analyze the effect of an event on a user and the entire ecosystem. Within a few weeks we knew which email templates worked best and, more importantly, we could see the impact these email campaigns had on our users. So, you go to the artist page on Spotify where you can check out the most popular tracks across different albums, read an artist bio, check out playlists where people tend to discover the artist, and explore similar artists. We will share your personal data for activities such as statistical analysis and academic study but only in a pseudonymised format. The typical data scientist at Spotify works with ~25-30 different datasets in a month. 2.12K followers 4.4K … For example, an example query might be out-dated because it included a join to a deprecated table. find popular datasets used widely across the company, find datasets that are relevant to the work my team is doing, and/or. We were able to see if an email had any effect on your listening habits, your account status and so on. Matching data is compressed and periodically synced to HDFS. For more complex operations, we have Luigi at our disposal, governing a zoo of Python, Pig and other animals which can be made to talk to any storage systems, run machine learning algorithms and even provide daily reports. Decisions that cannot be made by data alone are meticulously tracked and fed back into the system so future decisions can be based off of it. Since launching these new entity pages, we’ve seen that they’ve proven to be a critical pathway for discovery, with 44% of Lexikon’s monthly active users visiting these types of pages. First, we ran into challenges encouraging data producers to share example queries for all datasets. Welcome to podcast from Dun & Bradstreet — The Power of Data, powering decisions with data. Rather than fight this, we decided to embrace the idea by (1) mapping expertise within the insights community and (2) providing supplemental information in collaboration tools. Our Analytics Pipeline powers far more than satirical apps. Lexikon’s user base has organically grown from ~550 to ~870 monthly active users as it has proven to be useful to data consumers in non-insights specialist roles (e.g. Engineers can easily add data to our analytics pipeline by adding a new message to our log parser and simply logging information to syslog using the correct format. Since launching the Lexikon Slack Bot, we’ve seen a sustained 25% increase in the number of Lexikon links shared on Slack per week. Without big data, Spotify would not have turned out the way it did and with a growing user base only more data will be generated in the future. recommendations for datasets you haven’t used, but might find useful. Most data is user-centric and allows us to provide music … Newsletter emailaddress. These results are powered by summarizing an employee’s insight production and consumption activity related to the given keyword. Python is beautifully complemented by Pandas when it comes to data analysis. In addition to basic metadata about the schema fields, we included consumption statistics at the schema field level. You just listened to a track by a new artist on your Discover Weekly and you’re hooked. In addition to improving the search rank, we also introduced new types of entities (e.g. Most of our recurring data is added to our analytics pipeline by a set of daemons that constantly parse the syslog on production machines looking for messages we have defined along with the associated data for each message. We’ve had a number of folks help get this product to where it is today. One of these products is Lexikon, a library of data and insights that help employees find and understand the data and knowledge generated by members of our insights community. Our team decided to focus on this specific issue by iterating on Lexikon, with the goal to improve the data discovery experience for data scientists and ultimately accelerate insights production. As we know Spotify … We believed that the crux of the problem was that we lacked a centralized catalog of these data and insights resources. So, we built a Lexikon Slack Bot to improve discussions about datasets. Subscribe and listen to hear insights from business and industry leaders who share a passion for the power of data & analytics. The Audio Analysis … To enable Spotifiers to make faster, smarter decisions, we’ve developed a suite of internal products to accelerate the production and consumption of insights. Through user research, we learned that data scientists who failed to discover the data they were looking for would often fall back to finding an expert in the insights community on a given topic and connecting with them in person or online. While this isn’t the most widely used feature, we’ve seen that it is consistently used by 15% of users who visit a dataset page. You want to hear more and learn about the artist. For comparison, more people report using Lexikon than BigQuery UI, Python, or Tableau at Spotify. If you’re interested in helping us tackle similar problems or you’re a data scientist that’s looking to work at a company where producing impactful insights is becoming easier every day, visit the Join the Band page to view open roles. So, we adjusted our search algorithm to weight search results more heavily based on popularity. For instance, we have dashboards that show us user growth in particular regions, or user engagement, or even the number of emails we deliver. So you pull up Spotify on your phone, search for the track, and play it (on repeat). We are a company full of ambitious, highly intelligent, and highly opinionated people and yet as often as possible decisions are made using data. Imagine you’re starting to explore the genre of jazz. The Audio Analysis endpoint provides low-level audio analysis for all of the tracks in the Spotify catalog. Sounds robotic, but humans cannot be trusted so it’s cool. The only reason that’s possible is because Spotify now knows what to create—thanks to data. Dashboards provides an interface similar to Google Analytics and allows users to create their own custom screens containing data they are interested in from our pipeline. Once this data made its way into HDFS, we had all the data we needed to determine the best performing email template for a campaign and we could track the effect a single email had on a user’s experience. So, we built a feature on a BigQuery table page that allows the user to see tables that are most commonly joined with the given dataset. Since launching this feature, we’ve seen that 25% of users who visit a dataset page use the queries feature. Ek was sharing the detail to highlight the success of Spotify for Artists, the company’s analytics dashboard for musicians, which provides information such as playlist inclusion, streams by … So, we developed the features Schema-field consumption statistics, Queries, and Tables commonly joined to address this last mile of discovery. For example, a data scientist might be looking for the best dataset to use that contains a track’s URI track_uri. We could clearly see that these emails were having a positive effect on user engagement. The research and learnings from Spotify’ Insights community help make Spotify the best it can be. Shortly after joining Spotify, we decided as a company that we wanted to send users emails telling them if their friends joined and if new songs were added to a playlist they subscribed to. You can’t get the song out of your head and need to listen to it immediately. If data discovery is time-consuming, it significantly increases the time it takes to produce insights, which means either it might take longer to make a decision informed by those insights, or worse, we won’t have enough data and insights to inform a decision. Make data the most important asset you have because it is the only reliable decision maker that can scale your company. This will give you even more valuable insights into your episode performance, demographics, and more. In the first version of Lexikon, we introduced example queries that allowed data producers to submit example queries to give data scientists an idea of how they might use the available dataset. She has become your new genre guide. In addition to using learnings from user surveys, feedback sessions, and exploratory analysis to drive product development, we also conducted research on knowledge management theory to better understand how we might adjust our approach (recommended reading: Knowledge Management in Organizations: a critical introduction by Hislop, Bosua, and Helms). So, how did we know the effect these emails had on users? links to view more information in Lexikon, request access, or open directly in BigQuery. Your data is updated approximately every day. find datasets that I might not be using, but I should know about. an overview of the most used schema fields in the table, and. Spotify is all the music you’ll ever need. In this blog post, we want to share the story of how we iterated on Lexikon to better support data discovery. Once you’ve mastered Spotify’s analytics tool, with the power of data science, our tools can take your streaming analytics game to the next level by expanding your scope to include market-level data. At Spotify, we believe strongly in data-informed decision making. First, we focused on the search ranking algorithm. At the heart of Spotify lives a massive and growing data-set. It’s likely the case that they’ll need to join a dataset with others in order to answer the question they have. In the case of Lexikon, we initially believed that if data producers did a great job describing their datasets there would be little-to-no need for person-to-person knowledge exchange. In addition to the schema field page, we’ve added BigQuery Project, people, and team pages, which can serve as a similar stepping stone on the pathway to data discovery. After working at Spotify for only a few months, I was talking about term weighting and signing up for internal courses on the R programming language. at Spotify, resulting in more research and insights being produced across the company. Get more. Datasets often contain dozens or even hundreds of schema fields. Rather than discourage this discussion, we felt like we could help improve the person-to-person knowledge exchange by providing supplemental information. Through its desktop site and mobile app, Spotify logs over 100 billion data points per day based on the activities of its 207 million active users around the world. And I assure you, to build a pipeline and infrastructure like we have, it is. The homepage provides users with a number of potentially relevant, algorithmically generated suggestions for datasets including: While we did experiment with more advanced methods for serving recommendations, including using natural language processing and topic modeling on the dataset metadata to provide content-based recommendations, we determined through user feedback that relatively simple heuristics leveraging data consumption statistics worked quite well. Exploratory Data Analysis is often the most essential step of any Data Science project as it provides a great deal of insight towards building further analytics. Data scientists are often curious to see how a dataset is actually used in practice. So… we needed a transactional email system. You’ve just had a high-intent discovery! Datasets lacked clear ownership or documentation making it difficult for data scientists to find them. “experts”). Powerful stuff. Our belief was that by making these types of entities more explorable, we would open up new pathways for data discovery. Hey Guys, Yesterday a friend told me, that he got a pretty long email with his personal stats for 2016, including most heard songs (with numbers) and genres. In the first version of Lexikon, most traffic to BigQuery table pages was driven by search. You can query the data, create map/reduce jobs using Hive, and even create mini data pipelines if that’s the kind of thing you’re into. You’ve just had a low-intent discovery experience! The hypothesis we wanted to test was that sending these emails would have a positive impact on user engagement and help more users to come back to using the app more often. Data scientists in a high-intent mode of discovery were often looking for one of these top used datasets that met their needs. find the top datasets that a team has used because I’m collaborating on a new project with them. You’re walking down the street and hear a passing car blasting a great song you haven’t heard in a while. By understanding the user’s intent, enabling knowledge exchange through people, and by helping people get started with a dataset they’ve discovered, we’ve been able to significantly improve the data discovery experience for data scientists at Spotify. Listening is everything - Spotify This is how we collect people data and put it to work At Spotify, we take data very seriously and we try to make every decision data-informed. Spotify Audio Features. We learned through data analysis that although we have tens of thousands of datasets on BigQuery, the majority of consumption occurred on a relatively small share of top datasets. Most data is user-centric and allows us to provide music recommendations, choose the next song you hear on radio and many other things. engineers, data-savvy product managers, etc.). View your most listened tracks and artists and switch between 3 different time periods. However, months after the initial launch, we surveyed the insights community and learned that data scientists still reported data discovery as a major pain point, reporting significant time spent on finding the right dataset. We’ve found that there are similar opportunities for people-to-person knowledge exchange with data discovery. In doing so, we were able to gain a better understanding of our users intent within the context of data discovery, and use this understanding to drive product development. In addition to these encouraging adoption and engagement metrics, we’ve learned from surveying data scientists that after making these improvements data discovery is no longer identified as a primary pain point in insights production. Analytics at Spotify May 13, 2013 Published by Jason Palmer At the heart of Spotify lives a massive and growing data-set. Typically data is available in our Data Warehouse and Dashboards within 24 hours, but in some cases data is available within a few hours or even instantly through tools like Storm. However, research would often only have a localized impact in certain parts of the business, going unseen by others that might find it useful to influence their decision making. The insights community at Spotify was quite excited to have this new tool and it quickly became one of the most widely used tools amongst data scientists, with ~75% of data scientists using it regularly, and ~550 monthly active users. For example, as a data scientist with high-intent, I may want to: To better serve the use case of high-intent data discovery, we iterated on the search experience. Whether we’re considering a big shift in our product strategy or we’re making a relatively quick decision about which track to add to one of our editorially-programmed playlists, data provides a foundation for sound decision making. Other Spotify group companies: We will share your personal data with other Spotify group companies to carry out our daily business operations and to enable us to maintain and provide the Spotify … We see our different data … to Lexikon to better represent the landscape of insights production. This mode of discovery is particularly important for new employees or for people who are starting on a new project or team. Although Spotify approaches this process from a variety of angles, the overarching goal is to provide a music-listening experience that is unique to each user, and that will inspire them to continue listening and discovering new music that they will be engaged with we…