Content Metadata: the Fuel for Streaming Discovery

Why this matters

If you run or are building a streaming service, you will spend most of your budget on two things: licensing or making content, and delivering it. Metadata is the cheap, neglected third thing that decides whether the other two pay off, because a title nobody can find is a title nobody watches and a licence fee you wasted. The recommendation system, the search box, the "Action Thrillers" row, and your visibility in Google all sit on top of metadata and inherit its quality exactly — feed them a catalog described only by title and genre and they will behave like a library with the index torn out. This article is written for the founder, product manager, or first-time streaming CTO who has to decide what metadata to capture, which standards to adopt, and what the tagging pipeline should cost, so that the recommendation engine in the previous article has fuel to burn. You will not write the schema by hand; you do have to know enough to specify it and to refuse a catalog that arrives with empty fields.

What metadata actually is

Start with the plainest possible definition. The data that describes a piece of content — everything you can know about a title without watching it — is called metadata, which literally means "data about data." If the video file is the book, the metadata is the library card: the title, the author, the subject, the shelf it sits on, and the note about who is allowed to borrow it. The card is not the book, but without the card you cannot find the book, and a library is only as usable as its cards are complete and correct.

Here is the point that surprises people, and it is the whole reason this article exists: software cannot watch your videos. A recommendation model does not see a thriller and feel the tension; a search engine does not understand a plot. They read fields — text and numbers attached to the title — and act on those. When a viewer types "space documentaries," the search box is not scanning footage for stars; it is matching the word documentary and the word space against fields someone filled in. When the home screen builds a "Because you watched" row of similar titles, it is comparing the genre, cast, and tags of one title against another. Every discovery decision is a decision made on metadata. The video is the thing you deliver; the metadata is the thing you are found by.

That is why the title of this article calls metadata the fuel. The engine — the recommender, the search index, the merchandising logic — is impressive machinery, but it burns metadata, and it cannot run on an empty tank. A brilliant recommendation model fed three fields per title produces dull, repetitive rows; an ordinary model fed twenty rich fields per title feels personal. The advantage is in the fuel, not only the engine, and the fuel is the part most teams under-budget.

The four kinds of metadata

Metadata is not one thing; it comes in four kinds, and keeping them separate helps you see which ones you are neglecting. The four are descriptive, structural, technical, and administrative. Only the first two directly drive discovery, but all four are needed to run a platform, and the discovery ones are the ones teams most often leave thin.

Descriptive metadata is the human-meaningful description of the title: its name, the synopsis, the genre, the cast and crew, the language, the release year, the keywords and tags, the mood, the themes. This is the fuel for discovery in the most direct sense — it is what a content-based recommender compares, what search matches against, and what the rows are built from. When this article says "rich metadata," it mostly means descriptive metadata that goes well beyond title-and-genre into the texture of the title: not just "comedy" but "workplace comedy, ensemble cast, set in the 1990s, feel-good ending."

Structural metadata describes how the content is organized and how its parts relate. A streaming catalog is not a flat list of files; it is a hierarchy — a series contains seasons, a season contains episodes, an episode may have chapters, and a film may have a trailer, a dubbed version, and a director's cut that all describe the same work. Structural metadata is what lets the app show "Season 2, Episode 3," resume you at the right place, group a franchise, and know that the trailer and the feature are related. Get it wrong and you ship the embarrassing bugs: episodes out of order, a dubbed version treated as a separate unrelated film, "next episode" jumping to the wrong season.

Technical metadata describes the file and the stream: the resolution, the codec, the bitrate ladder, the audio tracks, the subtitle tracks, the duration, the aspect ratio, the DRM scheme applied. The player and the delivery pipeline read this to pick the right rendition and to know a 4K HDR version exists. It rarely drives discovery directly, but it does surface as badges — the "4K," "HDR," "5.1" labels a viewer filters on — so it leaks into discovery at the edges.

Administrative metadata covers ownership, rights, and availability: who owns or licensed the title, in which territories it may stream, between which dates, on which business model, and at what maturity rating. This is the metadata that decides whether a title may even appear for a given viewer, which means it gates discovery before discovery begins — a title perfectly matched to a viewer must still be hidden if the licence does not cover that viewer's country today. Rights and availability are covered in depth in the licensing and windowing block; here it is enough to know that availability is a metadata field the discovery layer must always check.

Four video metadata kinds — descriptive, structural, technical, administrative — and which ones drive discovery. Figure 1. The four kinds of metadata. Descriptive (what it is) and structural (how it is organized) are the fuel for discovery; technical (the file and stream) and administrative (rights and availability) keep the platform legal and playable. Most catalogs arrive rich in technical metadata and thin in descriptive — which is exactly backwards for discovery.

The table puts the four side by side, with the one column that matters most for this article: does it drive discovery?

Kind	Plain-language question it answers	Example fields	Drives discovery?
Descriptive	What is this title about?	Title, synopsis, genre, cast, tags, mood, theme	Yes — directly, the main fuel
Structural	How are its parts organized?	Series → season → episode, chapters, trailer-to-feature	Yes — series logic, grouping, resume
Technical	What is the file and stream?	Resolution, codec, bitrate, audio/subtitle tracks, DRM	Rarely — only as filter badges
Administrative	Who owns it and where can it play?	Owner, territory, window dates, business model, rating	Gates it — availability is checked first

Table 1. The four kinds of metadata and whether each drives discovery. The cruel asymmetry of streaming: catalogs almost always arrive complete on technical metadata (the encoder fills it in automatically) and thin on descriptive metadata (a human has to write it) — yet descriptive is the fuel discovery actually burns.

Why metadata is the fuel for discovery

It is worth being concrete about every place metadata gets burned, because the list is longer than most teams expect, and each item degrades the moment the metadata is thin.

The first consumer is the content-based recommender. As covered in the recommendation-systems article, content-based filtering recommends titles whose attributes resemble what a viewer already liked, and those attributes are the descriptive metadata. With only genre to compare, the recommender can say "you watched a comedy, here is another comedy" and little else. With cast, director, mood, theme, era, and twenty tags, it can say "you finished three slow-burn character dramas set in small towns, here is a fourth nobody else has connected to them." The recommender is the same; the difference is entirely the fuel.

The second consumer is the cold-start problem. The previous article showed that a brand-new title has no co-watch history, so collaborative filtering — the "people who watched this also watched that" approach — is blind to it. The only thing that can surface a new title on day one is its metadata, because content-based filtering can place it next to similar titles immediately. This is the sharpest business consequence of thin metadata: your newest, most heavily promoted, most expensive-to-license titles are exactly the ones with no behavior data, so they live or die on description alone. Skimp on metadata and you have built a platform that is worst at showing the content you most want watched.

The third consumer is search. When a viewer types a query, the search index matches it against descriptive fields. A catalog tagged only with title and genre answers "comedy" but fails "feel-good workplace comedy with an ensemble cast" — a query a richly tagged catalog answers easily. Search and its relationship to discovery get a full treatment in the search article; the point here is that search quality is a downstream effect of metadata depth.

The fourth consumer is merchandising — the rows and artwork on the home screen, covered in the merchandising article. A row called "Critically Acclaimed Dramas from the 2010s" can only exist if titles carry the fields acclaim, genre=drama, and decade=2010s. Every themed row is a query over metadata; the richer the metadata, the more rows you can build and the more personal the home screen feels.

The fifth consumer sits outside your app entirely: search engines and AI assistants. When someone searches Google for your title, or asks an assistant "where can I watch X," the answer is assembled from structured metadata you publish on your web pages — the Schema.org fields covered later in this article. Discovery does not stop at your app's edge; a chunk of it happens on Google, and that chunk runs on metadata too.

Metadata at the center feeding five discovery surfaces: recommendations, search, rows, cold-start, and off-platform SEO. Figure 2. One catalog of metadata fuels every discovery surface. Content-based recommendations, search, merchandising rows, the cold-start path for new titles, and off-platform visibility in Google and AI assistants all read the same descriptive and structural fields. Thin metadata starves all five at once; enriching it lifts all five at once.

The unifying idea is the oldest rule in computing: garbage in, garbage out. The discovery layer cannot be better than the metadata under it. This is why a clean metadata pipeline is the unglamorous core of personalization — it is not the part anyone demos, but it is the part that decides whether the demo works on a real catalog.

The identifier problem: every title needs a name machines agree on

Before metadata can be useful, every system has to agree on which title a given record describes — and that turns out to be surprisingly hard. Your encoder calls it asset A-44871. The studio that licensed it to you calls it WB-2019-0345. Your ad server has its own ID. A third-party metadata provider has another. When a rights report, a recommendation, and an availability check all refer to "the same movie," nothing technically guarantees they actually mean the same movie, and mismatches cause real damage: the wrong title goes dark in a territory, royalties are miscounted, two copies of one film clutter the catalog as if they were different works.

The industry's answer is a shared, universal identifier called EIDR — the Entertainment Identifier Registry. EIDR assigns one permanent ID to a work that every company can use, the way an ISBN identifies a book regardless of which shop sells it. Technically, an EIDR ID is a form of DOI — the same persistent-identifier system used for academic papers — and it resolves to a metadata record describing the work (EIDR, 2026). A Content ID looks like 10.5240/XXXX-XXXX-XXXX-XXXX-XXXX-C, and the registry is hierarchical in exactly the way structural metadata needs: a series has an ID, its seasons have IDs that point to the series, and each episode has an ID that points to its season (EIDR, 2026). The IETF even standardized how to write an EIDR identifier as a URN so it can travel cleanly between systems (IETF RFC 7972, 2016).

Why does a founder care about a registry of IDs? Because proprietary IDs cost money. EIDR's own stated purpose is to eliminate "costly translations between proprietary ID systems," lower the "risks of misidentification caused by duplication," and improve the "ability to match assets and metadata from different databases, service providers, or metadata suppliers" (EIDR, 2026). In plain terms: adopt the shared ID and a title licensed from three studios, tracked by your ad server, and reported in your royalty statements all line up automatically; skip it and you pay engineers to maintain brittle translation tables forever, and you eat the cost of every mismatch. The general public can resolve any EIDR ID for free and get back its descriptive metadata, which is part of why it works as common ground (EIDR, 2026).

The practical instruction is short: insist that licensed content arrives with EIDR IDs where they exist, and assign canonical IDs internally for everything else, so that "is this the same title?" is never a guess.

The interchange problem: speaking one metadata language

The identifier says which title. A second problem is how the description itself is written, because metadata arrives from many suppliers in many shapes. One studio sends a spreadsheet with a column called "Genre"; another sends XML with "category"; a third writes "Sci-Fi" where the first wrote "Science Fiction." Multiply that across thousands of titles and dozens of fields and you get the single most common operational headache in streaming: every catalog you ingest speaks a slightly different dialect, and your platform has to translate all of them into one.

The industry's answer here is a shared schema — an agreed list of fields, their meanings, and their allowed values. The dominant one in film and television is MovieLabs Common Metadata and its delivery profile, Media Entertainment Core (MEC). These specifications, defined by the Entertainment Merchant's Association and the Digital Entertainment Group together with MovieLabs, exist specifically "for transferring metadata from Publishers to Retailers" — that is, from the studio that owns a title to the platform that streams it (MovieLabs, 2025). MEC is an XML schema with formal field definitions and even controlled vocabularies, such as a standard genre list, so that "Science Fiction" means one agreed thing rather than five spellings (MovieLabs MEC v2.25, 2025).

The value of a shared schema is the same as the value of a shared ID: it removes a translation step that is otherwise paid for in engineering time and quality bugs. When a publisher delivers a catalog in MEC and your platform ingests MEC, the genre field lands in the genre field, the cast lands in cast, the season-episode structure arrives intact — no custom parser per supplier, no "Sci-Fi versus Science Fiction" cleanup. You will still normalize and enrich (covered below), but you start from a known shape rather than a pile of mismatched spreadsheets. For a platform licensing content from many sources, adopting the industry schema is the difference between an ingest pipeline that scales and one that needs a new custom importer for every deal.

Three metadata standards stacked: EIDR for identity, MovieLabs MEC for interchange, Schema.org for web discovery. Figure 3. The three standards a streaming catalog leans on, each solving a different problem. EIDR answers "which title is this?" with one universal ID; MovieLabs Common Metadata / MEC answers "how do we exchange the description?" with a shared schema; Schema.org answers "how does the open web read it?" with structured data. They stack — adopt all three and identity, interchange, and external discovery are solved with standards instead of custom code.

Metadata for the open web: structured data and discovery beyond your app

A large share of discovery never happens inside your application. Someone searches Google for a film, or asks an AI assistant where to watch a show, and a result appears — ideally yours. That result is built from a third metadata standard, this one aimed at search engines rather than at studios: Schema.org structured data, expressed in a format called JSON-LD.

Schema.org defines vocabulary types for media — VideoObject for an individual video, Movie and TVSeries for works — that you embed in your web pages so a search engine can read your catalog without guessing. Google's documentation is explicit that JSON-LD is the recommended format and that a video needs, at minimum, a name, a thumbnail URL, and an upload date, plus a link to the content, with description and duration strongly recommended (Google Search Central, 2026; Schema.org, 2026). Mark a page up correctly and it becomes eligible for video rich results — the thumbnail-with-play-button treatment in search — and you can even expose key moments within a video using the SeekToAction and Clip properties so Google links to the right timestamp (Google Search Central, 2026).

The reason this belongs in a discovery article and not only an SEO checklist is that it is the same fuel doing the same job in a different engine. Inside your app, descriptive metadata feeds the recommender; on the open web, the same descriptive facts — title, description, thumbnail, duration, cast — feed Google's index and the answers AI assistants assemble. A platform that captures rich metadata for its internal discovery has, almost for free, the raw material for external discovery too. This Learn section practices what it preaches: every article here ships Schema.org VideoObject-adjacent structured data (Article, FAQPage, BreadcrumbList) for exactly this reason.

How the catalog actually gets tagged: humans, vendors, and machines

Rich metadata does not appear by itself. Someone, or something, has to look at each title and write down its mood, its themes, its cast, its tags. There are three ways this happens, and a mature platform uses all three.

The first is human tagging. People watch content and record structured judgments about it. The most famous example is Netflix, whose metadata effort became legendary: the company hired and trained "taggers" working from a detailed manual to rate titles on dimensions far beyond genre — tone, romance level, the personality of the lead characters, even how conclusively the plot ends — and combined thousands of such tags into tens of thousands of hyper-specific "alt-genres" like "Critically Acclaimed Emotional Underdog Movies." The reporting around this describes on the order of 76,000 micro-tags assembled into a grammar of region, adjective, genre, and theme. Treat the exact numbers as journalism rather than spec, but the principle is sound and instructive: human tagging produces the nuanced descriptive metadata that makes content-based recommendation feel uncannily precise, and it is the kind of fuel a machine cannot yet fully generate. The cost is that human tagging is slow, expensive, and — across different people and suppliers — inconsistent unless governed by a strict vocabulary.

The second is third-party metadata providers. Specialist companies sell ready-made descriptive metadata — cast, crew, synopses, genres, artwork, and IDs — for large catalogs of commercial film and television, so you buy a clean description instead of writing it. This is fast and consistent for mainstream content; it is weaker for niche, regional, or original content the provider has never catalogued, which is exactly the content a differentiated platform tends to carry. Providers typically key their data to shared IDs (including EIDR), which is another reason the identifier problem comes first.

The third, and the fastest-growing, is automated enrichment — using software to generate metadata from the content itself. Modern tools analyze the video and audio to detect scenes, objects, faces, logos, spoken words (via speech-to-text), and mood, then write those out as tags — the computer-vision and speech-recognition models that do this are the subject of the AI for Video Engineering section, so we stay at the metadata-product layer here. A specific and important sub-technique is content fingerprinting, also called automatic content recognition (ACR): the system computes a compact "fingerprint" from a title's visual frames or audio signal and matches it against a database to identify the content, even with no filename or ID attached (Tatari, 2024). Fingerprinting is what lets a platform recognize a duplicate upload, identify unlabeled content, and link an asset to its canonical record. AI enrichment scales in a way human tagging never can — it can tag a 50,000-title catalog in the time a human team tags a few hundred — but it needs quality control, because a model that mislabels "thriller" as "comedy" pollutes every downstream recommendation. The realistic 2026 pattern is a blend: machines do the broad, high-volume first pass and the fingerprinting; humans curate, correct, and add the nuanced tags that matter most; providers fill in the mainstream catalog.

Metadata pipeline: messy sources ingested, normalized to one vocabulary, enriched by humans and AI, then governed. Figure 4. The metadata pipeline, the unglamorous core of discovery. Messy metadata from studios, providers, and encoders is ingested, normalized to one canonical vocabulary and ID (EIDR), enriched by human taggers and AI/fingerprinting tools, governed for accuracy and freshness, then served to every discovery surface. The recommender is only as good as what comes out of this pipe.

A worked example: thin metadata versus rich metadata

Make the cost concrete, because the argument for spending on metadata is easy to wave away until you see the arithmetic. Imagine a catalog of 10,000 titles and a content-based recommender that links two titles when they share descriptive attributes.

In a thin-metadata catalog, each title carries three useful descriptive fields: title, one genre, and release year. The genre is the only field with much matching power, and suppose the catalog uses 12 genres. On average, then, a title shares its genre with roughly 10,000 ÷ 12 ≈ 833 other titles — a match so coarse that "similar" means little more than "also a drama." The recommender's content-based rows will be broad and repetitive, and a new title gets dropped into an 833-title bucket with nothing to distinguish it.

Now enrich the same catalog so each title carries, say, 25 descriptive attributes: genre, sub-genre, mood, theme, decade, setting, three cast members, director, tone, pacing, and a dozen tags. Two titles are now "similar" when they overlap on several of those attributes, not just one. The number of titles that match on, for example, genre=drama AND mood=slow-burn AND setting=small-town AND decade=2010s collapses from 833 to perhaps a handful — which is exactly the precise, surprising, "how did it know" recommendation that retains subscribers. The model did not change. The catalog size did not change. Only the fuel changed, and the quality of every content-based recommendation changed with it.

The same arithmetic explains a pure business loss. If thin metadata leaves, say, 15% of a 10,000-title catalog effectively unsurfaceable — too generically described for the recommender or search to ever place it well — that is 1,500 titles you licensed or produced and then hid. At even a modest average licence cost, that is a large, recurring write-off with a cheap fix: describe the catalog properly. Metadata is the rare line item where a small investment directly rescues a large one.

A common mistake: treating metadata as an afterthought

The most expensive metadata error is also the most common: treating it as paperwork to be done later. A team launches with titles described by name and genre, promises to "add tags after launch," and ships a recommender and a search box that have nothing to work with — then blames the algorithms. The fix is to treat descriptive metadata as a launch requirement, not a backlog item, because every discovery feature you built depends on it.

The second mistake is skipping normalization — letting each supplier keep its own vocabulary. One feed says "Sci-Fi," another "Science Fiction," a third "SF," and the recommender treats them as three unrelated genres, fragmenting the catalog invisibly. The fix is a single canonical vocabulary that every ingested field is mapped to on the way in, which is precisely what a shared schema like MEC gives you a head start on.

The third mistake is no canonical identifier — ingesting the same film from two sources and storing it twice because nothing told the system they were the same work. The result is duplicate catalog entries, split viewing data, and confused recommendations. The fix is EIDR or an internal canonical ID applied at ingest, with fingerprinting to catch the duplicates that slip through. Skip the description and discovery starves; skip normalization and the catalog fragments; skip the identifier and it duplicates. All three failures are cheap to prevent at ingest and expensive to repair after launch.

Where Fora Soft fits in

A metadata system is the kind of unglamorous, scale-defining plumbing Fora Soft has built repeatedly: an ingest layer that accepts catalogs from many suppliers, a normalization step that maps every supplier's dialect to one canonical vocabulary and one identifier, an enrichment stage that blends human tagging with AI and content-fingerprinting tools, and a governance layer that keeps the catalog accurate as it grows. Across 625+ shipped projects for 400+ clients since 2005 in video streaming, OTT/Internet TV, e-learning, and surveillance, the recurring lesson is that discovery quality is set long before the recommendation model — it is set in the metadata pipeline. Our approach is scalability-first and vendor-neutral: we start from the size and messiness of your catalog and the number of sources you ingest, then build the ingest-normalize-enrich-govern pipeline and wire it into search, recommendations, and your public Schema.org markup across web, mobile, and TV, so the discovery engine downstream has clean, rich fuel to burn.

Call to action

Talk to a streaming engineer — book a 30-minute scoping call to talk through your content metadata plan.
See our case studies — 250+ shipped projects across video streaming, WebRTC, OTT, telemedicine, e-learning, surveillance, and AR/VR.
Download the Metadata Readiness Checklist — One Page — The four metadata kinds to capture, the EIDR / MovieLabs MEC / Schema.org standards to adopt, the human/provider/AI tagging mix, and the ingest-normalize-enrich-govern pipeline to specify before launch — on a single sheet.

References

EIDR — The Universal Media Identifier (registry help, data model, and Content ID structure). Entertainment ID Registry Association (EIDR), 2026. Tier 1 (industry-standard registry / first-party). Source of: EIDR as a DOI-based universal identifier resolving to a descriptive metadata record; the hierarchical Content ID structure (series → season → episode) and the 10.5240/… form; the stated purpose of eliminating costly proprietary-ID translation, reducing misidentification from duplication, and matching assets across databases and suppliers; free public resolution of any EIDR ID. https://ui.eidr.org/help and https://www.eidr.org/ — accessed 2026-06-18.
RFC 7972 — Entertainment Identifier Registry (EIDR) URN Namespace Definition. Lemieux, P., IETF, August 2016. Tier 1 (IETF primary standard). Source of: the standardized URN form for EIDR identifiers, enabling EIDR IDs to be carried unambiguously between systems. https://datatracker.ietf.org/doc/html/rfc7972 — accessed 2026-06-18.
Media Entertainment Core (MEC) Metadata, TR-META-MEC, Version 2.25. Motion Picture Laboratories (MovieLabs), with the Entertainment Merchant's Association (EMA) and Digital Entertainment Group (DEG), December 2025. Tier 1 (industry-standard specification). Source of: MEC and Common Metadata as the schema for transferring metadata from publishers to retailers; XML schema with formal field definitions and controlled vocabularies (e.g. a standard genre list); the current version and governance. https://movielabs.com/md/mec/ and https://movielabs.com/md/mec/v2.25/Media_Ent_Core_Metadata_v2.25.pdf — accessed 2026-06-18.
Video (VideoObject, Clip, BroadcastEvent) structured data documentation. Google Search Central, 2026. Tier 1 (first-party platform documentation for the controlling discovery surface). Source of: JSON-LD as the recommended format; the required name, thumbnailUrl, uploadDate, and content/embed URL fields plus recommended description/duration; eligibility for video rich results; SeekToAction and Clip for key moments. https://developers.google.com/search/docs/appearance/structured-data/video — accessed 2026-06-18.
VideoObject. Schema.org, 2026. Tier 1 (web-vocabulary standard). Source of: the VideoObject, Movie, and TVSeries types and their descriptive properties used to mark up media for search engines. https://schema.org/VideoObject — accessed 2026-06-18.
The Netflix Recommender System: Algorithms, Business Value, and Innovation. Gomez-Uribe, C. A. & Hunt, N. ACM Transactions on Management Information Systems, 6(4), Article 13, 2015. DOI 10.1145/2843948. Tier 1 (peer-reviewed, first-party engineering). Source of: the role of structured metadata and tagging in driving personalization, and recommendations influencing ~80% of streamed hours — the business case for investing in descriptive metadata. https://dl.acm.org/doi/10.1145/2843948 — accessed 2026-06-18.
Content metadata and how to manage OTT video metadata. SymphonyAI Media (glossary and engineering blog), 2025. Tier 7 (vendor, orientation only). Used for the four-category framing (descriptive, structural, administrative, technical) and the operational reality of normalizing metadata from many sources; the substantive standards claims are anchored to the primary sources above. https://www.symphonyai.com/glossary/media/content-metadata/ — accessed 2026-06-18.
Automatic Content Recognition (ACR): Whys and Hows. Tatari, 2024. Tier 6 (industry explainer, orientation only). Source of: how content fingerprinting / ACR computes a fingerprint from visual frames or the audio signal and matches it to a database to identify content and attach metadata. https://www.tatari.tv/insights/automatic-content-recognition-acr-whys-and-hows — accessed 2026-06-18.
Behind Netflix's ~76,897 Altgenres / On the Netflix Quantum Theory. Posner, M. (DH101) and related reporting, 2014–2017. Tier 6 (journalism / educational, illustrative only). Used for the human-tagging example — taggers, micro-tags, and the alt-genre grammar — presented as illustrative reporting rather than specification; the numbers are treated as approximate. https://miriamposner.com/dh101f14/?p=605 — accessed 2026-06-18.
The content-discovery problem caused by poor metadata; metadata enrichment. Coactive AI and Alpha Networks (vendor engineering blogs), 2024–2025. Tier 7 (vendor, orientation only). Used for the "poor metadata hides good content / enrichment improves discoverability" framing and the role of AI in scaling enrichment; not relied on for any standards claim. https://www.coactive.ai/blog/introducing-ai-powered-metadata-enrichment---for-better-content-discovery — accessed 2026-06-18.

Where sources disagreed, the industry-standard and first-party documents were followed. Identifier and interchange claims are cited directly from EIDR's own registry documentation (refs 1–2) and the MovieLabs MEC specification (ref 3), not from vendor paraphrases; structured-data claims are cited from Google Search Central and Schema.org directly (refs 4–5). The Netflix tagging example (ref 9) is treated as illustrative journalism — the substantive "metadata drives personalization" claim is anchored to the peer-reviewed Netflix paper (ref 6). Vendor blogs (refs 7, 8, 10) are used only for orientation and the four-category framing, never as the source for a standards claim.

Why this matters

What metadata actually is

The four kinds of metadata

Why metadata is the fuel for discovery

The identifier problem: every title needs a name machines agree on

The interchange problem: speaking one metadata language

Metadata for the open web: structured data and discovery beyond your app

How the catalog actually gets tagged: humans, vendors, and machines

A worked example: thin metadata versus rich metadata

A common mistake: treating metadata as an afterthought

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

Content Metadata: the Fuel for Streaming Discovery

Why this matters

What metadata actually is

The four kinds of metadata

Why metadata is the fuel for discovery

The identifier problem: every title needs a name machines agree on

The interchange problem: speaking one metadata language

Metadata for the open web: structured data and discovery beyond your app

How the catalog actually gets tagged: humans, vendors, and machines

A worked example: thin metadata versus rich metadata

A common mistake: treating metadata as an afterthought

Where Fora Soft fits in

What to read next

Call to action

References

Related glossary terms

Metadata

Ingest

Content-based filtering

Bitrate

Codec

Collaborative filtering

Recommendation system

Cold-start problem