ETL / data pipeline

An ETL (Extract, Transform, Load) pipeline is the set of processes that move data from one or more source systems — an LMS database, an LRS (Learning Record Store) containing xAPI (Experience API) statements, a video platform's event log, or a business system — into a centralised analytical data store such as a data warehouse or data lake, where it can be queried for dashboards, cohort analysis, and learning KPI reporting. The Extract stage reads raw data from each source via API call, direct database query, or file export. The Transform stage cleans, joins, and enriches the data — for example converting xAPI actor IRIs to internal user IDs, normalising timestamps to a single timezone, and computing derived fields such as percentage watched from raw play and pause events. The Load stage writes the processed records to the target store in a schema designed for analytical queries. In modern architectures the pattern is sometimes called ELT — load first, then transform inside the warehouse using tools like dbt — but the conceptual challenge is the same. For learning analytics the most common pain point is identity resolution: xAPI statements in an LRS reference actors by IRI, while the LMS tracks the same learner by a numeric ID and the HR system by an email address, requiring a stable join key in the transform stage. Without a reliable pipeline, analytics teams work from stale exports, miss events that occurred between runs, and produce dashboards whose numbers cannot be reconciled — undermining trust in the entire metrics programme.

ETL / data pipeline

Related terms

Learning analytics

LRS (Learning Record Store)

xAPI statement