Hydropower eLibrary / Dam License and Document Repository
The Hydropower eLibrary is a free online repository for documents and data related to U.S. hydropower projects and infrastructure. It includes all hydropower documents from the FERC eLibrary, a comprehensive list of FERC hydropower projects, and an interactive map of existing U.S. hydro installations.
Problem
FERC Hydropower Licensing Documents are Difficult to Find
The Federal Energy Regulatory Commission (FERC)’s eLibrary is an online records information system containing millions of documents for the four industries FERC regulates: electric, hydropower, natural gas, and oil. These documents are particularly valuable to the hydropower community as they offer current and historical data on environmental studies and licensing conditions for individual hydropower projects.
However, users have reported frustration with finding and accessing relevant documents within the eLibrary due to its poor user interface and confusing data structure. The U.S. Department of Energy's Water Power Technologies Office (WPTO) tasked PNNL with creating a hydropower-focused version of the FERC eLibrary. This new version aims to provide hydropower industry professionals with a better user experience in locating FERC documents and additional hydropower data not included in the original FERC eLibrary.
User Research & Context
The Original FERC eLibrary
First, I interviewed and observed hydropower professionals using the FERC eLibrary and noted several key issues common to most users:
- Project ID Barrier: Finding a hydropower project's P-0000 ID number is a prerequisite for any search related to that project. There is no list of these ids, creating a significant barrier for users before they can even begin a document search.
- Inefficient Search Results: A general search yields too many results, while a more targeted search excludes too much.
- Irrelevant Industry Data: The categories and filters are designed for compatibility with all FERC-regulated industries, but our users are only interested in hydropower data (not transmission, oil, or natural gas).
- Confusing Layout: Results are displayed in a table with a confusing layout. Users often do not understand why a result is returned, how sorting works, or how to navigate to view a result.
- Slow File Download Requirement: Reviewing the actual contents of a document requires downloading the original file. These files can be quite large, resulting in slow download times.
- Irrelevant Document Types: Users typically search for a small number of document types associated with the licensing process, such as License Issuances, License Amendments, and Environmental Reports. Most other records are obstacles to finding these "Key Documents."
- Poor Data Quality: Data quality on the FERC eLibrary is generally poor, with frequent mis-categorizations, inconsistencies in naming conventions, and typos that make keyword searches extremely difficult.
FERC eLibrary Screenshots
Iteration 0
Initial Designs, Layout, and Prototyping
Low Fi Mockups
I began my design exploration with a large set of low-fidelity mockups to compare different layout options. The top five concepts are shown here. From these, I selected two to advance to high-fidelity prototypes: "2 Column Alt" as the primary option, and "1-2 Column Hybrid" as a backup.
2 Column Layout - High Fi Prototypes
Next, I created high-fidelity interactive prototypes to develop my preferred "2 Column Alt" layout.
Single Page Scrolling Alternate
I also explored the "1-2 Column Hybrid" layout as a backup. Although I liked this more simplified concept, discussions with users and engineers led to the decision to go with the more complex layout.
Switching Themes to Avoid Mis-Crediting Our Client
Initially, I adopted the style of the FERC.gov website by theming Blueprint.js using my Blueprint Styler library. However, after the first round of feedback, the client (WPTO) requested a style change to avoid any brand association that might imply credit to FERC.
Iteration 1
Proof of Concept
Getting the Data from the FERC eLibrary
As the product owner, I worked closely with our data scientists and engineers to plan the data ingestion pipeline. We discovered that we could use the same API that the FERC eLibrary UI search uses to pull data. We considered two approaches:
Data Approach 1
Use the FERC eLibrary as Our Database
With this minimal approach, our UI would search the FERC eLibrary via their search API but provide a better search experience by translating certain elements like tags into advanced FERC filter sets on a proxy server.
- Pros: No need to host our own data. Dataset is always up-to-date.
- Cons: Requires setting up a server proxy to transform data into a different structure. Dependence on another site for everything; if they block us or go offline, we are affected. No display of documents raw text.
Data Approach 2
Copy and Host Our Own Data
With this more robust and traditional approach, we'd pull, enrich, and save a copy of all relevant data from the FERC eLibrary (except for original files). We opted for this approach to have more control.
- Pros: Ability to enrich ingested data to improve data quality, including parsing raw text from documents. Ability to manage how data is queried. Independence from FERC eLibrary's servers and API.
- Cons: Initial need to pull a large volume of data (1.4 million documents). Daily updates required, so data will be one day behind. Costs associated with hosting a large amount of data.
Hydropower Projects Dataset
User interviews revealed the need for a list of FERC Hydropower Projects to accompany and cross-link with the documents dataset. We identified a set of regularly updated Excel sheets published on the Hydropower Licensing Page of the FERC website. I then collaborated with our data engineers to create an ingestion script that pulls these Excel sheets weekly, merges them into a unified schema, and publishes a new Projects dataset.
Parsing Raw Text
One of the primary user requests was the ability to quickly preview document contents without having to download them. Users wanted to scan a snippet of text to decide if the document was worth a deeper look.
Parsing raw text also allowed us to perform better keyword searches, although the extreme volume of text later proved to be a performance issue.
I collaborated with our engineers to devise an approach to parse as much raw text as possible. Despite our efforts, data gaps remained. Some files were image scans requiring OCR, and some raw text was formatted irregularly. We decided that achieving around 80% coverage was acceptable and moved on.
Functional React Prototype
The first iteration concluded with a functional application that successfully displayed all the collected data.
Iteration 2
Private Alpha
Project Search Features & Key Document Tagging
In iteration 2, we introduced automatic document tagging for the most important "Key Documents" and a project search feature that allows users to search for projects by name, organization, or waterway, rather than just the P-0000 project ID number. We also experimented with a simplified "minimal input" search homepage, but it tested poorly with users and was removed in iteration 3.
Additionally, I adjusted the app's theme in iteration 2. However, most users didn't like the vibrant orange accent color, leading me to adopt a more neutral style in the next iteration.
User Testing
Cutting Features: User Personalization and Saved Filter Features
To inform iteration 2, I conducted observational user testing of the first iteration. Users initially requested the ability to save filter sets and tag specific documents. Implementing these personalization features would require user account management and saving user data. Additionally, the functionality for saving, editing, and overwriting filters and tags turned out to be quite complex.
In iteration 2, we prototyped and user-tested these personalization features. Surprisingly, the filter sets users wanted to save were extremely simple, sometimes just a single filter. When I asked about this, users referenced their difficult experience with entering filters in the original FERC eLibrary tool. Because our search features were much easier to use, the time and effort to enter a search was less than the configuring a and using a saved filter.
Regarding the document tagging feature, users mostly tagged the same Key Document types that we were already auto-tagging during data ingestion. Again, our enhanced data UX made this feature largely unnecessary.
Ultimately, we decided to cut these personalization features. The marginal benefits did not justify the overhead of managing user accounts.
Iteration 3
Public Beta
Usability and Discoveribility Testing
After releasing the public beta, I conducted a more quantifiable observational user test with 5 participants from the hydropower industry. Each participant was asked to complete a common set of tasks designed to test each major feature of the app. Each feature was graded on a 4 point scale to help prioritize which features and tasks needed more attention.
Test Grading
- 🟢 Pass Fast - the task was completed in a single or a few attempts
- 🟨 Pass Slow - the task was completed without a hint after several unsuccessful attempts
- 🔶 Partial Fail - the user became stuck and required a hint to complete the task
- 🔻 Fail - user could not complete task even after one or more hints were provided
- 🕙 Skip - task was not attempted due to time constraints or other restrictions
Task | Feature Tested | Grade |
---|---|---|
Login/Homepage - Where are you? What can you do? Who made this? | Does the homepage accurately describe the tool and data sources? Would a user understand them? | --- |
There is a project named “Felt” on the Teton River. Find the most recent “License Application” for that project. | Find a Document | 🟢1 🟨2 🔶2 🔻0 🕙0 |
(2 Part) Please save this document to easily find it later. Then, Could you save this in another way? | 'Favorite' document and/or 'Download' document | |
- Download | Used first 4 times | 🟢4 🟨1 🔶0 🔻0 🕙0 |
- Favorite | Used first 1 time | 🟢3 🟨0 🔶1 🔻0 🕙1 |
New Search | Clear out the current search and start from scratch | --- |
Find all the Projects in “Montana” “Utah” and “Idaho” - how many are there? | Searching in Projects Dataset, Finding and using "State" filter | 🟢2 🟨1 🔶2 🔻0 🕙0 |
Please find all the projects in the current list that have an "Active License" | Applying more filters, specifically the (License) "Status" Filter | 🟢3 🟨1 🔶1 🔻0 🕙0 |
Please find "Active Licenses" in "Idaho" only | Remove filters | 🟢5 🟨0 🔶0 🔻0 🕙0 |
Find the project with the earliest future expiration date. The next active license to expire (Felt) | Sorting by property, reversing sort | 🟢3 🟨1 🔶1 🔻0 🕙0 |
Find the License Issuance for that project (Felt) | 'Search this Project for Documents' button, Document search (again). | 🟢3 🟨1 🔶1 🔻0 🕙0 |
New Search | Clear out the current search and start from scratch | --- |
Find all NEPA Documents | 'Key Documents' filter and data tagging | 🟢1 🟨2 🔶2 🔻0 🕙0 |
Search for all NEPA documents that mention “Salmon” | Keyword search and understanding of why results were returned, dataset trust | 🟢1 🟨3 🔶1 🔻0 🕙0 |
Search for all NEPA documents that mention “Salmon” or "Trout" | Advanced keyword search | 🟢0 🟨0 🔶0 🔻4 🕙1 |
Totals | 11 Tasks / 5 Participants | 🟢26 🟨12 🔶11 🔻4 🕙2 |
Interactive Dam Map of Existing Hydropower Assets in the United States.
One of the final and most requested features was an interactive dam map. We sourced data from Oak Ridge National Laboratory's (ORNL) Existing Hydropower Assets (EHA) Database. This data source enabled us to plot the positions of dams nationwide and provided additional metadata for our Projects dataset.
Dedicated Project Page
We also added a dedicated Project page that includes all the metadata, dams, and Key Documents tagged to that project.
Additional Tweaks and Refinements
Based on user feedback, we implemented numerous feature refinements, including a new homepage, an about page, maps integrated into most pages, improved auto-tagging, added more Projects, enabled CSV export, enhanced cross-linking between Document and Project pages, and many other improvements.
Conclusion
A Generally Successful Launch
Overall, this project was a reasonable success. It's currently deployed and receives an stead 150 average monthly users, with an average session duration of 8 minutes, and about 3,000 total monthly page views. Feedback from direct contact has been positive, with most requests asking for more data rather than additional features. WPTO was pleased with the tool and traffic. Although I was underwhelmed by the traffic numbers, our client seemed quite impressed, indicating that the app is extremely helpful to a small but dedicated user base.
You can access the app now at:
Retrospective
What Could Have Been Better
Static Site Generation
The current site is a React SPA that (slowly) loads a large react bundle and then requests all the data it needs. No content is rendered for several seconds, and Google's crawling capabilities are limited due to the JavaScript resources required to render each page.
In hindsight, using Server-Side Rendering (SSR) or a Static Site Generator (SSG) like Next.js would have been better. Sending pre-rendered html with this approach would improve page loading speed and enhance our chances of Google indexing our project and key document pages, potentially improving organic search traffic for searches like "Vernon Hydropower License."
Keyword Search Coverage and Relevance Sorting
Our keyword search turned out to be too slow to search all our document raw text. We had to limit the search scope to only include the full raw text of Key Documents, while other documents only had the first few thousand characters available to keyword search. This limitation was a significant hit to our UX since keyword search is the most common method of filtering.
We also attempted to sort results based on "Relevance," which is a pretty fuzzy criteria. We developed rudimentary relevance scoring based on keyword frequency and location, but it did not perform particularly well.
(Database and server architecture is bit beyond my depth, but...) Utilizing other databases or search engines like ElasticSearch could vastly improve search coverage, speed, and sorting. However, supporting this along with Static Site Generation would require a significant rewrite of our current infrastructure.
URL Search Parameters for Filter Sets
Our filters are not saved or restored from URL search parameters. Adding this feature would have allowed users to keep their "Saved Filters" as a URL without requiring us to support user accounts.
Mobile Design
The initial scope of this project deprioritized a responsive mobile layout to cut development time. As the project progressed, our frontend architecture proved prohibitively difficult to retrofit with a mobile layer. If mobile design had been considered from the beginning, it would no have been difficult to maintain long-term.
Thanks for reading. Return Home to continue.