Signal Data Mining

Learn about data mining methods and Signal detection scores that determine Product-Event pairs.

Note Beginning with 24R1 in April 2024, this site will no longer be available for Limited Release content. The new Vault Safety Help site is the official site for Vault Safety Limited Release Help content. For the latest information, visit the new site.

About Data Mining

Data mining runs are used to detect whether there is an unexpectedly high occurrence of adverse events associated with a particular product, in other words, a Product-Event pair.

When you navigate to the Signal Detection Dashboard for the first time, you must click Run to retrieve the data. Vault Signal generates Signal detection scores using the Empirical Bayesian Geometric Mean (EBGM), the Proportional Reporting Ratio (PRR) disproportionality method, and the Reporting Odds Ratio (ROR) disproportionality method to measure the strength of a Product-Event pair.

These scores are also displayed on the Signal Workbench, where you can monitor tracked Product-Events.

Disproportionality methods are used to identify statistical associations between Products and Adverse Events. Vault Signal compares the frequency of reports for the Product and Adverse Event with a baseline reporting frequency, which is calculated using other Products. A Product-Event pair is flagged when the reporting frequency for the Product exceeds the baseline frequency, indicating a possible causal relationship between the Product and the Adverse Event.

About the Proportional Reporting Ratio (PRR)

The Proportional Reporting Ratio (PRR) is a statistical method used to determine if there is a higher number of reports received for a particular Product and Adverse Event compared to the number of reports for the same Adverse Event and all the other Products in the database.

The PRR is a two-dimensional model (2 variables) that measures the strength of the association between a particular Product and a particular Adverse Event.

The following table summarizes the PRR disproportionality comparison for a given Adverse Event (X) that was experienced and a Product (Y) that was taken:

prr-ror-table
The PRR Contingency Table
  • a: The number of Cases for a particular event (Adverse Event X) and a particular Product (Product Y).
  • b: The number of Cases for all other events for Product Y.
  • c: The number of Cases for all other Products for Adverse Event X.
  • d: The number of Cases for all other Products and all other Adverse Events.

The PRR can be expressed as follows:

prr-equation

A PRR value greater than 1 indicates a higher probability that the Adverse Event of interest occurs in patients taking the particular Product compared to the same event occurring with other Products in the database.

PRR Considerations

The PRR does not adjust for small Case counts, stratification variables, or consider multiple Product-Event pairs. Further, the PRR does not display the level of uncertainty in the results. The X2 statistic is used to account for these limitations by assessing the reliability of the data. The PRR025 confidence score is used to assess the reliability of the strength of the PRR estimate.

Calculating the PRR025 Score

In addition to the PRR score, Vault Signal calculates the lower confidence bound (PRR025) on the PRR 95% confidence interval. The confidence interval outlines the uncertainty in a Product-Event estimate to address small sample sizes.

PRR025 represents the lower bound of the PRR score, where the probability of an even lower score is less than 2.5%.

About the Reporting Odds Ratio (ROR)

The Reporting Odds Ratio (ROR) is a statistical method used to determine the odds of a particular event occurring for a particular product, compared to the odds of the same adverse event occurring for all the other products in the database.

The ROR is similar to the PRR as both are disproportionality scores. The difference is that the ROR uses odds instead of probability. Although, both methods provide a similar result.

The following table summarizes the ROR disproportionality comparison for a given Adverse Event (X) that was experienced and a Product (Y) that was taken:

prr-ror-table
ROR Contingency Table
  • a: The number of Cases for a particular event (Adverse Event X) and a particular Product (Product Y).
  • b: The number of Cases for all other events for Product Y.
  • c: The number of Cases for all other Products for Adverse Event X.
  • d: The number of Cases for all other Products and all other Adverse Events.

The ROR can be expressed as:

ror-equation

An ROR value greater than 1 indicates the odds are higher for the Adverse Event of interest to occur in patients taking the particular Product compared to the odds of the same event occurring with other Products in the database. If the ROR is equal to 2, the odds of the event being reported for the Product of interest is twice as high as the odds of the event being reported among all other Products in the database.

ROR Considerations

The ROR does not adjust for small Case counts nor the stratification variables. Similarly to the PRR score, the X2 statistic is used to account for these limitations by assessing the reliability of the data. The ROR025 confidence score is used to assess the reliability of the ROR estimate.

Calculating the ROR025 Score

In addition to the ROR score, Vault Signal calculates the lower confidence bound (ROR025) on the ROR 95% confidence interval. The confidence interval outlines the uncertainty in a product-event estimate to address small sample sizes.

ROR025 represents the lower bound of the ROR score, where the probability of an even lower score is less than 2.5%.

About the Chi-Square (X2) Statistic

The Chi-square (X2) is another statistical method of measuring the association between a specific Adverse Event and a specific Product.

The X2 statistic is represented by the following formula:

chi-square-equation
  • a: The number of Cases for a particular event (Adverse Event X) and a particular Product (Product Y).
  • b: The number of Cases for all other events for Product Y.
  • c: The number of Cases for all other Products for Adverse Event X.
  • d: The number of Cases for all other Products and all other Adverse Events.

A PRR or ROR value accompanied by a large X2 value (equal to or larger than 4) is considered reliable.

About the Empirical Bayesian Geometric Mean (EBGM)

Vault Signal uses Multi-Item Gamma Poisson Shrinker (MGPS) statistical methodology to mine the Signal database (including FAERS and Safety data) to produce the Empirical Bayesian Geometric Mean (EGBM) scores.

The EBGM improves over other disproportionality methods (such as PRR) by using Bayesian shrinkage to account for the uncertainty of when there are fewer Cases.

The key difference between EBGM and the PRR is that with the PRR, the Adverse Events from the specific Product don’t contribute to the number of “expected” Cases. Whereas with EBGM, all Adverse Events from the Product contribute to the expected number.

EBGM scores are supplemented by the EB05 and EB95 - the lower and upper 90% confidence limits.

About the Multi-Item Gamma Poisson Shrinker (MGPS) and Bayesian Modelling

The Multi-Item Gamma Poisson Shrinker (MGPS) algorithm is used to apply Bayesian Modelling to address statistical variability and uncertainty.

Unlike for PRR and X2 scores, Bayesian Modelling does not need to focus on the association between one Adverse Event and one Product. This statistical method considers other Adverse Event-Product pairs to estimate an EBGM score.

The MGPS consolidates the data into a single EBGM score with confidence limits. MGPS uses a Bayesian probability formula to shrink reporting ratios that are based on small amounts of data toward the value of 1. The estimated EBGM score tends to be less variable. The reduction in variability is named Bayesian Shrinkage.

Calculating EB05 and EB95 Scores

EB05 and EB95 scores are used as lower and upper confidence bounds, respectively, for the EBGM 90% confidence interval. The confidence interval outlines the uncertainty in a Product-Event estimate to address small sample sizes.

EB05 represents the lower bound of the EBGM score, where the probability of an even lower score is less than 5%.

EB95 represents the upper bound of the EBGM score, where the probability of an even higher score is less than 5%.

Vault Signal calculates these scores by using the MGPS for each Product-Event combination in a data source.

Stratification

Stratification is a method for countering the effect of confounding variables that can influence the probability of experiencing an adverse event when taking a particular product. Confounding variables include patient gender, patient age, event country, and more. Note that Stratification is only available on the Signal Workbench and not the Signal Detection Dashboard.

Vault Signal uses Age Group and Gender strata to calculate “stratified” scores.

The following diagram demonstrates an example of how Vault Signal stratifies data for gender:

gender-stratification-example
Gender Stratification Example
  • 1Vault Signal partitions the dataset into strata (female and male).
  • 2Scores are calculated for each stratum within the gender group.
  • 3The scores from each stratum are respectively pooled together to produce the stratified scores.

Stratification prevents Simpson’s paradox, a phenomenon that occurs when we don’t consider confounding variables (the stratification groups). The phenomenon then results in the opposite conclusion in the data, as opposed to the expected conclusion when the confounding variables are included.

Stratification and Filters

Both stratification and filters break the dataset down into subgroups that share similar traits (for example, females and males).

The difference between the two methods is that filtering restricts your view to this subset of data. For example, if you select the Adult filter under Age Group, you are looking at the association between the adverse event and the medicinal product for this subgroup only. However, for stratification, if you select the Age Group strata, the system analyzes each subgroup within the Age Group before pooling them together. In other words, you are looking at the entire dataset during stratification.

Since filtering restricts the data to one subgroup and stratification must consider all the subgroups, you cannot filter and stratify data on the same variable.

Apply Detection Score Thresholds

You can apply detection score thresholds when you edit or create a Signal Product Profile. Based on regulatory guidelines, we recommend the following threshold values:

  • Threshold PRR: Greater than 2
  • Threshold PRR025: 1
  • Threshold ROR: 3
  • Threshold ROR025: 1
  • Threshold X2: Greater than 4
  • Threshold EBGM: 2
  • Threshold EB05: 1
product-profile-thresholds
Detection Score Thresholds

The PRR and X2 threshold values set on the Signal Product Profile position the threshold lines on the PRR vs X2 Distribution chart on the Signal Detection Dashboard. This allows you to easily identify the scores that fall below the threshold.

bubble-chart-thresholds
Threshold Lines on Bubble Chart

The Product-Event Combinations grid on the Signal Detection Dashboard also informs you when scores fall below the specified threshold values by graying them out.

list-of-detected-signals-threshold
Scores Below the Threshold in the Product-Event Combinations Grid

Minimum Detection Scores

The Detection Summary panel on the Signal Detection Dashboard includes several minimum values for which you can specify to filter the detection data. Applying the minimums updates the detection screen to exclude product-event pairs with scores below specified minimum values.

Note that when you apply the minimum filters, the Product-Event pair must meet all minimum parameters to be displayed.


Related Docs