OpenVigil Cave-at document

Version 2.0.2 (2014-09-15,

The practical usage of pharmacovigilance data like the FDA FOI AERS data (LAERS, FAERS) is limited by the following methodical problems and shortcomings of the spontaneous reporting system:



consequences for data analyses



In several jurisdictions health care professionals are not legally bound to file adverse events. Overlooking adverse events or non-reporting because of heavy workload often lead to under-reporting which is estimated to range between 1:10 to 1:100.

Absolute numbers of cases might range 10-100 times higher. Cases for rarely used drugs might be missing.

Phenprocoumon is commonly used in Germany but not in the US. There are some records for this drug in the US FDA data but they probably do not allow any further analysis.


The act of reporting and the choice of a primary and secondary suspected drug causing an event is dependent how important and plausible this issue appears to a physicians or patient. Different overview of literature could skew the number of reports per drug or per adverse event.
More extensively used drugs have higher total numbers of records.

Queries should not always rely on the item that specifies which drugs are suspected to cause the reaction (DRUG.ROLE_COD in the AERS database). This is particulary important for signal detection which aims to discover relations hereto unknown.

If a certain problematic adverse reaction is finally reported in the media, spontaneous reporting peaks. E.g., once the propofol infusion syndrome was reported, it was seen everywhere.

drug usages vs reports vs cases

As of today, counting is usually done on individual safety reports. However, several drug usages make up one report. Several reports belong to one patient.

Most researchers are probably interested in counting affected patients. However, as of today, OpenVigil and others perform counting on single reports.
If you focus on dosage-dependent adverse reactions, looking on drug usages might be the best option. If you focus on allergic reactions, numbers of patients might be most appropriate.

see the varenicline/darvocet-example below at “multiplicate reports”: 13 reports originate from only 3 patients.

missing denominator

The total amount of use for a drug (e.g., defined daily dose (DDD) or total number of applications) is not gathered in traditional pharmacovigilance data. Therefore, any normalizations or relations to the real world (e.g., odds ratio, risks) are difficult.

A rough estimate of drug usage and therefore substitute of DDD might be the total number of reports which include this drug.

The German arznei-telegramm 2001(42):47 shows prescription data of metamizol (USAN dipyrone) and reported agranulocytosis.

Keller 2006 tries to estimate the incidence of adverse reactions using drug dispension data from pharmacies.

wrong data

Reporters might accidentally use wrong form fields for items or mix up cases.

Cleaning the data might catch some of these cases.

The antidepressant paroxetine flags a signal for the adverse event depression, i.e., has statistically to be considered to have depression as adverse reaction. However, depression might have been coded as adverse event together with “drug failure” or something similar. Additionally, in some reports, indication and reaction might have been mixed up. Finally, clinical trials show that there might be a subgroup that does not benefit from paroxetine but instead develops further psychiatric symptoms. Since at least these 3 groups are mixed together in the data, no further anaylsis can be done.

missing data

Reporters might not have all necessary data available or they cannot afford the time of entering all available data due to their workload.
Some important data (e.g., magnesium level on Torsades de pointes onset) are not gathered in traditional pharmacovigilance data.

These records with missing data can be filtered out or some kind of extrapolation might be applied.

The WHO ranks report according to their quality.

duplicate or multiplicate records

A report might be reported by the sponsor of a trial, the affected participant and his general practitioner.
Reports might be sent to a domestic and a foreign database and consequently be reported in duplicate to larger multi-national databases (e.g., VigiBase).

Checks for records with different case numbers but the same age, sex, date of onset of adverse event and other database items can catch these kind of multipicates.

Harpaz et al. (2010) detected that in the 2008 data, searching for the combination of drug “varenicline”, pharmaproduct “davocet” and adverse event “abnormal dreams” show a strong signal. OpenVigil 2.0 finds 13 reports. However, looking at the demographic data and the CASE_ID field, these 13 reports originate from only 3 patients.

no use of dictionaries or strict formatting

Data from some pharmacovigilance databases like FAERS is not completely sanitized and does not follow a single naming scheme for drugs (like WHO-DD or XEVMPD) or a single format for dosages. It does, however, use MedDRA to code the adverse event.

Sanitize your data by using external data sources like drugname databases. This is done automatically in OpenVigil 2; however, approx. 30% of the raw FAERS data is discarded because no drugname could be recognized.

Be sure to manually sanitize certain data items if you rely on them for an anlysis!

As of 2014-08-22, OpenVigil 2.0 can not recognize the putative brandname “sudafed 12 hour”, thus missing approx. 1766 individual safety records. Our primary drugname-mapping source, the DrugBank, cannot precisely map this to one drugname.

As of 2014-08-22, OpenVigil 2.0 can not parse the dosage “75 MG EACH MORNING, 150 MG EACH EVENING”, although such a notation is well understandable for human users.


Besides formal inconsistency (e.g., the gas xenon cannot be applied by intravenous route), the mechanism needs to be explained (e.g., some licensed lipid lowering drugs (e.g., orlistat) are connected to pharmacologically unexplainable adverse reactions (e.g., influenza).

Health care professionals should browse the list of case reports and manually correct inplausibilities and inconsistencies. This implies that any statistical findings cannot be used in legal proceedings.

A lawyer sought to attest that patients with lack of an enzyme who are on low dosages of a drug containing this enzyme as supplement are experiencing more adverse reactions than those who get the higher dosage. This approach has several errors: no biological plausibility, hypothesis testing instead of generation, reliance solely on statistics to find true adverse reactions.

hypthesis generation only

Pharmacovigilance allows mostly just hypothesis generation but not testing due to the above mentioned shortcomings.

Hypothesis generation is fine; if you aim to test a hypothesis you have to be very careful whether this is possible or whether the available pharmacovigilance data might be misleading (e.g., over-reporting, comparing results to a different population)


Pharmaproducts can contain several drugs. Unrelated drugs might pop up as signal because they are always in combination with the drug causing the reaction. The same applies for two separate drugs/pharmaproducts that are often used for the same indication.

When a case series is identified, a closer look should be cast on the medication lists.

Hydrochlorothiazide is routinely added to various other (potassium-sparing) diuretics and antihypertensives. Thus, hydrochlorothiazide might get associated with adverse reactions to antihypertensives.

human errors

Both human errors during software development as well as during software usage can result in distortion of data.

Your method section must list the exact access date of data sources and the exact version number of any software used to analyse the data.
Any scripts used to manipulate or analyse the dataset must be published.

An error in OpenVigil prior to version 1.2.6 led to 145 reports in an analysis of warfarin and haematemesis to be not considered (see tutorial to learn more about the consequences).

counting issues

Pharmacovigilance data sources like AERS consist of reports, linked to an ISR, linked to an CASENO. One case can consist of several ISR submitted at several stages.

Most types of queries should use unique cases (in SQL term: DISTINCT DEMO.CASENO).
Additional checking for multiplicates is advised (see above).

DEMO contains 5,337,037 reports, 5,332,211 of which are unique, referring to 4,139,662 individuals. (Without checking for multiplicates.)

All examples are based on FDA LAERS and FAERS from 2003-10-6 to 2013-12-31 (OpenVigil 1) or FDA LAERS data from 2003-10-6 to 2012-06-30 (OpenVigil 2), extracted prior to 2014-09-15. Figures may change during further development of software and data import filter.