Hit Identification in Drug Discovery

Hit identification (Hit ID) is the first decision gate in small-molecule discovery: finding chemical matter that measurably modulates a biological target or phenotype and is suitable for optimization. In practice, Hit ID narrows very large chemical collections to a small, structurally diverse set of “hits” that can be validated and evolved.

Get In Touch

Hit ID vs. Lead ID

A hit is any compound with confirmed, reproducible activity and tractable chemistry; a lead goes further—meeting stricter thresholds for potency, selectivity, preliminary ADME/DMPK, and chemical developability that justify preclinical investment. The sections below outline the main methods to generate hits, how we validate them, and how Vipergen accelerates the path from hit to lead (Hughes 2011).

What is a hit?

High-quality hit compounds are small molecules, peptides or biologics which suffice several criteria (Hughes 2011):

Confirmed activity in the primary assay (and retested), with a concentration-response (typical hits are μM; exact thresholds are target/assay dependent).
Selectivity: clean in counter-screens vs. close homologs/anti-targets; not a PAINS motif; non-aggregating; appropriate redox/fluorescence behavior.
Tractability: synthetically accessible; clear points for analogue design; freedom-to-operate or IP novelty.
Early ADME flags: solubility and stability compatible with follow-up assays; acceptable basic physicochemical properties.
Verified identity & purity of the resynthesized material (off-DNA for DEL hits).

If all these criteria are not met, the compound must be characterized as a challenging hit. This can potentially make the hit-to-lead process and further steps of the drug discovery workflow more complex and ultimately failed projects.

What is hit identification?

Hit identification (HitID) comprises the screening strategy, assay design, data analysis, and triage used to move from millions/billions of candidates to a few dozen validated hits. Approaches include High-Throughput Screening (HTS), Virtual Screening (VS), DNA-Encoded Library (DEL) Screening, Fragment-Based Screening (FBS), and phenotypic screening, often combined in an integrated workflow tailored to the target class and data available (Ashraf 2024).

How is a hit identified?

The central point in hit identification is the capability of screening large compound libraries against biological targets. Early identification of promising compounds helps prioritize research efforts which can streamline the development process. A successful hit identification campaign will lead to several structurally diverse compounds amenable for a hit-to-lead campaign, which will save time and resources further in the drug development phase. This will in turn also lead to better preclinical candidates. One would typically aim for multiple, diverse chemotypes to seed parallel hit-to-lead tracks.

Methods of Hit Identification

Early-stage drug discovery can either start with a known ligand originating from academic literature, natural products or previous campaigns. Alternatively, de novo hit identification can be applied to identify novel chemical modulators. A range of different methods are employed for hit ID, which all focus on the screening of compound collections. Each method comes with strengths and weaknesses and below are the most promising methods summarized.

Practical rules for designing compounds most often follow Lipinski’s Rule of Five to keep in the oral drug-like space which is widely used physicochemical guidelines for both library design and during hit-to-lead to balance potency with permeability and exposure (Lipinski 1997). Alternatively, fragment bases drug discovery usually follows the Rule of Three to give low molecular weight and low lipophilicity with minimal H-bond features. This is done to maximize solubility and biophysical detectability in FBS.

Parameter	Rule of 5	Veber rules	Ghose Filter	Rule of 3
Molecular Weight	< 500 Da	–	160 – 460	< 300 Da
cLogP	< 5	–	-0.4 – 5.6	≤ 3
Hydrogen Bond Donors	< 5	–	–	≤ 3
Hydrogen Bond Acceptors	< 10	–	–	≤ 3
# of rotatable bonds	–	≤ 10	–	≤ 3
PSA	–	≤ 140 Å	–	–
# of atoms	–	–	20 – 70	–

Table 1: Commonly used design rules for oral bioavailability. The rule of 3 is used for fragment libraries.

High-Throughput Screening

Principle: assay large plated libraries (96/384/1536-well) with automated liquid handling and high-density readouts (luminescence, fluorescence, FRET/TR-FRET, absorbance, HTRF, Alpha).

Figure 2: High Throughput Screening workflow, which in short requires the assaying of hundreds of thousands of compounds in individual wells. Hits needs to be validated to sort out false positives.

High-Throughput screening (HTS) is a cornerstone in hit identification in drug discovery and has since the 1990s been the primary method for identifying potential hits. HTS involves testing large libraries of compounds against a biological target to identify those who exhibit activity. The process of HTS typically allows for a compound library (stored in multi-well plates) against a recombinantly expressed target protein or enzyme. Compounds showing the highest read-out such as fluorescence or luminescence can then quickly be identified, isolated and validated.

However, HTS is often associated with high cost, as libraries needs to be acquired or synthesized and typically a robotic setup is utilized to handle the compound collection. Further, the compound collection has most often been synthesized in parallel which potentially hampers the diversity. False positives are also often identified from HTS which may arise from nonselective chemical reactions. Finally, either a biochemical or cellular assay, which generates a readout need to be developed. Where this is typically easy for common drug targets such as kinases and other enzymes, it can be more challenging for non-standard targets such as protein-protein interactions.

Typical library size: hundreds of thousands to a few million individually plated compounds for primary HTS, depending on the collection and assay format.

Typical instrumentation: automated liquid handlers/dispensers; microplate readers; acoustic dispensers; plate washers; robotics and LIMS for data integrity.

Advantages:

Direct measurement in a biochemical or cellular assay
Mature automation; throughput to 104-106 test/day
Broad assay menu; easy multiplexing

Limitations:

Assay development burden (especially PPIs/membrane targets)
Cost of library curation
False positives (aggregation, autofluorescence, redox etc.)
Library bias can limit chemical diversity

Virtual Screening

Principle: in silico triage of large chemical spaces using structure-based docking, ligand-based pharmacophores, QSAR/ML, or hybrids.
Representative tools: Glide, AutoDock Vina, GOLD, OpenEye FRED, MOE; ML-accelerated docking and AI pharmacophore modeling are increasingly used to shorten compute cycles.

Figure 3: Basic principle of Virtual Screening followed up by compound resynthesis and hit validation.

With the exponentially growing capabilities of computational tools, in silico methods such as virtual screening has emerged as a powerful tool for hit ID. Virtual screening can be utilized to predict a small molecule interaction with a target protein before screening of physical compounds occurs.

While it might seem appealing to screen a virtual library to reduce the number of compounds needed for physical screening, several challenges must be overcome to achieve high quality hit compounds. First of all, high-quality structural data is necessary for the target protein as this will be essential for the subsequent docking studies. For example, a low-resolution crystal structure could potentially yield false positives and a lot of wasted resources.

Typical library size searched: millions→billions in silico (vendor + enumerated spaces), prior to wet-lab confirmation.

Typical instruments: docking (Glide, AutoDock Vina), pharmacophore/QSAR, ML accelerators for docking scoring.

Advantages:

Cost-effective way to rank/cluster before wet screening
Enables design-make-test-learn loops and scaffold hopping

Limitations:

Dependent on target structure quality, protonation/tautomer states, and scoring functions
Requires rigorous prospective validation and decoy testing

Screening of DNA-Encoded Libraries

Principle: affinity selection of DNA-barcoded small molecules against a target; binders are identified by PCR/NGS counting and then resynthesized off-DNA for confirmation. Selections are quantified by NGS counts and gated against a mathematical background model; hits proceed to off-DNA resynthesis and IC₅₀/KD confirmation, followed by cellular EC₅₀/IC₅₀ and selectivity paneling (e.g., kinome maps) to prioritize series for SAR.

Figure 4: Basic overview of DNA Encoded Library Screening

DNA-Encoded Libraries (DELs) have emerged as powerful ways to generate compound collections ranging from a few millions to billions of compounds. Typically, a purified protein target is immobilized onto a solid support, which is then incubated with the DEL. Washing steps allow for the removal of non-binding library members and subsequent elution will isolate binding compounds which can be identified by PCR amplification followed by next generation sequencing (NGS).

One challenge with DELs arises from the split-and-pool synthesis where the small molecule is synthesized at one end of the DNA strand and the encoding DNA is ligated at the other end. This allows for truncations to appear leading to a high rate of false positives of resynthesized hits. Vipergen utilizes the YoctoReactor® to synthesize libraries, which ensures that non-reacted building blocks fall out of the library during purification allowing for high fidelity of the library and a low false positives rate during screenings.1

Vipergen has developed binder trap enrichment (BTE) technology allowing us to screen without the need for immobilization of the target enzyme.2 Furthermore, Vipergen can now perform the screening in live cells using cellular binder trap enrichment (cBTE) overcoming the issue of expressing recombinant protein.3 This also allows for screening against proteins situated in membranes, which traditionally can be difficult to express recombinantly.

Typical library size: millions to billions of DNA-encoded compounds can be screened in a single tube, leveraging NGS readout.

Typical instruments: Selection, NGS and off-DNA resynthesis. For Vipergen selections are run either using our BTE or cBTE platform.

Advantages:

Screens hundreds of millions of compounds rapidly
Powerful for challenging targets and selectivity profiling

Vipergen advantages:

YoctoReactor® synthesis delivers 100% code-to-compound fidelity by purifying after each step, minimizing truncates and improving hit validation rates.
BTE® avoids target immobilization and counts ligand–target binding events in emulsion droplets—enabling residence-time-aware discovery and instant selectivity (multiplexed anti-targets).
DELs-in-Cells (cBTE®) is the first and only method for screening DELs inside living cells, expanding target space and physiological relevance.

Figure 5: Schematic showing Vipergen’s cellular Binder Trap Enrichment in detail. Adapted from Petersen 2021.

Limitations:

Requires careful hit resynthesis and rigorous orthogonal validation
Selection statistics and library design influence false negatives/positives
Off-DNA translation essential before SAR

Phenotypic Screening

Principle: unbiased cell-based or organismal assays that measure phenotype (e.g., viability, morphology, reporter pathways) without prior target assignment. Can be part of a high throughput screening campaign.

Typical instruments: high-content imaging, transcriptomic profiling, CRISPR/ORF tools, proteomics.

Advantages:

Finds first-in-class mechanisms and polypharmacology
Captures cellular context (permeability, metabolism)

Limitations:

Target deconvolution required (chemoproteomics, CRISPR perturb-seq, thermal proteome profiling)
Assay noise/biology can complicate triage

Fragment-Based Screening

Principle: screen low-MW fragments (≈150–250 Da) at higher concentrations to sample chemical space efficiently; merge/grow/link fragments into leads guided by structure.
Primary readouts / instrumentation: SPR (real-time binding), NMR (STD/WaterLOGSY), X-ray crystallography (soak/co-crystal), ITC (thermodynamics), DSF/nanoDSF (stability shifts), MST.

Figure 6: Basic workflow of Fragment Based Screening.

Fragment based screening (FBS) has emerged as an alternative tool to traditional HTS. By screening and identifying small chemical fragments, subsequent work can focus on linking several fragments together to more complex structures with high affinity. This technique called fragment merging can hereby take simple substructures and merge these to form high affinity compounds. Where the initial screening is usually easy, the subsequent merging can be a time-consuming challenge, which often relies on structural data of the target.

Typical library size: 1–5K fragments (low-MW, high-solubility) screened with sensitive biophysical readouts.

Typical Instruments: SPR, ITC, DSF, NMR, MST, X-ray.

Advantages:

High hit rates; efficient exploration of binding pockets
Structural methods provide clear design hypotheses

Limitations:

Fragments are weak binders; requires sensitive biophysics and structure access
Chemical merging/growing can be resource-intensive

Common challenges in hit identification:

False positives/negatives: assay interference (autofluorescence, redox cyclers, aggregation), library artifacts, and truncates in poorly controlled DELs.
Assay design risk: target construct/format, detection chemistry, and counter-screen choices profoundly influence outcomes.
Compound/library quality: identity, purity, and diversity balance are critical to avoid fishing in a narrow chemical pond.
Data analysis complexity: statistics, curve-fitting, and multi-condition selection analysis (e.g., selectivity panels) require robust pipelines.
Translatability: bridging from biochemical hits to cellular engagement (permeability, efflux, metabolism).

How Vipergen helps: YoctoReactor® fidelity, BTE® residence-time awareness and multiplexed selectivity, and cBTE® in-cell screening directly address these pain points.

Even with robust libraries and high-quality assays, signal triage is critical: in our p38α campaign, >90% (22/24) of resynthesized DEL hits confirmed biochemically, yet potency rank did not strictly track read count—affinity, off-rate, and library member frequency each influence enrichment. This illustrates why orthogonal validation + early ADME are essential to convert “signal” into decision-ready chemical matter (Petersen 2016).

Hit validation

Hits identified from one of the above screening campaigns first needs to be validated. Compounds with apparent activity need to be isolated in high purity and tested in the primary assay. Here false positives arising from impurities (often seen in HTS), fluorescence or aggregation can be removed. Following this filtering of false positives, a range of secondary assays needs to be employed to show the desired biological activity. A typical set of assays which could be employed is the following:

Flagging of Pan Assay Interference Compounds (PAINS), compounds with low solubility and compounds which have been observed as hits in prior screens.
Counter screens against targets where the compounds should not be active and cytotoxicity assays.
Orthogonal assays which serve to confirm target engagement.

The range of assays which can be employed to validate a hit range from biophysical methods such as NMR and thermal shift assays to showing target engagement in cells. Ultimately, more disease relevant assays should be investigated to show a mode of action of the identified hit. Besides hits identified from the primary screening hits can be optimized through structure activity relationship (SAR) studies.

Early ADME/Tox screens de-risk hits before scale-up: solubility and chemical stability in assay buffers; permeability/efflux flags for cellular assays; microsomal stability for clearance risk; and basic cytotoxicity or mechanism-agnostic counterscreens to catch liabilities. Combining orthogonal biophysics (SPR/ITC/DSF) with early ADME to separate true binders from assay artefacts and prioritize tractable chemotypes for SAR. Findings from our yR-DEL + BTE campaigns routinely move from biochemical potency to nanomolar cellular IC₅₀ with strong selectivity, minimizing false positives and accelerating H2L (Petersen 2016).

Biophysical methods

SPR (surface plasmon resonance): Label-free, real-time kinetics (kon/koff), affinity, and stoichiometry
ITC (isothermal titration calorimetry): Direct thermodynamics (ΔH, ΔS, Kd)
DSF/nanoDSF (Differential Scanning Fluorimetry): Monitors protein stability shifts upon ligand binding for rapid triage
NMR: Epitope mapping and weak-binder detection
MSD (Meso Scale Discovery): Mobility changes reporting on binding

Assay & binding metrics at a glance

IC50/EC50: Concentration producing 50% inhibition/effect in the assay; used to rank hits and monitor SAR
Ki: Enzyme inhibition constant; model dependent but closer to mechanism than IC50
KD: Equilibrium binding affinity from biophysics (e.g. SPR, ITC, or nanoDSF); useful across assay formats
Z-factor: Assay quality statistics capturing separation and variability of positive/negative controls; used to qualify HTS readiness

Example from our DEL workflow: Resynthesis off-DNA hits proceeded from biochemical IC50 to cellular IC50 confirmation, including a 7 nM cellular IC50 p38α inhibitor discovery directly from a 12.6-million member yR library via BTE (Petersen 2016).

Hit-To-Lead Process

The goal of hit-to-lead (H2L) optimization is to convert validated hits into chemically coherent lead series with improved potency, selectivity, and developability. After initial confirmation and profiling, each hit undergoes systematic structure–activity relationship (SAR) exploration, physicochemical refinement, and biological evaluation to progress toward lead status.

What distinguishes a lead compound?

A lead is a hit that meets clearly defined thresholds in several key areas:

Potency: Demonstrates strong and reproducible activity, typically in the sub-micromolar range, with well-behaved concentration–response curves.
Selectivity: Shows a clean profile in target family panels and key off-target assays, minimizing potential liabilities.
ADME/DMPK properties: Exhibits suitable solubility, permeability, and metabolic stability, often supported by early in vitro or in vivo pharmacokinetic studies.
Safety and chemical stability: Lacks reactive or toxicophoric groups and displays acceptable cytotoxicity and chemical stability.
Synthetic tractability and IP position: Can be readily synthesized, modified, and protected, ensuring scalability and freedom-to-operate.

From hit to lead – an iterative optimization workflow:

Hit reconfirmation and triage – resynthesis and purity verification, followed by confirmatory assays and orthogonal biophysical validation (e.g., SPR, ITC, DSF).
Initial SAR exploration – analogue synthesis guided by structural or ligand-based insights to map potency and selectivity trends.
Property optimization – adjust physicochemical and ADME parameters (solubility, lipophilicity, stability) to balance efficacy with developability.
Selectivity and safety profiling – expand counter-screening and secondary assays to identify clean, specific chemotypes.
Lead series selection – prioritize 1–3 promising series for detailed optimization and potential structural biology studies.

Vipergen’s integrated approach:

Vipergen’s DNA-encoded library (DEL) platforms—YoctoReactor®, BTE®, and cBTE®—streamline the hit-to-lead transition by delivering high-quality, well-validated hits with immediate structural diversity. These platforms reduce false positives, enable residence-time-aware and cellularly relevant hit validation, and shorten the path to robust, patentable lead candidates (Petersen 2021).

Through close collaboration, Vipergen supports partners from initial hit validation through SAR expansion and lead optimization, ensuring that each project advances with efficiency, selectivity, and scientific confidence.

Future trends in hit identification

AI/ML-accelerated screening: ML models that pre-score docking or learn from prior campaigns to triage libraries faster; AI pharmacophore modeling complements structure-based methods (Singh 2024, Hayek-Orduz 2025).
In-cell selection & target engagement: technologies like cBTE® extend discovery inside living cells, improving physiological relevance (Petersen 2021).
Next-gen assays: high-content phenotypic imaging, microfluidics, and multiplexed biochemical/cell panels to increase information per screen.
Integrated workflows: combining VS → DEL/HTS → biophysics/phenotypic back-validation to exploit orthogonal strengths (Ashraf 2024).

Why Vipergen for Hit ID?

YoctoReactor® DELs: 100% code–compound match by design → cleaner data, fewer truncates, higher hit validation rates.
BTE® (in vitro): no immobilization, residence-time-aware selections, instant selectivity vs. anti-targets with μg protein consumption.
cBTE® (in living cells): first and only DEL screening inside living cells—broadens target scope and improves translational relevance.
Selectivity Direct & partner network: multiplexed target/anti-target screening and structure support to accelerate SAR.
Engagement models from fee-for-service to integrated collaborations; Express and Snap options for turnaround and budget flexibility.

Explore our In vitro and In living cell Hit Identification Services or get in touch via Inquiry.

References

Ashraf, S. N. et. al., Hit me with your best shot: Integrated hit discovery for the next generation of drug targets, Drug Discov Today, 29 (10), 104143 (2024). https://doi.org/10.1016/j.drudis.2024.104143
Ghose, A. K. et. al. A Knowledge-Based Approach in Designing Combinatorial or Medicinal Chemistry Libraries for Drug Discovery. 1. A Qualitative and Quantitative Characterization of Known Drug Databases, J Comb Chem, 1 (1), 55-68 (1999). https://doi.org/10.1021/cc9800071
Hayek-Orduz, Y. et. al. dyphAI dynamic pharmacophore modeling with AI: a tool for efficient screening of new acetylcholinesterase inhibitors, Front Chem, 13, 1479763 (2025). https://doi.org/10.3389/fchem.2025.1479763
Hughes, J. P. et. al., Principles of early drug discovery, Br J Pharmacol, 162 (6), 1239-1249 (2011). https://doi.org/10.1111/j.1476-5381.2010.01127.x
Lipinski, C. A. et. al. Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings, Adv Drug Deliv Rev, 23 (1-3), 3-25 (1997). https://doi.org/10.1016/S0169-409X(96)00423-1
Petersen, L. K. et. al. Novel p38α MAP kinase inhibitors identified from yoctoReactor DNA-encoded small molecule library, MedChemComm, 7, 1332-1339 (2016). https://doi.org/10.1039/C6MD00241B
Petersen, L. K. et. al. Screening of DNA-Encoded Small Molecule Libraries inside a Living Cell, J Am Chem Soc, 143 (7), 2751-2756 (2021). https://doi.org/10.1021/jacs.0c09213
Singh, S. et. al., Advances in Artificial Intelligence (AI)-assisted approaches in drug screening, Artif Intell Chem, 2 (1), 100039 (2024). https://doi.org/10.1016/j.aichem.2023.100039
Veber, D. F. et. al., Molecular Properties That Influence the Oral Bioavailability of Drug Candidates, J Med Chem, 45 (12), 2615-2623 (2002). https://doi.org/10.1021/jm020017n

Do you have an inquiry?

Get In Touch

Related Services

Small molecule drug discovery for even hard-to-drug targets – identify inhibitors, binders and modulators	In living cell In vitro
Molecular Glue Direct	In living cell In vitro
PPI Inhibitor Direct	In living cell In vitro
Integral membrane proteins	In living cell In vitro
Selectivity Direct – multiplexed screening of target and anti-targets	In living cell In vitro
Express – optimized for fast turn-around-time	In living cell In vitro
Snap – easy, fast, and affordable	In living cell In vitro