By now you surely are convinced that the structure of glycans are extraordinarily complex and in many ways much more complex than proteins and nucleic acid. Their structure diversity is staggering, given the number of different sugar monomers, stereocenters, linkages, lengths, confomers, dynamic flexibility and chemical modifications. Yet evolution has allowed this astronomical diversity, which must serve more than just simple functions such as protection of proteins from degradation, to pick one example. Much of the diversity derives from the lack of an equivalent of a genetic code for glycan synthesis.
Since all events in biology start with a binding interaction, let's ponder binding interaction of glycans with partners "ligands" such as proteins, lipids, nucleic acids, etc. A binding site on a glycan can be a simple as a single monosaacharide, which you could easily imagine for hydrogen bonding with water, to a much large and complex interface. Consider the structure of one of the few glycoproteins which has pdb coordinates , the unliganded simian immunodeficiency virus (SIV) gp120 core protein (pdbid: 3fus), whose structures is shown below.
The protein surface is shown in ivory and just the glycans are shown in color sticks with the correct symbolic color-coded spheres or cubes around them.
Now lets convert an an image file showing one face of the protein to a black and white QR code as shown in the figure below.
Computers can recognize information encoded in the QR codes and decode them into another form of information, such as a menu at a restaurant. Likewise, organisms have evolved readers to decode the glycan code, which is written by the enzymes (glycan synthases, hydrolases, and modifying enzymes). The glycan code is written onto the 3D-surfaces of polysaccharides, glycoproteins, glycolipids, and proteoglycans. It should be no surprise that the biological readers of the glycan code are mostly proteins, which locate and bind to the correct "QR" code displayed on the surface of glycan.
Luckily, the QR code metaphor for the glycan code is a bit exaggerated since the readers of the glycan code, glycan binding proteins, seem to recognize just small sections of a glycan. They can be compared to protein antibodies that bind to foreign molecules such as proteins. The binding site on a foreign protein recognized by an antibody is called an epitope. Epitopes can be continuous (linear) stretches of the foreign protein sequence, or discontinuous (conformational), made of some continuous stretches of amino acid and some further away in the sequence but close in the 3D folded protein. The average continuous epitope is often given as 5-6 amino acids long. Yet that might be an underestimation since an analysis of all contact residues (within a conservative 4 A° distance) for target proteins and their bound antibodies found in the Protein Data Bank is around 18-19 amino each (Stave and Lindpaintner). Glycan binding proteins presumably also bind a mixture of continuous and discontinuous glycan sequences. Linear one would be much easier to determine and study.
Now, let's explore the family of these glycan binding proteins (GBP), the readers of the glycan code.
Glycan Binding Proteins (GBPs)
There appear to be nine different types of glycan binding proteins (GBPs). These include noneveloped capsid virus GBPs,enveloped-virus GBPs (ex. influenza and corona viruses), eukaryotic microbial GBPs (ex yeast), and bacterial toxin GBPs (ex botulinum toxin). Bacterial adhesins (parts of organelles like flagella), lectins (soluble proteins and lectin domain-containing proteins are also examples of glycan binding proteins (GBP). We will discuss in more detail three other types, C-type lectins, galectins, and siglecs.
In the broadest sense, if a lectin is a protein that bind a specific carbohydrate motif (i.e a glycan code) with modifying the motif, then any glycan binding protein could be called a lectin. This definition would exclude enzymes that synthesize, degrade or modify glycans as well as antibodies that would recognize recognize foreign or self glycan sequences. The table below shows some lectins and their target glycan ligand from plants, animals, viruses and bacteria.
|Concanavilin A||ConA||Man α1- OCH3|
|Griffonia simplicifolia lectin 4||GS4||Lewis b (Leb) tetrasaccharide|
|Wheat germ agglutinin||WGA||Ner5Ac(α 2,3)Gal(β 1,4)GlcGlcNAc(β1,4)GlcNAc|
|Mannose-binding protein||MBP-A||High Mannose Octasaccahride|
|Influenza Virus hemagglutinin||HA||Neu5Ac(α 2,6)Gal(β1,4)Glc|
|Polyoma virus protein 1||VP1||Neu5Ac(α 2,3)Gal(β1,4)Glc|
|Cholera toxin||CT||GM1 pentasaccharide|
In animals, lectins facilitate cell-cell interactions by forming multiple, but weak interactions (also called multivalent interactions) between the protein and many sugars on the ligand to which it binds.
Now lets consider the other three classes of glycan binding proteins (or lectins), C-type lectins, galectins, and siglecs in more detail. Focus on the very different structures of the carbohydate binding domains.
C-Type lectins comprise the larges number of glycan binding protein. These proteins have a glycan or carbohydrate recognition domain that depends on the Ca2+ ion. They bind self glycans as well as those on pathogens, which can target viruses to specific cells. Many are on the surface of immune cells. The have an N-terminal glycan binding domain, also called a C-lectin (CLECT) domain or a carbohydrate recognition domain (CRD). However some proteins with the domain do not appear to bind either Ca2+ or glycans. The serve as adhesion molecules and are also involved in cell signaling. Some residues in the lectin binding domain appear critical for binding lectins. These include an EPN motif which interacts with Man, GlcNAc, Fuc, and Glc) and a WND motif involved in binding to Gal and GalNAc.
Let's look now at one example of a C-type lectin, the selectins.
These are involved in the interaction of immune cells in the blood with endothelial cells that line the blood vessel wall. Think of the challenges faced by an immune cell that must move from the blood into a tissue where an infection might occur. Blood flow in vessels at a rate inversely proportional to the total cross-sectional area of the blood vessel. That rate is about 5-20 cm/sec in arteries, 1.5-7cm/sec in veins and about 1 mm (or 1000μm)/sec in capillaries. Assuming an average diameter of 10 μm for a lymphocyte, that would be the equivalent of moving about 100 cell lengths per second. An equivalent speed for a human with an arm span of 6 feet (approximately fitting into a circle of diameter = 6 feet as drawn by Leonardo da Vinci) would be around 600 feet/second. The cell must go from its typical circulating speeds to a stop before it can move through the blood vessel wall into tissues. Nature has solved this by providing a way to slow down the moving cell until its final capture. The cells rolls along the endothelial cells, making transient low affinity interactions, which slow it down enough until high affinity ones effectively stop it (unless it dissociates first).
Also, you wouldn't want immune cells to stop and move into tissue in the absence of an infection which is signaled by mediator molecules. Another problem solved! P-selections are stored in the intracellular granules of platelets (the source of the name P-selectin) and endothelial cells so moving immune cells are not spuriously captured in the absence of some signal. In the presence of the right chemical signal, endothelial and platelets get active and P-selectin is transported rapidly to the cell surface where "capture" occurs before the cell can move into the underlying tissue. P-selectin mediates the first transient interactions and rolling of immune cells on activated platelets and endothelial cells.
Here is video animating the rolling and "capture" of a lymphocyte by endothelial cells. See the video for the reference. Note that cancer cells also can move through the endothelial cells of blood vessels in the process of of forming metastases.
P-selectins hence are receptors for molecules on immune cells. They bind Ca2+ ions, which helps create an active conformation. Their binding ligands are glycan codes and nearby sections of protein connected to the glycan. The glycan ligand on surface of a circulating immune cells if the sialyl-Lewis X (SiaLewX) glycan or derivative of it. One of the immune membrane proteins that has the SiaLewX is the P-selectin glycoprotein ligand 1 (PSGL-1), also called SELPLG, which is the name of the gene for PSGL-1. Mediates rapid rolling of leukocyte rolling over vascular surfaces during the initial steps in inflammation through interaction with SELPLG"
P-selectin is a mediator of cell adhesion (to other cells). As such it could also be classified as an adhesion protein. The three main types of selectins:
- L-selectins: found on leukocytes ("white" blood cells that are circulating immune cells)
- P-selectins: found on activated platelets (which can aggregate to form a type of blood clot) and activated endothelial cells. Activation occurs during the inflammatory response which can lead to the quick movement of pre-formed selectins stored within the cytoplasm to the membrane. In addition, their expression can be induced.
- E-selectins: found on activated endothelial cells only after the cells have been induced to form them by certain immune hormones called cytokines releases by immune cells during an inflammatory response.
The figure below shows the domain structure of human P-selectin (from https://smart.embl.de/)
It contains a N-terminaL C-Lectin (CLECT) domain, which is also called the carbohydrate-recognition domain (CRDs) or the C-type lectin domain (CTLD). In addition it has an epidermal growth factor domain (EGF), 9 complement control protein (CCP) domains and the blue transmembrane domain.
Here is structure of the SLewx glycan along with its symbol nomenclature for glycans (SNFG) diagram.
The model belows shows the crystal structure of P-selectin lectin/EGF domains complexed with SLeX (1g1r) to which P-selectin binds with weak affinity. Fucose interacts with the Ca2+ ion. The glycan interacts with the CLECT domain.
SLewx is not present in isolation but rather attached to a membrane protein on an immune cell, which serves as a ligand for the P-selectin on activated endothelial cells or platelet. (The SLewx can also be part of a glycolipid.) Now let's contrast the interactions of P-selectin LE domain with "naked" SLewx with those present between P-selectin LE with a higher affinity natural binding ligand, human P-selectin glycoprotein ligand 1 (PSGL-1), an immune cell integral membrane protein. PSGL-1 is expressed on neutrophils, monocytes and most lymphocytes. The P-selectin:PSGL-1 complex has a much lower KD (higher affinity compared to binding of the unmodified SLeX (1g1s). PSGL-1 is a disulfide-linked homodimer. When sulfated on a specific Tyr (48) the protein displays high affinity for P-selectin. In contrast, when sulfated on a different Tyr (51), it displays high affinity for L-selectin instead
The SLexX type glycan O-linked to the peptide is a bit more complicated than the simple SLexX ligand as it is connected to a protein through an O-linked bond at a threonine. The SNFG is shown below.
The crystal structure of a trisulfated, SLewx-modified peptide from the N terminal region of PSGL-1 (1G1S) bound to P-selectin lectin and EGF domains (P-LE) has been solved. The model below shows some of the interactions between the PSGL-1 peptide (green backbone) and P-LE (magenta backbone).
In the crystal structure, the peptide from the P-selectin ligand (which again is a membrane protein) contains 3 sulfated tyrosine (Tys) residues (605, 607, and 610 which correspond to amino acids 5, 7, and 10 in the peptide). No electron density for the side chain of Tys 605 was seen. Tys 607 binds through multiple interactions to P-selectin LE domain, and is mostly likely responsible for the high affinity interaction of P-selectin with the P-selectin glycoprotein ligand (again represented by the green chain). In contrast Tys 610 interacts through an intermediary water molecule with the glycan SLexX of the peptide.
The image below shows the electrostatic surface potential map of the one of the dimers of the P-select LE domains. The blue represents positive potential and the red negative. The backbone of the P-selectin ligand peptide is shown in green with all of the negatively charged side chains (Tys, Asp and Glu) show in stick with CPK colors. Note that these amino acids are all bound in blue (positive) regions of P-selection. Note that the glycan portion attached to the peptide (stick, CPK color) are positioned mostly over negative potential, allowing hydrogen bonding between the hydrogen bond donors of sugar OHs with the protein.
You could surmise that the blue region of positive potential could also bind other strongly negatively charged ligands (such as heparin and other glycosoaminoglycans) which could inhibit the function of this protein as it would prevent binding of the PSGL-1.
Here is an iCn3D showing a model of the above image.
There are also nonpolar interactions not shown in the figure and model above. The aromatic ring of Tys 607 (7) interacts with the nonpolar parts of a Ser (-CH2) and Lys (-CH2)4 side chains and the ring of Tys 610 (10) interacts with two leucine side chains.
The selectins are also part of a class of molecules called adhesion molecules. As described for the selectins, adhesion molecules contain
- an extracellular CHO binding domain (the lectin domain), which mediates binding to adjacent cells or to the extracellular matrix;
- a transmembrane domain;
- and a cytoplasmic domain which often interacts with the cytoskeleton within the cell.
This initial binding mediated by selectin-CHO interactions activate the expression of another adhesion molecule on the leukocyte, integrin, a heterodimer with an a and b chain. These cause strong leukocyte-endothelial cell interactions, leading to ultimate movement of the leukocytes through the vessel wall. Other classes of adhesion molecules (in addition to selectins and integrins) are cadherins (calcium-dependent adhesion molecules), and the immunoglobulin-like superfamily (ICAM1, ICAM2, VCAM). VCAM (Vascular Adhesion Molecule) binds the integrin expressed on activated lympocytes, leading to passage of the lympocyte from the lumen of the vessel into the tissues. Integrins appear to bind proteins in the extracellular matrix through RGD (Arg-Gly-Asp) and also through LDV (Leu-Asp-Val) motifs on the proteins, including fibronectin (RGD), thrombospondin (RGD & LDV), fibrinogen (RGD & LDV), van Willebrand Factor (RGD), vitronectin (RGD). They also bind other matrix proteins with an "alpha domain" including collagen and laminin. Integrin/Adhesion molecule interations involve protein/protein interactions.
A fertilized egg (in the blastocyst stage which is ready for implantation in the uterine cell wall) express L-selectin which allows a low affinity (rolling-type) interaction of the fertilized egg with the uterine epithelial cells. These cells expressed the CHO ligands on their surface which bind to the L-selectin on the blastocyst. The CHO ligands are only transiently expressed on the surface of the epithelial cells of the uterus, presumably only when the uterus is primed for implantation. After the initial interaction of the blastocyst and epithelial cells, further expression of integrins on the blastocyst surface might result. Problems in any of these molecular steps could result in infertility.
Post-translational modifications of protein modification (like glycoslyslation) can confer new binding and biological functions to a protein. Site-directed mutagenesis can be used to replace surface amino acids to cysteine or replacing methionine with nonnatural amino acid analogs that contain azide or alkyne groups. These modified groups could then direct the location of chemical modifying reagents (such as sugars) to these sites. A protein completely unrelated to PSGL-1 has been selectively modified using this approach to contain covalently attached glycans and sulfated tyrosine side chain. The unrelated protein bound to P-selectin.
What do you do with a protein that no longer has the correct structure(s) to perform its designed function(s)? Proteins, as with any molecules, undergo chemical changes during their biological lifetime. They must be recognized as aberrant and then removed from "service", ultimately being degraded into component amino acids for reuse. There are no repair enzymes for proteins as for DNA. One modification that changes glycoproteins and signals the need for their removal is the removal of terminal sialic acid residue, forming asialoglycoproteins, whose glycans end in galactose, as you can envision from the figure below which shows a typical structure of a N-linked glycoprotein
The asialoglycoprotein receptor , a member of the C-Type lectin family, is a transmembrane protein which binds terminal galactose and N-acetylgalactosamine sugars on the end of circulationg asialoglycoproteins, leading to their endocytosis into the cell. It is expressed on the surface of hepatocytes (liver cells). Receptors of this type are also called scavenger receptors as they remove proteins from circulation.
Another C-Type lectin involved in binding and removal of glycoproteins from the circulation is the mannose receptor (also called CD206), which is also expressed in liver endothelial cells. It binds both sulfated and non-sulfated glycans. It also is a receptor that allows binding and phagocytosis of bacterial and fungal pathogens by a type of immune cells called macrophages and dendritic cells. Unfortunately, tumor cells can use the same process for uptake into macrophages which can actually lead to promotion of tumor cell growth. The protein can bind and scavenge sulfated glycoprotein hormones, mannose-bearing glycoproteins released during inflammation, lysosomal enzymes released from cells on injury and fragments of collagen.
The figure below shows the domain structure of the human mannose receptor.
You might imagine given the large number of CLECT domains that this protein could bind a number of different target glycans from both self and pathogens. What's different about the domain structure compared to P-selectin is the presence of an N-terminal Ricin and a Fibronectin type 2 (FN2) domain. The FN2 domain has two cystines from the 4 conserved cysteine. four conserved cysteines involved in disulfide bonds. What's so interesting about the mannose receptor is that it binds glycans both in the CLECT domains and in the FN2 domain.
Glycan binding at the CLECT domain: The CLECT domains binds targets containing mannose, fucose and N-acetylacetylglucosamine with preference for Man(α1,2)Man or fucose. Here is a model of the CLECT 4 domain of the mannose receptor complexed with Man(α1,2)Man (7jue). Interactions of fucose lin ligands such as Lewis-a-trisaccharide strengths the binding.
The receptor can bind a variety of glycans. Both mannose and N-acetylglucosamine interact with bound Ca2+ through equatorial OHs on carbon 3 and 4 of the ring while fucose uses OHs on carbon 2 and 3, or 3 and 4.
The interaction with fungal pathogens is obviously medically important. Fungi like yeast have an outer structure composed of a membrane bilayer and a mixture of glycans, which deploys an incredibly complex "glycan code" to host infected by them., as illustrated in the figure below.
Kang, X., Kirui, A., Muszyński, A. et al. Molecular architecture of fungal cell walls revealed by solid-state NMR. Nat Commun 9, 2747 (2018). https://doi.org/10.1038/s41467-018-05199-0. Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0/
Mannans, polymers of just mannose, differ widely in structure. Their main backbone can be DMan(α-1,6)DMan or DMan(β-1,4)DMan with many branches.
Glcyan binding at the FN2 (Cysteine-Rich) Domain (1FWU): The mannose receptor can also bind non-mannose sulfated glycans, such as 3-SO4-LEWIS(X), for which the SNFG representation is shown below.
What interest is that the mannose receptor binds this glycan which does not even contain mannose through the FN2 domain (which contains four disulfide bonds) and not the the CLECT calcium-dependent carbohydrate binding domain. Hence the protein can bind both sulfated and nonsulfated glycans.
Here is an iCn3D model showing the complex of the FN2 domain of the mannose receptor with the non-mannose containing 3-SO4-LEWIS(X) glycan.
Look at the number of CLECT domains in the domain structure diagram for the mannose receptor above. Along with interactions of sulfated glycans at the FN2 domain, these would enable the binding of widely diverse glycan structures. Some of the reported ligands for the mannose receptor include those with high mannose content released during inflammation (lysosomal hydrolases, collagen peptides, and tissue plasminogen activator_ and sulfated ones (including the pituitary hormones lutropin and thyrotropin.
This family of glycan binding proteins contains a common carbohydrate recognition domain (CRD) of about 130 amino acids, which bind Galβ1,3GlcNAc or Galβ1,4GlcNAc disaccharides (hence the name galectins) as well as other glycan motifs. The are expressed in almost all cells and multicelluar organisms. There are 15 different types, grouped together in how the CRD is functionally expressed (as dimers, tandem repeats, or chimeras), as illustrated in the figure below. The figure also shows their role in cancer biology.
Shimada, C.; Xu, R.; Al-Alem, L.; Stasenko, M.; Spriggs, D.R.; Rueda, B.R. Galectins and Ovarian Cancer. Cancers 2020, 12, 1421. https://doi.org/10.3390/cancers12061421. Creative Commons Attribution License
The carbohydrate binding domain of the galactins has a jellyroll-like protein architecture with two anti-parallel β-sheets forming a β-sandwich.
This protein is secreted and is found in the extracellular matrix as well as the cytoplasm. It induces apoptosis in T-cells. It binds beta-galactosides as well as other glycans. The main ligand of galectin-1 has a Galβ1-4GlcNAc (or LacNAc) structure.
Here is an iCn3D model of Human Galectin-1 in Complex with Type 1 N-acetyllactosamine (Gal(β1,3)GlcNAc),which binds less tightly than Galβ1,4GlcNAc (Type 2)
A comparison of crystal structure show different pwsi angles for bound Type I (135°) versus the more tightly bound Type 2 (-108°), which shows the nuance in binding conformations in the interactions of glycans with glycan binding proteins.
The proteins are sialic acid-binding immunoglobulin (Ig)-like lectins fond on immune cells like basophils, macrophages, mast cells and eosinophis. One type (Siglec-4, is found in myelinated structures in the central and peripheral nervous system. They all have an N terminal extracellular immunoglobulin domain (abbreviated as IG or V-Set) and a differing number of IG-like domains, also called C2-set Ig domains. The glycan binding epitope recognized by Siglecs are sialylated oligosaccharides on a section of the protein containing a conserved arginine. The figure belows compares the domains structures of the human Siglec family.
Siddiqui, S.S.; Matar, R.; Merheb, M.; Hodeify, R.; Vazhappilly, C.G.; Marton, J.; Shamsuddin, S.A.; Al Zouabi, H. Siglecs in Brain Function and Neurological Disorders. Cells 2019, 8, 1125. https://doi.org/10.3390/cells8101125. Open access article distributed under the Creative Commons Attribution License
Here is one example of a Siglec.
Ths protein is expressed on immune cells like basophil, mast cells and eosinophils. When activated by infection and prolonged inflammation, they release the contents of intracellular granules which have potent physiological effects that can lead to allergic and asthmatic responses. On infection and other inflammatory states, immune cytokines are release that in a signaling process lead to release of sialoglycan that act as ligands, binding to the Siglec-8 on the surface of the immune cells. One type of sialoglycan released ae mucins, which are very large glycoproteins with many 6′S sLex glycans attached. These "multivalent" glycan epitopes can bind to Siglec-8 lead to signaling in the cells and ultimate inhibition of cell function (including by death or apoptosis). The mucins in mucus (cross-linked mucins), which covers epithelial cells or airways, also act as a first line defense as they can bind viruses through mulitple contact (multivalent) binding sites, effectively trapping the viruses. The glyan structure recognized by Siglec-8 is sialic acid and sulfate (NeuAcα2-3[6S]Galβ1-4G[Fucα1-3]GlcNAc-). Given their role in inhibiting and induding apoptosis in immune cell, the family of siglecs are likely involved as checkpoints which effect cancer and inflammatory conditions.
Here is the domain structure of Siglec-8
Note that there is no CLECT domain, but rather immunoglobulin- (IG) or IG-like domains, which seems logical given their role in binding glycan "epitopes". The IG domain is also called the immunoglobulin V-set domain (V-Set). The blue rectangle represents the transmembrane domain (single helix). The cytoplasmis has tyrosine inhibitory motif (ITIM) involved in transducing the signal on binding 6′S sLex glycans to the IG domains.
As discussed above, humans lack a hydroxlase gene necessary for the hydroxlation of Neu5Ac to Neu5Gc, which is found in chimps who possess the enzyme. Chimp's immune systems seems to confer protection from acquiring simian version of AIDS, cirrhosis, and other diseases which humans acquire when they are infected with the human versions of the HIV virus, hepatitis B or C, or other viruses. These disease and others associated with overactive T cells (rheumatoid arthritis, asthma, type-I diabetes) are not common in chimps. It turns out that there is a link between the type of sialic acid and the expresson of siglecs that influences the difference on our disease propensity. Varki et al have shown that chimps and gorillas show much higher levels of expression of siglecs on T cells, which are critical regulatory and effector cells in the immune system. When siglecs on T cells are activated, T-cell responses are down regulated. Although HIV virus ultimately kills T helper cells, the virus initially activates them on infection, leading to their proliferation and production of a larger number of cells for the virus to infect.