bionty.Protein¶
- class bionty.Protein(name: str | None, uniprotkb_id: str | None, synonyms: str | None, length: int | None, gene_symbol: str | None, ensembl_gene_ids: str | None, organism: Organism | None, source: Source | None)¶
Bases:
BioRecord
,TracksRun
,TracksUpdates
Proteins - Uniprot.
Notes
For more info, see tutorials Manage biological registries and Protein.
Bulk create records via
from_values()
.Examples
>>> record = bionty.Protein.from_source(name="Synaptotagmin-15B", organism="human") >>> record = bionty.Protein.from_source(gene_symbol="SYT15B", organism="human")
Simple fields¶
- uid: str¶
A universal id (hash of selected field).
- name: str | None¶
Unique name of a protein.
- uniprotkb_id: str | None¶
UniProt protein ID, 6 alphanumeric characters, possibly suffixed by 4 more.
- synonyms: str | None¶
Bar-separated (|) synonyms that correspond to this protein.
- description: str | None¶
Description of the protein.
- length: int | None¶
Length of the protein sequence.
- gene_symbol: str | None¶
The primary gene symbol corresponds to this protein.
- ensembl_gene_ids: str | None¶
Bar-separated (|) Ensembl Gene IDs that correspond to this protein.
- created_at: datetime¶
Time of creation of record.
- updated_at: datetime¶
Time of last update to record.
Relational fields¶
- created_by: User¶
Creator of record.
- run: Run¶
Last run that created or updated the record.
- artifacts: Artifact¶
Artifacts linked to the protein.
- feature_sets: FeatureSet¶
Featuresets linked to this protein.
Class methods¶
- classmethod add_source(source, currently_used=True)¶
Configure a source of the entity.
- Return type:
- classmethod df(include=None, join='inner', limit=100)¶
Convert to
pd.DataFrame
.By default, shows all direct fields, except
updated_at
.Use parameter
include
to include other fields.- Parameters:
include (
str
|list
[str
] |None
, default:None
) – Related fields to include as columns. Takes strings of form"labels__name"
,"cell_types__name"
, etc. or a list of such strings.join (
str
, default:'inner'
) – Thejoin
parameter ofpandas
.limit (
int
, default:100
) – Maximum number of rows to display from a Pandas DataFrame. Defaults to 100 to reduce database load.
- Return type:
DataFrame
Examples
>>> labels = [ln.ULabel(name="Label {i}") for i in range(3)] >>> ln.save(labels) >>> ln.ULabel.filter().df(include=["created_by__name"])
- classmethod filter(*queries, **expressions)¶
Query records.
- Parameters:
queries – One or multiple
Q
objects.expressions – Fields and values passed as Django query expressions.
- Return type:
QuerySet
- Returns:
A
QuerySet
.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ln.ULabel(name="my ulabel").save() >>> ulabel = ln.ULabel.get(name="my ulabel")
- classmethod from_source(*, mute=False, **kwargs)¶
Create a record or records from source based on a single field value.
Notes
For more info, see tutorial bionty
Bulk create records via
from_values()
.Examples
Create a record by passing a field value:
>>> record = bionty.Gene.from_source(symbol="TCF7", organism="human")
Create a record from non-default source:
>>> source = bionty.Source.get(entity="CellType", source="cl", version="2022-08-16") # noqa >>> record = bionty.CellType.from_source(name="T cell", source=source)
- classmethod from_values(values, field=None, create=False, organism=None, source=None, mute=False)¶
Bulk create validated records by parsing values for an identifier such as a name or an id).
- Parameters:
values (
List
[str
] |Series
|array
) – A list of values for an identifier, e.g.["name1", "name2"]
.field (
str
|DeferredAttribute
|None
, default:None
) – ARecord
field to look up, e.g.,bt.CellMarker.name
.create (
bool
, default:False
) – Whether to create records if they don’t exist.organism (
Record
|str
|None
, default:None
) – Abionty.Organism
name or record.source (
Record
|None
, default:None
) – Abionty.Source
record to validate against to create records for.mute (
bool
, default:False
) – Whether to mute logging.
- Return type:
list
[Record
]- Returns:
A list of validated records. For bionty registries. Also returns knowledge-coupled records.
Notes
For more info, see tutorial: Manage biological registries.
Examples
Bulk create from non-validated values will log warnings & returns empty list:
>>> ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], field="name") >>> assert len(ulabels) == 0
Bulk create records from validated values returns the corresponding existing records:
>>> ln.save([ln.ULabel(name=name) for name in ["benchmark", "prediction", "test"]]) >>> ulabels = ln.ULabel.from_values(["benchmark", "prediction", "test"], field="name") >>> assert len(ulabels) == 3
Bulk create records from public reference:
>>> import bionty as bt >>> records = bt.CellType.from_values(["T cell", "B cell"], field="name") >>> records
- classmethod get(idlike=None, **expressions)¶
Get a single record.
- Parameters:
idlike (
int
|str
|None
, default:None
) – Either a uid stub, uid or an integer id.expressions – Fields and values passed as Django query expressions.
- Return type:
- Returns:
A record.
- Raises:
lamindb.core.exceptions.DoesNotExist – In case no matching record is found.
See also
Guide: Query & search registries
Django documentation: Queries
Examples
>>> ulabel = ln.ULabel.get("2riu039") >>> ulabel = ln.ULabel.get(name="my-label")
- classmethod import_from_source(source=None, ontology_ids=None, organism=None, ignore_conflicts=True)¶
Bulk save records from a Pandas DataFrame.
Use this method to initialize your registry with public ontology.
- Parameters:
ontology_ids (
list
[str
] |None
, default:None
) – List of ontology ids to save.organism (
str
|Record
|None
, default:None
) – Organism name or record.source (
Source
|None
, default:None
) – Source record to import records from.ignore_conflicts (
bool
, default:True
) – Whether to ignore conflicts during bulk record creation.
Examples
>>> bionty.CellType.import_from_source()
- classmethod inspect(values, field=None, *, mute=False, organism=None, source=None)¶
Inspect if values are mappable to a field.
Being mappable means that an exact match exists.
- Parameters:
values (
List
[str
] |Series
|array
) – Values that will be checked against the field.field (
str
|DeferredAttribute
|None
, default:None
) – The field of values. Examples are'ontology_id'
to map against the source ID or'name'
to map against the ontologies field names.mute (
bool
, default:False
) – Whether to mute logging.organism (
str
|Record
|None
, default:None
) – An Organism name or record.source (
Record
|None
, default:None
) – Abionty.Source
record that specifies the version to inspect against.
- Return type:
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol")) >>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"] >>> result = bt.Gene.inspect(gene_symbols, field=bt.Gene.symbol) >>> result.validated ['A1CF', 'A1BG'] >>> result.non_validated ['FANCD1', 'FANCD20']
- classmethod lookup(field=None, return_field=None)¶
Return an auto-complete object for a field.
- Parameters:
field (
str
|DeferredAttribute
|None
, default:None
) – The field to look up the values for. Defaults to first string field.return_field (
str
|DeferredAttribute
|None
, default:None
) – The field to return. IfNone
, returns the whole record.
- Return type:
NamedTuple
- Returns:
A
NamedTuple
of lookup information of the field values with a dictionary converter.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> bt.Gene.from_source(symbol="ADGB-DT").save() >>> lookup = bt.Gene.lookup() >>> lookup.adgb_dt >>> lookup_dict = lookup.dict() >>> lookup_dict['ADGB-DT'] >>> lookup_by_ensembl_id = bt.Gene.lookup(field="ensembl_gene_id") >>> genes.ensg00000002745 >>> lookup_return_symbols = bt.Gene.lookup(field="ensembl_gene_id", return_field="symbol")
- classmethod public(organism=None, source=None)¶
The corresponding
bionty.base.PublicOntology
object.Note that the source is auto-configured and tracked via
bionty.Source
. :rtype:PublicOntology
|StaticReference
See also
Examples
>>> celltype_pub = bionty.CellType.public() >>> celltype_pub PublicOntology Entity: CellType Organism: all Source: cl, 2023-04-20 #terms: 2698
- classmethod search(string, *, field=None, limit=20, case_sensitive=False)¶
Search.
- Parameters:
string (
str
) – The input string to match against the field ontology values.field (
str
|DeferredAttribute
|None
, default:None
) – The field or fields to search. Search all string fields by default.limit (
int
|None
, default:20
) – Maximum amount of top results to return.case_sensitive (
bool
, default:False
) – Whether the match is case sensitive.
- Return type:
QuerySet
- Returns:
A sorted
DataFrame
of search results with a score in columnscore
. Ifreturn_queryset
isTrue
.QuerySet
.
Examples
>>> ulabels = ln.ULabel.from_values(["ULabel1", "ULabel2", "ULabel3"], field="name") >>> ln.save(ulabels) >>> ln.ULabel.search("ULabel2")
- classmethod standardize(values, field=None, *, return_field=None, return_mapper=False, case_sensitive=False, mute=False, public_aware=True, keep='first', synonyms_field='synonyms', organism=None, source=None)¶
Maps input synonyms to standardized names.
- Parameters:
values (
List
[str
] |Series
|array
) – Identifiers that will be standardized.field (
str
|DeferredAttribute
|None
, default:None
) – The field representing the standardized names.return_field (
str
, default:None
) – The field to return. Defaults to field.return_mapper (
bool
, default:False
) – IfTrue
, returns{input_value: standardized_name}
.case_sensitive (
bool
, default:False
) – Whether the mapping is case sensitive.mute (
bool
, default:False
) – Whether to mute logging.public_aware (
bool
, default:True
) – Whether to standardize from Bionty reference. Defaults toTrue
for Bionty registries.keep (
Literal
['first'
,'last'
,False
], default:'first'
) –- When a synonym maps to multiple names, determines which duplicates to mark as
pd.DataFrame.duplicated
: "first"
: returns the first mapped standardized name"last"
: returns the last mapped standardized nameFalse
: returns all mapped standardized name.
When
keep
isFalse
, the returned list of standardized names will contain nested lists in case of duplicates.When a field is converted into return_field, keep marks which matches to keep when multiple return_field values map to the same field value.
- When a synonym maps to multiple names, determines which duplicates to mark as
synonyms_field (
str
, default:'synonyms'
) – A field containing the concatenated synonyms.organism (
str
|Record
|None
, default:None
) – An Organism name or record.source (
Record
|None
, default:None
) – Abionty.Source
record that specifies the version to validate against.
- Return type:
list
[str
] |dict
[str
,str
]- Returns:
If
return_mapper
isFalse
– a list of standardized names. Otherwise, a dictionary of mapped values with mappable synonyms as keys and standardized names as values.
See also
add_synonym()
Add synonyms.
remove_synonym()
Remove synonyms.
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol")) >>> gene_synonyms = ["A1CF", "A1BG", "FANCD1", "FANCD20"] >>> standardized_names = bt.Gene.standardize(gene_synonyms) >>> standardized_names ['A1CF', 'A1BG', 'BRCA2', 'FANCD20']
- classmethod using(instance)¶
Use a non-default LaminDB instance.
- Parameters:
instance (
str
|None
) – An instance identifier of form “account_handle/instance_name”.- Return type:
QuerySet
Examples
>>> ln.ULabel.using("account_handle/instance_name").search("ULabel7", field="name") uid score name ULabel7 g7Hk9b2v 100.0 ULabel5 t4Jm6s0q 75.0 ULabel6 r2Xw8p1z 75.0
- classmethod validate(values, field=None, *, mute=False, organism=None, source=None)¶
Validate values against existing values of a string field.
Note this is strict validation, only asserts exact matches.
- Parameters:
values (
List
[str
] |Series
|array
) – Values that will be validated against the field.field (
str
|DeferredAttribute
|None
, default:None
) – The field of values. Examples are'ontology_id'
to map against the source ID or'name'
to map against the ontologies field names.mute (
bool
, default:False
) – Whether to mute logging.organism (
str
|Record
|None
, default:None
) – An Organism name or record.source (
Record
|None
, default:None
) – Abionty.Source
record that specifies the version to validate against.
- Return type:
ndarray
- Returns:
A vector of booleans indicating if an element is validated.
See also
Examples
>>> import bionty as bt >>> bt.settings.organism = "human" >>> ln.save(bt.Gene.from_values(["A1CF", "A1BG", "BRCA2"], field="symbol")) >>> gene_symbols = ["A1CF", "A1BG", "FANCD1", "FANCD20"] >>> bt.Gene.validate(gene_symbols, field=bt.Gene.symbol) array([ True, True, False, False])
Methods¶
- add_synonym(synonym, force=False, save=None)¶
Add synonyms to a record.
- Parameters:
synonym (
str
|List
[str
] |Series
|array
) – The synonyms to add to the record.force (
bool
, default:False
) – Whether to add synonyms even if they are already synonyms of other records.save (
bool
|None
, default:None
) – Whether to save the record to the database.
See also
remove_synonym()
Remove synonyms.
Examples
>>> import bionty as bt >>> bt.CellType.from_source(name="T cell").save() >>> lookup = bt.CellType.lookup() >>> record = lookup.t_cell >>> record.synonyms 'T-cell|T lymphocyte|T-lymphocyte' >>> record.add_synonym("T cells") >>> record.synonyms 'T cells|T-cell|T-lymphocyte|T lymphocyte'
- async adelete(using=None, keep_parents=False)¶
- async arefresh_from_db(using=None, fields=None, from_queryset=None)¶
- async asave(*args, force_insert=False, force_update=False, using=None, update_fields=None)¶
- clean()¶
Hook for doing any extra model-wide validation after clean() has been called on every field by self.clean_fields. Any ValidationError raised by this method will not be associated with a particular field; it will have a special-case association with the field defined by NON_FIELD_ERRORS.
- clean_fields(exclude=None)¶
Clean all fields and raise a ValidationError containing a dict of all validation errors if any occur.
- date_error_message(lookup_type, field_name, unique_for)¶
- delete()¶
Delete.
- Return type:
None
- get_constraints()¶
- get_deferred_fields()¶
Return a set containing names of deferred fields for this instance.
- prepare_database_save(field)¶
- refresh_from_db(using=None, fields=None, from_queryset=None)¶
Reload field values from the database.
By default, the reloading happens from the database this instance was loaded from, or by the read router if this instance wasn’t loaded from any database. The using parameter will override the default.
Fields can be used to specify which fields to reload. The fields should be an iterable of field attnames. If fields is None, then all non-deferred fields are reloaded.
When accessing deferred fields of an instance, the deferred loading of the field will call this method.
- remove_synonym(synonym)¶
Remove synonyms from a record.
- Parameters:
synonym (
str
|List
[str
] |Series
|array
) – The synonym values to remove.
See also
add_synonym()
Add synonyms
Examples
>>> import bionty as bt >>> bt.CellType.from_source(name="T cell").save() >>> lookup = bt.CellType.lookup() >>> record = lookup.t_cell >>> record.synonyms 'T-cell|T lymphocyte|T-lymphocyte' >>> record.remove_synonym("T-cell") 'T lymphocyte|T-lymphocyte'
- save_base(raw=False, force_insert=False, force_update=False, using=None, update_fields=None)¶
Handle the parts of saving which should be done only once per save, yet need to be done in raw saves, too. This includes some sanity checks and signal sending.
The ‘raw’ argument is telling save_base not to save any parent models and not to do any changes to the values before save. This is used by fixture loading.
- serializable_value(field_name)¶
Return the value of the field name for this instance. If the field is a foreign key, return the id value instead of the object. If there’s no Field object with this name on the model, return the model attribute’s value.
Used to serialize a field’s value (in the serializer, or form output, for example). Normally, you would just access the attribute directly and not use this method.
- set_abbr(value)¶
Set value for abbr field and add to synonyms.
- Parameters:
value (
str
) – A value for an abbreviation.
See also
Examples
>>> import bionty as bt >>> bt.ExperimentalFactor.from_source(name="single-cell RNA sequencing").save() >>> scrna = bt.ExperimentalFactor.get(name="single-cell RNA sequencing") >>> scrna.abbr None >>> scrna.synonyms 'single-cell RNA-seq|single-cell transcriptome sequencing|scRNA-seq|single cell RNA sequencing' >>> scrna.set_abbr("scRNA") >>> scrna.abbr 'scRNA' >>> scrna.synonyms 'scRNA|single-cell RNA-seq|single cell RNA sequencing|single-cell transcriptome sequencing|scRNA-seq' >>> scrna.save()
- unique_error_message(model_class, unique_check)¶
- validate_constraints(exclude=None)¶
- validate_unique(exclude=None)¶
Check unique constraints on the model and raise ValidationError if any failed.
- view_parents(field=None, with_children=False, distance=5)¶
View parents in an ontology.
- Parameters:
field (
str
|DeferredAttribute
|None
, default:None
) – Field to display on graphwith_children (
bool
, default:False
) – Whether to also show children.distance (
int
, default:5
) – Maximum distance still shown.
Ontological hierarchies:
ULabel
(project & sub-project),CellType
(cell type & subtype).Examples
>>> import bionty as bt >>> bt.Tissue.from_source(name="subsegmental bronchus").save() >>> record = bt.Tissue.get(name="respiratory tube") >>> record.view_parents() >>> tissue.view_parents(with_children=True)