Welcome to pytorch-lifestream docs
Library content
Here is a brief overview of library with links to the detailed descriptions.
Library modules:
-
ptls.preprocessing- transforms data toptls-compatible format withpandasorpyspark. Categorical encoding, datetime transformation, numerical feature preprocessing. -
ptls.data_load- all that you need for prepare your data to training and validation.ptls.data_load.datasets- PyTorchDatasetAPI implementation for data access.ptls.data_load.iterable_processing- generator-style filters for data transformation.ptls.data_load.augmentations- functions for data augmentation.
-
ptls.frames- tools for training encoders with popular frameworks like CoLES, SimCLR, CPC, VICReg, ...ptls.frames.coles- Contrastive leaning on sub-sequences.ptls.frames.cpc- Contrastive learning for future event state prediction.ptls.frames.bert- methods, inspired by NLP and transformer models.ptls.framed.supervised- modules fo supervised training.ptls.frames.inference- inference module.
-
ptls.nn- layers for model creation:ptls.nn.trx_encoder- layers to produce the representation for a single transactions.ptls.nn.seq_encoder- layers for sequence processing, likeRNNofTransformer.ptls.nn.pb-PaddedBatchcompatible layers, similar totorch.nnmodules, but works withptls-data.ptls.nn.head- composite layers for final embedding transformation.ptls.nn.seq_step.py- change the sequence along the time axis.ptls.nn.binarization,ptls.nn.normalization- other groups of layers.
How to guide
- Prepare your data.
- Use
Pysparkin local or cluster mode for big dataset andPandasfor small. - Split data into required parts (train, valid, test, ...).
- Use
ptls.preprocessingfor simple data preparation. - Transform features to compatible format using
PysparkorPandasfunctions. You can also useptls.data_load.preprocessingfor common data transformation patterns. - Split sequences to
ptls-dataformat withptls.data_load.split_tools. Save prepared data intoParquetformat or keep it in memory (Picklealso works). - Use one of the available
ptls.data_load.datasetsto define input for the models.
- Use
- Choose framework for encoder train.
- There are both supervised of unsupervised frameworks in
ptls.frames. - Keep in mind that each framework requires his own batch format. Tools for batch collate can be found in the selected framework package.
- There are both supervised of unsupervised frameworks in
- Build encoder.
- All parts are available in
ptls.nn. - You can also use pretrained layers.
- All parts are available in
- Train your encoder with selected framework and
pytorch_lightning.- Provide data with one of the DataLoaders that is compatible with selected framework.
- Monitor the progress on tensorboard.
- Optionally tune hyperparameters.
- Save trained encoder for future use.
- You can use it as single solution (e.g. get class label probabilities).
- Or it can be a pretrained part of other neural network.
- Use encoder in your project.
- Run predict for your data and get logits, probas, scores or embeddings.
- Use
ptls.data_loadandptls.data_load.datasetstools to keep your data transformation and collect batches for inference.
How to create your own components
It is possible create specific component for every library modules. Here are the links to the detailed description: