Models management

Prepare a model for Firefox

Models that can be used with Firefox should have ONNX weights at different quantization levels.

In order to make sure we are compatible with Transformers.js, we use the conversion script provided by that project, which checks that the model arhitecture will work and has been tested.

To do this, follow these steps:

  • make sure your model is published in Hugging Face with PyTorch or SafeTensor weights.

  • clone https://github.com/xenova/transformers.js and checkout branch v3

  • go into scripts/

  • create a virtualenv there and install requirements from the local requirements.txt file

Then you can run:

python convert.py --model_id organizationId/modelId --quantize --modes fp16 q8 q4 --task the-inference-task

You will get a new directory in models/organizationId/modelId that includes an onnx directory and other files. Upload everything into Hugging Face.

Congratulations! you have a Firefox-compatible model. You can now try it in about:inference.

Notice that for the encoder-decoder models with two files, you may need to rename decoder_model_quantized.onnx to decoder_model_merged_quantized.onnx, and make similar changes to the fp16, q4 versions. You do not need to rename the encoder models.

Lifecycle

When Firefox uses a model, it will

  1. read metadata stored in Remote Settings

  2. download model files from our hub

  3. store the files in IndexDB

1. Remote Settings

We have two collections in Remote Settings:

  • ml-onnx-runtime: provides all the WASM files we need to run the inference platform.

  • ml-inference-options: provides for each taskId a list of running options, such as the modelId.

Running the inference API will download the WASM files if needed, and then see if there’s an entry for the task in ml-inference-options, to grab the options. That allows us to set the default running options for each task.

This is also how we can update a model without changing Firefox’s code: setting a new revision for a model in Remote Settings will trigger a new download for our users.

Records in ml-inference-options are uniquely identified by featureId. When not provided, it falls back to taskName.

2. Model Hub

Our Model hub follows the same structure than Hugging Face, each file for a model is under a unique URL:

https://model-hub.mozilla.org/<organization>/<model>/<revision>/<path>

Where: - organization and name are the model id. example “ mozilla/distivit” - revision is the branch or version - path is the path to the file.

Model files downloaded from the hub are stored in IndexDB so users don’t need to download them again.

Model files

Models consists of several files like its configuration, tokenizer, training metadata, and weights.

Below are the most common files you’ll encounter:

1. Model Weights

  • pytorch_model.bin: Contains the model’s weights for PyTorch models. It is a serialized file that holds the parameters of the neural network.

  • tf_model.h5: TensorFlow’s version of the model weights.

  • flax_model.msgpack: For models built with the Flax framework, this file contains the model weights in a format used by JAX and Flax.

  • onnx: A subdirectory containing ONNX weights files in different quantization levels. They are the one our platform uses

2. Model Configuration

The config.json file contains all the necessary configurations for the model architecture, such as the number of layers, hidden units, attention heads, activation functions, and more. This allows the Hugging Face library to reconstruct the model exactly as it was defined.

3. Tokenizer Files

  • vocab.txt or vocab.json: Vocabulary files that map tokens (words, subwords, or characters) to IDs. Different tokenizers (BERT, GPT-2, etc.) will have different formats.

  • tokenizer.json: Stores the full tokenizer configuration and mappings.

  • tokenizer_config.json: This file contains settings that are specific to the tokenizer used by the model, such as whether it is case-sensitive or the special tokens it uses (e.g., [CLS], [SEP], etc.).

4. Preprocessing Files

  • special_tokens_map.json: Maps the special tokens (like padding, CLS, SEP, etc.) to the token IDs used by the tokenizer.

  • added_tokens.json: If any additional tokens were added beyond the original vocabulary (like custom tokens or domain-specific tokens), they are stored in this file.

5. Training Metadata

  • training_args.bin: Contains the arguments that were used during training, such as learning rates, batch size, and other hyperparameters. This file allows for easier replication of the training process.

  • trainer_state.json: Captures the state of the trainer, such as epoch information and optimizer state, which can be useful for resuming training.

  • optimizer.pt: Stores the optimizer’s state for PyTorch models, allowing for a resumption of training from where it left off.

6. Model Card

README.md or model_card.json. The model card provides documentation about the model, including details about its intended use, training data, performance metrics, ethical considerations, and any limitations. This can either be a README.md or structured as a model_card.json.

7. Tokenization and Feature Extraction Files

  • merges.txt: For byte pair encoding (BPE) tokenizers, this file contains the merge operations used to split words into subwords.

  • preprocessor_config.json: Contains configuration details for any pre-processing or feature extraction steps applied to the input before passing it to the model.