API Reference

Droplet-Film Model Development Project — Technical Documentation.

Overview

This document provides technical documentation for classes, methods, and parameters in the DFT Development project. The API is designed to be both powerful and user-friendly, supporting research and industrial applications.

The project follows object-oriented design with clear separation between physics modeling, data management, and machine learning components.

Core Classes and Modules

The project consists of several key modules:

  • dft_model.py: Core physics model implementation

  • utils.py: Data management and utility functions

  • Individual Jupyter notebooks for different approaches

DFT Class — Core Physics Model

The DFT class implements the Droplet-Film Model for predicting critical flow rates in gas wells.

Class definition

class DFT:
    """
    Droplet-Film Model for predicting critical flow rates in gas wells.

    This class implements a physics-informed machine learning approach that combines
    fundamental fluid dynamics principles with data-driven optimization to predict
    when gas wells will experience liquid loading.
    """

Constructor

__init__(self, seed=42, feature_tol=1.0, dev_tol=1e-3, multiple_dev_policy="max")

Parameters:

  • seed (int): Random seed for reproducibility. Default: 42

  • feature_tol (float): Feature distance threshold for matching. Default: 1.0

  • dev_tol (float): Deviation tolerance for angle matching. Default: 1e-3

  • multiple_dev_policy (str): Policy for handling multiple matches. Options: “max”, “min”, “mean”, “median”. Default: “max”

Attributes: seed, feature_tol, dev_tol, multiple_dev_policy, opt_params (set after fitting), n_train (set after fitting).

Methods — fit

fit(self, X, y)

Train the DFT model on provided data.

Parameters: X (np.ndarray) shape (n_samples, 10), y (np.ndarray) shape (n_samples,). Returns: self.

Features (in order): Dia, Dev(deg), Area (m2), z, GasDens, LiquidDens, g (m/s2), P/T, friction_factor, critical_film_thickness.

Implementation: Uses Powell optimization from scipy.optimize; optimizes 5 global parameters (p1–p5) plus alpha per sample; bounds alpha in [0, 1]; max 5000 iterations, 10000 function calls.

Methods — predict

predict(self, X, dev_train=None, alpha_strategy='enhanced_dev_based')

Make predictions on new data.

Parameters: X (np.ndarray), optional dev_train, alpha_strategy (must be ‘enhanced_dev_based’). Returns: np.ndarray of shape (n_samples,).

Alpha assignment strategy (by well deviation angle):

  1. Dev < 10°: Regular deviation-based matching

    • Find training samples within dev_tol

    • Apply multiple_dev_policy if multiple matches

    • Use mean training alpha if no matches

  2. 10° ≤ Dev < 20°: Minimum alpha strategy

    • Find training samples within dev_tol

    • Use minimum alpha among matches

    • Use mean training alpha if no matches

  3. Dev ≥ 20°: Full-feature matching

    • Compute Euclidean distance to all training samples

    • Use closest sample’s alpha if distance < feature_tol

    • Use mean training alpha otherwise

Methods — _eq (physics equation)

_eq(self, params, X)

Compute predicted values using the physics equation.

Physics equation:

\[\begin{split}Q_{cr} = p_1 \\sqrt{\\left| \\mathrm{term}_1 \\cdot \\alpha + (1-\\alpha) \\cdot \\mathrm{term}_2 \\right| \\cdot (1/z) \\cdot (P/T) }\end{split}\]

Where:

  • term1 involves \(2 g \\, \\mathrm{Dia}\), \((\\rho_l - \\rho_g)\), \(\\cos(\\mathrm{Dev})\), and parameters p4.

  • term2 involves \(|\\sin(p_5 \\cdot \\mathrm{Dev})|^{p_3}\) and \((\\rho_l - \\rho_g)^{p_2} / \\rho_g^2\).

Methods — _loss

_loss(self, params)

Compute loss function for optimization. Returns: float (MSE).

Helm Class — Data Management

The Helm class (in models.utils) handles dataset loading, train/test splitting, scaling, and model training/evaluation. The CSV must include the feature columns, plus Qcr (target), Gasflowrate, and Test status.

Class definition

from models.utils import Helm

Constructor

__init__(self, path, seed=42, drop_cols=None, includ_cols=None, test_size=0.20, scale=True)

Parameters: path (str), seed (int), drop_cols, includ_cols (lists), test_size (float, default 0.20), scale (bool, default True).

Attributes (set after initialization): X_train, X_test, y_train, y_test (numpy arrays). If scale=True, use X_train_rdy, X_test_rdy (and optionally y_train_rdy, y_test_rdy) for scaled data. Also: feature_names, scaler_X, scaler_y (when scale=True).

Methods — evolv_model

evolv_model(self, build_model, hparam_grid, k_folds=5)

Train model with hyperparameter optimization. Returns: best trained model. Performs grid search and k-fold cross-validation; stores predictions and metrics.

QLatticeWrapper Class — Symbolic Regression

Wrapper for Feyn QLattice for automated symbolic regression with a scikit-learn compatible interface.

Constructor

__init__(self, feature_tags, output_tag="Qcr", seed=42, max_complexity=10, n_epochs=10, criterion="bic")

Parameters: feature_tags (List[str]), output_tag, seed, max_complexity, n_epochs, criterion (“bic”, “aic”, “r2”).

Methods: fit(X, y), predict(X), express() (returns SymPy expression).

Data Format Requirements

Input CSV must contain the 10 features (Dia, Dev(deg), Area (m2), z, GasDens, LiquidDens, g (m/s2), P/T, friction_factor, critical_film_thickness) plus Qcr, Gasflowrate, and Test status for Helm.

Data Validation, Error Handling, Performance

Helm loads CSV and splits by stratify=loading (Test status). The API includes error handling for invalid inputs, missing columns, optimization failures, and (for QLattice) network issues.

Support and Resources

GitHub repository, documentation, community forums, issue tracker, and research papers. For examples, see Usage Examples and the Jupyter notebooks.