API Reference

Droplet-Film Model Development Project — Technical Documentation.

Overview

This document provides technical documentation for classes, methods, and parameters in the DFT Development project. The API is designed to be both powerful and user-friendly, supporting research and industrial applications.

The project follows object-oriented design with clear separation between physics modeling, data management, and machine learning components.

Core Classes and Modules

The project consists of several key modules:

dft_model.py: Core physics model implementation
utils.py: Data management and utility functions
Individual Jupyter notebooks for different approaches

DFT Class — Core Physics Model

The DFT class implements the Droplet-Film Model for predicting critical flow rates in gas wells.

Class definition

class DFT:
    """
    Droplet-Film Model for predicting critical flow rates in gas wells.

    This class implements a physics-informed machine learning approach that combines
    fundamental fluid dynamics principles with data-driven optimization to predict
    when gas wells will experience liquid loading.
    """

Constructor

__init__(self, seed=42, feature_tol=1.0, dev_tol=1e-3, multiple_dev_policy="max")

Parameters:

seed (int): Random seed for reproducibility. Default: 42
feature_tol (float): Feature distance threshold for matching. Default: 1.0
dev_tol (float): Deviation tolerance for angle matching. Default: 1e-3
multiple_dev_policy (str): Policy for handling multiple matches. Options: “max”, “min”, “mean”, “median”. Default: “max”

Attributes: seed, feature_tol, dev_tol, multiple_dev_policy, opt_params (set after fitting), n_train (set after fitting).

Methods — fit

fit(self, X, y)

Train the DFT model on provided data.

Parameters: X (np.ndarray) shape (n_samples, 10), y (np.ndarray) shape (n_samples,). Returns: self.

Features (in order): Dia, Dev(deg), Area (m2), z, GasDens, LiquidDens, g (m/s2), P/T, friction_factor, critical_film_thickness.

Implementation: Uses Powell optimization from scipy.optimize; optimizes 5 global parameters (p1–p5) plus alpha per sample; bounds alpha in [0, 1]; max 5000 iterations, 10000 function calls.

Methods — predict

predict(self, X, dev_train=None, alpha_strategy='enhanced_dev_based')

Make predictions on new data.

Parameters: X (np.ndarray), optional dev_train, alpha_strategy (must be ‘enhanced_dev_based’). Returns: np.ndarray of shape (n_samples,).

Alpha assignment strategy (by well deviation angle):

Dev < 10°: Regular deviation-based matching
- Find training samples within dev_tol
- Apply multiple_dev_policy if multiple matches
- Use mean training alpha if no matches
10° ≤ Dev < 20°: Minimum alpha strategy
- Find training samples within dev_tol
- Use minimum alpha among matches
- Use mean training alpha if no matches
Dev ≥ 20°: Full-feature matching
- Compute Euclidean distance to all training samples
- Use closest sample’s alpha if distance < feature_tol
- Use mean training alpha otherwise

Methods — _eq (physics equation)

_eq(self, params, X)

Compute predicted values using the physics equation.

Physics equation:

\[\begin{split}Q_{cr} = p_1 \\sqrt{\\left| \\mathrm{term}_1 \\cdot \\alpha + (1-\\alpha) \\cdot \\mathrm{term}_2 \\right| \\cdot (1/z) \\cdot (P/T) }\end{split}\]

Where:

term1 involves \(2 g \\, \\mathrm{Dia}\), \((\\rho_l - \\rho_g)\), \(\\cos(\\mathrm{Dev})\), and parameters p4.
term2 involves \(|\\sin(p_5 \\cdot \\mathrm{Dev})|^{p_3}\) and \((\\rho_l - \\rho_g)^{p_2} / \\rho_g^2\).

Methods — _loss

_loss(self, params)

Compute loss function for optimization. Returns: float (MSE).

Helm Class — Data Management

The Helm class (in models.utils) handles dataset loading, train/test splitting, scaling, and model training/evaluation. The CSV must include the feature columns, plus Qcr (target), Gasflowrate, and Test status.

Class definition

from models.utils import Helm

Constructor

__init__(self, path, seed=42, drop_cols=None, includ_cols=None, test_size=0.20, scale=True)

Parameters: path (str), seed (int), drop_cols, includ_cols (lists), test_size (float, default 0.20), scale (bool, default True).

Attributes (set after initialization): X_train, X_test, y_train, y_test (numpy arrays). If scale=True, use X_train_rdy, X_test_rdy (and optionally y_train_rdy, y_test_rdy) for scaled data. Also: feature_names, scaler_X, scaler_y (when scale=True).

Methods — evolv_model

evolv_model(self, build_model, hparam_grid, k_folds=5)

Train model with hyperparameter optimization. Returns: best trained model. Performs grid search and k-fold cross-validation; stores predictions and metrics.

QLatticeWrapper Class — Symbolic Regression

Wrapper for Feyn QLattice for automated symbolic regression with a scikit-learn compatible interface.

Constructor

__init__(self, feature_tags, output_tag="Qcr", seed=42, max_complexity=10, n_epochs=10, criterion="bic")

Parameters: feature_tags (List[str]), output_tag, seed, max_complexity, n_epochs, criterion (“bic”, “aic”, “r2”).

Methods: fit(X, y), predict(X), express() (returns SymPy expression).

Data Format Requirements

Input CSV must contain the 10 features (Dia, Dev(deg), Area (m2), z, GasDens, LiquidDens, g (m/s2), P/T, friction_factor, critical_film_thickness) plus Qcr, Gasflowrate, and Test status for Helm.

Data Validation, Error Handling, Performance

Helm loads CSV and splits by stratify=loading (Test status). The API includes error handling for invalid inputs, missing columns, optimization failures, and (for QLattice) network issues.

Support and Resources

GitHub repository, documentation, community forums, issue tracker, and research papers. For examples, see Usage Examples and the Jupyter notebooks.