Example usage

To use okridge in a project:

Import Necessary packages

import os
import numpy as np
from okridge.tree import BNBTree
from okridge.utils import download_file_from_google_drive
from pathlib import Path

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[1], line 2
      1 import os
----> 2 import numpy as np
      3 from okridge.tree import BNBTree
      4 from okridge.utils import download_file_from_google_drive

ModuleNotFoundError: No module named 'numpy'

Download Sample Synthetic Data

data_file_path = "../tests/Synthetic_n=6000_p=3000_k=10_rho=0.5_snr=5.0_seed=0.npy"

if not os.path.isfile(data_file_path):
    download_file_from_google_drive('1lizlnufRBmEzMNpr0OlgE-P7otC8opkX', data_file_path)

loaded_data = np.load(data_file_path, allow_pickle=True)
X, y = loaded_data.item().get("X"), loaded_data.item().get("y")

print("Shape of feature matrix is", X.shape)
print("There are {} number of samples".format(len(y)))

Shape of feature matrix is (6000, 3000)
There are 6000 number of samples

Apply OKRidge Software

k = 10 # cardinality constraint
lambda2 = 0.1 # l2 regularization parameter
gap_tol = 1e-4 # optimality gap tolerance
verbose = True # print out the progress
time_limit = 180 # time limit in seconds

BnB_optimizer = BNBTree(X=X, y=y, lambda2=lambda2)

upper_bound, betas, optimality_gap, max_lower_bound, running_time = BnB_optimizer.solve(k = k, gap_tol = gap_tol, verbose = verbose, time_limit = time_limit)

Using max memory (300 GB)
using breadth-first search
'l' -> level(depth) of BnB tree,  'd' -> best dual bound,  'u' -> best upper(primal) bound,  'g' -> optimiality gap,  't' -> time
l: 0,    d: -659444.6478942606,    u: -657798.0645024217,    g: 0.0025031746,  t: 2.66743 s
l: 1,    d: -659387.8270570670,    u: -657798.0645024217,    g: 0.0024167942,  t: 5.36963 s
l: 2,    d: -659317.8396916100,    u: -657798.0645024217,    g: 0.0023103978,  t: 8.06236 s
l: 3,    d: -659250.2763933846,    u: -657798.0645024217,    g: 0.0022076865,  t: 10.70843 s
l: 4,    d: -659169.0430832551,    u: -657798.0645024217,    g: 0.0020841937,  t: 13.27604 s
l: 5,    d: -659066.9599284586,    u: -657798.0645024217,    g: 0.0019290045,  t: 15.77262 s
l: 6,    d: -658941.7627230188,    u: -657798.0645024217,    g: 0.0017386768,  t: 18.27392 s
l: 7,    d: -658798.4741379283,    u: -657798.0645024217,    g: 0.0015208461,  t: 20.69671 s
l: 8,    d: -658604.8032165651,    u: -657798.0645024217,    g: 0.0012264231,  t: 23.12765 s
l: 9,    d: -658331.2630668270,    u: -657798.0645024217,    g: 0.0008105809,  t: 25.51869 s

print("Loss of best solution is", upper_bound)
print("Best lower bound is", max_lower_bound)
print("indices of nonzero coefficients are", np.where(betas != 0)[0])
print("Optimality gap is {}%".format(optimality_gap * 100))
print("Running time is {} seconds".format(running_time))

Loss of best solution is -657798.0645024217
Best lower bound is -657798.0645024217
indices of nonzero coefficients are [   0  300  600  900 1200 1500 1800 2100 2400 2700]
Optimality gap is 0.0%
Running time is 27.08906078338623 seconds