Taking multiple results from a database randomly at once

Suppose that you log some data at regular intervals in a database, perhaps from an automated service. You know some of the keywords that are stored in the process, but would also like to find when the records for particular items were stored.

This can be slightly problematic as relational databases are not very good at random access and subqueries or unions may not be very fast to execute. If the result set we expect is relatively small, we could retrieve it all at once and then iterate over it, using a filter for the data we need. This will work fine, but for a large dataset, such iteration could take some time, when we cannot stop our search early.

We could eventually use the prior information that we have to take multiple items from our result set randomly, all at once. Here is a small example to illustrate this. We insert a set of fruits in a database and attach different datetime in the past to each one. At some later point in time we decide that we need the datetime related to the fruits guava, apple, tangerine and apricots (see fruits_to_examine). We retrieve only the datatimes, find the indexes of the fruits to examine and use them to take all datetimes at once. This way we avoid iterating over rows that do not contain our fruits.

import sqlite3 import numpy as np from random import randint from datetime import datetime, timedelta conn = sqlite3.connect('fruits.db') cur = conn.cursor() fruits = [ 'orange', 'banana', 'grapes', 'pears', 'plums', 'lemon', 'avocado', 'papaya', 'pineapple', 'apple', 'cherry', 'apricots', 'blueberries', 'guava', 'grapefruit', 'kiwifruit', 'lychee', 'mandarin', 'mango', 'nectarine', 'pomegranate', 'raspberry', 'strawberries', 'tangerine', 'watermelon', 'melon' ] fruits_to_examine = ['guava', 'apple', 'tangerine', 'apricots'] dt = datetime.now() dates = [] for _ in range(len(fruits)): diff = randint(1,30) delta = timedelta(days=diff) dt -= delta dates.append(dt) # cur.execute('DROP TABLE fruits') cur.execute(""" CREATE TABLE IF NOT EXISTS fruits ( fruitID INT PRIMARY KEY, datetime TEXT, fruit TEXT )""" ) for i, d in enumerate(dates): cur.execute(""" INSERT INTO fruits (fruitID, datetime, fruit) VALUES(?, ?, ?)""", (i+1, d, fruits[i]) ) res = cur.execute('select datetime from fruits').fetchall() conn.close() fruits, res = np.array(fruits), np.array(res) indices = np.where([fruits==fruit for fruit in fruits_to_examine])[1] print(np.take(res, indices))

This gives us the following dates:

'2016-09-02 23:49:35.413437' '2016-09-25 23:49:35.413437' '2016-04-02 23:49:35.413437' '2016-09-14 23:49:35.413437'