SMS Lora Training

October 5, 2023

Loading your life into an AI

Using your history of text messages from iPhone (or Android, with a few tweaks), you can create a Lora file to augment AI models, making them sound more like you. To start, back up your iPhone to your computer using either a cable and Finder or iCloud.

Locating the Backup

If you’re trying to locate your device’s backup on your computer, follow the path specific to your operating system:

For Mac:

Path: ~/Library/Application Support/MobileSync/Backup/

For Windows:

Path: C:\\Users\\[YourUsername]\\AppData\\Roaming\\Apple Computer\\MobileSync\\Backup\\

Note: Within the backup, text messages can be found in the file named 3d0d7e5fb2ce288813306e4d4636395e047a3d28.

Transforming the Data

To train the model, we need the data in a specific JSON format. Official documentation can be found here

I created a quick script to transform this data into a Lora Dataset, which is available on my GitHub.

text-to-train

import sqlite3
import json
import re

# Function to clean the text and remove special characters
def clean_text(text):
    return re.sub(r'[^\w\s]', '', text)

# Connect to the SQLite database
conn = sqlite3.connect('3d0d7e5fb2ce288813306e4d4636395e047a3d28')
cursor = conn.cursor()

# Query the message table
cursor.execute("SELECT is_from_me, text FROM message order by handle_id, date")
rows = cursor.fetchall()

# Process the rows into alpaca-chatbot-format.json format
data_entries = []
buffered_messages = []
buffered_responses = []

for row in rows:
    message_text = row[1]
    if message_text is None:  # skip None messages
        continue

    if row[0] == 0:  # message from friends
        if buffered_responses:  # check if there are buffered responses from the previous set
            combined_responses = clean_text(" ".join(buffered_responses))
            combined_messages = clean_text(" ".join(buffered_messages))

            # Only add if both are present and not just whitespace
            if combined_messages.strip() and combined_responses.strip():
                # Create a dictionary entry with the required format
                data_entries.append({
                    "instruction": combined_messages,
                    "output": combined_responses
                })

            buffered_responses = []  # clear the buffered responses
            buffered_messages = []  # clear the buffered messages

        buffered_messages.append(message_text)
    else:  # your message
        buffered_responses.append(message_text)

# I doubt the last message is important enough to repeat all that code

# Convert the list to JSON and write to a file
with open('training_data.json', 'w') as file:
    json.dump(data_entries, file, indent=4)

# Close the database connection
conn.close()

Load the Model

Launch text-generation-webui.
Download the model TheBloke/Llama-2-13B-chat-GPTQ or any other of your choice.
Load the chosen model using transformers (note: I had to disable exllama).

Train the Model

Copy training_data.json from text-to-train to ~/Code/text-generation-webui/loras.
Name your Lora file appropriately.
Select alpaca-chatbot-format for Data Format.
Choose training_data for Dataset.
Start the training.

Load Lora and Test

Load Lora on the Model page.
Begin chatting to see the results!

Share on

Twitter Facebook LinkedIn

Joshua Pfaendler