SMS Lora Training
Loading your life into an AI
Using your history of text messages from iPhone (or Android, with a few tweaks), you can create a Lora file to augment AI models, making them sound more like you. To start, back up your iPhone to your computer using either a cable and Finder or iCloud.
Locating the Backup
If you’re trying to locate your device’s backup on your computer, follow the path specific to your operating system:
For Mac:
- Path:
~/Library/Application Support/MobileSync/Backup/
For Windows:
- Path:
C:\\Users\\[YourUsername]\\AppData\\Roaming\\Apple Computer\\MobileSync\\Backup\\
Note: Within the backup, text messages can be found in the file named
3d0d7e5fb2ce288813306e4d4636395e047a3d28
.
Transforming the Data
To train the model, we need the data in a specific JSON format. Official documentation can be found here
I created a quick script to transform this data into a Lora Dataset, which is available on my GitHub.
import sqlite3
import json
import re
# Function to clean the text and remove special characters
def clean_text(text):
return re.sub(r'[^\w\s]', '', text)
# Connect to the SQLite database
conn = sqlite3.connect('3d0d7e5fb2ce288813306e4d4636395e047a3d28')
cursor = conn.cursor()
# Query the message table
cursor.execute("SELECT is_from_me, text FROM message order by handle_id, date")
rows = cursor.fetchall()
# Process the rows into alpaca-chatbot-format.json format
data_entries = []
buffered_messages = []
buffered_responses = []
for row in rows:
message_text = row[1]
if message_text is None: # skip None messages
continue
if row[0] == 0: # message from friends
if buffered_responses: # check if there are buffered responses from the previous set
combined_responses = clean_text(" ".join(buffered_responses))
combined_messages = clean_text(" ".join(buffered_messages))
# Only add if both are present and not just whitespace
if combined_messages.strip() and combined_responses.strip():
# Create a dictionary entry with the required format
data_entries.append({
"instruction": combined_messages,
"output": combined_responses
})
buffered_responses = [] # clear the buffered responses
buffered_messages = [] # clear the buffered messages
buffered_messages.append(message_text)
else: # your message
buffered_responses.append(message_text)
# I doubt the last message is important enough to repeat all that code
# Convert the list to JSON and write to a file
with open('training_data.json', 'w') as file:
json.dump(data_entries, file, indent=4)
# Close the database connection
conn.close()
Load the Model
- Launch text-generation-webui.
- Download the model
TheBloke/Llama-2-13B-chat-GPTQ
or any other of your choice. - Load the chosen model using transformers (note: I had to disable exllama).
Train the Model
- Copy
training_data.json
fromtext-to-train
to~/Code/text-generation-webui/loras
. - Name your Lora file appropriately.
- Select
alpaca-chatbot-format
for Data Format. - Choose
training_data
for Dataset. - Start the training.
Load Lora and Test
- Load Lora on the Model page.
- Begin chatting to see the results!