
Scenes from a large CSV
Need to work with CSV files that are too large to open in ExcelHave you ever come across such a situation?
For example,
Sales data in the hundreds of thousands to millions of lines
Survey results and log data
Report CSV spit out from the business system
...Cases where you want to look at this kind of data and consider or analyze it a little.
I think it's a very common case if you work in the corporate world 📓.
How big are these files? I would say
for example
A company with annual sales of 500 million yen and a unit sales price of about 10,000 yen
...then there are 50,000 lines of basic sales data alone.
For 5 years, 250,000 lines.
When this is combined with data on product types, discounts, and other measures, the total number of rows can quickly grow to more than 1 million!
Naturally, if sales are large, such as 5 billion yen or 50 billion yen...
Data will increase proportionally.
That's just one example, but it's someone who works in a company,It is a rather common case that you want to analyze data from millions of rows of data.I think that's what it means.
CSV file, not database.
In a company of a certain size, there is usually an IT department and an engineer, and the engineer may be someone who works with databases on a daily basis,
As an engineer, it is normal to deal with millions of rows of dataabout.
When analyzing data within the IT department,Set up and work on a database for this purpose.I know you do.
However, it is often the case that the people in a company who want to do data analysis in the first person are in the marketing department, product department, sales department, human resources department, etc. ・・・・
Sometimes they're not in the IT department.
And,
suchNo database for data analysis for non-IT departmentsI'm sure there are many companies that have
The alual in such cases is,IT department only gives me CSV.This is a pattern called
IT department engineer: "I've pulled 5 million rows of CSVs from the database, and you can do whatever analysis you want with them."
He looks at the huge file he handed me and says, "Hmmm...
...I think there are quite a few phases like this.
What to do in such a situation
If possible, I wish I could analyze it in Excel, which I am familiar with, but unfortunately Excel is not.When the number of lines exceeds several tens of thousands (depending on the performance of the computer), the operation becomes very slow and tiring.
Also, if the number of lines exceeds 1,040,000, it cannot be opened in the normal way
The Excel Power Query method can be used to open the data, but it does not avoid the fact that the analysis will take time,Heavy operation and risk of crashes can be a bottleneck
However, uploading to some unknown cloud is not a security measure.
Of course, it's hard for me to say, "Well, I'll set up a database in-house," like an engineer.
Similarly, "Let's implement a BI tool such as Tableau or Power BI" is a good suggestion for a future plan, but not something you can do today or tomorrow on your own.
I need to analyze it today and I'm stuck...
We want to do it.I just want to handle large data quickly on my local PC! (If there were just a few more lines, I could do it in a snap with Excel...!)something like ...
Therefore, in this issue.DBBrowser for SQLiteto and AI (Chat GPT, CoPilot, Gemini, Claude.... Any AI that can be consulted in a chat format) approaches to
Why DB Browser for SQLite?
DB Browser fro SQLite is SQLite is an ultra-lightweight database with GUI operation and has the following features
✅ fully localNo net connection or server required
✅ High capacity OKLightweight operation even with hundreds of thousands to millions of lines
✅ Free and easy to install
All of the following can be operated with a mouse
Data import (CSV → table)
Extraction by conditions (search filter)
Editing and deleting
Export results (e.g., re-CSV)
Only when we do the final analysis, we use a language called SQL, but this can be solved in an instant with the support of AI.
Analyze using SQL → Solve using AI
Operation Procedure (Preparation of DB Browser for SQLite)
1. download and install the tool
Let's download it first!
Official site:https://sqlitebrowser.org/
Windows / macOS / Linux all supported
For Windows users, choose "DB.Browser.for.SQLite-*.win64.exe".
The screen images from here on out will be done on a Mac.
Homebrew is a command line installation from the terminal
Here we will use the installer (.dmg file on Mac)
When you open the dmg file, you will see a screen like this, so copy it to Applications by drag & drop
Create a new database file
Import CSV files as tables
A table is data in table form
File→Import→CSV File to Table
This is a sample, so it is 900 rows, but it works with CSVs more than 100 times larger than this.
- sales detail data
- Merchandise Cost Master
The sales detail data is an image of the data in the POS data or sales management system, with columns like this inside
- SalesID (Serial number of the sales invoice)
- SalesDate
- ProductID (product ID of the sold product)
- Category (Category of sold products)
- Gender (whether the product sold is men's or women's)
- Size (size of item sold)
- SalesValue
- ProductID (Product ID)
- ProductCost (the cost amount of the item)
If you open it in Excel, you will see an image like this
Now let's import the CSV
Select CSV file and "Open
You should see a preview similar to the following
In this example, the 6th column is garbled because the character encoding of the Japanese part of the CSV file (Shift JIS) is different from the character encoding specified when opening the file (default is UTF-8). The problem can be solved by adjusting the character encoding to the actual character encoding of the file (in this case, Shift JIS).
Set encoding to "Other
If you select "Other," another field will appear.
Since this field is a blank text field, you may think, "Oh, what am I supposed to do with this?
(In fact, this is where I got hooked myself.)
The system is such that if you enter the appropriate letter, a list of candidates will be displayed.
Let's enter "S", this time specifying Shift_JIS.
Japanese reads correctly.
Check "First row as column name".
Column names are column names (headers)
It will consider the contents of the first line of the CSV as the column name.
Press OK to close the screen.
You can see it added under "Tables" in the database structure.
Another CSV was imported in the same way
I see another table lined up.
Now the data is ready.
Now you can analyze data using the "SQL" specification method (language)
Analysis using SQL (made by AI)!
Don't worry, this is an area where AI excels.
Write down in words what you want to do and let the AI create SQL for you!
Open an AI such as Chat GPT and instruct it to do the following
There is a table called "Sales Detail Data" with the following columns
which is composed of the following columns
Create SQL that sums and lists the SalesValue (sales amount)
and list them by SalesDate (date of sale).
SalesID
SalesDate
ProductID
Category
Gender
Size
SalesValue</xmp
The answer came back like this
SELECT
SalesDate,.
SUM(SalesValue) AS TotalSalesValue
FROM
Sales Detail Data
GROUP BY
SalesDate
ORDER BY
SalesDate;.
Open the "Execute SQL" tab of DB Browser for SQLite and copy and paste this part of the SQL statement
Then, press the button in the shape of ▶ in the menu to execute this SQL
The result is returned immediately (Sample is a group aggregation of 900 rows of data, 6 milliseconds = 0.006 seconds).
The result was in the mid-section of the screen.
Output results can be copied
Let me give you another example.
Now let's do a tally that can only be produced by merging two tables (two CSVs)
It consists of ProductID, ProductCost columns.
There is a table named ProductCostMaster that consists of the columns
The ProductID in this ProductID and the ProductID in the sales detail data
the same ProductID in the sales invoice data.
Please write SQL to aggregate and list the total sales and total cost per Category column of the sales detail data
Please write SQL to output the total sales and total cost per Category column in the sales detail data as a list.
The answer was something like this
SELECT
s.Category,.
SUM(s.SalesValue) AS TotalSalesValue,.
SUM(c.ProductCost) AS TotalProductCost
FROM
Sales detail data AS s
JOIN
Product Cost Master AS c
ON s.ProductID = c.ProductID
GROUP BY
s.Category
ORDER BY
s.Category;.
Analysis Results.
Total sales and total cost of sales have been tabulated by category
You can also think about the analysis you want to do, have the AI write the SQL, run it through DB Browser for SQLite, and so on....
I think this will be a case for a pleasant collaboration with AI. I recommend it.
You will also be able to experience what engineers have been saying about how it feels to have an AI helping you with your programming and coding.
summary
For safe and fast handling of large CSVs,DB Browser for SQLite + AIis a very viable option.
This is especially useful in situations where "cloud computing cannot be used for internal data" or "Excel will not open.
Can be handled by non-engineers with GUI operation only
Collaborate with AII enjoy doing analysis using SQL.
To download filesUser RegistrationOr log in
