(Free) How Non-Engineers Can Sigodeki with Huge Data Analysis

Add this entry to Hatena Bookmarks
Big CSV to pass

Scenes from a large CSV

Need to work with CSV files that are too large to open in ExcelHave you ever come across such a situation?

For example,

  • Sales data in the hundreds of thousands to millions of lines

  • Survey results and log data

  • Report CSV spit out from the business system

...Cases where you want to look at this kind of data and consider or analyze it a little. 

I think it's a very common case if you work in the corporate world 📓.

How big are these files? I would say

for example

A company with annual sales of 500 million yen and a unit sales price of about 10,000 yen

...then there are 50,000 lines of basic sales data alone.

For 5 years, 250,000 lines.

When this is combined with data on product types, discounts, and other measures, the total number of rows can quickly grow to more than 1 million!

Naturally, if sales are large, such as 5 billion yen or 50 billion yen...

Data will increase proportionally.

That's just one example, but it's someone who works in a company,It is a rather common case that you want to analyze data from millions of rows of data.I think that's what it means.

DB icon

CSV file, not database.

In a company of a certain size, there is usually an IT department and an engineer, and the engineer may be someone who works with databases on a daily basis,

As an engineer, it is normal to deal with millions of rows of dataabout.

When analyzing data within the IT department,Set up and work on a database for this purpose.I know you do.

However, it is often the case that the people in a company who want to do data analysis in the first person are in the marketing department, product department, sales department, human resources department, etc. ・・・・

Sometimes they're not in the IT department.

And,

suchNo database for data analysis for non-IT departmentsI'm sure there are many companies that have

The alual in such cases is,IT department only gives me CSV.This is a pattern called

IT department engineer: "I've pulled 5 million rows of CSVs from the database, and you can do whatever analysis you want with them."

Engineer and marketer

He looks at the huge file he handed me and says, "Hmmm...

...I think there are quite a few phases like this.

What to do in such a situation

If possible, I wish I could analyze it in Excel, which I am familiar with, but unfortunately Excel is not.When the number of lines exceeds several tens of thousands (depending on the performance of the computer), the operation becomes very slow and tiring.

Also, if the number of lines exceeds 1,040,000, it cannot be opened in the normal way

The Excel Power Query method can be used to open the data, but it does not avoid the fact that the analysis will take time,Heavy operation and risk of crashes can be a bottleneck

However, uploading to some unknown cloud is not a security measure.

Of course, it's hard for me to say, "Well, I'll set up a database in-house," like an engineer.

Similarly, "Let's implement a BI tool such as Tableau or Power BI" is a good suggestion for a future plan, but not something you can do today or tomorrow on your own.

I need to analyze it today and I'm stuck...

We want to do it.I just want to handle large data quickly on my local PC! (If there were just a few more lines, I could do it in a snap with Excel...!)something like ...

 

Therefore, in this issue.DBBrowser for SQLiteto and AI (Chat GPT, CoPilot, Gemini, Claude.... Any AI that can be consulted in a chat format) approaches to


 

Why DB Browser for SQLite?

DB Browser fro SQLite is SQLite is an ultra-lightweight database with GUI operation and has the following features

  • ✅ fully localNo net connection or server required

  • ✅ High capacity OKLightweight operation even with hundreds of thousands to millions of lines

  • ✅ Free and easy to install

All of the following can be operated with a mouse

  • Data import (CSV → table)

  • Extraction by conditions (search filter)

  • Editing and deleting

  • Export results (e.g., re-CSV)

Only when we do the final analysis, we use a language called SQL, but this can be solved in an instant with the support of AI.

  • Analyze using SQL → Solve using AI

 

 

Operation Procedure (Preparation of DB Browser for SQLite)

1. download and install the tool

Let's download it first!

For Windows users, choose "DB.Browser.for.SQLite-*.win64.exe".

The screen images from here on out will be done on a Mac.

 Homebrew is a command line installation from the terminal

Here we will use the installer (.dmg file on Mac)

When you open the dmg file, you will see a screen like this, so copy it to Applications by drag & drop

 

Double-click DB Browser for SQLite in the Applications folder to open it

 

 You may get a screen like this, but it will open.
 DB Browser for SQLite has been opened. If you want to configure Japanese settings, please click  You can configure it by going to DB Browser for SQLite → Preferences
 Now ready for use

Create a new database file

 

Choose File→New Database
Choose where to save the database

 

Next is the table definition screen.
 

Import CSV files as tables

After this, import the CSV file so that it can be handled as a database (table)
A table is data in table form

File→Import→CSV File to Table

 

 

In this case, we have prepared two CSV files as samples
This is a sample, so it is 900 rows, but it works with CSVs more than 100 times larger than this.
  1. sales detail data
  2. Merchandise Cost Master

The sales detail data is an image of the data in the POS data or sales management system, with columns like this inside

  • SalesID (Serial number of the sales invoice)
  • SalesDate
  • ProductID (product ID of the sold product)
  • Category (Category of sold products)
  • Gender (whether the product sold is men's or women's)
  • Size (size of item sold)
  • SalesValue
The product cost master is an image of the master that manages the product cost of each product and contains columns such as
  • ProductID (Product ID)
  • ProductCost (the cost amount of the item)
This is a completely fictitious sample, but I am sure similar examples are common. 
If you open it in Excel, you will see an image like this

 

Now let's import the CSV

Select CSV file and "Open
You should see a preview similar to the following

In this example, the 6th column is garbled because the character encoding of the Japanese part of the CSV file (Shift JIS) is different from the character encoding specified when opening the file (default is UTF-8). The problem can be solved by adjusting the character encoding to the actual character encoding of the file (in this case, Shift JIS).

Set encoding to "Other

If you select "Other," another field will appear.

Since this field is a blank text field, you may think, "Oh, what am I supposed to do with this?

(In fact, this is where I got hooked myself.)

The system is such that if you enter the appropriate letter, a list of candidates will be displayed.

Let's enter "S", this time specifying Shift_JIS.

Japanese reads correctly.

Check "First row as column name".

Column names are column names (headers)

It will consider the contents of the first line of the CSV as the column name.

Press OK to close the screen.

You can see it added under "Tables" in the database structure.

Another CSV was imported in the same way

I see another table lined up.

Now the data is ready.

Now you can analyze data using the "SQL" specification method (language)

Analysis using SQL (made by AI)!

I'm assuming you are a non-engineer, so you might think, "I can't write SQL.

Don't worry, this is an area where AI excels.

Write down in words what you want to do and let the AI create SQL for you!

Open an AI such as Chat GPT and instruct it to do the following

				
					There is a table called "Sales Detail Data" with the following columns
which is composed of the following columns
Create SQL that sums and lists the SalesValue (sales amount)
and list them by SalesDate (date of sale).

SalesID
SalesDate
ProductID
Category
Gender
Size
SalesValue</xmp
				
			

The answer came back like this

				
					SELECT
    SalesDate,.
    SUM(SalesValue) AS TotalSalesValue
FROM
    Sales Detail Data
GROUP BY
    SalesDate
ORDER BY
    SalesDate;.
				
			

Open the "Execute SQL" tab of DB Browser for SQLite and copy and paste this part of the SQL statement

Then, press the button in the shape of ▶ in the menu to execute this SQL

The result is returned immediately (Sample is a group aggregation of 900 rows of data, 6 milliseconds = 0.006 seconds).

The result was in the mid-section of the screen.

Output results can be copied

Let me give you another example.

Now let's do a tally that can only be produced by merging two tables (two CSVs)

				
					It consists of ProductID, ProductCost columns.
There is a table named ProductCostMaster that consists of the columns

The ProductID in this ProductID and the ProductID in the sales detail data
the same ProductID in the sales invoice data.

Please write SQL to aggregate and list the total sales and total cost per Category column of the sales detail data
Please write SQL to output the total sales and total cost per Category column in the sales detail data as a list.
				
			

The answer was something like this

				
					SELECT
    s.Category,.
    SUM(s.SalesValue) AS TotalSalesValue,.
    SUM(c.ProductCost) AS TotalProductCost
FROM
    Sales detail data AS s
JOIN
    Product Cost Master AS c
    ON s.ProductID = c.ProductID
GROUP BY
    s.Category
ORDER BY
    s.Category;.
				
			

 

 Analysis Results.

Total sales and total cost of sales have been tabulated by category

You can also think about the analysis you want to do, have the AI write the SQL, run it through DB Browser for SQLite, and so on....

I think this will be a case for a pleasant collaboration with AI. I recommend it.

You will also be able to experience what engineers have been saying about how it feels to have an AI helping you with your programming and coding.

  summary

  • For safe and fast handling of large CSVs,DB Browser for SQLite + AIis a very viable option.

  • This is especially useful in situations where "cloud computing cannot be used for internal data" or "Excel will not open.

  • Can be handled by non-engineers with GUI operation only

  • Collaborate with AII enjoy doing analysis using SQL.

 
If you want the sample data (two CSVs) used in the above demo, please click here.

To download filesUser RegistrationOr log in

Idea switch!
created by Rinker.
Adopted by Softbank, NTT Data, Mitsui Fudosan, and more! Only 5 minutes to understand and 30 minutes to get ideas! A completely new way of thinking! The popular training program for companies and schools is now available in book form! Just by following the 4 steps, anyone can automatically turn on the "Idea Brain" and get a flood of new ideas!
Did you know? Actually, Shikoku is not rectangular. Neither is your life.

Did you know? Actually, Shikoku is not rectangular. Neither is your life.

Did you know? Actually, Shikoku is not a rectangle...
Learn from Barley Tea and Valentine's Day! Gutsy know-how on business branding that you can use right away.

Learn from Barley Tea and Valentine's Day! Gutsy know-how on business branding that you can use right away.

The Glory and Failure of a World-Class Company, "Barley Tea" and "Eels...