select searches records that are matched to specified conditions from a table and then outputs them.
select is the most important command in groonga. You need to understand select to use the full power of groonga.
select has many parameters. The required parameter is only table and others are optional:
select table
[match_columns=null]
[query=null]
[filter=null]
[scorer=null]
[sortby=null]
[output_columns="_id, _key, *"]
[offset=0]
[limit=10]
[drilldown=null]
[drilldown_sortby=null]
[drilldown_output_columns=null]
[drilldown_offset=0]
[drilldown_limit=10]
[cache=yes]
[match_escalation_threshold=0]
[query_expansion=null]
Let's learn about select usage with examples. This section shows many popular usages.
Here are a schema definition and sample data to show usage.
Execution example:
> table_create Entries TABLE_HASH_KEY ShortText
[[0,1335594537.43378,0.000381708145141602],true]
> column_create Entries content COLUMN_SCALAR Text
[[0,1335594537.63582,0.000931501388549805],true]
> table_create Terms TABLE_PAT_KEY|KEY_NORMALIZE ShortText --default_tokenizer TokenBigram
[[0,1335594537.83776,0.000288009643554688],true]
> column_create Terms entries_key_index COLUMN_INDEX|WITH_POSITION Entries _key
[[0,1335594538.03901,0.0059964656829834],true]
> column_create Terms entries_content_index COLUMN_INDEX|WITH_POSITION Entries content
[[0,1335594538.24584,0.00617790222167969],true]
> load --table Entries
> [
> {"_key": "The first post!", "content": "Welcome! This is my first post!"},
> {"_key": "Groonga", "content": "I started to use groonga. It's very fast!"}
> {"_key": "Mroonga", "content": "I also started to use mroonga. It's also very fast! Really fast!"}
> ]
There is a table, Entries, for blog entries. An entry has title and content. Title is key of Entries. Content is value of Entries.content column.
Entries._key column and Entries.content column are indexed using TokenBigram tokenizer. So both Entries._key and Entries.content are fulltext search ready.
OK. The schema and data for examples are ready.
Here is the most simple usage with the above shema and data. It outputs all records in Entries table.
Execution example:
> select Entries
[
[
0,
1335519592.67133,
0.000493049621582031
],
[
[
[
3
],
[
[
"_id",
"UInt32"
],
[
"_key",
"ShortText"
],
[
"content",
"Text"
]
],
[
1,
"The first post!",
"Welcome! This is my first post!"
],
[
2,
"Groonga",
"I started to use groonga. It's very fast!"
],
[
3,
"Mroonga",
"I also started to use mroonga. It's also very fast! Really fast!"
]
]
]
]
Why does the command output all records? There are two reasons. The first reason is that the command doesn't specify any search conditions. No search condition means all records are matched. The second reason is that the number of all records is 3. select command outputs 10 records at a maximum by default. There are only 3 records. It is less than 10. So the command outputs all records.
Search conditions are specified by query or filter. You can also specify both query and filter. It means that selected records must be matched against both query and filter.
query is designed for search box in Web page. Imagine a search box in google.com. You specify search conditions for query as space separated keywords. For example, search engine means a matched record should contain two words, search and engine.
Normally, query parameter is used for specifying fulltext search conditions. It can be used for non fulltext search conditions but filter is used for the propose.
query parameter is used with match_columns parameter when query parameter is used for specifying fulltext search conditions. match_columns specifies which columnes and indexes are matched against query.
Here is a simple query usage example.
Execution example:
> select Entries --match_columns content --query fast
[
[
0,
1335519592.87374,
0.000868797302246094
],
[
[
[
2
],
[
[
"_id",
"UInt32"
],
[
"_key",
"ShortText"
],
[
"content",
"Text"
]
],
[
2,
"Groonga",
"I started to use groonga. It's very fast!"
],
[
3,
"Mroonga",
"I also started to use mroonga. It's also very fast! Really fast!"
]
]
]
]
The select command searches records that contain a word fast in content column value from Entries table.
query has query syntax but its deatils aren't described here. See Query syntax for datails.
filter is designed for complex search conditions. You specify search conditions for filter as ECMAScript like syntax.
Here is a simple filter usage example.
Execution example:
> select Entries --filter 'content @ "fast" && _key == "Groonga"'
[
[
0,
1335520300.98383,
0.000544071197509766
],
[
[
[
1
],
[
[
"_id",
"UInt32"
],
[
"_key",
"ShortText"
],
[
"content",
"Text"
]
],
[
2,
"Groonga",
"I started to use groonga. It's very fast!"
]
]
]
]
The select command searches records that contain a word fast in content column value and has Groonga as _key from Entries table. There are three operators in the command, @, && and ==. @ is fulltext search operator. && and == are the same as ECMAScript. && is logical AND operator and == is equality operator.
filter has more operators and syntax like grouping by (...) its deatils aren't described here. See Script Syntax for datails.
You can specify range of outputted records by offset and limit. Here is an example to output only the 2nd record.
Execution example:
> select Entries --offset 1 --limit 1
[
[
0,
1335519593.28077,
0.000288248062133789
],
[
[
[
3
],
[
[
"_id",
"UInt32"
],
[
"_key",
"ShortText"
],
[
"content",
"Text"
]
],
[
2,
"Groonga",
"I started to use groonga. It's very fast!"
]
]
]
]
offset is zero-origin. --offset 1 means output range is started from the 2nd record.
limit specifies the max number of output records. --limit 1 means the number of output records is 1 at a maximium. If no records are matched, select command outputs no records.
You can use --limit 0 to retrieve the total number of recrods without any contents of records.
Execution example:
> select Entries --limit 0
[
[
0,
1335519593.48351,
0.000329017639160156
],
[
[
[
3
],
[
[
"_id",
"UInt32"
],
[
"_key",
"ShortText"
],
[
"content",
"Text"
]
]
]
]
]
--limit 0 is also useful for retrieving only the number of matched records.
This section describes all parameters. Parameters are categorized.
There is a required parameter, table.
It specifies a table to be searched. table must be specified.
If nonexistent table is specified, an error is returned.
Execution example:
> select Nonexistent
[
[
-22,
1335519593.68574,
0.000583171844482422,
"invalid table name: <Nonexistent>",
[
[
"grn_select",
"proc.c",
542
]
]
]
]
TODO: add example.
It specifies threshold to determine whether search storategy escalation is used or not. The threshold is compared against the number of matched records. If the number of matched records is equal to or less than the threshold, search storategy escalation is used. See 検索 about search storategy escalation.
The default threshold is 0. It means that search storategy escalation is used only when no records are matched.
The default threshold can be customized by one of the followings.
- --with-match-escalation-threshold option of configure
- --match-escalation-threshold option of groogna command
- match-escalation-threshold configuration item in configuration file
TODO: write in English and add example.
It specifies a column that is used to expand (substitute) query parameter value.
query_expansionパラメータには、queryパラメータに指定された文字列を置換(拡張)する条件となるテーブル・カラムを指定します。フォーマットは「${テーブル名}.${カラム名}」となります。指定するテーブルは文字列を主キーとするハッシュ型あるいはパトリシア木型のテーブルで、一つ以上の文字列型のカラムが定義されている必要があります。(ここでは置換テーブルと呼びます。)
queryパラメータに指定された文字列が、指定されたテーブルの主キーと完全一致する場合、その文字列を指定されたカラム値の文字列に置換します。queryパラメータが、空白、括弧、演算子などを含む場合は、その演算子によって区切られた文字列の単位で置換が実行されます。ダブルクォート("")で括られた範囲は、その内部に空白を含んでいても一つの置換される単位と見なされます。検索文字列と置換テーブルの主キー値との比較に際して大文字小文字等を区別したくない場合には、置換テーブルを定義する際にKEY_NORMALIZEを指定します。置換後の文字列となるカラムの値には、括弧や*, ORなど、queryパラメータで利用可能な全ての演算子を指定することができます。
TODO: write in English and add example.
以下のようなjson形式で値が返却されます。
[[リターンコード, 処理開始時間, 処理時間], [検索結果, ドリルダウン結果]]
リターンコード
grn_rcに対応する数値が返されます。0(GRN_SUCCESS)以外の場合は、続いてエラー内容を示す 文字列が返されます。
処理開始時間
処理を開始した時間について、1970年1月1日0時0分0秒を起点とした秒数を小数で返します。
処理時間
処理にかかった秒数を返します。
検索結果
drilldown条件が実行される前の検索結果が以下のように出力されます。:
[[検索件数], [[カラム名1,カラム型1],..], 検索結果1,..]検索件数
検索件数が出力されます。カラム名n
output_columnsに指定された条件に従って、対象となるカラム名が出力されます。カラム型n
output_columnsに指定された条件に従って、対象となるカラム型が出力されます。検索結果n
output_columns, offset, limitによって指定された条件に従って各レコードの値が出力されます。
drilldown結果
drilldown処理の結果が以下のように出力されます。:
[[[件数], [[カラム名1,カラム型1],..], 検索結果1,..],..]件数
drilldownに指定されたカラムの値の異なり数が出力されます。カラム名n
drilldown_output_columnsに指定された条件に従って、対象となるカラム名が出力されます。カラム型n
drilldown_output_columnsに指定された条件に従って、対象となるカラム型が出力されます。ドリルダウン結果n
drilldown_output_columns, drilldown_offset, drilldown_limitによって指定された条件に従って各レコードの値が出力されます。