The good, the bad, and the ugly...
Fields that are not terribly unique are poor candidates for indexing.
For instance, if a field takes on only a small number of unique values, it may actually be cheaper to perform a sequential scan that looks at every page of a table instead of using the index.
#!/usr/bin/perl print qq{ create table nonunique ( id serial, notuniq character(4) ); }; print "begin;\n"; foreach $i (0..10000) { # Insert 10,000 rows my $randnum = int (rand(4)); my $notuniq; if ($randnum == 0) { # Choose from 4 values for $notuniq = "val1"; # field "notuniq" } elsif ($randnum == 1) { $notuniq = "val2"; } elsif ($randnum == 2) { $notuniq = "val3"; } elsif ($randnum == 3) { $notuniq = "val4"; } print "insert into nonunique (notuniq) values ('$notuniq');\n"; } print qq{commit; create index nonuniq_idx on nonunique(notuniq); -- Create an index analyze nonunique; -- Analyze the table so there are decent statistics explain analyze select id from nonunique where notuniq = 'val3'; set enable_seqscan to false; -- force optimizer to NOT do seq scan explain analyze select id from nonunique where notuniq = 'val3'; set enable_seqscan to true; };
The typical query plan for searches based on the notuniq
field looks like the following:
QUERY PLAN ---------------------------------------------------------------------------- Seq Scan on nonunique (cost=0.00..182.19 rows=2462 width=4) (actual time=0.05..31.32 rows=2437 loops=1) Filter: (notuniq = 'val1'::bpchar) Total runtime: 33.77 msec (3 rows)
The query optimizer chooses a Seq Scan, in effect, ignoring the index. That is because since there are so many instances of the value, " val1", it is more efficient to look through all 55 pages of the table instead of walking through the index, which would lead to visiting much the same set of pages. Using the index would require reading 55 pages of the table as well as much of the 30 pages of the index.
An index on a
and b
provides an
index on a
, "for free".
As a result, if you need a UNIQUE index on a table that is comprised of several fields, it is a good idea to choose the order carefully. This can allow avoiding the need to create an extra index on the field.
PostgreSQL supports the creation of indexes on parts of tables.
create index pindex_1 on table1 (field1, field2) where field1 is not null;
create index pindex_2 on table1 (field2) where field2 > 1000;
Which are pretty cool...